US20130073480A1 - Real time cross correlation of intensity and sentiment from social media messages - Google Patents

Real time cross correlation of intensity and sentiment from social media messages Download PDF

Info

Publication number
US20130073480A1
US20130073480A1 US13/427,833 US201213427833A US2013073480A1 US 20130073480 A1 US20130073480 A1 US 20130073480A1 US 201213427833 A US201213427833 A US 201213427833A US 2013073480 A1 US2013073480 A1 US 2013073480A1
Authority
US
United States
Prior art keywords
sentiment
time series
social media
series
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/427,833
Inventor
Gautham Sastri
Lionel Alberti
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ISENTIUM LLC
Original Assignee
ISENTIUM TECHNOLOGIES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=46878142&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US20130073480(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by ISENTIUM TECHNOLOGIES Inc filed Critical ISENTIUM TECHNOLOGIES Inc
Priority to US13/427,833 priority Critical patent/US20130073480A1/en
Assigned to ISENTIUM TECHNOLOGIES, INC. reassignment ISENTIUM TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SASTRI, GAUTHAM, ALBERTI, LIONEL
Publication of US20130073480A1 publication Critical patent/US20130073480A1/en
Assigned to ISENTIUM, LLC reassignment ISENTIUM, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ISENTIUM TECHNOLOGIES INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present invention relates a method and system using social media for real-time event driven trading of equities, commodities and other traded assets.
  • Sentiment analysis applies various analytical techniques in identifying subjective information from different information sources. Sentiment analysis, therefore, attempts to ascertain the feelings, thoughts, attitude, opinion, etc. of a speaker or a writer with respect to a topic.
  • the first approach in particular, a so called “bag of words” approach, attempts to apply a positive/negative document classifier based on occurrence frequencies of the various words in a document. Applying this approach various learning methods can be used to select or weight different parts of the text used in the classification process.
  • This approach fails to process the sentiment with respect to assets (for example, equities or commodities) in short digital messages such as tweets sent via the online social networking service Twitter.
  • Semantic orientation automatically classifies words into two classes, “good” and “bad”, and then computes an overall good/bad score for the text.
  • This method does not take into consideration the sentiment conveyed by parts of speech other than adjectives, including verbs, for example, to bounce, to crash, nouns, for example, a put, a call, and phrases, for example, ascending triangle, black Friday, head-and-shoulders.
  • an object of the present invention to provide a method for finding patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked.
  • the method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, s s , relating to an asset; generating a frequency time series, s f , relating to an asset; and determining a pattern based upon the sentiment time series and the frequency time series.
  • It is another object of the present invention to provide a method wherein the interpolation method I, is a function of a time series s s and of a time t that is C 1 -piecewise continuous with respect to t, and such that if there exists a point (t, v) in s s , I(s s , t) v.
  • FIG. 1 is a schematic overview of the present system.
  • FIG. 2 is a representation of the graphical user interface in accordance with the present invention.
  • FIG. 3 is a partial view of the reaction indicator.
  • FIG. 4 is a graphical depiction showing the correlation of frequency and sentiment.
  • FIG. 5 is a screen shot showing the ingest and processing of various assets.
  • FIGS. 6A and 6B are screen shots when a moving spherical graphic object is clicked in the graphical user interface.
  • FIG. 7 is a screen shot showing various moving spherical graphic objects shrinking and growing based on social media intensity thereof.
  • a method and system using social media for event-driven trading are disclosed.
  • the present method and system 10 use social media for the real-time evaluation of publicly traded assets, in particular, equities and commodities, using information generated through social media interactions.
  • equities social comments transmitted using the social networking service Twitter
  • an “asset” is considered to be a resource with economic value that an individual, corporation or country owns or controls with the expectation that it will provide future benefit.
  • Assets include, but not limited to investments in equities, options, derivatives, commodities, bonds, futures, currencies, etc. It should further be appreciated that “equities” are stocks or any other securities representing an ownership interest.
  • the present method and system 10 with reference to the stock market, although the application of the present invention could be extended to commodities and other asset based markets.
  • the present system 10 is able to effectively predict swings in asset prices for effective and profitable trading thereof.
  • the present method and system 10 provide a sentiment calculator 22 that employs natural language processing in evaluating social media interactions by anticipating the sentiment of traders relating to specific equities and commodities in terms of the polarity of the sentiment and the strength of the sentiment.
  • the data generated by the sentiment calculator 22 is applied to a reaction indicator 31 in the form of a graphical user interface 30 that combines sentiment and frequency (which is indicative of the intensity of the sentiment) data relating to the assets.
  • sentiment and frequency are fully appreciated, the present system 10 and method provide a mechanism for cross-correlating the sentiment and intensity data (the perceived strength of the sentiment being expressed by the social media) with the actual fluctuations occurring with the price of assets.
  • the present system 10 provides for the processing of social media messages generating data for the real-time evaluation of publicly traded assets, for example, stocks.
  • the system 10 includes an ingest component 11 for ingesting the social media messages; a filter module 14 eliminating expressions not considered useful language from social media messages; a natural language processor (NLP) 16 processing filtered social media messages; a sentiment calculator 22 applying rules to the filtered and NLP processed social media messages so as to compute a representation of values associated with the filtered and NLP processed social media messages; and a graphical user interface 30 displaying the values generated by the sentiment calculator 22 .
  • NLP natural language processor
  • the ingest component 11 consumes, acquires or gathers a wide range of social media messages 12 and immediately filters the messages as will be explained below in greater detail.
  • the ingest component 11 is a data acquisition module.
  • the ingest component 11 allows the system 10 to automatically import raw social media messages, for example, tweets from Twitter or other social media sites.
  • the data that is, the raw social media messages, is acquired on the basis of a predefined set of keywords or combination of keywords the system 10 has been programmed to look for.
  • the filtered social media messages are then subjected to natural language processing via NLP module 16 based upon lexical databases 18 , 20 of both stock specific sentiment terminology (Stock-Lex 18 ) and general, non-stock specific, sentiment terminology (Sent-Lex 20 ).
  • the filtered and NLP processed social media messages are next processed by the sentiment calculator 22 and inference engine 24 .
  • the sentiment calculator 22 and inference engine 24 apply information from databases 26 , 28 respectively relating to the knowledge of the stock market world and the knowledge of the world.
  • the results of the sentiment calculator 22 and inference engine 24 are then presented to the user via a reaction indicator 31 in the form of a graphical user interface upon a computer monitor which displays sentiment per asset information.
  • sentiment calculation is part of the present system 10 for event-driven trading using social media messages.
  • the system 10 ingests content (that is, social media messages such as tweets) from one or multiple social media sources based on user-specified criteria.
  • the meaning of the information conveyed by the social media messages is determined using a natural language processing (NLP) module 16 .
  • NLP natural language processing
  • the system 10 calculates “sentiment” and presents metrics relating thereto in real-time.
  • FIG. 5 shows social media messages, for example, “tweets”, with annotations relating to the sentiment scoring for the individual tweets.
  • sentiment calculations in accordance with the present invention may be used to anticipate the reaction of the traders before they act.
  • the sentiment calculator 22 of the present system 10 analyzes social media messages to calculate the sentiment with respect to events pertaining to objects.
  • Objects relate to assets being traded, via situations having a bearing on public sentiment and relating the value of the asset being traded (preferably on an exchange).
  • object(s) refers to anything related to an asset that can be publicly traded and monitored.
  • an “iphone” and the stock symbol “AAPL” are objects which relates to the asset Apple Inc. which can be publicly traded.
  • the sentiment calculator 22 represents one module of the present multilayered system 10 for processing short and noisy messages such as tweets, as depicted in the schematic shown in FIG. 1 .
  • Proper operation of the sentiment calculator 22 that is, sentiment calculations, requires that a filter module 14 configure input text into formats for use by the subsequent processing modules of the pipeline making up the present invention.
  • the filter module 14 is composed of a set of rules (using regular expressions) created to transform the ingested social media messages into expressions without noise. Noise is considered to be elements in the message which are not part of natural language, such as hash tags, URLs, etc. Therefore, the filter module 14 functions to bring tweets as close as possible to expressions in natural language by eliminating expressions that are not considered part of current language usage.
  • the filter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example “q1
  • Sentiment calculations in accordance with the present invention require that a Part of Speech (POS) Tagger 33 assign lexical categories to each of the filtered social media messages as they are broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages. Sentiment calculations in accordance with the present invention also require that a partial parser (PARS) 32 recover the structure of the main constituents/syntactic structures (a lemmatizer deriving the canonical form of lexical items, that is, a single or a group of words conveying a single meaning, to enable the lexical lookups) of the filtered social media messages.
  • PARS partial parser
  • MORPH 34 is a lemmatizer which reduces the spelling of words to its lexical root or base/lemma form.
  • the base form for a verb is the simple infinitive.
  • the base form for a noun is the singular form.
  • the plural “mice” is a form of the lemma “mouse.”
  • MORPH 34 uses a list of numerous such rules to reduce an ingested and non-filtered word to its base form.
  • MorphAdorner A Java Library for the Morphological Adornment of English Language Texts”, Version 1.0. Apr. 30, 2009, Copyright ⁇ 2007, 2009 by Northwestern University, is incorporated herein by reference.
  • the data composed of the filtered and NLP processed social media messages is supplied to the sentiment calculator 22 that calculates sentiment compositionally in the syntactic context.
  • the process of sentiment calculation also employs an inference engine 24 that fine-tunes sentiment calculations using knowledge of the world. This process for sentiment calculation enables sentiment to be calculated on the basis of a set of rules deriving the polarity of stock events and their strength.
  • tweets are constrained to 140 characters means that messages sent via Twitter begin to resemble programming languages such as Fortran (which originally had a constraint of 72 characters per line).
  • the primary effect of this constraint is a limitation on the freedom available to the author of a tweet as he or she attempts to convey a specific message.
  • compiling tweets (akin to compiling a programming language) and achieving very high levels of accuracy in deriving sentiment whilst minimizing resource consumption and interpretation times. It therefore becomes feasible to ingest and process potentially millions of messages per hour using Common Off The Shelf (COTS) computers.
  • COTS Common Off The Shelf
  • the technical advantage of the present system 10 relative to other known technologies is that the present system 10 is based on natural language processing techniques rather than machine learning techniques (for example, Naive Bayes, maximum entropy classification, and support vector machines), as described for example in Pang and Lee. Bo Pang and Lillian Lee 2002 . Thumbs up? Sentiment Classification using Machine Learning Techniques . Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86.
  • ENNLP Empirical Methods in Natural Language Processing
  • the rule-based method that is used in accordance with the present invention avoids the shortcomings of the statistical method because it processes the social media messages directly instead of classifying social media messages on the basis of probabilistic algorithms.
  • Another advantage lies in the innovative contribution of the inference engine 24 , which contributes to reduce uncertainty and brings further support to decision making.
  • classification algorithms such as Na ⁇ ve Bayes classifiers
  • Na ⁇ ve Bayes classifiers are not used in the implementation of the present invention.
  • the present invention solves the problem by processing the actual content of the social media messages as they are formulated. It does not calculate the number of positive adjectives in a social media message, or in a set of social media messages, to compute sentiment, contrary to common practice.
  • the sentiment calculator 22 is a module of the multilayered architecture employed in accordance with the present system 10 , which can be customized for different domains, including, for example, finance, security and pharmaceutics. In the disclosed application in the stock market exchange, the sentiment calculator 22 calculates sentiments from stock exchange-related social media messages 12 in order to predict stock movements before human traders can act.
  • the innovation brought about by the sentiment calculation in accordance with the present invention is the event driven approach to sentiment mining.
  • Unstructured incoming social media messages 12 are processed in order to extract sentiment about pre-specified assets, as they participate in ongoing events.
  • the sentiment calculator 22 performs event-driven sentiment calculus.
  • Equation (1) The event-driven approach to sentiment mining as applied in accordance with the present invention can be represented in accordance with equation (1), where M stands for Modifier, Ev stands for Event, and x, . . . , z stand for the participants of the event.
  • M stands for Modifier
  • Ev stands for Event
  • x, . . . , z stand for the participants of the event.
  • the asset the sentiment is about is a participant of the event.
  • This relational approach to sentiment mining contrasts with the statistical keyword search approach, classifying messages on the basis of the number of positive or negative qualifiers.
  • the statistical keyword search approach fails to provide sentiment-per-asset values.
  • the present invention takes an event to be a change in the relation between the participants of the event.
  • the participants of an event are: names, organizations, locations, expressions of time, quantities, monetary values, percentages, etc.
  • the present system 10 includes name entity recognition capacities and syntax-semantic capacities to provide the articulation of events and their participants.
  • the interpretation of syntactic structure is generally compositional: that is, the interpretation of the whole is a function of the interpretation of the parts. However, part of the semantics conveyed by natural language is non-compositional and idiosyncratic. The idiosyncratic meanings are listed in lexicons assuming both generic (Sent-Lex 20 ) and domain-specific (Stock-Lex 18 ) lexicons
  • “sentiment” about an asset participating in an event is considered in accordance with the present invention to be the orientation (that is, the polarity in opinions expressed regarding the asset) and the strength of the opinions on that asset that deviates from the normal state.
  • a sentiment is the expression of a psychological state relative to an event (whether that event be static or dynamic).
  • social media messages sent via the social networking site Twitter are limited to 140 characters, lexical items, emoticons and other diacritics found in such messages cannot express the richness of thought and sentiments conveyed by traditional written natural language without further processing.
  • the present system and methodology focus on the properties of natural language employed in the social media messages to calculate the sentiment with respect to given objects in ongoing stock events referred to in social media exchanges.
  • sentiment is represented by an integer combining a polarity value (polarity positive +, negative ⁇ , neutral n) and a strength value ranging within a pre-defined scale.
  • the sentiment calculator 22 uses data generated by the filtering and natural language processing of the social media messages, the sentiment calculator 22 yields an integer that combines the polarity and the strength values of each pair of expressions relating an asset to an event as explained below in greater detail.
  • Polarity is a value (that is, positive, negative or neutral) that is part of the lexical specification of words and phrases. These values will compose according to the Polarity rules, provided below.
  • Strength is an integer, that is, also part of the lexical specification of the words and phrases. The values for strength in accordance to a preferred embodiment of the present invention range from 1 to 3 (1 low and 3 is high). These numbers will be added in the processing of messages according to the Strength rules, provided below. However, it is appreciated that values for strength could range from 1 to 5 or higher.
  • the sentiment calculator 22 is embedded as part of the present overall system 10 that ingests social media messages from multiple social media sources based on user-specified criteria.
  • the social media messages go through a filtering layer/module 14 purging the messages of noise (URLs, hashtags, etc.).
  • the results of the filtering of social media messages are tokens which are assigned part of speech tags according to the lexical and contextual properties of the lexical items based upon NLP module 16 .
  • a parser 32 then recovers the major constituents/syntactic structures of the tokenized messages.
  • the sentiment calculator 22 takes the annotated partial parses as its input and yields a sentiment-per-asset on the basis of the sentiment values of the lexical items and the sentiment logic, calculating the sentiment of constituents on the basis of the sentiment values of their parts.
  • the sentiment calculator 22 interacts with the inference engine 24 to determine the sentiment with respect to the knowledge of the word.
  • the sentiment calculator 22 derives sentiment in terms of polarity and strength with respect to objects (for example, assets as referenced by tickers and commodity names) as they participate in ongoing stock events described by the ingested social media messages, for example, tweets, 12.
  • objects for example, assets as referenced by tickers and commodity names
  • the generic representation in equation (1) as noted above can thus be instantiated by equation (2) for this application.
  • stock-market specific lexical items and phrases are qualified in the Stock-Lex 18 , and the sentiment calculator 22 applies to pairs of sentiment-marked lexical items compositionally in their syntactic configuration.
  • the sentiment calculator 22 is a module of the pipeline making up the present system 10 .
  • the components of this system 10 process incoming social media messages, and yield a sentiment-per-object/asset for each ingested incoming social media message in real-time.
  • the sentiment calculator 22 calculates the sentiment-per-asset for each incoming social media message ingested by the system 10 .
  • FIG. 1 represents the three main components of the system: Ingest 11 (the social medial messages 12 ), Process 15 (the social media messages using the filter 14 , NLP 16 and sentiment calculator 22 ), and Display 30 (the results on the processing step on a reaction indicator 31 in the form of a graphical user interface 30 ). It also identifies the specific NLP components/modules (POS 33 , PARS 32 , MORPH 34 , Stock-Lex 18 and Sent-Lex 20 ) processing social media messages 12 from the ingest component 11 to the sentiment calculator 22 and its interaction with the inference engine 24 (which includes databases relating to Knowledge of the Stock Market world 23 and Knowledge of the world 25 ).
  • Ingest 11 the social medial messages 12
  • Process 15 the social media messages using the filter 14 , NLP 16 and sentiment calculator 22
  • Display 30 the results on the processing step on a reaction indicator 31 in the form of a graphical user interface 30 . It also identifies the specific NLP components/modules (POS 33 , PARS 32 ,
  • the architecture of the system 10 is shown in FIG. 1 .
  • the following explains the main features of each component of this architecture, where the lexicon, the part of speech tagger (POS) 33 and the parser (PARS) 32 can be parameterized to process different languages.
  • POS part of speech tagger
  • PARS parser
  • Simplex and complex keywords are used for ingesting the social media messages 12 , according to the requirements of the stock traders.
  • the hardware used for ingest are standard off the shelf computers gathering and processing social media messages using the pre-determined keywords.
  • the techniques used for collecting social media messages must take into consideration the requirements of stock traders, see Section 1.1, as well as they must enable the collection of social media messages with respect to specific assets, as described in Section 1.2
  • the set of keywords for specific assets is defined in terms of generic categories that can be parameterized according to the finance domain.
  • different strategies for ingesting a large number of relevant social media messages are used. For example, the following strategies may be employed:
  • strategy (2) is used for Crude Oil, where only one keyword is used, “oil”, and a very large exclude list include expressions such as “Soya oil”, “olive oil”, etc.
  • strategy (3) is preferred.
  • Strategy (3) employs a large include list made up of both unary and binary expressions including the word “gold”, the object “X”, and another word, a predicate, as in “gold industry”, “gold news”, “gold investor”, “gold invest”, “gold investment”, “gold plunge”, “gold raise”, “gold plunged”, “gold raised”, “gold plunging”, “gold raising”, “gold decline”, “gold declined”, “gold declining”, “gold rally”, “gold rallied”, “gold rallying”, “gold fall”, “gold falling”, “gold fell”, etc.
  • a large exclude list is still necessary to exclude for example jewelry items and colors.
  • Strategy (3) can be used for other commodities by substituting names of other commodities to the variable in (3) and keeping constant the set of predicates. Thus, a very similar set of keywords may apply to other commodities.
  • This technique using refined keyword strategy is used in conjunction with the methods described above to come up with a sufficiently large number of social media messages, and a high degree of correlation between derived sentiment and price movement, thereby meeting the two requirements of sentiment-price correlation and sufficient volume.
  • the refined keyword strategy for ingesting relevant social media messages increases the volume of ingested social media messages that will be fed into the other components of the system, described in the following paragraphs.
  • the ingested social media messages may include messages in a language other than English.
  • a language identifier/detector 19 is therefore employed in identifying the language of an incoming message and assigns it a code.
  • the ingested social media message (4) will be assigned the code (5), which stands for English.
  • Language identification is a prerequisite for the NLP processing in accordance with the present invention, as the overt syntactic properties vary between languages, as well as the form and content of the lexical items. It is thus necessary to ensure that the social media messages processed by the NLP module 16 will be English messages, or whatever language the system 10 was parameterized for.
  • the filter module 14 is a pre-NLP processing module that brings social media messages 12 as close as possible to expressions in natural language by eliminating expressions that are not part of current use of language. For example, the filter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example q1
  • the filter module 14 also performs sentence detection on the basis of typographic cues. This is a necessary step in the pre-NLP processing, since social media messages may include more than one sentence, see (6). As the NLP processing and the sentiment calculus are sentence bound, sentence boundary delimitation is necessary. For example, the filter module 14 applies to (7), replaces the URL by a period, convert capitals into lower case, and yields (8):
  • the Sent-Lex 20 is a hand-crafted sentiment-based repository, or database, of the most frequent lexical items and phrases collected from the ingested social media messages, as well as from specialized vocabularies, that are indicative of sentiment.
  • the lexical items and phrases vary according to the domain of application, e.g. finance, security, pharmaceutics, etc. Words that are not sentiment bearing, such as definite articles and auxiliaries, are not part of the Sent-Lex 20 .
  • sentiment is associated to event denoting verbs and nouns, as well as with sentiment-bearing modifiers of events or of participants of the events.
  • the lexical specifications are designed to be parameterized to specific domains of application.
  • the generic format of the lexical entry includes the lexical item, followed by fields of lexical specifications.
  • the first field specifies the category of the item, the second field specifies its polarity, the third field specifies its lexical strength, and the fourth field specifies the polarity of the semantic arguments of the lexical items and phrases, if applicable.
  • the lexical items and phrases and their features are stored in a lexical database, that is, the Sent-Lex 20 .
  • Each of the lexical items and phrases maintained in the Sent-Lex 20 is associated with a category tag, an inherent polarity value, an inherent strength value, and for some items, polarity and strength values are also associated to designated argument structure variables as in (11).
  • the variable y in that verb's argument structure is associated with a positive value, as in (12), this is not the case for other verbs such a announce and report.
  • Google is associated with a positive sentiment.
  • the categories, nominal (NN), verbal (VB), adjectival (JJ), adverbial (RB) and their sub-categories, are intrinsically associated to polarity (+, ⁇ , n), and Strength (1, 2, 3). Furthermore, the lexical specifications differentiate degree modifiers, such as very, too and much from modifiers, such as good and better. Degree-intensifiers contribute their own lexical value, and add an extra value 1 to the category they modify, see (14) below for examples.
  • the Stock-Lex 18 is the stock-based lexical repository, or database, consisting of the most frequent lexical items and phrases used in the ingested social media messages that relate to stock-based knowledge, as well as most frequent items used in stock exchange and financial news wire such as the Financial Post (or other commodity exchange system depending upon the application to which the present system 10 is applied).
  • the Stock-Lex 18 thus includes a restricted set of stock-specific lexical items and phrases, associated with their domain specific polarity and strength values.
  • the polarity values are: positive, negative and neutral.
  • the lexical strength associated to the lexical items and phrases ranges from 1 to 3, where 1 is the lowest value and 3 is the highest value, see (15) for examples.
  • the stock-specific lexical items and phrases are part of the major lexical and phrasal categories, nominal, verbal, adjectival. Only event denoting nominal and verbal expressions are part of the Stock-Lex 18 , and only stock specific adjectival and adverbial modifiers are part of the Stock-Lex 18 .
  • Stock objects (tickers, company names, product names, etc.) have a neutral polarity and have no associated strength value.
  • the sentiment calculator 22 derives the sentiment with respect to specific stock objects.
  • the Stock-Lex 18 is a repository of the most frequent sentiment-bearing noun, verbs, adjectives and adverbs used in social media stock market-related exchanges. Each lexical item is associated with a part of speech (POS), a polarity and strength. The Stock-Lex 18 is handcrafted and contributes to the invention in providing sentiment specifications for event denoting items, and their dependents.
  • the innovation is two-fold: i) it specifies sentiment values for other categories than adjectives, contrary to common practice; ii) it specifies sentiment value for event denoting lexical items and their dependents, thus providing the lexical information used by the sentiment calculator 22 for the compositional calculus of the sentiment-per-asset.
  • each incoming filtered social media message is broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages and each token is assigned a Part Of Speech (POS) by the POS tagger 33 .
  • POS Part Of Speech
  • Brill Tagger that is, a known methodology for performing part of speech tagging, is used, as it is sensitive to the lexical properties and distributional properties of lexical items and phrases in natural languages. It is appreciated Brill Tagger is an “error-driven transformation-based tagger”. Brill Tagger is error-driven in the sense that it recourses to supervised learning transformation and Brill Tagger is based in the sense that a tag is assigned to each word and changed using a set of predefined rules.
  • the POS tagger 33 is necessary in accordance with the present system 10 to identify the lexical items that contribute to the sentiment calculus, namely adjectives (JJ), adverbs (RB), as well as event denoting verbs (e.g., to upgrade) and nouns, e.g. (e.g., an upgrade).
  • JJ adjectives
  • RB adverbs
  • event denoting verbs e.g., to upgrade
  • nouns e.g., an upgrade
  • the POS Tagger 33 applies to the ingested filtered social medial messages, tokenizes the string and assigns part of speech to the tokens on the basis of a set of lexical and contextual rules, accounting for the distribution of categories in natural language texts.
  • Brill Tagger applies to (18) and derives the annotated tokenized string in (19).
  • the tokenized and POS annotated messages resulting from the POS tagger 33 are fed to a partial parser 32 that recovers the main syntactic constituents of the social media messages.
  • the partial sparser 32 employs a Cass parser, Abney's cascaded FST (Finite State Transducer), to recover the main syntactic constituents of the basis of the tokenized and POS annotated representations of social media messages, as illustrated in (20).
  • Partial parsing is designed for use with large amounts of noisy text. Robustness and speed are primary design considerations. Not all NLP applications require a complete syntactic analysis. Partial parsing is used in information retrieval as well as information extraction applications, such as facts and sentiment mining, where finding simple nominal and verbal constituents is enough. Full parser provides more information than needed, and when expected information is missing, as it is generally the case in social media messages, where syntactic reductions and truncation are necessary to convey meaning within limited character constraints, for example, 140 characters when considering tweets using Twitter.
  • the leaves of the parse tree are associated with their sentiment values via access to Stock-Lex 18 and the sentiment calculator 22 applies to the resulting semantically annotated tree.
  • the main properties on the calculator are described in the following section.
  • a sentiment is an integer, which can be either positive or negative, computed on the basis of the application of the rules of the sentiment calculus to pairs of lexical items in their local syntactic context; for example, nouns (that is, nominal lexical items) representing assets and nouns/verbs/adjectives (that is, nominal, verbal or adjectival lexical items) representing sentiment in the form of polarity and strength.
  • the computed sentiment value ranges within a pre-established scale.
  • the sentiment calculator 22 uses social media messages for the real-time evaluation of publicly traded equities and commodities wherein a sentiment is a positive or negative integer computed based upon pairs of lexical items in local syntactic context. In its most basic components the sentiment calculator employs a mechanism for determining lexical polarity in social media messages and a mechanism for determining a strength value of lexical items and phrases used in social media messages.
  • the sentiment calculus employed by the sentiment calculator 22 applies to the output of the annotated Cass tree produced by the partial parser 32 . It compositionally derives the sentiment associated to entities in the event denoted by the expression they are part of.
  • the sentiment logic is a compositional calculus deriving the sentiment value of a relation on the basis of the sentiment values of its parts.
  • the sentiment logic calculates sentiment values per asset with respect to stock market events described by the incoming social media messages. Namely, it calculates the sentiment with respect to given assets, as they occur is stock events.
  • the social media messages relating to an asset are gathered by a set of keywords used for ingesting the social media messages.
  • the sentiment calculus is based on the lexical polarity and strength value of the lexical items and phrases defined in the Stock-Lex 18 and how they are syntactically organized in the Cass tree.
  • the maximal local domain for the application of the calculus is the sentence; the minimal local domain is the smallest constituents including the keywords standing for the asset.
  • the sentiment calculus applies locally to the constituents including the asset within the sentences of the message.
  • the Cass parser derives the syntactic constituents of the sentences, including the adjectival (cx), as well the nominal (nx) and the verbal (vx) constituents.
  • a head of a constituent is a lexical item, such as a verb, e.g., hit, or a noun, e.g., acquisition, that makes the constituent it is part verbal (vx) or nominal (nx).
  • a head selects a complement, which is a syntactic constituent such as a nominal phrase, e.g. the market in hit the market, and AAPL in the acquisition of AAPL.
  • a modifier is an adjective or an adverb that modifies another constituent, a nominal constituent in the first case and a verbal constituent in the other case, e.g., strong market and strongly hit the market.
  • the subject-predicate relation is the relation between a subject, generally a nominal constituent and a predicate, generally a verbal constituent, e.g., in the sentence AAPL hits the market, AAPL is the subject and hit the market is the predicate.
  • the sentiment calculus includes separate rules for calculating the polarity and the strength. They have the generic form of dyadic operators (Op (arg1, arg2)), and their specific form is dependent on the relation between arg1 and arg2, as well as the lexical polarity and strength values of the lexical items and phrases specified in the Stock-Lex 18 .
  • the rule applies locally in syntactic constituents/domains, e.g., nx, vx, cx, etc. It derives the strength of constituents on the basis of the strength of their parts and how they are syntactically related by the application of an arithmetic operation to the pair of arguments depending on the nature of the syntax-semantic relation and the polarity of the constituents.
  • the strength rules apply to the lexical items and phrases in the three universal syntactic relations, and the strength is calculated on the basis of elemental arithmetic operations.
  • the Strength rules include the following:
  • social media messages may include more than one sentence, may talk about more than one asset, more than one stock event, and they may express more than one sentiment.
  • the sentiment calculator 22 is sentence bound. Moreover it calculates sentiment in the local syntax-semantic domain of an asset. Thus, it ensures that the specific sentiment with respect to a given asset conveyed by a message is calculated. It applies iteratively in the local domain of the constituent including the asset (keyword, set of keywords), e.g. OIL, or GOLD, and the expression of a stock event (e.g., “lose”, “gain”, “sell”, “buy”) or a sentiment (e.g., “high”, “low”).
  • a stock event e.g., “lose”, “gain”, “sell”, “buy”
  • a sentiment e.g., “high”, “low”.
  • the following trace for the tweet (26) illustrates the application of the sentiment calculator 22 that calculates sentiment-per-asset in the local domain of the targeted asset: Oil.
  • the calculus assigned the value +3 to Oil, discarding the value of the computation for Canadian dollar, which is ⁇ 5.
  • This example shows that every step of the computation by the modules of the system provides the structure for the application of the sentiment calculus.
  • This calculus applies in local syntactic domains and provides an integer that represents the sentiment (polarity and strength) with respect to designated assets.
  • Inference engine 24 is part of expert systems, which are designed to process a problem expressing an uncertainty with respect to a decision, and to provide a decision, or a set of decisions reducing the uncertainty. Inference engine 24 attempts to provide an answer to a problem, or clarify uncertainties where normally one or more human experts would need to be consulted.
  • the inference engine 24 of the present system 10 is part of the pipeline and provides a mechanism to sharpen the accuracy of the sentiment computed, by bringing both knowledge of the stock market world 23 and knowledge of the world 25 into the computation.
  • the inference engine 24 includes a data structure, and a set of inference rules (if X then Y) relating facts to sentiments. This knowledge interacts with the domain-specific knowledge stored in the lexicon and used by the sentiment calculator 22 .
  • the inference engine 24 includes a data structure, a knowledge base that uses some knowledge representation structure to capture the knowledge of a specific domain, for example a relational table relating entities in knowledge domains, and a set of inference rules applying to the entities in the relational table and drawing consequences.
  • a relational table relating entities in knowledge domains
  • inference rules use reasoning, which more closely resembles human reasoning.
  • the knowledge base consists of a relational table relating stock entities (tickers, company names, products, etc.), stock events (e.g., upgrade, downgrade) and facts, extracted from news wire.
  • the rules of the inference engine 24 apply to the elements of the relational table and infer sentiment values.
  • the knowledge base includes (28) below, and the inference rules (29) below, stating that if gas oil (at the pump) is inferior to $3 then the sentiment value is positive, +2, if the gas oil is superior to $3 then the sentiment value is negative, ⁇ 2.
  • This real world knowledge varies according to time and place.
  • the sentiment calculator 22 alone would not derive the negative sentiment associated to the second sentence in (27). While the sentiment calculator 22 assigns the value neutral to questions, the inference engine 24 assigns the sentiment value of ⁇ 2.
  • the inference engine 24 ensures that the sentiment is grounded in the real world. It contributes to the innovative technology, which leads to both simplify and sharpen decision taking in stock market transactions.
  • Sentiment calculations in accordance with the present system 10 are a result of the pipeline or multilayered embodied by the present invention that ingests social medial messages, identifies the language of the social media messages, and filters them from elements that are not part of natural language for which the system 10 has been parameterized (here English).
  • the POS tagger 33 and the partial parser 32 modules of the NLP processor 16 assign parts of speech to the tokens of sentences, and recover the structure they are part of.
  • the sentiment calculus of the sentiment calculator 22 applies to the annotated structures and derives the sentiment value per asset based on the sentiment value of the event they are part of.
  • the inference engine 24 reduces uncertainly by relying on a relational database including knowledge of the world information and a set of inference rules.
  • the present sentiment calculation system includes computer implemented mechanism for obtaining and converting ingested unstructured social media messages regarding a plurality of objects/assets being tracked into a sentiment value for each object/assets.
  • the sentiment value includes a polarity value and strength value derived from a natural language processing algorithm containing a database of lexical items and phrases related to the objects being tracked.
  • the precise sentiment value per object is derived by the compositional calculus based on the sentiment values of lexical items (and phrases) and their syntactic organization.
  • the contextual sentiment value is based on the inference engine 24 deriving a sentiment value with respect to knowledge of the world.
  • the interaction of the sentiment calculus and the inference engine 24 yields accurate sentiment in real-time.
  • the sentiment cognitive-based calculus relates conceptual processing with natural language processing algorithm.
  • the data generated by the sentiment calculator 22 is applied to a graphical user interface 30 that combines sentiment and intensity data relating to the assets.
  • the graphical user interface 30 includes moving graphic objects displayed upon a monitor that depict social media market sentiment; a timeline slider object 46 ; and a vertical bar chart object 44 .
  • the graphical user interface 30 provides for the visualization of graphic objects in the form of moving spheres 40 where the sphere size and color depict social media market sentiment.
  • the moving spherical graphic objects 40 shrink and grow based on intensity changes.
  • the sphere color changes based on social media sentiment polarity.
  • the center sphere 40 a represents the weighted sentiment average. Clicking one of the moving spherical graphic objects 40 results in the display of a chart 42 (see FIGS. 6A & 6B ) graphing (based on what the trader selects) all or a choice of price, volume, social media frequency, social media sentiment, cross-correlation and a variety of price and sentiment derived technical indicators.
  • Sphere updates are based on a configurable polling time.
  • the graphical user interface 30 contains a time slider 46 to go back to a point in time and replay history.
  • a vertical bar chart 44 graphs the social media sentiment when the graphical user interface 30 is in full screen mode.
  • reaction indicator 31 The purpose of the reaction indicator 31 is to provide a mechanism wherein hundreds of assets can be tracked, but only those that are “interesting” based on preprogrammed parameters will float to the surface and draw the viewer's attention.
  • the reaction indicator 31 provides a graphical user interface 30 displaying three graphical areas of objects, moving spherical graphic objects 40 , a timeline slider object 46 and a vertical bar chart object 44 .
  • the moving graphic objects may take shapes other than spheres, such as squares. Referring to FIG. 2 , the spherical moving graphical objects are represented at 40 , the timeline slider object at 46 and the vertical bar chart object at 44 .
  • the reaction indicator polls a data stream containing mathematically computed values for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment auto refreshing the moving spherical graphic objects 40 and the vertical bar chart object 44 based on a configurable polling time. Intensity is defined as the ratio of short term frequency divided by long term frequency.
  • the mathematical computations for the data stream are calculated by an algorithm discussed herein in detail in a section related to cross correlation. The calculations are based upon information obtained from a multilayer pipeline architecture previously discussed.
  • the moving spherical graphic objects 40 shrink and grow based on the social media intensity attribute and are sized relative to each other taking into consideration the stage size and browser screen resolution.
  • the color of the moving spherical graphic object 40 is based on social media sentiment polarity where polarity is defined as negative, neutral or positive.
  • Each of the moving spherical graphic objects 40 displays a label, social media sentiment and social media frequency.
  • the center sphere 40 a object visualizes a weighted average of all sphere objects based on weights assigned to the spheres.
  • the weighted sphere object is represented at 40 a .
  • the weighted average sphere size is static relative to the other sphere objects, which shrink and grow, and displays weighted average social media sentiment and weighted average social media frequency, if sphere weights have been assigned. If sphere weights have not been assigned, the weighted average sphere object does not display any data.
  • the weighted average sphere object does not change color to reflect social media sentiment polarity.
  • An example where weights may play a role is in the instance where the visualization represents an Exchange Traded Fund (ETF).
  • ETF holds assets such as stocks, commodities or bonds.
  • the assets would be represented in the spheres.
  • the weight for each asset assigned would represent the percentage in the ETF for an amalgamation of all assets.
  • the timeline slider object 46 visualizes a timeline where the date and time on the left represent the earliest date and time where data exists for the collection of moving spherical graphic objects 40 .
  • the date and time on the fight represents current date and time.
  • Moving to various points on the timeline slider object 46 move the moving spherical graphic objects 40 and the vertical bar chart object 44 to a point in time, pausing the real-time display, then replaying history. From the historical point in time selected, the moving spherical graphic object 40 and the vertical bar chart object 44 will poll the data stream coming from the sentiment calculator 22 for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment from the point in time selected then rerun history as if it were happening real-time. Referring to FIG. 2 , the timeline slider object is represented at 46 .
  • the vertical bar chart object 44 utilizes the same data stream as the moving spherical graphic objects 40 to graph social media frequency, using the same color scheme as the spherical objects. Referring to FIG. 2 , the vertical bar chart object is represented at 44 .
  • Clicking on a moving spherical graphic object 40 will launch a chart, graphing price, volume, social media sentiment, social media frequency, and cross-correlation auto refreshing based on a configurable time, e.g. every second as seen in the screen shots depicted in FIGS. 6A and 6B .
  • Each of the moving spherical graphic objects 40 display a symbol, such as an exclamation mark within the sphere, preferably in the center, when an alert has been triggered.
  • a trigger will result when sentiment and intensity variables cross certain thresholds, the related moving spherical graphic object shall display an exclamation mark, signaling a potential trading opportunity; for example, when the sentiment and intensity for a given asset A exceeds a preprogrammed value indicating sell.
  • An exclamation mark will be displayed in the center of sphere A alerting the operator to take action.
  • the operator shall have the option of directly executing a trade via a combination of keyclicks.
  • the operator can program the reaction indicator 31 to automatically place a trade.
  • the operator can program the reaction indicator 31 to send an alert via e-mail or text message.
  • the reaction indicator 31 comprises a plurality of moving graphic objects 40 which change size and color based upon social media market sentiment, intensity and frequency captured and correlated in real-time from a stream of online social media messages related to a market segment.
  • the moving spherical graphic objects 40 shrink or grow in size based upon the social media intensity attributed to each moving spherical graphic object 40 and the moving spherical graphic objects 40 change color based upon whether the social media sentiment attributed to each moving spherical graphic object is positive, negative or neutral.
  • the reaction indicator 31 also provides a weighted average of all displayed moving spherical graphic objects 40 displayed based on weights assigned to the objects prior to capturing social media streams is displayed among the plurality of displayed objects.
  • the present system and method provides a mechanism for cross-correlating the sentiment and intensity data with the actual fluctuations with asset prices.
  • the present invention provides two methods to find patterns in a target real-valued time series by utilizing two other real-valued time series derived from a stream of social-media messages (Twitter for instance): sentiment and frequency.
  • the series used to find patterns in the target are called predictive.
  • the patterns can be depicted graphically on charts, together with the time series, to be used as a decision making tool.
  • the patterns can also serve as the input to an automated trading system to generate trading signals.
  • the curves are a depiction of the sentiment time-series for the target (thick curve labeled s s ) and the sentiment-frequency time series (thin curve s f ). The calculation of the sentiment-frequency series will be described later.
  • the method of the present invention finds patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social-media messages, wherein the target represents a quantifiable property of an asset being tracked.
  • the method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, s s (which is plotted); generating a frequency time series plot, s f (which is plotted); and determining a pattern based upon the sentiment time series and the frequency time series.
  • a real-valued time series is defined as a sequence of pairs (time (t), value (s)), also called points, ordered by increasing time.
  • time (t), value (s) also called points, ordered by increasing time.
  • a simple time series could look like this: [(12:36,27),(13:03,37),(16:34,88)].
  • T s F ( ⁇ , that is the set of finite subsets of ⁇ , whose elements are endowed with the total order ⁇ :((t,s), (t′,s′)) ⁇ ( ⁇ t ⁇ t′ ⁇ true, false ⁇ .
  • each series s ⁇ T s is naturally mapped to the vector V(s) ⁇ ( ⁇ ) #(s) such that v i is the i th element of s.
  • the vector of first components will be denoted by V 1 (s) and the vector of second components V 2 (s).
  • V ( s ) [(12:36,27),(13:03,37),(16:34,88)]
  • V 1 ( s ) [12:36,13:03,16:34]
  • V 2 ( s ) [27,37,88].
  • a semantic distinction is drawn between pulsated time series where points represent a punctual event (i.e., sequence of Diracs), such as the arrival of a message, and sampled time series that represent a discretization of a function that's defined at all times, such as the market price. It is thus natural to interpolate points of a sampled time series to try and recover the original function it was sampled from.
  • a punctual event i.e., sequence of Diracs
  • the target is an arbitrary sampled real-valued time series.
  • the algorithm has been applied with prices as target.
  • the sentiment time series s s is generated by the Natural Language Processing (NLP) module 16 . It is a pulsated time series. For each message in the input stream, the sentiment time series contains a pair whose time is the time when the message was posted, and whose value is the result of the NLP processor 16 . This value is called sentiment.
  • NLP Natural Language Processing
  • the frequency time series s f depends on two parameters: the sentiment time series and a positive number w representing a time called window size. It is a pulsated time series. For each point (t, s) in the sentiment series, the frequency series contains a point (t, f) where f is the number of points in the sentiment series in the time range [t ⁇ w, t], divided by w. This number f is called frequency.
  • a pattern P is defined as a cross-correlation c in [ ⁇ 1,1], a positive window size w, a time lag l, and a time t s . These numbers are interpreted as “the predictive series over [t s ⁇ w, t s ] correlates to the target series over [t s ⁇ w+l, t s +l] with a cross-correlation of c”.
  • a pattern is thus an element of [ ⁇ 1,1] ⁇ ⁇ ⁇
  • the lag is positive, it is said to be predictive.
  • the cross-correlation determines the relevance of the pattern: the higher it is, the more relevant the pattern is considered.
  • the method is called the sentiment-frequency method. It uses the sentiment to create a sentiment-frequency series, and correlates the latter to the target using a plain statistical cross-correlation. It then identifies patterns by finding the optimal time lag.
  • the system first creates an average sentiment series s a such that for every point (t,s) in the sentiment time series s s there is a point (t, a) in the average sentiment series where a is the arithmetic average of all the sentiments in the time range, or interval [t ⁇ w, t].
  • s sf ⁇ ( t,f a )
  • the series correlator produces a set of patterns based on a real-valued pulsated time series s p , a real-valued sampled time series s s , an interpolation method I for s s , and a window size w.
  • Interpolation is a classical subject and it will not be described here. Common interpolation methods are linear or cubic splines.
  • E s (s s , s p ,t,l) For any time t and lag l, we defined the vector E s (s s , s p ,t,l) so that for every (t p ,p) in s p with t p in [t ⁇ w,t], E s (s s ,s p ,t,l) contains the point i(s s ,t p +l).
  • E s (s s , s p ,t,l) the interpolated.
  • the system also defines the vector E p (s p ,t) so that for every (t p ,p) in s p with t p in [t ⁇ w,t], E p (s p , t) contains the point p.
  • the cross-correlation CC(s s , s p , t, l) is defined as the scalar product of E p (s p ,t) and E s (s s ,s p , t, l) divided by the product of their norms.
  • CC(s s ,s p ,t,l) has a finite set of local maximums. There are many methods to find local maximums. One possible method is to use a gradient method on points spread evenly on the time interval that the series covers.
  • the local maximums of CC t :l ⁇ CC(s s ,s p ,t,l) simply move linearly with t when no points of s p leaves or enters [t ⁇ w,t].
  • the sets of local maximums of CC t for t or (t ⁇ w) the time of a point in s p is a finite set that represents completely the set of local maximums of CC t for all t.
  • the system For every w, the system computes a finite set of times t and lags l and a cross-correlation c for each of them. This defines a finite set of patterns (c, w, t, l) which the system orders by relevance.
  • the system for sentiment, intensity cross-correlation provides for time-based cross-correlation between the real-time sentiment value and frequency of a message stream relative to an object and a quantifiable property of that object.
  • the time correlation relates patterns in the sentiment and frequency to patterns in the object property.
  • the cross-correlation system further includes graphical depictions showing relations identified by the patterns between the object property and the sentiment, frequency, and any quantity derived from them.
  • the cross-correlation system also includes event prediction of future up and down movement of the object property based upon the aforementioned patterns, as well as trading signals generated on and trading strategies based on the aforementioned patterns.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method finds patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked. The method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, ss, relating to an asset; generating a frequency time series, sf, relating to an asset; and determining a pattern based upon the sentiment time series and the frequency time series.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/466,067, entitled “METHOD AND SYSTEM USING SOCIAL MEDIA FOR REAL-TIME EVENT DRIVEN TRADING”, filed Mar. 22, 2011.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates a method and system using social media for real-time event driven trading of equities, commodities and other traded assets.
  • 2. Description of the Related Art
  • Sentiment analysis applies various analytical techniques in identifying subjective information from different information sources. Sentiment analysis, therefore, attempts to ascertain the feelings, thoughts, attitude, opinion, etc. of a speaker or a writer with respect to a topic.
  • Most work on sentiment analysis has relied on two main approaches. The first approach, in particular, a so called “bag of words” approach, attempts to apply a positive/negative document classifier based on occurrence frequencies of the various words in a document. Applying this approach various learning methods can be used to select or weight different parts of the text used in the classification process. This approach fails to process the sentiment with respect to assets (for example, equities or commodities) in short digital messages such as tweets sent via the online social networking service Twitter.
  • The second approach is “semantic orientation.” Semantic orientation automatically classifies words into two classes, “good” and “bad”, and then computes an overall good/bad score for the text. This method does not take into consideration the sentiment conveyed by parts of speech other than adjectives, including verbs, for example, to bounce, to crash, nouns, for example, a put, a call, and phrases, for example, ascending triangle, black Friday, head-and-shoulders.
  • Both methods fail to determine the sentiment with respect to specific assets in short digital messages such as tweets sent via the online social networking service Twitter. Their main pitfall is that they fail to process the sentiment in the syntax-semantic context of the message.
  • SUMMARY OF THE INVENTION
  • It is, therefore, an object of the present invention to provide a method for finding patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked. The method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, ss, relating to an asset; generating a frequency time series, sf, relating to an asset; and determining a pattern based upon the sentiment time series and the frequency time series.
  • It is also an object of the present invention to provide a method wherein sentiment is an expression of a psychological state relative to an event.
  • It is another object of the present invention to provide a method wherein frequency represents the volume of social media messages about the asset.
  • It is a further object of the present invention to provide a method wherein the step of generating a sentiment time series is performed by language processing and is derived based upon pairs of lexical items in local syntactic context found in a volume of social media messages.
  • It is also an object of the present invention to provide a method wherein the step of generating a sentiment time series includes the creation of an average sentiment series, sa, such that for every point (t,s) in the sentiment time series, ss, there is a point (t, a) in an average sentiment series where “a” is the arithmetic average of all the sentiments in a time range [t−w, t].
  • It is another object of the present invention to provide a method wherein the step of generating a sentiment time series includes the creation of a sentiment-frequency series, ssf, to contain a point (t,vsf) for every (t, a) in the sentiment time series, ss, and (t, f) in the frequency time series, sf, where vsf=fa(=ea ln(f)).
  • It is a further object of the present invention to provide a method wherein the frequency time series, sf, is dependent upon the sentiment time series, ss, and a positive number w representing a time called window size.
  • It is also an object of the present invention to provide a method wherein for each point (t, s) in the sentiment time series, ss, the frequency time series, sf, contains a point (t, f) where f is the number of points in the sentiment time series, ss, in the time range [t−w, t], divided by w.
  • It is another object of the present invention to provide a method wherein the number f is called frequency and
  • f ( t ) = # ( s s [ t - w , t ] ) w s f = { ( t , f ( t ) ) t V 1 ( s s ) }
  • It is a further object of the present invention to provide a method wherein the pattern P is a cross-correlation c in [−1,1], a positive window size w, a time lag l, and a time ts, and these numbers are interpreted as a predictive series over [ts−w, ts] correlating to the target series over [ts−w+l, ts+l] with a cross-correlation of c″.
  • It is also an object of the present invention to provide a method wherein the step of determining a pattern employs a sentiment-frequency method that uses sentiment to create a sentiment-frequency series, sfs, and correlates to the target using a plain statistical cross-correlation.
  • It is another object of the present invention to provide a method wherein the step of determining a pattern includes the step of identifying an optimal time lag.
  • It is a further object of the present invention to provide a method wherein correlating two time-series using a plain statistical cross-correlation and finding the optimal lag is achieved with a series correlator.
  • It is also an object of the present invention to provide a method wherein the series correlator produces a set of patterns based on a real-valued pulsated time series sp, a real-valued sampled time series, ss, an interpolation method I for ss, and a window size w.
  • It is another object of the present invention to provide a method wherein the interpolation method I, is a function of a time series ss and of a time t that is C1-piecewise continuous with respect to t, and such that if there exists a point (t, v) in ss, I(ss, t)=v.
  • Other objects and advantages of the present invention will become apparent from the following detailed description when viewed in conjunction with the accompanying drawings, which set forth certain embodiments of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a schematic overview of the present system.
  • FIG. 2 is a representation of the graphical user interface in accordance with the present invention.
  • FIG. 3 is a partial view of the reaction indicator.
  • FIG. 4 is a graphical depiction showing the correlation of frequency and sentiment.
  • FIG. 5 is a screen shot showing the ingest and processing of various assets.
  • FIGS. 6A and 6B are screen shots when a moving spherical graphic object is clicked in the graphical user interface.
  • FIG. 7 is a screen shot showing various moving spherical graphic objects shrinking and growing based on social media intensity thereof.
  • DESCRIPTION OF THE PREFERRED EMBODIMENT
  • The detailed embodiment of the present invention is disclosed herein. It should be understood, however, that the disclosed embodiment is merely exemplary of the invention, which may be embodied in various forms. Therefore, the details disclosed herein are not to be interpreted as limiting, but merely as a basis for teaching one skilled in the art how to make and/or use the invention.
  • In accordance with the present invention, and with reference to FIGS. 1 to 7, a method and system using social media for event-driven trading are disclosed. The present method and system 10 use social media for the real-time evaluation of publicly traded assets, in particular, equities and commodities, using information generated through social media interactions. For example, and with reference to FIG. 5, a series of “tweets” (social comments transmitted using the social networking service Twitter) are shown. As used herein an “asset” is considered to be a resource with economic value that an individual, corporation or country owns or controls with the expectation that it will provide future benefit. Assets include, but not limited to investments in equities, options, derivatives, commodities, bonds, futures, currencies, etc. It should further be appreciated that “equities” are stocks or any other securities representing an ownership interest.
  • It is appreciated the following discloses the present method and system 10 with reference to the stock market, although the application of the present invention could be extended to commodities and other asset based markets. By monitoring publicly available social media information, the present system 10 is able to effectively predict swings in asset prices for effective and profitable trading thereof.
  • As will be appreciated based upon the following disclosure, the present method and system 10 provide a sentiment calculator 22 that employs natural language processing in evaluating social media interactions by anticipating the sentiment of traders relating to specific equities and commodities in terms of the polarity of the sentiment and the strength of the sentiment. The data generated by the sentiment calculator 22 is applied to a reaction indicator 31 in the form of a graphical user interface 30 that combines sentiment and frequency (which is indicative of the intensity of the sentiment) data relating to the assets. Once sentiment and frequency are fully appreciated, the present system 10 and method provide a mechanism for cross-correlating the sentiment and intensity data (the perceived strength of the sentiment being expressed by the social media) with the actual fluctuations occurring with the price of assets.
  • Briefly, and in accordance with a preferred implementation of the present invention, the present system 10 provides for the processing of social media messages generating data for the real-time evaluation of publicly traded assets, for example, stocks. The system 10 includes an ingest component 11 for ingesting the social media messages; a filter module 14 eliminating expressions not considered useful language from social media messages; a natural language processor (NLP) 16 processing filtered social media messages; a sentiment calculator 22 applying rules to the filtered and NLP processed social media messages so as to compute a representation of values associated with the filtered and NLP processed social media messages; and a graphical user interface 30 displaying the values generated by the sentiment calculator 22.
  • With reference to FIG. 1, the ingest component 11 consumes, acquires or gathers a wide range of social media messages 12 and immediately filters the messages as will be explained below in greater detail. The ingest component 11 is a data acquisition module. The ingest component 11 allows the system 10 to automatically import raw social media messages, for example, tweets from Twitter or other social media sites. The data, that is, the raw social media messages, is acquired on the basis of a predefined set of keywords or combination of keywords the system 10 has been programmed to look for. The filtered social media messages are then subjected to natural language processing via NLP module 16 based upon lexical databases 18, 20 of both stock specific sentiment terminology (Stock-Lex 18) and general, non-stock specific, sentiment terminology (Sent-Lex 20). The filtered and NLP processed social media messages are next processed by the sentiment calculator 22 and inference engine 24. The sentiment calculator 22 and inference engine 24 apply information from databases 26, 28 respectively relating to the knowledge of the stock market world and the knowledge of the world. The results of the sentiment calculator 22 and inference engine 24 are then presented to the user via a reaction indicator 31 in the form of a graphical user interface upon a computer monitor which displays sentiment per asset information.
  • Sentiment Calculation
  • As discussed above, sentiment calculation is part of the present system 10 for event-driven trading using social media messages. As described above, the system 10 ingests content (that is, social media messages such as tweets) from one or multiple social media sources based on user-specified criteria. The meaning of the information conveyed by the social media messages is determined using a natural language processing (NLP) module 16. The system 10 then calculates “sentiment” and presents metrics relating thereto in real-time.
  • FIG. 5 shows social media messages, for example, “tweets”, with annotations relating to the sentiment scoring for the individual tweets. In this way, sentiment calculations in accordance with the present invention may be used to anticipate the reaction of the traders before they act.
  • In accordance with a preferred embodiment, the sentiment calculator 22 of the present system 10 analyzes social media messages to calculate the sentiment with respect to events pertaining to objects. Objects relate to assets being traded, via situations having a bearing on public sentiment and relating the value of the asset being traded (preferably on an exchange). It should be appreciated “object(s)” refers to anything related to an asset that can be publicly traded and monitored. For example, an “iphone” and the stock symbol “AAPL” are objects which relates to the asset Apple Inc. which can be publicly traded.
  • The sentiment calculator 22 represents one module of the present multilayered system 10 for processing short and noisy messages such as tweets, as depicted in the schematic shown in FIG. 1. Proper operation of the sentiment calculator 22, that is, sentiment calculations, requires that a filter module 14 configure input text into formats for use by the subsequent processing modules of the pipeline making up the present invention. The filter module 14 is composed of a set of rules (using regular expressions) created to transform the ingested social media messages into expressions without noise. Noise is considered to be elements in the message which are not part of natural language, such as hash tags, URLs, etc. Therefore, the filter module 14 functions to bring tweets as close as possible to expressions in natural language by eliminating expressions that are not considered part of current language usage. For example, the filter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example “q1|1q|1st quarter|” is replaced by “first quarter”. Basically, the filter module 14 eliminates noisy elements from the data being ingested, such as URLs and hash tags so that it may be further processed by the NLP module 16.
  • Once filtered, the ingested social media messages are then sent to NLP module 16 for further processing. Sentiment calculations in accordance with the present invention require that a Part of Speech (POS) Tagger 33 assign lexical categories to each of the filtered social media messages as they are broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages. Sentiment calculations in accordance with the present invention also require that a partial parser (PARS) 32 recover the structure of the main constituents/syntactic structures (a lemmatizer deriving the canonical form of lexical items, that is, a single or a group of words conveying a single meaning, to enable the lexical lookups) of the filtered social media messages.
  • The system also employs MORPH 34, which is a lemmatizer which reduces the spelling of words to its lexical root or base/lemma form. In English, the base form for a verb is the simple infinitive. For example, the gerund “striking” and the past form “struck” are both forms of the lemma “(to) strike”. The base form for a noun is the singular form. For example, the plural “mice” is a form of the lemma “mouse.” Most English spellings can be lemmatized using regular rules of English grammar, as long as the word class is known. MORPH 34 uses a list of numerous such rules to reduce an ingested and non-filtered word to its base form. In accordance with a preferred embodiment the application MorphAdorner is utilized, the documentation of which, “MorphAdorner, A Java Library for the Morphological Adornment of English Language Texts”, Version 1.0. Apr. 30, 2009, Copyright© 2007, 2009 by Northwestern University, is incorporated herein by reference.
  • Finally, the data composed of the filtered and NLP processed social media messages is supplied to the sentiment calculator 22 that calculates sentiment compositionally in the syntactic context. The process of sentiment calculation also employs an inference engine 24 that fine-tunes sentiment calculations using knowledge of the world. This process for sentiment calculation enables sentiment to be calculated on the basis of a set of rules deriving the polarity of stock events and their strength.
  • The problem of identifying the sentiment of social media messages on asset markets can be detailed as follows:
      • i) the social media messages are short, for example a tweet using Twitter is limited to 140 characters;
      • ii) the social media messages lack several constituents that are normally part of English sentences;
      • iii) the social media messages are noisy, they include characters and expressions that are not part of English sentences;
      • iv) the social media messages may be in a language other than English;
      • v) in some cases, the social media messages are not complete English sentences and truncated messages are observed;
      • vi) reported information, such as headlines, which do not directly convey sentiment, as well as social media messages conveying sentiments are also part of the ingest; consequently sentiments cannot be differentiated from facts;
      • vi) the knowledge of the asset markets world includes constant as well as contingent knowledge; and
      • viii) the sentiment is thus a function of the natural language expressions used in the social media messages in conjunction with the knowledge of these expressions as they are used in asset market exchanges.
  • The fact that tweets are constrained to 140 characters means that messages sent via Twitter begin to resemble programming languages such as Fortran (which originally had a constraint of 72 characters per line). The primary effect of this constraint is a limitation on the freedom available to the author of a tweet as he or she attempts to convey a specific message. This means that it is now possible to envision compiling tweets (akin to compiling a programming language) and achieving very high levels of accuracy in deriving sentiment whilst minimizing resource consumption and interpretation times. It therefore becomes feasible to ingest and process potentially millions of messages per hour using Common Off The Shelf (COTS) computers.
  • The technical advantage of the present system 10 relative to other known technologies is that the present system 10 is based on natural language processing techniques rather than machine learning techniques (for example, Naive Bayes, maximum entropy classification, and support vector machines), as described for example in Pang and Lee. Bo Pang and Lillian Lee 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86.
  • As will be appreciated based upon the following disclosure, the rule-based method that is used in accordance with the present invention avoids the shortcomings of the statistical method because it processes the social media messages directly instead of classifying social media messages on the basis of probabilistic algorithms. Another advantage lies in the innovative contribution of the inference engine 24, which contributes to reduce uncertainty and brings further support to decision making.
  • Because of their limitations, classification algorithms, such as Naïve Bayes classifiers, are not used in the implementation of the present invention. The present invention solves the problem by processing the actual content of the social media messages as they are formulated. It does not calculate the number of positive adjectives in a social media message, or in a set of social media messages, to compute sentiment, contrary to common practice.
  • The features of the present invention that provide a solution or benefit are the following:
      • i) the ingestion of social media messages on the basis of targeted keywords according to the requirements of the stock traders (as discussed below in more detail);
      • ii) the filtering of items that are not part of natural language (English);
      • iii) the tagging of the items in the filtered messages with part-of-speech tags;
      • iv) the recovery of the syntactic structure associated with each social media message;
      • v) the application of the sentiment calculus rules to the output of the syntactic structure on the basis of the sentiment value of the lexical items and how they are syntactically combined;
      • vi) the stock-specific lexical items and phrases of the major lexical categories (event denoting Nouns and Verbs, stock-market specific Adjectives and Adverbs) are associated with lexical sentiment values; and
      • vii) the sentiment calculator applies to pairs of lexical items and their syntactic structures/constituents relating sentiment-marked lexical items in the syntactic configuration where they occur, ensuring the computation of an accurate sentiment-per-asset value.
  • The sentiment calculator 22 is a module of the multilayered architecture employed in accordance with the present system 10, which can be customized for different domains, including, for example, finance, security and pharmaceutics. In the disclosed application in the stock market exchange, the sentiment calculator 22 calculates sentiments from stock exchange-related social media messages 12 in order to predict stock movements before human traders can act.
  • The innovation brought about by the sentiment calculation in accordance with the present invention is the event driven approach to sentiment mining. Unstructured incoming social media messages 12 are processed in order to extract sentiment about pre-specified assets, as they participate in ongoing events. As will be explained below, the sentiment calculator 22 performs event-driven sentiment calculus.
  • The event-driven approach to sentiment mining as applied in accordance with the present invention can be represented in accordance with equation (1), where M stands for Modifier, Ev stands for Event, and x, . . . , z stand for the participants of the event. The asset the sentiment is about is a participant of the event.

  • (M(Ev(x, . . . ,z)))  (1)
  • This relational approach to sentiment mining contrasts with the statistical keyword search approach, classifying messages on the basis of the number of positive or negative qualifiers. The statistical keyword search approach fails to provide sentiment-per-asset values.
  • The present invention takes an event to be a change in the relation between the participants of the event. The participants of an event are: names, organizations, locations, expressions of time, quantities, monetary values, percentages, etc. The present system 10 includes name entity recognition capacities and syntax-semantic capacities to provide the articulation of events and their participants. The interpretation of syntactic structure is generally compositional: that is, the interpretation of the whole is a function of the interpretation of the parts. However, part of the semantics conveyed by natural language is non-compositional and idiosyncratic. The idiosyncratic meanings are listed in lexicons assuming both generic (Sent-Lex 20) and domain-specific (Stock-Lex 18) lexicons
  • As briefly discussed above, “sentiment” about an asset participating in an event is considered in accordance with the present invention to be the orientation (that is, the polarity in opinions expressed regarding the asset) and the strength of the opinions on that asset that deviates from the normal state. A sentiment is the expression of a psychological state relative to an event (whether that event be static or dynamic). Considering social media messages sent via the social networking site Twitter are limited to 140 characters, lexical items, emoticons and other diacritics found in such messages cannot express the richness of thought and sentiments conveyed by traditional written natural language without further processing. The present system and methodology focus on the properties of natural language employed in the social media messages to calculate the sentiment with respect to given objects in ongoing stock events referred to in social media exchanges.
  • In accordance with the present invention, sentiment is represented by an integer combining a polarity value (polarity positive +, negative −, neutral n) and a strength value ranging within a pre-defined scale. Using data generated by the filtering and natural language processing of the social media messages, the sentiment calculator 22 yields an integer that combines the polarity and the strength values of each pair of expressions relating an asset to an event as explained below in greater detail.
  • Polarity is a value (that is, positive, negative or neutral) that is part of the lexical specification of words and phrases. These values will compose according to the Polarity rules, provided below. Strength is an integer, that is, also part of the lexical specification of the words and phrases. The values for strength in accordance to a preferred embodiment of the present invention range from 1 to 3 (1 low and 3 is high). These numbers will be added in the processing of messages according to the Strength rules, provided below. However, it is appreciated that values for strength could range from 1 to 5 or higher.
  • The sentiment calculator 22 is embedded as part of the present overall system 10 that ingests social media messages from multiple social media sources based on user-specified criteria. As discussed above, the social media messages go through a filtering layer/module 14 purging the messages of noise (URLs, hashtags, etc.). The results of the filtering of social media messages are tokens which are assigned part of speech tags according to the lexical and contextual properties of the lexical items based upon NLP module 16. A parser 32 then recovers the major constituents/syntactic structures of the tokenized messages. The sentiment calculator 22 takes the annotated partial parses as its input and yields a sentiment-per-asset on the basis of the sentiment values of the lexical items and the sentiment logic, calculating the sentiment of constituents on the basis of the sentiment values of their parts. The sentiment calculator 22 interacts with the inference engine 24 to determine the sentiment with respect to the knowledge of the word.
  • Stock-Exchange Domain
  • For example, and considering the present system 10 as applied in the stock-exchange domain, the sentiment calculator 22 derives sentiment in terms of polarity and strength with respect to objects (for example, assets as referenced by tickers and commodity names) as they participate in ongoing stock events described by the ingested social media messages, for example, tweets, 12. The generic representation in equation (1) as noted above can thus be instantiated by equation (2) for this application.

  • M(stock-event(stock-object x, . . . ,stock-object z)))  (2)
  • As will be appreciated based upon the following disclosure, stock-market specific lexical items and phrases are qualified in the Stock-Lex 18, and the sentiment calculator 22 applies to pairs of sentiment-marked lexical items compositionally in their syntactic configuration.
  • As discussed above, the sentiment calculator 22 is a module of the pipeline making up the present system 10. The components of this system 10 process incoming social media messages, and yield a sentiment-per-object/asset for each ingested incoming social media message in real-time. The sentiment calculator 22 calculates the sentiment-per-asset for each incoming social media message ingested by the system 10.
  • FIG. 1 represents the three main components of the system: Ingest 11 (the social medial messages 12), Process 15 (the social media messages using the filter 14, NLP 16 and sentiment calculator 22), and Display 30 (the results on the processing step on a reaction indicator 31 in the form of a graphical user interface 30). It also identifies the specific NLP components/modules (POS 33, PARS 32, MORPH 34, Stock-Lex 18 and Sent-Lex 20) processing social media messages 12 from the ingest component 11 to the sentiment calculator 22 and its interaction with the inference engine 24 (which includes databases relating to Knowledge of the Stock Market world 23 and Knowledge of the world 25).
  • As discussed above, the architecture of the system 10 is shown in FIG. 1. The following explains the main features of each component of this architecture, where the lexicon, the part of speech tagger (POS) 33 and the parser (PARS) 32 can be parameterized to process different languages. Thus, in addition to the fact that this system 10 can calculate sentiment in different domains, it can also process sentiment cross-linguistically.
  • 1. The Ingest Component
  • Simplex and complex keywords are used for ingesting the social media messages 12, according to the requirements of the stock traders. The hardware used for ingest are standard off the shelf computers gathering and processing social media messages using the pre-determined keywords. The techniques used for collecting social media messages must take into consideration the requirements of stock traders, see Section 1.1, as well as they must enable the collection of social media messages with respect to specific assets, as described in Section 1.2
  • 1.1 The Requirements of Stock Traders
  • From a stock traders perspective, there must be a measurable and significant correlation between sentiment (as manifested in social media messages) and price movement. The correlation can be positive or negative. For example, there is usually a strong positive correlation between the performance of the financial sector and the S&P 500 and there is a negative correlation between Volatility and the S&P500. As there needs to be enough social media, for example, tweet, volume to provide confidence that the aggregate sentiment will have enough mass to move the asset price. In many cases, collecting all tweets pertaining to a single stock symbol will NOT meet the volume threshold that would produce a reliable correlation between sentiment and price. This can be mitigated by trading assets that have measured price correlations over an extended period of time by ingesting and processing tweets that pertain to all price-correlated assets and then using the sentiment derived from the above described aggregation of tweets to trade each individual asset.
  • 1.2 Collecting Social Media Messages Regarding Specific Assets
  • The set of keywords for specific assets is defined in terms of generic categories that can be parameterized according to the finance domain. Depending on the nature of the asset, different strategies for ingesting a large number of relevant social media messages are used. For example, the following strategies may be employed:
      • (2) single keyword and exclude list of irrelevant combinations; or
      • (3) binary keyword template of the form: object “X”+predicate
  • An example of the strategy (2) is used for Crude Oil, where only one keyword is used, “oil”, and a very large exclude list include expressions such as “Soya oil”, “olive oil”, etc. In the case of commodities, such as Gold, strategy (3) is preferred. Strategy (3) employs a large include list made up of both unary and binary expressions including the word “gold”, the object “X”, and another word, a predicate, as in “gold industry”, “gold news”, “gold investor”, “gold invest”, “gold investment”, “gold plunge”, “gold raise”, “gold plunged”, “gold raised”, “gold plunging”, “gold raising”, “gold decline”, “gold declined”, “gold declining”, “gold rally”, “gold rallied”, “gold rallying”, “gold fall”, “gold falls”, “gold falling”, “gold fell”, etc. A large exclude list is still necessary to exclude for example jewelry items and colors. Strategy (3) can be used for other commodities by substituting names of other commodities to the variable in (3) and keeping constant the set of predicates. Thus, a very similar set of keywords may apply to other commodities.
  • This technique using refined keyword strategy is used in conjunction with the methods described above to come up with a sufficiently large number of social media messages, and a high degree of correlation between derived sentiment and price movement, thereby meeting the two requirements of sentiment-price correlation and sufficient volume.
  • The refined keyword strategy for ingesting relevant social media messages increases the volume of ingested social media messages that will be fed into the other components of the system, described in the following paragraphs.
  • 2. The Language Identifier Module
  • The ingested social media messages may include messages in a language other than English. A language identifier/detector 19 is therefore employed in identifying the language of an incoming message and assigns it a code. For example, the ingested social media message (4) will be assigned the code (5), which stands for English.
      • (4) Gold Rises but Lags as the Dollar Drops Sharply http://bit.ly/da88XX
      • (5) en
  • Language identification is a prerequisite for the NLP processing in accordance with the present invention, as the overt syntactic properties vary between languages, as well as the form and content of the lexical items. It is thus necessary to ensure that the social media messages processed by the NLP module 16 will be English messages, or whatever language the system 10 was parameterized for.
  • 3. The Filter Module
  • The filter module 14 is a pre-NLP processing module that brings social media messages 12 as close as possible to expressions in natural language by eliminating expressions that are not part of current use of language. For example, the filter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example q1|1q|1st quarter| is replaced by “first quarter”.
  • The filter module 14 also performs sentence detection on the basis of typographic cues. This is a necessary step in the pre-NLP processing, since social media messages may include more than one sentence, see (6). As the NLP processing and the sentiment calculus are sentence bound, sentence boundary delimitation is necessary. For example, the filter module 14 applies to (7), replaces the URL by a period, convert capitals into lower case, and yields (8):
      • (6) Crude Oil Is Unchanged as US Stocks Decline, China's Processing Surges: The Cisco announcement sent stocks lower . . . _http://bit.ly/dyU40L
      • (7) Gold Rises but Lags as the Dollar Drops Sharply http://bit.ly/da88XX
      • (8) Gold rises but lags as the dollar drops sharply.
        Thus, the filter module 14 takes an ingested social media message as its input and transforms the social media message to a less noisy English expression, which is then subject to NLP processing.
  • 4. The NLP Processor
  • 4.1 The Sent-Lex (Sentiment-Lex)
  • The Sent-Lex 20 is a hand-crafted sentiment-based repository, or database, of the most frequent lexical items and phrases collected from the ingested social media messages, as well as from specialized vocabularies, that are indicative of sentiment. The lexical items and phrases vary according to the domain of application, e.g. finance, security, pharmaceutics, etc. Words that are not sentiment bearing, such as definite articles and auxiliaries, are not part of the Sent-Lex 20. In the present event-driven approach to sentiment mining, sentiment is associated to event denoting verbs and nouns, as well as with sentiment-bearing modifiers of events or of participants of the events.
  • The lexical specifications are designed to be parameterized to specific domains of application. The generic format of the lexical entry includes the lexical item, followed by fields of lexical specifications. The first field specifies the category of the item, the second field specifies its polarity, the third field specifies its lexical strength, and the fourth field specifies the polarity of the semantic arguments of the lexical items and phrases, if applicable.
      • (10) Lexial item, category, polarity, strength, argument's polarity and strength
  • Thus, the lexical items and phrases and their features are stored in a lexical database, that is, the Sent-Lex 20. Each of the lexical items and phrases maintained in the Sent-Lex 20 is associated with a category tag, an inherent polarity value, an inherent strength value, and for some items, polarity and strength values are also associated to designated argument structure variables as in (11). For example, in the case of the verb acquire, the acquired object, the variable y in that verb's argument structure is associated with a positive value, as in (12), this is not the case for other verbs such a announce and report. Thus, in (13) Google is associated with a positive sentiment.
      • (11) Categorial tag: NN, VB, RB, . . . .
        • Polarity values: +, −, n
        • Strength values: 1, 2, 3, where 1 is min. and 3 is max.
        • Argument structure values associated to the argument variables: (x, y, z, w)
      • (12) acquire (x, y)
        • +2
      • (13) Apple acquired Google.
        • +2
  • The categories, nominal (NN), verbal (VB), adjectival (JJ), adverbial (RB) and their sub-categories, are intrinsically associated to polarity (+, −, n), and Strength (1, 2, 3). Furthermore, the lexical specifications differentiate degree modifiers, such as very, too and much from modifiers, such as good and better. Degree-intensifiers contribute their own lexical value, and add an extra value 1 to the category they modify, see (14) below for examples.
      • (14) Sample of the JJ/RB database:
  • Tag Polarity Strength Intensifier
    Several JJ n 1
    impressive JJ + 2
    More JJR n 2
    Most JJS n 3
    Good JJ + 1
    Better JJR + 2
    Best JJS + 3
    Very RB* n 3 1
    Weak JJ 1
    Weaker JJR 2
    Weakest JJS 3
    So RB* n 1 1
    Too RB* n 1 1
  • 4.2 the Stock-Lex (Specific Stock Trading Lexicon)
  • In the current application, the Stock-Lex 18 is the stock-based lexical repository, or database, consisting of the most frequent lexical items and phrases used in the ingested social media messages that relate to stock-based knowledge, as well as most frequent items used in stock exchange and financial news wire such as the Financial Post (or other commodity exchange system depending upon the application to which the present system 10 is applied). The Stock-Lex 18 thus includes a restricted set of stock-specific lexical items and phrases, associated with their domain specific polarity and strength values. The polarity values are: positive, negative and neutral. The lexical strength associated to the lexical items and phrases ranges from 1 to 3, where 1 is the lowest value and 3 is the highest value, see (15) for examples.
      • (15) decline, V, −, 2
        • decrease, V, −, 2
        • deleverage, V, +, 3
        • detain, V, −, 1
        • deteriorate, V, −, 2
        • develop, V, +, 1
        • die, V, −, 3
        • dip, V, −, 2
  • The stock-specific lexical items and phrases are part of the major lexical and phrasal categories, nominal, verbal, adjectival. Only event denoting nominal and verbal expressions are part of the Stock-Lex 18, and only stock specific adjectival and adverbial modifiers are part of the Stock-Lex 18.
  • Stock objects (tickers, company names, product names, etc.) have a neutral polarity and have no associated strength value. The sentiment calculator 22, as specified below, derives the sentiment with respect to specific stock objects.
  • The Stock-Lex 18 is a repository of the most frequent sentiment-bearing noun, verbs, adjectives and adverbs used in social media stock market-related exchanges. Each lexical item is associated with a part of speech (POS), a polarity and strength. The Stock-Lex 18 is handcrafted and contributes to the invention in providing sentiment specifications for event denoting items, and their dependents. The innovation is two-fold: i) it specifies sentiment values for other categories than adjectives, contrary to common practice; ii) it specifies sentiment value for event denoting lexical items and their dependents, thus providing the lexical information used by the sentiment calculator 22 for the compositional calculus of the sentiment-per-asset.
  • 4.3 The POS Tagger
  • The sentiment calculus applies to lexical items and phrases in their syntactic context. In order to derive the syntactic context for sentiment calculus, each incoming filtered social media message is broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages and each token is assigned a Part Of Speech (POS) by the POS tagger 33. In accordance with a preferred embodiment, Brill Tagger, that is, a known methodology for performing part of speech tagging, is used, as it is sensitive to the lexical properties and distributional properties of lexical items and phrases in natural languages. It is appreciated Brill Tagger is an “error-driven transformation-based tagger”. Brill Tagger is error-driven in the sense that it recourses to supervised learning transformation and Brill Tagger is based in the sense that a tag is assigned to each word and changed using a set of predefined rules.
  • The POS tagger 33 is necessary in accordance with the present system 10 to identify the lexical items that contribute to the sentiment calculus, namely adjectives (JJ), adverbs (RB), as well as event denoting verbs (e.g., to upgrade) and nouns, e.g. (e.g., an upgrade). Thus, the POS identification of the elements of the event structure (16), (17), reduces the complexity of sentiment mining, and contributes to the precision of the sentiment calculus.
  • Figure US20130073480A1-20130321-C00001
  • Thus, the POS Tagger 33 applies to the ingested filtered social medial messages, tokenizes the string and assigns part of speech to the tokens on the basis of a set of lexical and contextual rules, accounting for the distribution of categories in natural language texts. To illustrate, Brill Tagger applies to (18) and derives the annotated tokenized string in (19).
      • (18) Gold Rises but Lags as the Dollar Drops Sharply.
      • (19) Gold/NNP rises/VBZ but/CC lags/VBZ as/IN the/DT dollar/NN drops/VBZ sharply/RB./.
        • where the Brill tags NNP stands for proper noun, VBZ stands for verb, CC stands for conjunction, IN stands for preposition, DT stands for determiner, NN stands for common noun, and RB stands for adverb.
  • The majority of operating sentiment mining systems detect sentiments only on the basis of mining of adjectives with positive, e.g., good, great, excellent, or negative value, e.g., bad, worse, terrible, and so on. However, other parts of speech also convey sentiment. This is the case of adverbs in the verbal domain, which modify the event (action or state) described by the verbal projection they modify, like adjectives in the nominal domain, which modify the object denoted by the nominal projection. In this relational approach to sentiment mining as applied in accordance with the present invention, sets of POS are related to the elements of event structures, for example in (16), (17) above the M can be adjective JJ or adverb RB, the event can be a noun NN or verb VB. The identification of the POS of the tokens of the filtered social media messages reduces the complexity of sentiment mining as well as it contributes to its efficiency.
  • 4.4 The Parser
  • The tokenized and POS annotated messages resulting from the POS tagger 33 are fed to a partial parser 32 that recovers the main syntactic constituents of the social media messages. The partial sparser 32 employs a Cass parser, Abney's cascaded FST (Finite State Transducer), to recover the main syntactic constituents of the basis of the tokenized and POS annotated representations of social media messages, as illustrated in (20).
  • (20) Gold/NNP rises/VBZ but/CC lags/VBZ as/IN the/DT
    dollar/NN drops/VBZ sharply/RB ./.
     [c
      [c0
      [nx
         [name
        [nnp Gold]]]
          [vx
        [vbz rises]]]]
        [cc but]
        [vp
         [vx
         [vbz lags]]
        [pp
         [as as]
            [nx
             [dt the]
             [nn dollar]]]]
        [vp
           [vx
           [vbz drops]]
        [rb sharply]]
        [per .]
  • Partial parsing is designed for use with large amounts of noisy text. Robustness and speed are primary design considerations. Not all NLP applications require a complete syntactic analysis. Partial parsing is used in information retrieval as well as information extraction applications, such as facts and sentiment mining, where finding simple nominal and verbal constituents is enough. Full parser provides more information than needed, and when expected information is missing, as it is generally the case in social media messages, where syntactic reductions and truncation are necessary to convey meaning within limited character constraints, for example, 140 characters when considering tweets using Twitter.
  • The leaves of the parse tree are associated with their sentiment values via access to Stock-Lex 18 and the sentiment calculator 22 applies to the resulting semantically annotated tree. The main properties on the calculator are described in the following section.
  • 5. The Sentiment Calculator
  • A sentiment is an integer, which can be either positive or negative, computed on the basis of the application of the rules of the sentiment calculus to pairs of lexical items in their local syntactic context; for example, nouns (that is, nominal lexical items) representing assets and nouns/verbs/adjectives (that is, nominal, verbal or adjectival lexical items) representing sentiment in the form of polarity and strength. The computed sentiment value ranges within a pre-established scale. In accordance with the present invention, the sentiment calculator 22 uses social media messages for the real-time evaluation of publicly traded equities and commodities wherein a sentiment is a positive or negative integer computed based upon pairs of lexical items in local syntactic context. In its most basic components the sentiment calculator employs a mechanism for determining lexical polarity in social media messages and a mechanism for determining a strength value of lexical items and phrases used in social media messages.
  • The sentiment calculus employed by the sentiment calculator 22 applies to the output of the annotated Cass tree produced by the partial parser 32. It compositionally derives the sentiment associated to entities in the event denoted by the expression they are part of. The sentiment logic is a compositional calculus deriving the sentiment value of a relation on the basis of the sentiment values of its parts.
  • In the specific domain of stock-market exchanges, the sentiment logic calculates sentiment values per asset with respect to stock market events described by the incoming social media messages. Namely, it calculates the sentiment with respect to given assets, as they occur is stock events.
  • As discussed above, the social media messages relating to an asset are gathered by a set of keywords used for ingesting the social media messages. The sentiment calculus is based on the lexical polarity and strength value of the lexical items and phrases defined in the Stock-Lex 18 and how they are syntactically organized in the Cass tree. The maximal local domain for the application of the calculus is the sentence; the minimal local domain is the smallest constituents including the keywords standing for the asset. The sentiment calculus applies locally to the constituents including the asset within the sentences of the message. The Cass parser derives the syntactic constituents of the sentences, including the adjectival (cx), as well the nominal (nx) and the verbal (vx) constituents.
  • The polarity and strength rules apply to syntactic constituents in head-complement, modifier-modified, and subject-predicate relations, which are identified on Cass trees. These relations are defined as follows. A head of a constituent is a lexical item, such as a verb, e.g., hit, or a noun, e.g., acquisition, that makes the constituent it is part verbal (vx) or nominal (nx). A head selects a complement, which is a syntactic constituent such as a nominal phrase, e.g. the market in hit the market, and AAPL in the acquisition of AAPL. A modifier is an adjective or an adverb that modifies another constituent, a nominal constituent in the first case and a verbal constituent in the other case, e.g., strong market and strongly hit the market. The subject-predicate relation is the relation between a subject, generally a nominal constituent and a predicate, generally a verbal constituent, e.g., in the sentence AAPL hits the market, AAPL is the subject and hit the market is the predicate.
  • The sentiment calculus includes separate rules for calculating the polarity and the strength. They have the generic form of dyadic operators (Op (arg1, arg2)), and their specific form is dependent on the relation between arg1 and arg2, as well as the lexical polarity and strength values of the lexical items and phrases specified in the Stock-Lex 18.
      • Polarity (Pol): Pol (arg1, arg2), where arg1 is a head and arg2 is a dependent. The rule applies locally in syntactic constituents/domains, e.g., nx, vx, cx, etc. It derives the polarity of constituents on the basis of the polarity of their parts and how they are syntactically related. The polarity rules apply in three universal syntactic relations defined above (that is, head-complement, modification (modifier-modified), and predication (subject-predicate) relation), according to the polarity of the parts of the relations. The Polarity rules include the following:
    Pol Rules:
  • Pol ([x] [y])=Compose ([x], [y]) as specified by the following rules:
      • (21) if (x is NEG) and (y is +), then Pol (y=−) NEG, +=− no upgrade
        • if (x is NEG) and (y is −), then Pol (y=n) NEG, −=n not bad
        • if (x is NEG) and (y is n), then Pol (y=n) NEG, n=n no report
      • (22) if (Pol (x)=Pol (y)), then
        • if (x is n) and (y is n), then Pol (y=n) n, n=n average result
        • if (x is +) and (y is +), then Pol (x=+) +, +=+ announce an upgrade
        • if (x is −) and (y is −), then Pol (y=−) −, −=− downgrade to sell
      • (23) if (Pol (x)≠Pol (y)), and
        • if (x is +) and (y is n), then Pol (y=+) +, n=+ impressive report
        • if (x is +) and (y is −), then Pol (x=−) +, −=− impressive downgrade
        • if (x is −) and (y is n), then Pol (y=−) n=− weak report
        • if (x is −) and (y is +), then Pol (y=−) −, +=− missed rally
        • if (x is n) and (y is +), then Pol(y=+) n, +=+average upgrade
        • if (x is n) and (y is −), then Pol (y=−) n, −=− average depreciation)
  • Strength (Str):
  • Str (arg1, arg2), where arg1 is a head and arg2 is a dependent. The rule applies locally in syntactic constituents/domains, e.g., nx, vx, cx, etc. It derives the strength of constituents on the basis of the strength of their parts and how they are syntactically related by the application of an arithmetic operation to the pair of arguments depending on the nature of the syntax-semantic relation and the polarity of the constituents. The strength rules apply to the lexical items and phrases in the three universal syntactic relations, and the strength is calculated on the basis of elemental arithmetic operations. The Strength rules include the following:
  • Str Rules:
  • Function (arg1, arg2), where arg1 is a head and arg2 is its dependent
    Str ([x] [y])=Compose ([x], [y]) as specified by the following rule:
      • (24) if (x is the head (h)) and (y is the complement (o)), then Str (x)+Str (y)
        • if (x is the head (h)) and (y is the modifier (m)), then Str (x)+Str (y)
          Function (arg1, arg2), where arg1 is a modifier and arg2 is the modified
          Str ([x] [y])=Compose ([x], [y]) as specified by the following rules:
      • (25) if (x is JJ, RB) and (y is NN, VB), then Str(x)+Str (y)
        • if (x is an JJ, RB) and (y is a JJ, RB), then Str (x)+Str (y)
        • if (x is RB*) and (y is a JJ, RB), then Str (x)+Str (y)
  • It is appreciated that social media messages may include more than one sentence, may talk about more than one asset, more than one stock event, and they may express more than one sentiment. Computing blindly the sentiment values of all the lexical items and phrases of social media messages, the resulting value is general and not necessarily asset specific. The sentiment calculator 22 is sentence bound. Moreover it calculates sentiment in the local syntax-semantic domain of an asset. Thus, it ensures that the specific sentiment with respect to a given asset conveyed by a message is calculated. It applies iteratively in the local domain of the constituent including the asset (keyword, set of keywords), e.g. OIL, or GOLD, and the expression of a stock event (e.g., “lose”, “gain”, “sell”, “buy”) or a sentiment (e.g., “high”, “low”).
  • The following trace for the tweet (26) illustrates the application of the sentiment calculator 22 that calculates sentiment-per-asset in the local domain of the targeted asset: Oil. The calculus assigned the value +3 to Oil, discarding the value of the computation for Canadian dollar, which is −5.
      • (26) Canadian dollar falls for second week. Crude Oil prices raises.
  • [root {oil: Positive,3.0,null}
    [sen {_: Negative,5.0,null}
     [c  {_: Negative,5.0,null}
      [c0  {_: Negative,3.0,null}
      [nx    {_: Null,null,null}
       [jj [{_: Null,null,null}] (Canadian)]
      [nn [{_: Null,null,null}] (dollar)]
      ]
      [vx  {_: Negative,3.0,null}
      [vbz [{_: Negative,3.0,null}] (falls)] <<<< {−}
      ]
     ]
     [pp  {_: Neutral,2.0,null}
      [in [{_: Null,null,null}] (for)]
      [nx   {_: Neutral,2.0,null}
      [jj [{_: Neutral,2.0,null}] (second)]
      [tunit [{_: Null,null,null}] (week)]
      ]
     ]
     ]
     [per [{_: Null,null,null}] (.)]
    ]
    [sen {oil: Positive,3.0,null}
     [c  {oil: Positive,3.0,null}
     [c0  {oil: Positive,3.0,null}
      [nx   {oil: Null,0.0,null}
      [jj [{_: Null,null,null}] (Crude)]
      [nn [{oil: Null,0.0,null}] (oil)] <<<< {K}
      [nns [{_: Null,null,null}] (prices)]
      ]
      [vx   {_: Positive,3.0,null}
       [vbz [{_: Positive,3.0,null}] (raises)] <<<< {+}
      ]
      ]
     ]
     ]
    ]
  • This example shows that every step of the computation by the modules of the system provides the structure for the application of the sentiment calculus. This calculus applies in local syntactic domains and provides an integer that represents the sentiment (polarity and strength) with respect to designated assets.
  • 6. The Inference Engine
  • Inference engine 24 is part of expert systems, which are designed to process a problem expressing an uncertainty with respect to a decision, and to provide a decision, or a set of decisions reducing the uncertainty. Inference engine 24 attempts to provide an answer to a problem, or clarify uncertainties where normally one or more human experts would need to be consulted.
  • The inference engine 24 of the present system 10 is part of the pipeline and provides a mechanism to sharpen the accuracy of the sentiment computed, by bringing both knowledge of the stock market world 23 and knowledge of the world 25 into the computation.
  • The inference engine 24 includes a data structure, and a set of inference rules (if X then Y) relating facts to sentiments. This knowledge interacts with the domain-specific knowledge stored in the lexicon and used by the sentiment calculator 22.
  • The inference engine 24 includes a data structure, a knowledge base that uses some knowledge representation structure to capture the knowledge of a specific domain, for example a relational table relating entities in knowledge domains, and a set of inference rules applying to the entities in the relational table and drawing consequences. One advantage of inference rules over traditional programming is that inference rules use reasoning, which more closely resembles human reasoning. In the specific application of stock-market trade, the knowledge base consists of a relational table relating stock entities (tickers, company names, products, etc.), stock events (e.g., upgrade, downgrade) and facts, extracted from news wire. The rules of the inference engine 24 apply to the elements of the relational table and infer sentiment values.
      • (27) Damn you OPEC! Will this be the summer we finally see $5/gal at the pump??? I sure hope not. Kills any similar fun from last summer
  • For example, the knowledge base includes (28) below, and the inference rules (29) below, stating that if gas oil (at the pump) is inferior to $3 then the sentiment value is positive, +2, if the gas oil is superior to $3 then the sentiment value is negative, −2. This real world knowledge varies according to time and place.
      • (28) OPEC, oil, $X/gal, locations
      • (29) In “$X/gal” expressions, where X is a digit
        • if X is inferior to 3 then polarity=+, and strength is 2
        • if X superior to 3 then polarity=−, an strength is 2
  • The sentiment calculator 22 alone would not derive the negative sentiment associated to the second sentence in (27). While the sentiment calculator 22 assigns the value neutral to questions, the inference engine 24 assigns the sentiment value of −2.
  • Thus, the inference engine 24 ensures that the sentiment is grounded in the real world. It contributes to the innovative technology, which leads to both simplify and sharpen decision taking in stock market transactions.
  • Sentiment calculations in accordance with the present system 10 are a result of the pipeline or multilayered embodied by the present invention that ingests social medial messages, identifies the language of the social media messages, and filters them from elements that are not part of natural language for which the system 10 has been parameterized (here English). The POS tagger 33 and the partial parser 32 modules of the NLP processor 16 assign parts of speech to the tokens of sentences, and recover the structure they are part of. The sentiment calculus of the sentiment calculator 22 applies to the annotated structures and derives the sentiment value per asset based on the sentiment value of the event they are part of. Finally the inference engine 24 reduces uncertainly by relying on a relational database including knowledge of the world information and a set of inference rules.
  • The present sentiment calculation system includes computer implemented mechanism for obtaining and converting ingested unstructured social media messages regarding a plurality of objects/assets being tracked into a sentiment value for each object/assets. The sentiment value includes a polarity value and strength value derived from a natural language processing algorithm containing a database of lexical items and phrases related to the objects being tracked. The precise sentiment value per object is derived by the compositional calculus based on the sentiment values of lexical items (and phrases) and their syntactic organization. The contextual sentiment value is based on the inference engine 24 deriving a sentiment value with respect to knowledge of the world. The interaction of the sentiment calculus and the inference engine 24 yields accurate sentiment in real-time. The sentiment cognitive-based calculus relates conceptual processing with natural language processing algorithm.
  • 7. Reaction Indicator
  • As discussed above, the data generated by the sentiment calculator 22 is applied to a graphical user interface 30 that combines sentiment and intensity data relating to the assets. The graphical user interface 30 includes moving graphic objects displayed upon a monitor that depict social media market sentiment; a timeline slider object 46; and a vertical bar chart object 44.
  • In accordance with the present invention, the graphical user interface 30 provides for the visualization of graphic objects in the form of moving spheres 40 where the sphere size and color depict social media market sentiment. The moving spherical graphic objects 40 shrink and grow based on intensity changes. The sphere color changes based on social media sentiment polarity. The center sphere 40 a represents the weighted sentiment average. Clicking one of the moving spherical graphic objects 40 results in the display of a chart 42 (see FIGS. 6A & 6B) graphing (based on what the trader selects) all or a choice of price, volume, social media frequency, social media sentiment, cross-correlation and a variety of price and sentiment derived technical indicators. Sphere updates are based on a configurable polling time.
  • The graphical user interface 30 contains a time slider 46 to go back to a point in time and replay history. A vertical bar chart 44 graphs the social media sentiment when the graphical user interface 30 is in full screen mode.
  • The purpose of the reaction indicator 31 is to provide a mechanism wherein hundreds of assets can be tracked, but only those that are “interesting” based on preprogrammed parameters will float to the surface and draw the viewer's attention.
  • More particularly, and with reference to FIGS. 2, 3, 4, 6 and 7, the reaction indicator 31 provides a graphical user interface 30 displaying three graphical areas of objects, moving spherical graphic objects 40, a timeline slider object 46 and a vertical bar chart object 44. It is noted the moving graphic objects may take shapes other than spheres, such as squares. Referring to FIG. 2, the spherical moving graphical objects are represented at 40, the timeline slider object at 46 and the vertical bar chart object at 44.
  • The reaction indicator polls a data stream containing mathematically computed values for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment auto refreshing the moving spherical graphic objects 40 and the vertical bar chart object 44 based on a configurable polling time. Intensity is defined as the ratio of short term frequency divided by long term frequency. The mathematical computations for the data stream are calculated by an algorithm discussed herein in detail in a section related to cross correlation. The calculations are based upon information obtained from a multilayer pipeline architecture previously discussed.
  • Referring to FIG. 7, the moving spherical graphic objects 40 shrink and grow based on the social media intensity attribute and are sized relative to each other taking into consideration the stage size and browser screen resolution. The color of the moving spherical graphic object 40 is based on social media sentiment polarity where polarity is defined as negative, neutral or positive. Each of the moving spherical graphic objects 40 displays a label, social media sentiment and social media frequency.
  • The center sphere 40 a object visualizes a weighted average of all sphere objects based on weights assigned to the spheres. Referring to FIG. 3, the weighted sphere object is represented at 40 a. The weighted average sphere size is static relative to the other sphere objects, which shrink and grow, and displays weighted average social media sentiment and weighted average social media frequency, if sphere weights have been assigned. If sphere weights have not been assigned, the weighted average sphere object does not display any data. The weighted average sphere object does not change color to reflect social media sentiment polarity. An example where weights may play a role is in the instance where the visualization represents an Exchange Traded Fund (ETF). An ETF holds assets such as stocks, commodities or bonds. The assets would be represented in the spheres. The weight for each asset assigned would represent the percentage in the ETF for an amalgamation of all assets.
  • The timeline slider object 46 visualizes a timeline where the date and time on the left represent the earliest date and time where data exists for the collection of moving spherical graphic objects 40. The date and time on the fight represents current date and time. Moving to various points on the timeline slider object 46 move the moving spherical graphic objects 40 and the vertical bar chart object 44 to a point in time, pausing the real-time display, then replaying history. From the historical point in time selected, the moving spherical graphic object 40 and the vertical bar chart object 44 will poll the data stream coming from the sentiment calculator 22 for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment from the point in time selected then rerun history as if it were happening real-time. Referring to FIG. 2, the timeline slider object is represented at 46.
  • The vertical bar chart object 44 utilizes the same data stream as the moving spherical graphic objects 40 to graph social media frequency, using the same color scheme as the spherical objects. Referring to FIG. 2, the vertical bar chart object is represented at 44.
  • Clicking on a moving spherical graphic object 40 will launch a chart, graphing price, volume, social media sentiment, social media frequency, and cross-correlation auto refreshing based on a configurable time, e.g. every second as seen in the screen shots depicted in FIGS. 6A and 6B.
  • Each of the moving spherical graphic objects 40 display a symbol, such as an exclamation mark within the sphere, preferably in the center, when an alert has been triggered. Specifically, a trigger will result when sentiment and intensity variables cross certain thresholds, the related moving spherical graphic object shall display an exclamation mark, signaling a potential trading opportunity; for example, when the sentiment and intensity for a given asset A exceeds a preprogrammed value indicating sell. An exclamation mark will be displayed in the center of sphere A alerting the operator to take action. The operator shall have the option of directly executing a trade via a combination of keyclicks. The operator can program the reaction indicator 31 to automatically place a trade. The operator can program the reaction indicator 31 to send an alert via e-mail or text message.
  • In summary, the reaction indicator 31 comprises a plurality of moving graphic objects 40 which change size and color based upon social media market sentiment, intensity and frequency captured and correlated in real-time from a stream of online social media messages related to a market segment. The moving spherical graphic objects 40 shrink or grow in size based upon the social media intensity attributed to each moving spherical graphic object 40 and the moving spherical graphic objects 40 change color based upon whether the social media sentiment attributed to each moving spherical graphic object is positive, negative or neutral. The reaction indicator 31 also provides a weighted average of all displayed moving spherical graphic objects 40 displayed based on weights assigned to the objects prior to capturing social media streams is displayed among the plurality of displayed objects.
  • Sentiment, Intensity Cross-Correlation
  • As discussed above, once sentiment and intensity are fully appreciated, the present system and method provides a mechanism for cross-correlating the sentiment and intensity data with the actual fluctuations with asset prices. The present invention provides two methods to find patterns in a target real-valued time series by utilizing two other real-valued time series derived from a stream of social-media messages (Twitter for instance): sentiment and frequency.
      • The target is arbitrary. It represents a quantifiable property of the asset that is being tracked. For instance, we have applied the algorithm using stocks and commodities as asset, and their market prices as targets.
      • The sentiment, as defined previously, is relative to the asset underlying the target.
      • The frequency represents the volume of messages about the asset. It is derived from the sentiment time series and a parameter called the window size.
  • When supplied with a window size, and applied in real-time those methods have a predictive value on the target. For this reason the series used to find patterns in the target, such as the sentiment series and the frequency series, are called predictive. As shown in FIG. 4, the patterns can be depicted graphically on charts, together with the time series, to be used as a decision making tool.
  • The patterns can also serve as the input to an automated trading system to generate trading signals.
  • In the example shown in FIG. 4, the curves are a depiction of the sentiment time-series for the target (thick curve labeled ss) and the sentiment-frequency time series (thin curve sf). The calculation of the sentiment-frequency series will be described later.
  • From a visual inspection of the picture it is easy to see that the target is reproducing the bell pattern the sentiment-frequency curve had earlier. This provides the ability to predict the future move of the target better. Looking at the sentiment times series ss for the target only, it seems the target is dropping sharply. However, using the pattern of the sentiment-frequency, one can anticipate that the target will soon experience a rather important rebound. This is the predictive value of the method. A visual inspection of FIGS. 6A and 6B will reveal that sentiment, despite NOT being derived from price, can show extremely strong correlation to price, either as a leading indicator or a supporting indicator, both scenarios being extremely relevant and useful to stock traders.
  • As will be appreciated based upon the following disclosure, the method of the present invention finds patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social-media messages, wherein the target represents a quantifiable property of an asset being tracked. The method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, ss (which is plotted); generating a frequency time series plot, sf (which is plotted); and determining a pattern based upon the sentiment time series and the frequency time series.
  • Formal Definitions
  • A real-valued time series is defined as a sequence of pairs (time (t), value (s)), also called points, ordered by increasing time. A simple time series could look like this: [(12:36,27),(13:03,37),(16:34,88)].
  • Formally the space of time series is defined as Ts=
    Figure US20130073480A1-20130321-P00001
    F(
    Figure US20130073480A1-20130321-P00002
    ×
    Figure US20130073480A1-20130321-P00003
    , that is the set of finite subsets of
    Figure US20130073480A1-20130321-P00002
    ×
    Figure US20130073480A1-20130321-P00002
    , whose elements are endowed with the total order<:((t,s), (t′,s′))ε(
    Figure US20130073480A1-20130321-P00002
    ×
    Figure US20130073480A1-20130321-P00004
    t<t′ε{true, false}.
  • Using the order <, each series sεTs is naturally mapped to the vector V(s)ε(
    Figure US20130073480A1-20130321-P00002
    ×
    Figure US20130073480A1-20130321-P00002
    )#(s) such that vi is the ith element of s. The vector of first components will be denoted by V1(s) and the vector of second components V2 (s).
  • For example,

  • ={(12:36,27),(13:03,37),(16:34,88)}

  • V(s)=[(12:36,27),(13:03,37),(16:34,88)]

  • V 1(s)=[12:36,13:03,16:34]

  • V 2(s)=[27,37,88].
  • A semantic distinction is drawn between pulsated time series where points represent a punctual event (i.e., sequence of Diracs), such as the arrival of a message, and sampled time series that represent a discretization of a function that's defined at all times, such as the market price. It is thus natural to interpolate points of a sampled time series to try and recover the original function it was sampled from.
  • The target is an arbitrary sampled real-valued time series. The algorithm has been applied with prices as target.
  • The sentiment time series ss is generated by the Natural Language Processing (NLP) module 16. It is a pulsated time series. For each message in the input stream, the sentiment time series contains a pair whose time is the time when the message was posted, and whose value is the result of the NLP processor 16. This value is called sentiment.
  • The frequency time series sf depends on two parameters: the sentiment time series and a positive number w representing a time called window size. It is a pulsated time series. For each point (t, s) in the sentiment series, the frequency series contains a point (t, f) where f is the number of points in the sentiment series in the time range [t−w, t], divided by w. This number f is called frequency.
  • Formally,
  • f ( t ) = # ( s s [ t - w , t ] ) w s f = { ( t , f ( t ) ) t V 1 ( s s ) }
  • A pattern P is defined as a cross-correlation c in [−1,1], a positive window size w, a time lag l, and a time ts. These numbers are interpreted as “the predictive series over [ts−w, ts] correlates to the target series over [ts−w+l, ts+l] with a cross-correlation of c”.
  • Formally, a pattern is thus an element of [−1,1]×
    Figure US20130073480A1-20130321-P00005
    ×
    Figure US20130073480A1-20130321-P00002
    ×
    Figure US20130073480A1-20130321-P00002
  • If the lag is positive, it is said to be predictive. The cross-correlation determines the relevance of the pattern: the higher it is, the more relevant the pattern is considered.
  • Pattern Identification Method
  • The method is called the sentiment-frequency method. It uses the sentiment to create a sentiment-frequency series, and correlates the latter to the target using a plain statistical cross-correlation. It then identifies patterns by finding the optimal time lag.
  • Correlating two time-series using a plain statistical cross-correlation and finding the optimal lag is an independent component. This component is called the series correlator and is described below.
  • Sentiment-Frequency Method
  • The system first creates an average sentiment series sa such that for every point (t,s) in the sentiment time series ss there is a point (t, a) in the average sentiment series where a is the arithmetic average of all the sentiments in the time range, or interval [t−w, t].
  • Formally let,
  • A w : t ( t , s ) s s [ t - w , t ] s # ( s s [ t - w , t ] )
    s a={(t,A w(t))|tεV 1(s s ∩[t−w,t])}
  • The system then creates the sentiment-frequency series ssf to contain a point (t,vsf) for every (t,a) in the sentiment series and (t,f) in the frequency time series sf, where vsf=fa(=ea ln(f)).
  • Formally define as:

  • s sf={(t,f a)|(t,as a,(t,fs f}
  • Next the series correlator is applied to the sentiment-frequency series and the target.
  • Series Correlator
  • The series correlator produces a set of patterns based on a real-valued pulsated time series sp, a real-valued sampled time series ss, an interpolation method I for ss, and a window size w.
  • The interpolation method I, is a function of a time series ss and of a time t that is C1-piecewise continuous with respect to t, and such that if there exists a point (t,v) in ss, I(ss, t)=v. Interpolation is a classical subject and it will not be described here. Common interpolation methods are linear or cubic splines.
  • Formally,

  • IεT s →C 1(
    Figure US20130073480A1-20130321-P00002
    Figure US20130073480A1-20130321-P00002
    )

  • ∀(t,vs s ,I(s s ,t)=v
  • For any time t and lag l, we defined the vector Es (ss, sp,t,l) so that for every (tp,p) in sp with tp in [t−w,t], Es(ss,sp,t,l) contains the point i(ss,tp+l). We call Es (ss, sp,t,l) the interpolated.
  • Formally,

  • E s(s s ,s p ,t,l)=I(s s ,V 1(s p ∩[t−w,t])i +l)
  • The system also defines the vector Ep(sp,t) so that for every (tp,p) in sp with tp in [t−w,t], Ep(sp, t) contains the point p.
  • Formally,

  • E p =V 2(s p ∩[t−w,t])
  • The cross-correlation CC(ss, sp, t, l) is defined as the scalar product of Ep(sp,t) and Es(ss,sp, t, l) divided by the product of their norms.
  • Formally,
  • E p ( s p , t ) E s ( s s , s p , t , l ) E p ( s p , t ) E s ( s s , s p , t , l )
  • Since t
    Figure US20130073480A1-20130321-P00006
    I(ss,t) is C1-piecewise continuous, for any fixed t, CC(ss,sp,t,l) has a finite set of local maximums. There are many methods to find local maximums. One possible method is to use a gradient method on points spread evenly on the time interval that the series covers.
  • From the definition above, the local maximums of CCt:lε
    Figure US20130073480A1-20130321-P00007
    CC(ss,sp,t,l) simply move linearly with t when no points of sp leaves or enters [t−w,t]. Hence the sets of local maximums of CCt for t or (t−w) the time of a point in sp is a finite set that represents completely the set of local maximums of CCt for all t.
  • For every w, the system computes a finite set of times t and lags l and a cross-correlation c for each of them. This defines a finite set of patterns (c, w, t, l) which the system orders by relevance.
  • Real-Time Target Prediction
  • The system runs the previous algorithm for t=now. The system then chooses the one with the most relevant predictive lag, and project that the target will behave like the sentiment-frequency curve.
  • When applied to real-time several optimizations are made:
      • Non-predictive lags can be ignored (we don't have data on the target in the future)
      • The system only computes new local maximums when a new point arrives in the sentiment series.
      • Updating the cross-correlation series can be optimized, not all the scalar products have to be recomputed.
      • The system can reuse the local maximums we had already identified to find the new ones.
  • In summary, the system for sentiment, intensity cross-correlation provides for time-based cross-correlation between the real-time sentiment value and frequency of a message stream relative to an object and a quantifiable property of that object. The time correlation relates patterns in the sentiment and frequency to patterns in the object property. The cross-correlation system further includes graphical depictions showing relations identified by the patterns between the object property and the sentiment, frequency, and any quantity derived from them. The cross-correlation system also includes event prediction of future up and down movement of the object property based upon the aforementioned patterns, as well as trading signals generated on and trading strategies based on the aforementioned patterns.
  • While the preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, is intended to cover all modifications and alternate constructions falling within the spirit and scope of the invention.

Claims (15)

1. A method for finding patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked, comprising:
identifying a target, which is a sampled real-valued time series;
generating a sentiment time series, ss, relating to an asset;
generating a frequency time series, sf, relating to an asset;
determining a pattern based upon the sentiment time series and the frequency time series.
2. The method according to claim 1, wherein sentiment is an expression of a psychological state relative to an event.
3. The method according to claim 1, wherein frequency represents the volume of social media messages about the asset.
4. The method according to claim 1, wherein the step of generating a sentiment time series is performed by language processing and is derived based upon pairs of lexical items in local syntactic context found in a volume of social media messages.
5. The method according to claim 4, wherein the step of generating a sentiment time series includes the creation of an average sentiment series, sa, such that for every point (t,s) in the sentiment time series, ss, there is a point (t, a) in an average sentiment series where “a” is the arithmetic average of all the sentiments in a time range [t−w, t].
6. The method according to claim 5, wherein the step of generating a sentiment time series includes the creation of a sentiment-frequency series, ssf, to contain a point (t,vsf) for every (t, a) in the sentiment time series, ss, and (t, f) in the frequency time series, sf, where vsf=fa(=ea ln(f)).
7. The method according to claim 1, wherein the frequency time series, sf, is dependent upon the sentiment time series, ss, and a positive number w representing a time called window size.
8. The method according to claim 7, wherein for each point (t, s) in the sentiment time series, ss, the frequency time series, sf, contains a point (t, f) where f is the number of points in the sentiment time series, ss, in the time range [t−w, t], divided by w.
9. The method according to claim 8, wherein the number f is called frequency and
f ( t ) = # ( s s [ t - w , t ] ) w s f = { ( t , f ( t ) ) t V 1 ( s s ) }
10. The method according to claim 9, wherein the pattern P is a cross-correlation c in [−1,1], a positive window size w, a time lag l, and a time ts, and these numbers are interpreted as a predictive series over [ts−w, ts] correlating to the target series over [ts−w+l, ts+l] with a cross-correlation of c″.
11. The method according to claim 1, wherein the step of determining a pattern employs a sentiment-frequency method that uses sentiment to create a sentiment-frequency series, sfs, and correlates to the target using a plain statistical cross-correlation.
12. The method according to claim 11, wherein the step of determining a pattern includes the step of identifying an optimal time lag.
13. The method according to claim 12, wherein correlating two time-series using a plain statistical cross-correlation and finding the optimal lag is achieved with a series correlator.
14. The method according to claim 13, wherein the series correlator produces a set of patterns based on a real-valued pulsated time series sp, a real-valued sampled time series, ss, an interpolation method I for ss, and a window size w.
15. The method according to claim 11, wherein the interpolation method I, is a function of a time series ss and of a time t that is C1-piecewise continuous with respect to t, and such that if there exists a point (t, v) in ss, I(ss, t)=v.
US13/427,833 2011-03-22 2012-03-22 Real time cross correlation of intensity and sentiment from social media messages Abandoned US20130073480A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/427,833 US20130073480A1 (en) 2011-03-22 2012-03-22 Real time cross correlation of intensity and sentiment from social media messages

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161466067P 2011-03-22 2011-03-22
US13/427,833 US20130073480A1 (en) 2011-03-22 2012-03-22 Real time cross correlation of intensity and sentiment from social media messages

Publications (1)

Publication Number Publication Date
US20130073480A1 true US20130073480A1 (en) 2013-03-21

Family

ID=46878142

Family Applications (5)

Application Number Title Priority Date Filing Date
US13/427,828 Expired - Fee Related US8856056B2 (en) 2011-03-22 2012-03-22 Sentiment calculus for a method and system using social media for event-driven trading
US13/427,819 Active 2033-10-02 US9940672B2 (en) 2011-03-22 2012-03-22 System for generating data from social media messages for the real-time evaluation of publicly traded assets
US13/427,830 Abandoned US20120246054A1 (en) 2011-03-22 2012-03-22 Reaction indicator for sentiment of social media messages
US13/427,833 Abandoned US20130073480A1 (en) 2011-03-22 2012-03-22 Real time cross correlation of intensity and sentiment from social media messages
US15/904,819 Abandoned US20180182038A1 (en) 2011-03-22 2018-02-26 System for generating data from social media messages for the real-time evaluation of publicly traded assets

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US13/427,828 Expired - Fee Related US8856056B2 (en) 2011-03-22 2012-03-22 Sentiment calculus for a method and system using social media for event-driven trading
US13/427,819 Active 2033-10-02 US9940672B2 (en) 2011-03-22 2012-03-22 System for generating data from social media messages for the real-time evaluation of publicly traded assets
US13/427,830 Abandoned US20120246054A1 (en) 2011-03-22 2012-03-22 Reaction indicator for sentiment of social media messages

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/904,819 Abandoned US20180182038A1 (en) 2011-03-22 2018-02-26 System for generating data from social media messages for the real-time evaluation of publicly traded assets

Country Status (1)

Country Link
US (5) US8856056B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130018896A1 (en) * 2011-07-13 2013-01-17 Bluefin Labs, Inc. Topic and Time Based Media Affinity Estimation
US9104734B2 (en) 2012-02-07 2015-08-11 Social Market Analytics, Inc. Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams
US9418389B2 (en) * 2012-05-07 2016-08-16 Nasdaq, Inc. Social intelligence architecture using social media message queues
US20170083817A1 (en) * 2015-09-23 2017-03-23 Isentium, Llc Topic detection in a social media sentiment extraction system
US10185996B2 (en) * 2015-07-15 2019-01-22 Foundation Of Soongsil University Industry Cooperation Stock fluctuation prediction method and server
US10440402B2 (en) 2011-01-26 2019-10-08 Afterlive.tv Inc Method and system for generating highlights from scored data streams
JP2020123401A (en) * 2015-11-16 2020-08-13 ウバープル カンパニー リミテッド Method for displaying asset information

Families Citing this family (117)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8565781B2 (en) 2007-07-27 2013-10-22 Intertrust Technologies Corporation Content publishing systems and methods
US10339541B2 (en) 2009-08-19 2019-07-02 Oracle International Corporation Systems and methods for creating and inserting application media content into social media system displays
US11620660B2 (en) 2009-08-19 2023-04-04 Oracle International Corporation Systems and methods for creating and inserting application media content into social media system displays
US20120011432A1 (en) 2009-08-19 2012-01-12 Vitrue, Inc. Systems and methods for associating social media systems and web pages
US9633399B2 (en) * 2009-08-19 2017-04-25 Oracle International Corporation Method and system for implementing a cloud-based social media marketing method and system
US9324112B2 (en) 2010-11-09 2016-04-26 Microsoft Technology Licensing, Llc Ranking authors in social media systems
US9286619B2 (en) 2010-12-27 2016-03-15 Microsoft Technology Licensing, Llc System and method for generating social summaries
WO2012116236A2 (en) 2011-02-23 2012-08-30 Nova Spivack System and method for analyzing messages in a network or across networks
US20130018892A1 (en) * 2011-07-12 2013-01-17 Castellanos Maria G Visually Representing How a Sentiment Score is Computed
US20130159219A1 (en) * 2011-12-14 2013-06-20 Microsoft Corporation Predicting the Likelihood of Digital Communication Responses
US8832092B2 (en) 2012-02-17 2014-09-09 Bottlenose, Inc. Natural language processing optimized for micro content
US20130263019A1 (en) * 2012-03-30 2013-10-03 Maria G. Castellanos Analyzing social media
US8620718B2 (en) * 2012-04-06 2013-12-31 Unmetric Inc. Industry specific brand benchmarking system based on social media strength of a brand
JP5607859B2 (en) * 2012-04-25 2014-10-15 インターナショナル・ビジネス・マシーンズ・コーポレーション Sentence classification method based on evaluation polarity, computer program, computer
US20130297546A1 (en) * 2012-05-07 2013-11-07 The Nasdaq Omx Group, Inc. Generating synthetic sentiment using multiple transactions and bias criteria
US9374374B2 (en) * 2012-06-19 2016-06-21 SecureMySocial, Inc. Systems and methods for securing social media for users and businesses and rewarding for enhancing security
US9460473B2 (en) * 2012-06-26 2016-10-04 International Business Machines Corporation Content-sensitive notification icons
US9678948B2 (en) * 2012-06-26 2017-06-13 International Business Machines Corporation Real-time message sentiment awareness
US10165067B2 (en) * 2012-06-29 2018-12-25 Nuvi, Llc Systems and methods for visualization of electronic social network content
US9009126B2 (en) 2012-07-31 2015-04-14 Bottlenose, Inc. Discovering and ranking trending links about topics
US9539498B1 (en) 2012-07-31 2017-01-10 Niantic, Inc. Mapping real world actions to a virtual world associated with a location-based game
US9226106B1 (en) 2012-07-31 2015-12-29 Niantic, Inc. Systems and methods for filtering communication within a location-based game
US9604131B1 (en) 2012-07-31 2017-03-28 Niantic, Inc. Systems and methods for verifying player proximity within a location-based game
US9669293B1 (en) 2012-07-31 2017-06-06 Niantic, Inc. Game data validation
US9782668B1 (en) 2012-07-31 2017-10-10 Niantic, Inc. Placement of virtual elements in a virtual world associated with a location-based parallel reality game
US9128789B1 (en) 2012-07-31 2015-09-08 Google Inc. Executing cross-cutting concerns for client-server remote procedure calls
US9621635B1 (en) 2012-07-31 2017-04-11 Niantic, Inc. Using side channels in remote procedure calls to return information in an interactive environment
US9669296B1 (en) 2012-07-31 2017-06-06 Niantic, Inc. Linking real world activities with a parallel reality game
EP2885756A4 (en) * 2012-08-15 2016-07-06 Thomson Reuters Glo Resources System and method for forming predictions using event-based sentiment analysis
US9727925B2 (en) 2012-09-09 2017-08-08 Oracle International Corporation Method and system for implementing semantic analysis of internal social network content
US9852239B2 (en) * 2012-09-24 2017-12-26 Adobe Systems Incorporated Method and apparatus for prediction of community reaction to a post
US8968099B1 (en) 2012-11-01 2015-03-03 Google Inc. System and method for transporting virtual objects in a parallel reality game
CN103812906B (en) * 2012-11-14 2015-03-18 腾讯科技(深圳)有限公司 Website recommendation method and device and communication system
US10395321B2 (en) * 2012-11-30 2019-08-27 Facebook, Inc. Dynamic expressions for representing features in an online system
US10671926B2 (en) 2012-11-30 2020-06-02 Servicenow, Inc. Method and system for generating predictive models for scoring and prioritizing opportunities
US9280739B2 (en) 2012-11-30 2016-03-08 Dxcontinuum Inc. Computer implemented system for automating the generation of a business decision analytic model
US10706359B2 (en) 2012-11-30 2020-07-07 Servicenow, Inc. Method and system for generating predictive models for scoring and prioritizing leads
US9460083B2 (en) 2012-12-27 2016-10-04 International Business Machines Corporation Interactive dashboard based on real-time sentiment analysis for synchronous communication
US9690775B2 (en) 2012-12-27 2017-06-27 International Business Machines Corporation Real-time sentiment analysis for synchronous communication
US9477704B1 (en) * 2012-12-31 2016-10-25 Teradata Us, Inc. Sentiment expression analysis based on keyword hierarchy
US9294576B2 (en) 2013-01-02 2016-03-22 Microsoft Technology Licensing, Llc Social media impact assessment
US8762302B1 (en) 2013-02-22 2014-06-24 Bottlenose, Inc. System and method for revealing correlations between data streams
WO2014138415A1 (en) * 2013-03-06 2014-09-12 Northwestern University Linguistic expression of preferences in social media for prediction and recommendation
US9432325B2 (en) 2013-04-08 2016-08-30 Avaya Inc. Automatic negative question handling
US20150317038A1 (en) * 2014-05-05 2015-11-05 Marty Mianji Method and apparatus for organizing, stamping, and submitting pictorial data
US9514133B1 (en) 2013-06-25 2016-12-06 Jpmorgan Chase Bank, N.A. System and method for customized sentiment signal generation through machine learning based streaming text analytics
US9268770B1 (en) 2013-06-25 2016-02-23 Jpmorgan Chase Bank, N.A. System and method for research report guided proactive news analytics for streaming news and social media
JP6150282B2 (en) * 2013-06-27 2017-06-21 国立研究開発法人情報通信研究機構 Non-factoid question answering system and computer program
SG10201403898TA (en) * 2013-07-05 2015-02-27 Barrett Carter Keith Computer-implemented intelligence tool
US10463953B1 (en) 2013-07-22 2019-11-05 Niantic, Inc. Detecting and preventing cheating in a location-based game
US9262438B2 (en) * 2013-08-06 2016-02-16 International Business Machines Corporation Geotagging unstructured text
US9715492B2 (en) 2013-09-11 2017-07-25 Avaya Inc. Unspoken sentiment
US9545565B1 (en) 2013-10-31 2017-01-17 Niantic, Inc. Regulating and scoring player interactions within a virtual world associated with a location-based parallel reality game
US20150134402A1 (en) * 2013-11-11 2015-05-14 Yahoo! Inc. System and method for network-oblivious community detection
US10515631B2 (en) 2013-12-17 2019-12-24 Koninklijke Philips N.V. System and method for assessing the cognitive style of a person
US20150206243A1 (en) * 2013-12-27 2015-07-23 Martin Camins Method and system for measuring financial asset predictions using social media
US9241069B2 (en) 2014-01-02 2016-01-19 Avaya Inc. Emergency greeting override by system administrator or routing to contact center
US10346752B2 (en) * 2014-04-17 2019-07-09 International Business Machines Corporation Correcting existing predictive model outputs with social media features over multiple time scales
US20150309965A1 (en) * 2014-04-28 2015-10-29 Elwha Llc Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis
US20150310003A1 (en) * 2014-04-28 2015-10-29 Elwha Llc Methods, systems, and devices for machines and machine states that manage relation data for modification of documents based on various corpora and/or modification data
GB2526622A (en) * 2014-05-30 2015-12-02 Mastercard International Inc Graphically rendering account data
US11354340B2 (en) 2014-06-05 2022-06-07 International Business Machines Corporation Time-based optimization of answer generation in a question and answer system
US9785684B2 (en) 2014-06-05 2017-10-10 International Business Machines Corporation Determining temporal categories for a domain of content for natural language processing
US20150363796A1 (en) * 2014-06-13 2015-12-17 Thomson Licensing System and method for filtering social media messages for presentation on digital signage systems
US9646198B2 (en) 2014-08-08 2017-05-09 International Business Machines Corporation Sentiment analysis in a video conference
US9648061B2 (en) 2014-08-08 2017-05-09 International Business Machines Corporation Sentiment analysis in a video conference
US10922657B2 (en) 2014-08-26 2021-02-16 Oracle International Corporation Using an employee database with social media connections to calculate job candidate reputation scores
US9978362B2 (en) 2014-09-02 2018-05-22 Microsoft Technology Licensing, Llc Facet recommendations from sentiment-bearing content
US10706432B2 (en) * 2014-09-17 2020-07-07 [24]7.ai, Inc. Method, apparatus and non-transitory medium for customizing speed of interaction and servicing on one or more interactions channels based on intention classifiers
US10666664B2 (en) * 2014-11-06 2020-05-26 Pcms Holdings, Inc. System and method of providing location-based privacy on social media
US11599841B2 (en) 2015-01-05 2023-03-07 Saama Technologies Inc. Data analysis using natural language processing to obtain insights relevant to an organization
KR101634086B1 (en) * 2015-01-19 2016-07-08 주식회사 엔씨소프트 Method and computer system of analyzing communication situation based on emotion information
US9805128B2 (en) 2015-02-18 2017-10-31 Xerox Corporation Methods and systems for predicting psychological types
US20160260166A1 (en) * 2015-03-02 2016-09-08 Trade Social, LLC Identification, curation and trend monitoring for uncorrelated information sources
US10521420B2 (en) 2015-07-31 2019-12-31 International Business Machines Corporation Analyzing search queries to determine a user affinity and filter search results
US10572206B2 (en) * 2015-08-28 2020-02-25 Vinuth Tulasi System and method for minimizing screen space required for displaying auxiliary content
US10187675B2 (en) * 2015-10-12 2019-01-22 The Nielsen Company (Us), Llc Methods and apparatus to identify co-relationships between media using social media
US10073794B2 (en) 2015-10-16 2018-09-11 Sprinklr, Inc. Mobile application builder program and its functionality for application development, providing the user an improved search capability for an expanded generic search based on the user's search criteria
US20170148097A1 (en) * 2015-11-23 2017-05-25 Indiana University Research And Technology Corporation Systems and methods for deriving financial information from emotional content analysis
US11004096B2 (en) 2015-11-25 2021-05-11 Sprinklr, Inc. Buy intent estimation and its applications for social media data
WO2017100361A1 (en) * 2015-12-08 2017-06-15 Dennehy Matthew T System and method for tracking stock fluctuations
TWI582683B (en) * 2015-12-08 2017-05-11 宏碁股份有限公司 Electronic device and the method for operation thereof
CN106886513A (en) * 2015-12-16 2017-06-23 宏碁股份有限公司 Electronic installation and its operating method
US10042842B2 (en) * 2016-02-24 2018-08-07 Utopus Insights, Inc. Theft detection via adaptive lexical similarity analysis of social media data streams
US10133735B2 (en) 2016-02-29 2018-11-20 Rovi Guides, Inc. Systems and methods for training a model to determine whether a query with multiple segments comprises multiple distinct commands or a combined command
US10031967B2 (en) * 2016-02-29 2018-07-24 Rovi Guides, Inc. Systems and methods for using a trained model for determining whether a query comprising multiple segments relates to an individual query or several queries
US20170300823A1 (en) * 2016-04-13 2017-10-19 International Business Machines Corporation Determining user influence by contextual relationship of isolated and non-isolated content
US20180025545A1 (en) * 2016-07-19 2018-01-25 Pol-Lin Tai Method for creating visualized effect for data
CN106295702B (en) * 2016-08-15 2019-10-25 西北工业大学 A kind of social platform user classification method based on the analysis of individual affective behavior
US20180053197A1 (en) * 2016-08-18 2018-02-22 International Business Machines Corporation Normalizing user responses to events
US10397326B2 (en) 2017-01-11 2019-08-27 Sprinklr, Inc. IRC-Infoid data standardization for use in a plurality of mobile applications
US10831796B2 (en) * 2017-01-15 2020-11-10 International Business Machines Corporation Tone optimization for digital content
US10489510B2 (en) * 2017-04-20 2019-11-26 Ford Motor Company Sentiment analysis of product reviews from social media
JP6959044B2 (en) * 2017-06-23 2021-11-02 株式会社野村総合研究所 Recording server, recording method and program
US11934937B2 (en) * 2017-07-10 2024-03-19 Accenture Global Solutions Limited System and method for detecting the occurrence of an event and determining a response to the event
US10717005B2 (en) 2017-07-22 2020-07-21 Niantic, Inc. Validating a player's real-world location using activity within a parallel reality game
US10574601B2 (en) * 2017-08-03 2020-02-25 International Business Machines Corporation Managing and displaying online messages along timelines
CN107526831B (en) * 2017-09-04 2020-03-31 华为技术有限公司 Natural language processing method and device
US11238535B1 (en) 2017-09-14 2022-02-01 Wells Fargo Bank, N.A. Stock trading platform with social network sentiment
US11164266B2 (en) * 2017-10-27 2021-11-02 International Business Machines Corporation Protection of water providing entities from loss due to environmental events
CN108009148B (en) * 2017-11-16 2021-04-27 天津大学 Text emotion classification representation method based on deep learning
US10528660B2 (en) 2017-12-02 2020-01-07 International Business Machines Corporation Leveraging word patterns in the language of popular influencers to predict popular trends
US11238087B2 (en) * 2017-12-21 2022-02-01 Microsoft Technology Licensing, Llc Social analytics based on viral mentions and threading
US10380613B1 (en) 2018-11-07 2019-08-13 Capital One Services, Llc System and method for analyzing cryptocurrency-related information using artificial intelligence
US10789430B2 (en) 2018-11-19 2020-09-29 Genesys Telecommunications Laboratories, Inc. Method and system for sentiment analysis
CA3120977A1 (en) * 2018-11-19 2020-05-28 Genesys Telecummications Laboratories, Inc. Method and system for sentiment analysis
CN109949076B (en) * 2019-02-26 2022-02-18 北京首钢自动化信息技术有限公司 Method for establishing hypersphere mapping model, information recommendation method and device
US11461847B2 (en) * 2019-03-21 2022-10-04 The University Of Chicago Applying a trained model to predict a future value using contextualized sentiment data
US11573995B2 (en) * 2019-09-10 2023-02-07 International Business Machines Corporation Analyzing the tone of textual data
GB201915879D0 (en) * 2019-10-31 2019-12-18 Black Swan Data Ltd Using social data to improve long term sales forecasting
US12106061B2 (en) 2020-04-29 2024-10-01 Clarabridge, Inc. Automated narratives of interactive communications
US11546285B2 (en) * 2020-04-29 2023-01-03 Clarabridge, Inc. Intelligent transaction scoring
US11689487B1 (en) * 2020-07-16 2023-06-27 Kynami, Inc. System and method for identifying and blocking trolls on social networks
US10878505B1 (en) 2020-07-31 2020-12-29 Agblox, Inc. Curated sentiment analysis in multi-layer, machine learning-based forecasting model using customized, commodity-specific neural networks
US20220261818A1 (en) * 2021-02-16 2022-08-18 RepTrak Holdings, Inc. System and method for determining and managing reputation of entities and industries through use of media data
US20220383411A1 (en) * 2021-06-01 2022-12-01 Jpmorgan Chase Bank, N.A. Method and system for assessing social media effects on market trends
US11797517B2 (en) * 2021-06-21 2023-10-24 Yahoo Assets Llc Public content validation and presentation method and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270116A1 (en) * 2007-04-24 2008-10-30 Namrata Godbole Large-Scale Sentiment Analysis
US20100325031A1 (en) * 2009-06-18 2010-12-23 Penson Worldwide, Inc. Method and system for trading financial assets

Family Cites Families (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US7216304B1 (en) * 2000-01-05 2007-05-08 Apple Inc. Graphical user interface for computers having variable size icons
JP3573688B2 (en) * 2000-06-28 2004-10-06 松下電器産業株式会社 Similar document search device and related keyword extraction device
US7185065B1 (en) * 2000-10-11 2007-02-27 Buzzmetrics Ltd System and method for scoring electronic messages
US20070226640A1 (en) * 2000-11-15 2007-09-27 Holbrook David M Apparatus and methods for organizing and/or presenting data
US8285619B2 (en) 2001-01-22 2012-10-09 Fred Herz Patents, LLC Stock market prediction using natural language processing
US7159178B2 (en) * 2001-02-20 2007-01-02 Communispace Corp. System for supporting a virtual community
US7216073B2 (en) * 2001-03-13 2007-05-08 Intelligate, Ltd. Dynamic natural language understanding
US20080015871A1 (en) * 2002-04-18 2008-01-17 Jeff Scott Eder Varr system
JP2005010854A (en) * 2003-06-16 2005-01-13 Sony Computer Entertainment Inc Information presenting method and system
GB2403636A (en) * 2003-07-02 2005-01-05 Sony Uk Ltd Information retrieval using an array of nodes
US7213206B2 (en) * 2003-09-09 2007-05-01 Fogg Brian J Relationship user interface
US7865354B2 (en) * 2003-12-05 2011-01-04 International Business Machines Corporation Extracting and grouping opinions from text documents
JP4394517B2 (en) 2004-05-12 2010-01-06 富士通株式会社 Feature information extraction method, feature information extraction program, and feature information extraction device
WO2006039566A2 (en) 2004-09-30 2006-04-13 Intelliseek, Inc. Topical sentiments in electronically stored communications
US20060242040A1 (en) * 2005-04-20 2006-10-26 Aim Holdings Llc Method and system for conducting sentiment analysis for securities research
US20070005477A1 (en) 2005-06-24 2007-01-04 Mcatamney Pauline Interactive asset data visualization guide
US20070005564A1 (en) * 2005-06-29 2007-01-04 Mark Zehner Method and system for performing multi-dimensional searches
US7502789B2 (en) * 2005-12-15 2009-03-10 Microsoft Corporation Identifying important news reports from news home pages
EP1989639A4 (en) * 2006-02-28 2012-05-02 Buzzlogic Inc Social analytics system and method for analyzing conversations in social media
US20070239590A1 (en) * 2006-04-07 2007-10-11 Lee Gang P Two-step method and system for commodity trading
US7882014B2 (en) * 2006-04-28 2011-02-01 Pipeline Financial Group, Inc. Display of market impact in algorithmic trading engine
US7720835B2 (en) * 2006-05-05 2010-05-18 Visible Technologies Llc Systems and methods for consumer-generated media reputation management
US9269068B2 (en) * 2006-05-05 2016-02-23 Visible Technologies Llc Systems and methods for consumer-generated media reputation management
US20090070683A1 (en) * 2006-05-05 2009-03-12 Miles Ward Consumer-generated media influence and sentiment determination
US7676518B2 (en) * 2006-08-16 2010-03-09 Sap Ag Clustering for structured data
US8862591B2 (en) * 2006-08-22 2014-10-14 Twitter, Inc. System and method for evaluating sentiment
US8271429B2 (en) * 2006-09-11 2012-09-18 Wiredset Llc System and method for collecting and processing data
US7730316B1 (en) * 2006-09-22 2010-06-01 Fatlens, Inc. Method for document fingerprinting
US8078450B2 (en) * 2006-10-10 2011-12-13 Abbyy Software Ltd. Method and system for analyzing various languages and constructing language-independent semantic structures
US7693773B2 (en) * 2006-10-13 2010-04-06 Morgan Stanley Interactive user interface for displaying information related to publicly traded securities
US20080104225A1 (en) * 2006-11-01 2008-05-01 Microsoft Corporation Visualization application for mining of social networks
US7930302B2 (en) 2006-11-22 2011-04-19 Intuit Inc. Method and system for analyzing user-generated content
GB0701202D0 (en) * 2007-01-22 2007-02-28 Wanzke Detlev Data analysis
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
US7966241B2 (en) 2007-03-01 2011-06-21 Reginald Nosegbe Stock method for measuring and assigning precise meaning to market sentiment
US7734641B2 (en) * 2007-05-25 2010-06-08 Peerset, Inc. Recommendation systems and methods using interest correlation
US7987188B2 (en) 2007-08-23 2011-07-26 Google Inc. Domain-specific sentiment classification
US8280885B2 (en) * 2007-10-29 2012-10-02 Cornell University System and method for automatically summarizing fine-grained opinions in digital text
US8781989B2 (en) * 2008-01-14 2014-07-15 Aptima, Inc. Method and system to predict a data value
US8010539B2 (en) 2008-01-25 2011-08-30 Google Inc. Phrase based snippet generation
JP2011516938A (en) * 2008-02-22 2011-05-26 ソーシャルレップ・エルエルシー Systems and methods for measuring and managing distributed online conversations
US8239189B2 (en) * 2008-02-26 2012-08-07 Siemens Enterprise Communications Gmbh & Co. Kg Method and system for estimating a sentiment for an entity
US8463594B2 (en) 2008-03-21 2013-06-11 Sauriel Llc System and method for analyzing text using emotional intelligence factors
WO2009151502A2 (en) * 2008-04-08 2009-12-17 Allgress, Inc. Enterprise information security management software used to prove return on investment of security projects and activities using interactive graphs
US8117207B2 (en) * 2008-04-18 2012-02-14 Biz360 Inc. System and methods for evaluating feature opinions for products, services, and entities
AU2009260033A1 (en) * 2008-06-19 2009-12-23 Wize Technologies, Inc. System and method for aggregating and summarizing product/topic sentiment
US8446412B2 (en) * 2008-06-26 2013-05-21 Microsoft Corporation Static visualization of multiple-dimension data trends
US8219916B2 (en) * 2008-07-25 2012-07-10 Yahoo! Inc. Techniques for visual representation of user activity associated with an information resource
US20100121707A1 (en) * 2008-11-13 2010-05-13 Buzzient, Inc. Displaying analytic measurement of online social media content in a graphical user interface
US8669994B2 (en) * 2008-11-15 2014-03-11 New Bis Safe Luxco S.A R.L Data visualization methods
US20100332465A1 (en) 2008-12-16 2010-12-30 Frizo Janssens Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection
US9076125B2 (en) * 2009-02-27 2015-07-07 Microsoft Technology Licensing, Llc Visualization of participant relationships and sentiment for electronic messaging
JP4852119B2 (en) * 2009-03-25 2012-01-11 株式会社東芝 Data display device, data display method, and data display program
US8166032B2 (en) * 2009-04-09 2012-04-24 MarketChorus, Inc. System and method for sentiment-based text classification and relevancy ranking
JP5559306B2 (en) * 2009-04-24 2014-07-23 アルグレス・インコーポレイテッド Enterprise information security management software for predictive modeling using interactive graphs
US8504550B2 (en) * 2009-05-15 2013-08-06 Citizennet Inc. Social network message categorization systems and methods
US8346702B2 (en) * 2009-05-22 2013-01-01 Step 3 Systems, Inc. System and method for automatically predicting the outcome of expert forecasts
JP5879260B2 (en) * 2009-06-09 2016-03-08 イービーエイチ エンタープライズィーズ インコーポレイテッド Method and apparatus for analyzing content of microblog message
JP5795580B2 (en) * 2009-07-16 2015-10-14 ブルーフィン ラボズ インコーポレイテッド Estimating and displaying social interests in time-based media
US8386482B2 (en) * 2009-09-02 2013-02-26 Xurmo Technologies Private Limited Method for personalizing information retrieval in a communication network
US20110112995A1 (en) * 2009-10-28 2011-05-12 Industrial Technology Research Institute Systems and methods for organizing collective social intelligence information using an organic object data model
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
US8281238B2 (en) * 2009-11-10 2012-10-02 Primal Fusion Inc. System, method and computer program for creating and manipulating data structures using an interactive graphical interface
US11132748B2 (en) * 2009-12-01 2021-09-28 Refinitiv Us Organization Llc Method and apparatus for risk mining
US20120316916A1 (en) * 2009-12-01 2012-12-13 Andrews Sarah L Methods and systems for generating corporate green score using social media sourced data and sentiment analysis
US20120296845A1 (en) * 2009-12-01 2012-11-22 Andrews Sarah L Methods and systems for generating composite index using social media sourced data and sentiment analysis
US8356025B2 (en) 2009-12-09 2013-01-15 International Business Machines Corporation Systems and methods for detecting sentiment-based topics
US9201863B2 (en) 2009-12-24 2015-12-01 Woodwire, Inc. Sentiment analysis from social media content
US8849649B2 (en) * 2009-12-24 2014-09-30 Metavana, Inc. System and method for determining sentiment expressed in documents
JP5284990B2 (en) * 2010-01-08 2013-09-11 インターナショナル・ビジネス・マシーンズ・コーポレーション Processing method for time series analysis of keywords, processing system and computer program
US8402035B2 (en) 2010-03-12 2013-03-19 General Sentiment, Inc. Methods and systems for determing media value
WO2011119410A2 (en) 2010-03-24 2011-09-29 Taykey, Ltd. A system and methods thereof for mining web based user generated content for creation of term taxonomies
US8965835B2 (en) * 2010-03-24 2015-02-24 Taykey Ltd. Method for analyzing sentiment trends based on term taxonomies of user generated content
US9613139B2 (en) 2010-03-24 2017-04-04 Taykey Ltd. System and methods thereof for real-time monitoring of a sentiment trend with respect of a desired phrase
US20110246921A1 (en) 2010-03-30 2011-10-06 Microsoft Corporation Visualizing sentiment of online content
US8725494B2 (en) 2010-03-31 2014-05-13 Attivio, Inc. Signal processing approach to sentiment analysis for entities in documents
US8326880B2 (en) * 2010-04-05 2012-12-04 Microsoft Corporation Summarizing streams of information
US20110251977A1 (en) 2010-04-13 2011-10-13 Michal Cialowicz Ad Hoc Document Parsing
US20110258256A1 (en) 2010-04-14 2011-10-20 Bernardo Huberman Predicting future outcomes
US20110264531A1 (en) * 2010-04-26 2011-10-27 Yahoo! Inc. Watching a user's online world
US20110275046A1 (en) * 2010-05-07 2011-11-10 Andrew Grenville Method and system for evaluating content
CN102884530A (en) 2010-05-16 2013-01-16 捷通国际有限公司 Data collection, tracking, and analysis for multiple media including impact analysis and influence tracking
US20110320542A1 (en) 2010-06-28 2011-12-29 Bank Of America Corporation Analyzing Social Networking Information
US9582908B2 (en) * 2010-10-26 2017-02-28 Inetco Systems Limited Method and system for interactive visualization of hierarchical time series data
US8955001B2 (en) * 2011-07-06 2015-02-10 Symphony Advanced Media Mobile remote media control platform apparatuses and methods
US9292602B2 (en) 2010-12-14 2016-03-22 Microsoft Technology Licensing, Llc Interactive search results page
US8706647B2 (en) * 2010-12-17 2014-04-22 University Of Southern California Estimating value of user's social influence on other users of computer network system
US8380607B2 (en) * 2010-12-17 2013-02-19 Indiana University Research And Technology Corporation Predicting economic trends via network communication mood tracking
US8805714B2 (en) * 2011-01-20 2014-08-12 Ipc Systems, Inc. User interface displaying communication information
WO2012116236A2 (en) * 2011-02-23 2012-08-30 Nova Spivack System and method for analyzing messages in a network or across networks
US8660581B2 (en) * 2011-02-23 2014-02-25 Digimarc Corporation Mobile device indoor navigation
US8650023B2 (en) * 2011-03-21 2014-02-11 Xerox Corporation Customer review authoring assistant
US8838438B2 (en) * 2011-04-29 2014-09-16 Cbs Interactive Inc. System and method for determining sentiment from text content
US9100669B2 (en) * 2011-05-12 2015-08-04 At&T Intellectual Property I, Lp Method and apparatus for associating micro-blogs with media programs
US20130018954A1 (en) * 2011-07-15 2013-01-17 Samsung Electronics Co., Ltd. Situation-aware user sentiment social interest models
US10165067B2 (en) * 2012-06-29 2018-12-25 Nuvi, Llc Systems and methods for visualization of electronic social network content
US20140207579A1 (en) * 2013-01-18 2014-07-24 Salesforce.Com, Inc. Syndication of online message content using social media

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270116A1 (en) * 2007-04-24 2008-10-30 Namrata Godbole Large-Scale Sentiment Analysis
US20100325031A1 (en) * 2009-06-18 2010-12-23 Penson Worldwide, Inc. Method and system for trading financial assets

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11082722B2 (en) 2011-01-26 2021-08-03 Afterlive.tv Inc. Method and system for generating highlights from scored data streams
US10440402B2 (en) 2011-01-26 2019-10-08 Afterlive.tv Inc Method and system for generating highlights from scored data streams
US10769194B2 (en) 2011-07-13 2020-09-08 Bluefin Labs, Inc. Topic and time based media affinity estimation
US8600984B2 (en) * 2011-07-13 2013-12-03 Bluefin Labs, Inc. Topic and time based media affinity estimation
US9009130B2 (en) 2011-07-13 2015-04-14 Bluefin Labs, Inc. Topic and time based media affinity estimation
US20130018896A1 (en) * 2011-07-13 2013-01-17 Bluefin Labs, Inc. Topic and Time Based Media Affinity Estimation
US11301505B2 (en) 2011-07-13 2022-04-12 Bluefin Labs, Inc. Topic and time based media affinity estimation
US9753923B2 (en) 2011-07-13 2017-09-05 Bluefin Labs, Inc. Topic and time based media affinity estimation
US9104734B2 (en) 2012-02-07 2015-08-11 Social Market Analytics, Inc. Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams
US10031909B2 (en) 2012-02-07 2018-07-24 Social Market Analytics, Inc. Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams
US10846479B2 (en) 2012-02-07 2020-11-24 Social Market Analytics, Inc. Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams
US9418389B2 (en) * 2012-05-07 2016-08-16 Nasdaq, Inc. Social intelligence architecture using social media message queues
US11086885B2 (en) * 2012-05-07 2021-08-10 Nasdaq, Inc. Social intelligence architecture using social media message queues
US20210349907A1 (en) * 2012-05-07 2021-11-11 Nasdaq, Inc. Social intelligence architecture using social media message queues
US11803557B2 (en) * 2012-05-07 2023-10-31 Nasdaq, Inc. Social intelligence architecture using social media message queues
US10185996B2 (en) * 2015-07-15 2019-01-22 Foundation Of Soongsil University Industry Cooperation Stock fluctuation prediction method and server
US20170083817A1 (en) * 2015-09-23 2017-03-23 Isentium, Llc Topic detection in a social media sentiment extraction system
JP2020123401A (en) * 2015-11-16 2020-08-13 ウバープル カンパニー リミテッド Method for displaying asset information
JP7021289B2 (en) 2015-11-16 2022-02-16 ウバープル カンパニー リミテッド How to display asset information

Also Published As

Publication number Publication date
US20120246104A1 (en) 2012-09-27
US20120246054A1 (en) 2012-09-27
US20180182038A1 (en) 2018-06-28
US8856056B2 (en) 2014-10-07
US9940672B2 (en) 2018-04-10
US20120324023A1 (en) 2012-12-20

Similar Documents

Publication Publication Date Title
US20180182038A1 (en) System for generating data from social media messages for the real-time evaluation of publicly traded assets
Derakhshan et al. Sentiment analysis on stock social media for stock price movement prediction
US20170083817A1 (en) Topic detection in a social media sentiment extraction system
CN110799981B (en) Systems and methods for domain-independent aspect level emotion detection
Nassirtoussi et al. Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment
Nassirtoussi et al. Text mining for market prediction: A systematic review
Smailović et al. Stream-based active learning for sentiment analysis in the financial domain
Lutz et al. Sentence-level sentiment analysis of financial news using distributed text representations and multi-instance learning
Urolagin Text mining of tweet for sentiment classification and association with stock prices
Kim et al. Stock price prediction through sentiment analysis of corporate disclosures using distributed representation
CN113449108A (en) Financial news stream burst detection method based on hierarchical clustering
Ao Sentiment analysis based on financial tweets and market information
Zhao et al. Dynamic impacts of online investor sentiment on international crude oil prices
Sun et al. Financial fraud detection based on the part-of-speech features of textual risk disclosures in financial reports
Perikos et al. Opinion mining and visualization of online users reviews: a case study in Booking. com
Dash Information Extraction from Unstructured Big Data: A Case Study of Deep Natural Language Processing in Fintech
Kamal et al. A Comprehensive Review on Summarizing Financial News Using Deep Learning
Wade Transformers and tradition: using Generative AI and Deep Learning for financial markets prediction
Xu Data Mining in Social Media for Stock Market Prediction
Gutiérrez et al. Similarity analysis of federal reserve statements using document embeddings: the Great Recession vs. COVID-19
Schlaubitz Natural Language Processing in finance: analysis of sentiment and complexity of news and earnings reports of swiss SMEs and their relevance for stock returns
Lwanga Stock market price prediction using sentiment analysis: a case study of Nairobi stock exchange market
Bhalla et al. A Review of Various Sentiment Analysis Techniques, Methodologies and their Applications.
Venturini Analysing the impact of ECB Communication on Financial Markets: A Text Mining approach
Albahli et al. Opinion mining for stock trend prediction using deep learning

Legal Events

Date Code Title Description
AS Assignment

Owner name: ISENTIUM TECHNOLOGIES, INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALBERTI, LIONEL;SASTRI, GAUTHAM;SIGNING DATES FROM 20120404 TO 20120517;REEL/FRAME:028309/0968

AS Assignment

Owner name: ISENTIUM, LLC, FLORIDA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ISENTIUM TECHNOLOGIES INC.;REEL/FRAME:033536/0088

Effective date: 20140812

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION