US10628737B2 - Identifying constructive sub-dialogues - Google Patents

Identifying constructive sub-dialogues Download PDF

Info

Publication number
US10628737B2
US10628737B2 US15/406,565 US201715406565A US10628737B2 US 10628737 B2 US10628737 B2 US 10628737B2 US 201715406565 A US201715406565 A US 201715406565A US 10628737 B2 US10628737 B2 US 10628737B2
Authority
US
United States
Prior art keywords
dialogue
sub
features
new
comment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US15/406,565
Other versions
US20180203846A1 (en
Inventor
Courtney Napoles Cohen
Aasish Pappu
Joel Tetreault
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verizon Patent and Licensing Inc
Original Assignee
Oath Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oath Inc filed Critical Oath Inc
Priority to US15/406,565 priority Critical patent/US10628737B2/en
Assigned to YAHOO! INC. reassignment YAHOO! INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COHEN, COURTNEY NAPOLES, TETREAULT, JOEL, PAPPU, AASISH
Assigned to YAHOO HOLDINGS, INC. reassignment YAHOO HOLDINGS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YAHOO! INC.
Assigned to OATH INC. reassignment OATH INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YAHOO HOLDINGS, INC.
Publication of US20180203846A1 publication Critical patent/US20180203846A1/en
Application granted granted Critical
Publication of US10628737B2 publication Critical patent/US10628737B2/en
Assigned to VERIZON MEDIA INC. reassignment VERIZON MEDIA INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OATH INC.
Assigned to VERIZON PATENT AND LICENSING INC. reassignment VERIZON PATENT AND LICENSING INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VERIZON MEDIA INC.
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06F17/241
    • G06F17/279
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0445
    • G06N7/005
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/2833
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/566Grouping or aggregating service requests, e.g. for unified processing

Definitions

  • Online forums such as reddit and comment sections for online articles allow users to converse with one another through the posting of comments in comment threads.
  • a processor-executed method is described.
  • software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum.
  • Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue.
  • the software obtains one or more sub-dialogue annotations associated with each sub-dialogue.
  • the one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
  • the software extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive.
  • the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue.
  • the software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Then the software uses the determination to re-locate the new sub-dialogue in the thread.
  • an apparatus namely, computer-readable media which persistently store a program for a website hosting an online forum.
  • the program extracts a plurality of sub-dialogues from each thread in a corpus from the online forum.
  • Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue.
  • the program obtains one or more sub-dialogue annotations associated with each sub-dialogue.
  • the one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
  • the program extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive.
  • the program obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue.
  • the program inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Then the program uses the determination to re-locate the new sub-dialogue in the thread.
  • Another example embodiment also involves a processor-executed method.
  • a processor-executed method is described.
  • software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum.
  • Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue.
  • the software obtains one or more sub-dialogue annotations associated with each sub-dialogue.
  • the sub-dialogue annotations are obtained from a human annotator.
  • the one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
  • the software extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive. Then the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue. The software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Each sub-dialogue is sequentially modeled using conditional random fields. Then the software uses the determination to re-locate the new sub-dialogue in the thread.
  • FIG. 1 is a network diagram showing a website hosting a content-aggregation service and a website hosting an online forum, in accordance with an example embodiment.
  • FIG. 2 is a diagram of a pipeline of software modules for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
  • FIG. 3 is a flowchart diagram of a process for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
  • FIG. 4 shows the comments in a thread in an online forum, in accordance with an example embodiment.
  • FIG. 5 shows a table listing specified sub-dialogue annotations and specified comment annotations, in accordance with an example embodiment.
  • FIG. 6A shows a co-occurrence graph of sub-dialogue annotations with other sub-dialogue annotations, in accordance with an example embodiment.
  • FIG. 6B shows a co-occurrence graph of sub-dialogue annotations with comment annotations, in accordance with an example embodiment.
  • FIG. 7 shows a table listing precision scores for particular features, in accordance with an example embodiment.
  • terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context.
  • the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
  • Constructive conversations do not require “conclusions.” A conversation or argument does not have to have a winner or conclusion to be constructive, as long as there is a clear exchange of ideas, opinions, and information done somewhat respectfully. A constructive conversation should contain one or multiple points of agreement and/or disagreement, all mostly on topic, and be relatively respectful. Comments should contain new information (be informative) and/or attempt to persuade. Comments may also seek to contribute humor, sarcasm, or even meanness if in the context of a passionate attempt at persuasiveness. How much “meanness” degrades the constructiveness is subjective—some people are more tolerant than others of fearful language when heated arguments occur.
  • Non-constructive conversations are those which are largely unproductive. Usually, the initial commenter's point does not get properly addressed (i.e., conversation does not contain a clear communicative goal; conversation is disconnected), is comprised of few attempts at persuasiveness, and each speech act can be taken in isolation. A sub-dialogue can also be deemed non-constructive if largely negative (i.e., an exchange of insults) or “all over the place” in terms of topic.
  • FIG. 1 is a network diagram showing a website hosting a content-aggregation service and a website hosting an online forum, in accordance with an example embodiment.
  • a personal computer 102 e.g., a laptop or other mobile computer
  • a mobile device 103 e.g., a smartphone such as an iPhone, Android, Windows Phone, etc., or a tablet computer such as an iPad, Galaxy, etc.
  • a network 101 e.g., a wide area network (WAN) including the Internet, which might be wireless in part or in whole
  • WAN wide area network
  • Websites hosting a content-aggregation service including websites hosting a social-networking service, often display content to a user using graphical user interface (GUI) functionality called a “content stream”.
  • GUI graphical user interface
  • Such websites determine inclusion or prominence of an item (e.g., an article) in the content stream based at least in part on a personalized user-interest profile which records the user's explicit and implicit relevance feedback as to previous items of content presented in the content stream.
  • Explicit relevance feedback might take the form of user input to a GUI dialog inquiring about the user's interests.
  • Implicit relevance feedback might include the viewing/listening history of the user, e.g., click-throughs and/or other measures of time spent (e.g., time spent viewing, time spent listening, time spent playing, etc.) by the user on categorized content.
  • website 104 might be a website such as Yahoo! News or Google News, which ingests content from the Internet through “push” technology (e.g., a subscription to a web feed such as an RSS feed) and/or “pull” technology (e.g., web crawling), including articles (or Uniform Resource Locators (URLs) for articles).
  • website 104 might host an online social network such as Facebook or Twitter.
  • online social network is to be broadly interpreted to include, for example, any online service, including a social-media service, that allows its users to, among other things: (a) selectively access (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, or other control list) content (e.g., text including articles and web links, images, videos, animations, audio recordings, games and other software, etc.) associated with each other's profiles (e.g., Facebook walls, Flickr photo albums, Pinterest boards, etc.); (b) selectively (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list) broadcast content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) to each other's newsfeeds (e.g.
  • content-aggregation service is to be broadly interpreted to include any online service, including a social-media service, that allows its users to, among other things, access and/or annotate (e.g., comment on) content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) aggregated/ingested by the online service (e.g., using its own curators and/or its own algorithms) and/or posted by its users and presented in a “wall” view or “stream” view.
  • content e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.
  • a website hosting a content-aggregation service might have social features based on a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list that is accessed over the network from a separate website hosting an online social network through an application programming interface (API) exposed by the separate website.
  • API application programming interface
  • Yahoo! News might identify the content items (e.g., articles) in its newsfeed (e.g., as displayed on the front page of Yahoo! News) that have been viewed/read by a user's friends, as listed on a Facebook friend list that the user has authorized Yahoo! News to access.
  • websites 104 and 106 might be composed of a number of servers (e.g., racked servers) connected by a network (e.g., a local area network (LAN) or a WAN) to each other in a cluster (e.g., a load-balancing cluster, a Beowulf cluster, a Hadoop cluster, etc.) or other distributed system which might run website software (e.g., web-server software, database software, search-engine software, etc.), and distributed-computing and/or cloud software such as Map-Reduce, Google File System, Hadoop, Hadoop File System, Hadoop YARN, Pig, Hive, Dremel, CloudBase, etc.
  • a network e.g., a local area network (LAN) or a WAN
  • a cluster e.g., a load-balancing cluster, a Beowulf cluster, a Hadoop cluster, etc.
  • other distributed system which might run website software (e.g., web-server software, database software,
  • the servers in website 104 might be connected to persistent storage 105 and the servers in website 106 might be connected to persistent storage 107 .
  • Persistent storages 105 and 107 might include flash memory, a redundant array of independent disks (RAID), and/or a storage area network (SAN), in an example embodiment.
  • the servers for websites 104 and 106 and/or the persistent storage in persistent storages 105 and 107 might be hosted wholly or partially in a public and/or private cloud, e.g., where the cloud resources serve as a platform-as-a-service (PaaS) or an infrastructure-as-a-service (IaaS).
  • PaaS platform-as-a-service
  • IaaS infrastructure-as-a-service
  • Persistent storages 105 and 107 might be used to store content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) and/or its related data. Additionally, persistent storage 105 might be used to store data related to users and their social contacts (e.g., Facebook friends), as well as software including algorithms and other processes, as described in detail below, for re-locating comments in a thread of comments on an article in a content stream. In an example embodiment, the content stream might be ordered from top to bottom (a) in reverse chronology (e.g., latest in time on top), or (b) according to interestingness scores, including the rankings discussed below.
  • content e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.
  • persistent storage 105 might be used to store data related to users and their social contacts (e.g., Facebook friends), as well as software including algorithms and other processes, as described in detail below
  • some of the content (and/or its related data) might be stored in persistent storages 105 and 107 and might have been received from a content delivery or distribution network (CDN), e.g., Akami Technologies. Or, alternatively, some of the content (and/or its related data) might be delivered directly from the CDN to the personal computer 102 or the mobile device 103 , without being stored in persistent storages 105 and 107 .
  • CDN content delivery or distribution network
  • Personal computer 102 and the servers at websites 104 and 106 might include (1) hardware consisting of one or more microprocessors (e.g., from the x86 family, the ARM family, or the PowerPC family), volatile storage (e.g., RAM), and persistent storage (e.g., flash memory, a hard disk, or a solid-state drive), and (2) an operating system (e.g., Windows, Mac OS, Linux, Windows Server, Mac OS Server, etc.) that runs on the hardware.
  • microprocessors e.g., from the x86 family, the ARM family, or the PowerPC family
  • volatile storage e.g., RAM
  • persistent storage e.g., flash memory, a hard disk, or a solid-state drive
  • an operating system e.g., Windows, Mac OS, Linux, Windows Server, Mac OS Server, etc.
  • mobile device 103 might include (1) hardware consisting of one or more microprocessors (e.g., from the ARM family or the x86 family), volatile storage (e.g., RAM), and persistent storage (e.g., flash memory such as microSD), (2) an operating system (e.g., iOS, webOS, Windows Mobile, Android, Linux, Symbian OS, RIM BlackBerry OS, etc.) that runs on the hardware, and (3) one or more accelerometers, one or more gyroscopes, and a global positioning system (GPS) or other location-identifying type capability.
  • microprocessors e.g., from the ARM family or the x86 family
  • volatile storage e.g., RAM
  • persistent storage e.g., flash memory such as microSD
  • an operating system e.g., iOS, webOS, Windows Mobile, Android, Linux, Symbian OS, RIM BlackBerry OS, etc.
  • GPS global positioning system
  • personal computer 102 and mobile device 103 might each include a browser as an application program or as part of an operating system. Examples of browsers that might execute on personal computer 102 include Internet Explorer, Mozilla Firefox, Safari, and Google Chrome. Examples of browsers that might execute on mobile device 103 include Safari, Mozilla Firefox, Android Browser, and webOS Browser. It will be appreciated that users of personal computer 102 and/or mobile device 103 might use browsers to access content presented by websites 104 and 106 . Alternatively, users of personal computer 102 and/or mobile device 103 might use other application programs (or apps, including hybrid apps that display HTML content) to access content presented by websites 104 and 106 .
  • application programs or apps, including hybrid apps that display HTML content
  • FIG. 2 is a diagram of a pipeline of software modules for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
  • pipeline 201 includes three software modules, identified as modules 202 - 204 , which might run on the servers at website 104 .
  • module 202 extracts sub-dialogues from the comment section of articles displayed (e.g., in a content stream) on Yahoo! News.
  • a “sub-dialogue” consists of two or more comments in a thread of comments, e.g., in chronological order from the top (earliest) to the bottom (latest).
  • the sub-dialogues are: (1) provided to human annotators who label the sub-dialogues as constructive or non-constructive and who annotate the sub-dialogues and comments, e.g., using the specified annotations shown in FIG. 5 , as will be described in further detail below; (2) turned into representations (e.g., vectors) whose sequential values are specified features which will also be described in further detail below.
  • the labeled and annotated sub-dialogues and their corresponding representations might be used by module 203 to train binary classifier 204 , which in an example embodiment might use logistic regression with L1 regularization, as described in Lee et al., “Efficient L 1 Regularized Logistic Regression” (American Association for Artificial Intelligence 2006), which is incorporated herein by reference. Then once trained, binary classifier 204 might be used to determine whether unseen sub-dialogues are constructive or not constructive, as shown in the figure.
  • FIG. 3 is a flowchart diagram of a process for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
  • the operations shown in this figure might be performed by software running on servers at website 104 (e.g., Yahoo! News, Google News, Facebook, Twitter, etc.) using persistent storage 105 or on servers at website 106 (e.g., an online forum such as reddit) using persistent storage 107 .
  • some of the operations shown in this figure might be performed by software (e.g., a client application including, for example, a webpage with embedded JavaScript or ActionScript) running on a client device (e.g., personal computer 102 or mobile device 103 ). It will be appreciated that these operations provide specifics for the general operations depicted in FIG. 2 .
  • software extracts the sub-dialogues from each thread in a corpus from an online forum (e.g., the comment section to articles on Yahoo! News), where each sub-dialogue consists of a series (e.g., two or more) of comments, in operation 301 .
  • the software obtains specified sub-dialogue annotations (e.g., the sub-dialogue annotations listed in FIG. 5 ) for each sub-dialogue and specified comment annotations (e.g., the comment annotations listed in FIG.
  • the specified sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
  • the software might obtain only the constructiveness annotation from the human annotator (e.g., a trained annotator or a worker from Amazon Mechanical Turk).
  • the software verifies the constructiveness annotation for each sub-dialogue, using the other sub-dialogue annotations for the sub-dialogue and the associated specified comment annotations.
  • the software extracts specified features from each sub-dialogue, where the specified features, as described in detail below, are represented as sequential values in a vector.
  • a sequential representation e.g., a struct or record
  • the software uses the specified features for each sub-dialogue (e.g., constructiveness) and the specified sub-dialogue annotations associated with the sub-dialogue to train a binary classifier (e.g., logistic regression with 11 regularization) that determines whether a particular sub-dialogue is constructive.
  • the software might also use the associated specified comment annotations for the sub-dialogue to train the binary classifier.
  • the software obtains a new sub-dialogue from a thread currently displayed in the online forum (e.g., the comment section to articles on Yahoo!
  • the software inputs the specified features extracted from the new sub-dialogue into the trained binary classifier to obtain a determination as to whether the new sub-dialogue is constructive, in operation 307 . Then in operation 308 , the software uses the determination to re-locate the new sub-dialogue in the displayed thread. For example, if a sub-dialogue is determined to be constructive, it might be moved toward the top of a thread. If a sub-dialogue is determined to be non-constructive, it might be moved toward the bottom of a thread.
  • the binary classifier might use logistic regression with L1 regularization.
  • An off-the-shelf (OTS) version of such a binary classifier is included in scikit-learn.
  • the binary classifier might use convolutional neural networks.
  • An off-the-shelf (OTS) version of such a binary classifier is included in TensorFlow.
  • sub-dialogues might be represented using features.
  • features might be calculated for each comment and concatenated together to form a sub-dialogue, so that each comment has its own feature space and/or comment features are weighted equally.
  • features might be calculated for a sub-dialogue “as a whole”.
  • a window might be used.
  • a sub-dialogue with a window of 3 might include a particular comment and the comment prior (e.g., chronologically) to the particular comment and the comment following (e.g., chronologically) the particular comment.
  • other windows might be used, e.g., 5, 7, 9, etc.
  • every window size that is compatible with a particular thread, including the thread itself might be used in a “brute force” approach.
  • the feature values for a sub-dialogue might be weighted, e.g., to reflect decay (or staleness).
  • a sub-dialogue might be sequentially modeled using conditional random fields (CRF) or recurrent neural networks.
  • CRF conditional random fields
  • FIG. 4 shows the comments in a thread in an online forum, in accordance with an example embodiment.
  • the online forum is reddit, but other online forums and comment sections could be substituted here without loss of generality.
  • thread 401 was started by an initial post 402 describing “Loral hops”. Following the initial post, 36 comments were posted, including comments 403 - 406 .
  • Comment 403 appears to be directed to the initial post 402 and describes the purchase of one half pound of Loral hops.
  • comment 404 poses a question as to whether Loral hops would work as a single hop addition for an altbier.
  • comments 405 and 406 respond positively to the question.
  • software performing the operations described in FIG. 3 might determine that the sub-dialogue consisting of comments 404 - 406 is a constructive sub-dialogue and relocate those comments above comment 403 .
  • FIG. 5 shows a table listing specified sub-dialogue annotations and specified comment annotations, in accordance with an example embodiment.
  • these annotations might be added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3 .
  • human annotators e.g., a trained annotator or a worker from Amazon Mechanical Turk
  • the specified sub-dialogue annotations listed at the top of Table 1 in FIG. 5 include Constructiveness (whose enumerated values are “constructive” and “not constructive”), as well as groupings under Type and Agreement.
  • the sub-dialogue annotations might consist solely of Constructiveness.
  • Persuasiveness can take an enumerated value of “persuasive” or “not persuasive”
  • FIG. 6A shows a co-occurrence graph of sub-dialogue annotations with other sub-dialogue annotations, in accordance with an example embodiment.
  • co-occurrence graph 601 might be used to verify the sub-dialogue annotations, including the sub-dialogue annotation as to constructiveness, added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3 .
  • “constructiveness” on either the left or top axis
  • not constructive on either the left or top axis
  • FIG. 6B shows a co-occurrence graph of sub-dialogue annotations with comment annotations, in accordance with an example embodiment.
  • co-occurrence graph 602 might be used to verify the sub-dialogue annotations, including the sub-dialogue annotation as to constructiveness, added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3 .
  • the “constructiveness” on the top axis
  • not constructive” on the top axis
  • sub-dialogues might be represented using features, which in turn might be represented using sequential values in a vector, struct, record, etc.
  • the features used to represent sub-dialogues might include features from one or more of the following feature groups:
  • Comment Features describing the length and popularity of a comment: the number of sentences and average token-length of sentences and character-length of tokens, the number of thumbs up, thumbs down, and thumbs up and down received.
  • Word embeddings from Word2Vec see Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, Distributed representations of words and phrases and their compositionality (2013), in Advances in neural information processing systems, pages 3111-3119, which is incorporated herein by reference).
  • Entity Counts of the named entities by type and average person name length.
  • Influence The total number of comments made and sub-dialogues participated in by the commenter over the course of two consecutive months; the total number of thumbs up, thumbs down, and both thumbs up and down received, and the percent of thumbs received; the total active time of the user during the period; and the activity rate (number of comments/time active).
  • Lexicon features Counts of phrases from different lexicons that appear in the comment.
  • the lexicons are pronouns, expressions conveying certainty, hedges, comparisons, contingencies, expansions, hate words, and opinions. There are also binary features indicating if there are agreement or disagreement phrases in the comment.
  • Subjectivity Normalized count of hedge words, pronouns, and passive constructions; and the subjectivity and polarity scores estimated using TextBlob.
  • Temporal Maximum, minimum, and mean difference between comments and the total elapsed time between the first and last comment.
  • Thread Features describing the thread structure and popularity. Structure features are the number of comments, commenters, and the average number of comments per person. Popularity features are the counts of thumbs up and thumbs down in a thread as well the percent of thumbs up out of total number of thumbs up or down.
  • FIG. 7 shows a table listing precision scores for particular features, in accordance with an example embodiment.
  • an L1-regularized logistic regression classifier was trained using feature groups in isolation. As indicated in Table 3 in FIG. 7 , the following features gave high precision when determining constructiveness: the counts of named entities, the counts of thumbs up and thumbs down in a thread, the comment length, lower- and upper-case characteristics, and the formality score.
  • the inventions might employ various computer-implemented operations involving data stored in computer systems. Any of the operations described herein that form part of the inventions are useful machine operations.
  • the inventions also relate to a device or an apparatus for performing these operations.
  • the apparatus may be specially constructed for the required purposes, such as the carrier network discussed above, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
  • various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • the inventions can also be embodied as computer readable code on a computer readable medium.
  • the computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, DVDs, Flash, magnetic tapes, and other optical and non-optical data storage devices.
  • the computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum. The software obtains one or more sub-dialogue annotations associated with each sub-dialogue. The sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive. The software extracts a plurality of features from each sub-dialogue uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive. Then the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue. The software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive.

Description

BACKGROUND
Online forums such as reddit and comment sections for online articles allow users to converse with one another through the posting of comments in comment threads.
Constructive dialogues in such forums and comment sections tend to increase user engagement with the service hosting the forum or publishing the article, whereas dialogues that are not constructive tend to have the opposite effect.
Determining whether a dialogue is constructive or not is usually not necessarily a difficult task for a literate human. However, it has proven to be a difficult task to automate, at least in part because of the “comment-centric” approaches adopted by most researchers.
Consequently “dialogue-centric” approaches to the automation of this task remain an active area of research and experimentation.
SUMMARY
In an example embodiment, a processor-executed method is described. According to the method, software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum. Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue. The software obtains one or more sub-dialogue annotations associated with each sub-dialogue. The one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive. The software extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive. Then the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue. The software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Then the software uses the determination to re-locate the new sub-dialogue in the thread.
In another example embodiment, an apparatus is described, namely, computer-readable media which persistently store a program for a website hosting an online forum. The program extracts a plurality of sub-dialogues from each thread in a corpus from the online forum. Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue. The program obtains one or more sub-dialogue annotations associated with each sub-dialogue. The one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive. The program extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive. Then the program obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue. The program inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Then the program uses the determination to re-locate the new sub-dialogue in the thread.
Another example embodiment also involves a processor-executed method. According to the method, a processor-executed method is described. According to the method, software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum. Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue. The software obtains one or more sub-dialogue annotations associated with each sub-dialogue. The sub-dialogue annotations are obtained from a human annotator. The one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive. The software extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive. Then the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue. The software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Each sub-dialogue is sequentially modeled using conditional random fields. Then the software uses the determination to re-locate the new sub-dialogue in the thread.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a network diagram showing a website hosting a content-aggregation service and a website hosting an online forum, in accordance with an example embodiment.
FIG. 2 is a diagram of a pipeline of software modules for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
FIG. 3 is a flowchart diagram of a process for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
FIG. 4 shows the comments in a thread in an online forum, in accordance with an example embodiment.
FIG. 5 shows a table listing specified sub-dialogue annotations and specified comment annotations, in accordance with an example embodiment.
FIG. 6A shows a co-occurrence graph of sub-dialogue annotations with other sub-dialogue annotations, in accordance with an example embodiment.
FIG. 6B shows a co-occurrence graph of sub-dialogue annotations with comment annotations, in accordance with an example embodiment.
FIG. 7 shows a table listing precision scores for particular features, in accordance with an example embodiment.
DETAILED DESCRIPTION
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the exemplary embodiments. However, it will be apparent to one skilled in the art that the example embodiments may be practiced without some of these specific details. In other instances, process operations and implementation details have not been described in detail, if already well known.
Subject matter will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific example embodiments. Subject matter may, however, be embodied in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any example embodiments set forth herein; example embodiments are provided merely to be illustrative. Likewise, a reasonably broad scope for claimed or covered subject matter is intended. Among other things, for example, subject matter may be embodied as methods, devices, components, or systems. Accordingly, embodiments may, for example, take the form of hardware, software, firmware or any combination thereof (other than software per se). The following detailed description is, therefore, not intended to be taken in a limiting sense.
Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. Likewise, the phrase “in an example embodiment” as used herein does not necessarily refer to the same embodiment and the phrase “in another example embodiment” as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter include combinations of example embodiments in whole or in part.
In general, terminology may be understood at least in part from usage in context. For example, terms, such as “and”, “or”, or “and/or,” as used herein may include a variety of meanings that may depend at least in part upon the context in which such terms are used. Typically, “or” if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term “one or more” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
As used in this disclosure, the terms “constructive” and “non-constructive” have the following meanings.
Constructive: Constructive conversations do not require “conclusions.” A conversation or argument does not have to have a winner or conclusion to be constructive, as long as there is a clear exchange of ideas, opinions, and information done somewhat respectfully. A constructive conversation should contain one or multiple points of agreement and/or disagreement, all mostly on topic, and be relatively respectful. Comments should contain new information (be informative) and/or attempt to persuade. Comments may also seek to contribute humor, sarcasm, or even meanness if in the context of a passionate attempt at persuasiveness. How much “meanness” degrades the constructiveness is subjective—some people are more tolerant than others of disrespectful language when heated arguments occur.
Non-constructive: Non-constructive conversations are those which are largely unproductive. Usually, the initial commenter's point does not get properly addressed (i.e., conversation does not contain a clear communicative goal; conversation is disconnected), is comprised of few attempts at persuasiveness, and each speech act can be taken in isolation. A sub-dialogue can also be deemed non-constructive if largely negative (i.e., an exchange of insults) or “all over the place” in terms of topic.
FIG. 1 is a network diagram showing a website hosting a content-aggregation service and a website hosting an online forum, in accordance with an example embodiment. As depicted in this figure, a personal computer 102 (e.g., a laptop or other mobile computer) and a mobile device 103 (e.g., a smartphone such as an iPhone, Android, Windows Phone, etc., or a tablet computer such as an iPad, Galaxy, etc.) are connected by a network 101 (e.g., a wide area network (WAN) including the Internet, which might be wireless in part or in whole) with a website 104 hosting a content-aggregation service that publishes articles with comment sections and a website 106 hosting an online forum (e.g., reddit). Websites hosting a content-aggregation service, including websites hosting a social-networking service, often display content to a user using graphical user interface (GUI) functionality called a “content stream”. Such websites determine inclusion or prominence of an item (e.g., an article) in the content stream based at least in part on a personalized user-interest profile which records the user's explicit and implicit relevance feedback as to previous items of content presented in the content stream. Explicit relevance feedback might take the form of user input to a GUI dialog inquiring about the user's interests. Implicit relevance feedback might include the viewing/listening history of the user, e.g., click-throughs and/or other measures of time spent (e.g., time spent viewing, time spent listening, time spent playing, etc.) by the user on categorized content. In an example embodiment, website 104 might be a website such as Yahoo! News or Google News, which ingests content from the Internet through “push” technology (e.g., a subscription to a web feed such as an RSS feed) and/or “pull” technology (e.g., web crawling), including articles (or Uniform Resource Locators (URLs) for articles).
Alternatively, in an example embodiment, website 104 might host an online social network such as Facebook or Twitter. As used here and elsewhere in this disclosure, the term “online social network” is to be broadly interpreted to include, for example, any online service, including a social-media service, that allows its users to, among other things: (a) selectively access (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, or other control list) content (e.g., text including articles and web links, images, videos, animations, audio recordings, games and other software, etc.) associated with each other's profiles (e.g., Facebook walls, Flickr photo albums, Pinterest boards, etc.); (b) selectively (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list) broadcast content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) to each other's newsfeeds (e.g., content/activity streams such as Facebook's News Feed, Twitter's Timeline, Google Plus's Stream, etc.); and/or (c) selectively communicate (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list) with each other (e.g., using a messaging protocol such as email, instant messaging, short message service (SMS), etc.).
And as used in this disclosure, the term “content-aggregation service” is to be broadly interpreted to include any online service, including a social-media service, that allows its users to, among other things, access and/or annotate (e.g., comment on) content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) aggregated/ingested by the online service (e.g., using its own curators and/or its own algorithms) and/or posted by its users and presented in a “wall” view or “stream” view. It will be appreciated that a website hosting a content-aggregation service might have social features based on a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list that is accessed over the network from a separate website hosting an online social network through an application programming interface (API) exposed by the separate website. Thus, for example, Yahoo! News might identify the content items (e.g., articles) in its newsfeed (e.g., as displayed on the front page of Yahoo! News) that have been viewed/read by a user's friends, as listed on a Facebook friend list that the user has authorized Yahoo! News to access.
In an example embodiment, websites 104 and 106 might be composed of a number of servers (e.g., racked servers) connected by a network (e.g., a local area network (LAN) or a WAN) to each other in a cluster (e.g., a load-balancing cluster, a Beowulf cluster, a Hadoop cluster, etc.) or other distributed system which might run website software (e.g., web-server software, database software, search-engine software, etc.), and distributed-computing and/or cloud software such as Map-Reduce, Google File System, Hadoop, Hadoop File System, Hadoop YARN, Pig, Hive, Dremel, CloudBase, etc. The servers in website 104 might be connected to persistent storage 105 and the servers in website 106 might be connected to persistent storage 107. Persistent storages 105 and 107 might include flash memory, a redundant array of independent disks (RAID), and/or a storage area network (SAN), in an example embodiment. In an alternative example embodiment, the servers for websites 104 and 106 and/or the persistent storage in persistent storages 105 and 107 might be hosted wholly or partially in a public and/or private cloud, e.g., where the cloud resources serve as a platform-as-a-service (PaaS) or an infrastructure-as-a-service (IaaS).
Persistent storages 105 and 107 might be used to store content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) and/or its related data. Additionally, persistent storage 105 might be used to store data related to users and their social contacts (e.g., Facebook friends), as well as software including algorithms and other processes, as described in detail below, for re-locating comments in a thread of comments on an article in a content stream. In an example embodiment, the content stream might be ordered from top to bottom (a) in reverse chronology (e.g., latest in time on top), or (b) according to interestingness scores, including the rankings discussed below. In an example embodiment, some of the content (and/or its related data) might be stored in persistent storages 105 and 107 and might have been received from a content delivery or distribution network (CDN), e.g., Akami Technologies. Or, alternatively, some of the content (and/or its related data) might be delivered directly from the CDN to the personal computer 102 or the mobile device 103, without being stored in persistent storages 105 and 107.
Personal computer 102 and the servers at websites 104 and 106 might include (1) hardware consisting of one or more microprocessors (e.g., from the x86 family, the ARM family, or the PowerPC family), volatile storage (e.g., RAM), and persistent storage (e.g., flash memory, a hard disk, or a solid-state drive), and (2) an operating system (e.g., Windows, Mac OS, Linux, Windows Server, Mac OS Server, etc.) that runs on the hardware. Similarly, in an example embodiment, mobile device 103 might include (1) hardware consisting of one or more microprocessors (e.g., from the ARM family or the x86 family), volatile storage (e.g., RAM), and persistent storage (e.g., flash memory such as microSD), (2) an operating system (e.g., iOS, webOS, Windows Mobile, Android, Linux, Symbian OS, RIM BlackBerry OS, etc.) that runs on the hardware, and (3) one or more accelerometers, one or more gyroscopes, and a global positioning system (GPS) or other location-identifying type capability.
Also in an example embodiment, personal computer 102 and mobile device 103 might each include a browser as an application program or as part of an operating system. Examples of browsers that might execute on personal computer 102 include Internet Explorer, Mozilla Firefox, Safari, and Google Chrome. Examples of browsers that might execute on mobile device 103 include Safari, Mozilla Firefox, Android Browser, and webOS Browser. It will be appreciated that users of personal computer 102 and/or mobile device 103 might use browsers to access content presented by websites 104 and 106. Alternatively, users of personal computer 102 and/or mobile device 103 might use other application programs (or apps, including hybrid apps that display HTML content) to access content presented by websites 104 and 106.
FIG. 2 is a diagram of a pipeline of software modules for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment. As shown in this figure, pipeline 201 includes three software modules, identified as modules 202-204, which might run on the servers at website 104. In an example embodiment, module 202 extracts sub-dialogues from the comment section of articles displayed (e.g., in a content stream) on Yahoo! News. As used in this disclosure, a “sub-dialogue” consists of two or more comments in a thread of comments, e.g., in chronological order from the top (earliest) to the bottom (latest). Once extracted, the sub-dialogues are: (1) provided to human annotators who label the sub-dialogues as constructive or non-constructive and who annotate the sub-dialogues and comments, e.g., using the specified annotations shown in FIG. 5, as will be described in further detail below; (2) turned into representations (e.g., vectors) whose sequential values are specified features which will also be described in further detail below. The labeled and annotated sub-dialogues and their corresponding representations (e.g., vectors) might be used by module 203 to train binary classifier 204, which in an example embodiment might use logistic regression with L1 regularization, as described in Lee et al., “Efficient L1 Regularized Logistic Regression” (American Association for Artificial Intelligence 2006), which is incorporated herein by reference. Then once trained, binary classifier 204 might be used to determine whether unseen sub-dialogues are constructive or not constructive, as shown in the figure.
FIG. 3 is a flowchart diagram of a process for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment. In an example embodiment, the operations shown in this figure might be performed by software running on servers at website 104 (e.g., Yahoo! News, Google News, Facebook, Twitter, etc.) using persistent storage 105 or on servers at website 106 (e.g., an online forum such as reddit) using persistent storage 107. In an alternative example embodiment, some of the operations shown in this figure might be performed by software (e.g., a client application including, for example, a webpage with embedded JavaScript or ActionScript) running on a client device (e.g., personal computer 102 or mobile device 103). It will be appreciated that these operations provide specifics for the general operations depicted in FIG. 2.
As depicted in FIG. 3, software (e.g., software running on servers at website 104 or website 106) extracts the sub-dialogues from each thread in a corpus from an online forum (e.g., the comment section to articles on Yahoo! News), where each sub-dialogue consists of a series (e.g., two or more) of comments, in operation 301. In operation 302, the software obtains specified sub-dialogue annotations (e.g., the sub-dialogue annotations listed in FIG. 5) for each sub-dialogue and specified comment annotations (e.g., the comment annotations listed in FIG. 5) for each comment from a human annotator (e.g., a trained annotator or a worker from Amazon Mechanical Turk). The specified sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive. In an alternative example embodiment, the software might obtain only the constructiveness annotation from the human annotator (e.g., a trained annotator or a worker from Amazon Mechanical Turk). In operation 303, the software verifies the constructiveness annotation for each sub-dialogue, using the other sub-dialogue annotations for the sub-dialogue and the associated specified comment annotations. And in operation 304, the software extracts specified features from each sub-dialogue, where the specified features, as described in detail below, are represented as sequential values in a vector. In an alternative example embodiment, a sequential representation (e.g., a struct or record) other than a vector might be used for the specified features. In operation 305, the software uses the specified features for each sub-dialogue (e.g., constructiveness) and the specified sub-dialogue annotations associated with the sub-dialogue to train a binary classifier (e.g., logistic regression with 11 regularization) that determines whether a particular sub-dialogue is constructive. In an example embodiment, the software might also use the associated specified comment annotations for the sub-dialogue to train the binary classifier. In operation 306, the software obtains a new sub-dialogue from a thread currently displayed in the online forum (e.g., the comment section to articles on Yahoo! News) and extracts the specified features from the new sub-dialogue. The software inputs the specified features extracted from the new sub-dialogue into the trained binary classifier to obtain a determination as to whether the new sub-dialogue is constructive, in operation 307. Then in operation 308, the software uses the determination to re-locate the new sub-dialogue in the displayed thread. For example, if a sub-dialogue is determined to be constructive, it might be moved toward the top of a thread. If a sub-dialogue is determined to be non-constructive, it might be moved toward the bottom of a thread.
As noted above, in an example embodiment, the binary classifier might use logistic regression with L1 regularization. An off-the-shelf (OTS) version of such a binary classifier is included in scikit-learn. In an alternative example embodiment, the binary classifier might use convolutional neural networks. An off-the-shelf (OTS) version of such a binary classifier is included in TensorFlow.
As indicated above, sub-dialogues might be represented using features. In an example embodiment, features might be calculated for each comment and concatenated together to form a sub-dialogue, so that each comment has its own feature space and/or comment features are weighted equally. In the same or an alternative example embodiment, features might be calculated for a sub-dialogue “as a whole”. When determining which comments to include in a sub-dialogue for purposes of concatenation or “as a whole”, a window might be used. For example, a sub-dialogue with a window of 3 might include a particular comment and the comment prior (e.g., chronologically) to the particular comment and the comment following (e.g., chronologically) the particular comment. In an alternative example embodiment, other windows might be used, e.g., 5, 7, 9, etc. Also, in an example embodiment, every window size that is compatible with a particular thread, including the thread itself, might be used in a “brute force” approach.
In an example embodiment, the feature values for a sub-dialogue might be weighted, e.g., to reflect decay (or staleness). Also, in an example embodiment, a sub-dialogue might be sequentially modeled using conditional random fields (CRF) or recurrent neural networks.
FIG. 4 shows the comments in a thread in an online forum, in accordance with an example embodiment. In this example, the online forum is reddit, but other online forums and comment sections could be substituted here without loss of generality. As depicted in this figure, thread 401 was started by an initial post 402 describing “Loral hops”. Following the initial post, 36 comments were posted, including comments 403-406. Comment 403 appears to be directed to the initial post 402 and describes the purchase of one half pound of Loral hops. But comment 404 poses a question as to whether Loral hops would work as a single hop addition for an altbier. And comments 405 and 406 respond positively to the question. In an example embodiment, software performing the operations described in FIG. 3 might determine that the sub-dialogue consisting of comments 404-406 is a constructive sub-dialogue and relocate those comments above comment 403.
FIG. 5 shows a table listing specified sub-dialogue annotations and specified comment annotations, in accordance with an example embodiment. In an example embodiment, these annotations might be added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3. It will be appreciated that the specified sub-dialogue annotations listed at the top of Table 1 in FIG. 5 include Constructiveness (whose enumerated values are “constructive” and “not constructive”), as well as groupings under Type and Agreement. In an alternative example embodiment, the sub-dialogue annotations might consist solely of Constructiveness. The specified comment annotations listed at the bottom of Table 1 in FIG. 5 include groupings under Sentiment, Tone, Agreement, Topic, Audience, and Persuasiveness, each of which is associated with two or more enumerated values. The specified comment annotation Persuasiveness can take an enumerated value of “persuasive” or “not persuasive”
FIG. 6A shows a co-occurrence graph of sub-dialogue annotations with other sub-dialogue annotations, in accordance with an example embodiment. In an example embodiment, co-occurrence graph 601 might be used to verify the sub-dialogue annotations, including the sub-dialogue annotation as to constructiveness, added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3. As shown by graph 601, “constructiveness” (on either the left or top axis) tends to co-occur with “agreement throughout”, “initial disagreement converging to agreement”, “personal stories”, and “positive/respectful”. And “not constructive” (on either the left or top axis) tends to co-occur with “flamewar (insulting)”, “off-topic/digression”, and “snarky/humorous”.
FIG. 6B shows a co-occurrence graph of sub-dialogue annotations with comment annotations, in accordance with an example embodiment. In an example embodiment, co-occurrence graph 602 might be used to verify the sub-dialogue annotations, including the sub-dialogue annotation as to constructiveness, added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3. As shown by graph 602, the “constructiveness” (on the top axis) tends to co-occur with comment labels of “persuasive”, “positive”, “informative”, and “sympathetic”. And “not constructive” (on the top axis) tends to co-occur with comment labels of “not persuasive”, “negative”, “funny”, “mean”, “sarcastic”, “off-topic with article”, and “off-topic with conversation”.
As noted earlier, sub-dialogues might be represented using features, which in turn might be represented using sequential values in a vector, struct, record, etc. In an example embodiment, the features used to represent sub-dialogues might include features from one or more of the following feature groups:
Case: Raw counts of: capitalized words, sentences without capitalization, sentences beginning with a lowercase letter, explanation points, question marks, ellipses, and contractions.
Comment: Features describing the length and popularity of a comment: the number of sentences and average token-length of sentences and character-length of tokens, the number of thumbs up, thumbs down, and thumbs up and down received.
Constituency: Max parse tree depth and normalized counts of parse constructions.
Dependency: Raw counts of dependency triples (lexicalized and backed-off to POS (Part-Of-Speech) tags).
Embeddings: Word embeddings from Word2Vec (see Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, Distributed representations of words and phrases and their compositionality (2013), in Advances in neural information processing systems, pages 3111-3119, which is incorporated herein by reference).
Entity: Counts of the named entities by type and average person name length.
Indices: Scores of the comments according to pre-existing style tools: the formality score (see Ellie Pavlick and Ani Nenkova, Inducing lexical style properties for paraphrase and genre differentiation (2015), in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 218-224, Denver, Colo., May-June, Association for Computational Linguistics, which is incorporated herein by reference); politeness score (see Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts, A computational approach to politeness with application to social factors (2013), in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 250-259, Sofia, Bulgaria, August, Association for Computational Linguistics, which is incorporated herein by reference); idea flow (see Vlad Niculae and Cristian Danescu-Niculescu-Mizil, Conversational markers of constructive discussions (2016), in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 568-578, San Diego, Calif., June, Association for Computational Linguistics, which is incorporated herein by reference); Flesch-Kincaid grade level (a readability measure)\; and SMOG index (another readability measure).
Influence: The total number of comments made and sub-dialogues participated in by the commenter over the course of two consecutive months; the total number of thumbs up, thumbs down, and both thumbs up and down received, and the percent of thumbs received; the total active time of the user during the period; and the activity rate (number of comments/time active).
Lexical: Token log frequency in the Web 1T corpus (see Thorsten Brants and Alex Franz (2009), Web 1T 5-gram, 10 European languages version 1, Linguistic Data Consortium, Philadelphia, which is incorporated herein by reference).
Lexicon features: Counts of phrases from different lexicons that appear in the comment. The lexicons are pronouns, expressions conveying certainty, hedges, comparisons, contingencies, expansions, hate words, and opinions. There are also binary features indicating if there are agreement or disagreement phrases in the comment.
N-gram: TF-IDF of token n-grams and normalized counts of part-of-speech n-grams (n=1; 2; 3).
Similarity: Comparison of the comment to the article headline, initial comment, preceding comment, and all previous comments measured by n-gram overlap and cosine similarity.
Subjectivity: Normalized count of hedge words, pronouns, and passive constructions; and the subjectivity and polarity scores estimated using TextBlob.
Temporal: Maximum, minimum, and mean difference between comments and the total elapsed time between the first and last comment.
Thread: Features describing the thread structure and popularity. Structure features are the number of comments, commenters, and the average number of comments per person. Popularity features are the counts of thumbs up and thumbs down in a thread as well the percent of thumbs up out of total number of thumbs up or down.
FIG. 7 shows a table listing precision scores for particular features, in accordance with an example embodiment. In an experiment, an L1-regularized logistic regression classifier was trained using feature groups in isolation. As indicated in Table 3 in FIG. 7, the following features gave high precision when determining constructiveness: the counts of named entities, the counts of thumbs up and thumbs down in a thread, the comment length, lower- and upper-case characteristics, and the formality score.
With the above embodiments in mind, it should be understood that the inventions might employ various computer-implemented operations involving data stored in computer systems. Any of the operations described herein that form part of the inventions are useful machine operations. The inventions also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, such as the carrier network discussed above, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
The inventions can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, DVDs, Flash, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Although example embodiments of the inventions have been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the following claims. Moreover, the operations described above can be ordered, modularized, and/or distributed in any suitable way. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the inventions are not to be limited to the details given herein, but may be modified within the scope and equivalents of the following claims. In the following claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims or implicitly required by the disclosure.

Claims (20)

What is claimed is:
1. A method for performing a plurality of operations, comprising:
extracting a plurality of sub-dialogues from a thread in a corpus from an online forum, wherein a sub-dialogue includes a plurality of comments and the thread includes at least one sub-dialogue;
obtaining one or more sub-dialogue annotations associated with the sub-dialogue, wherein the one or more sub-dialogue annotations are associated with an annotation as to whether the sub-dialogue is constructive based upon one or more features of a first comment of the sub-dialogue and one or more features of a second comment of the sub-dialogue;
extracting a plurality of features from the sub-dialogue, the plurality of features comprising a first set of sub-dialogue features and a second set of sub-dialogue features represented using a vector of values, wherein the extracting the plurality of features comprises:
determining comment features comprising a first comment feature associated with the first comment and a second comment feature associated with the second comment;
weighting the comment features to generate weighted comment features;
generating the first set of sub-dialogue features based upon the weighted comment features; and
generating the second set of sub-dialogue features based upon a window of chronologically ordered comments;
using the vector representing the plurality of features from the sub-dialogue and at least one sub-dialogue annotation associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive;
obtaining a new sub-dialogue from a thread currently displayed in the online forum and extracting one or more features from the new sub-dialogue, the one or more features from the new sub-dialogue represented as a new vector;
inputting the new vector representing the one or more features extracted from the new sub-dialogue into the classifier and obtaining a determination as to whether the new sub-dialogue is constructive, wherein the determination is performed using values of the one or more features of the new sub-dialogue included in the new vector; and
re-locating the new sub-dialogue in the thread based upon the determination, wherein each operation of the plurality of operations is performed by one or more processors.
2. The method of claim 1, the re-locating comprising:
re-locating the new sub-dialogue from a current location in the thread to a first location in the thread responsive to determining that the new sub-dialogue is constructive, the first location above the current location; or
re-locating the new sub-dialogue from the current location in the thread to a second location in the thread responsive to determining that the new sub-dialogue is non-constructive, the second location below the current location.
3. The method of claim 1, comprising determining that the new sub-dialogue is constructive based upon a determination that the new sub-dialogue at least one of:
comprises one or more points of agreement between two or more comments included in the new sub-dialogue;
comprises one or more points of disagreement between the two or more comments included in the new sub-dialogue; or
is on topic across the two or more comments included in the new sub-dialogue.
4. The method of claim 1, comprising determining that the new sub-dialogue is non-constructive based upon a determination that the new sub-dialogue at least one of:
comprises a disconnect between two or more comments included in the new sub-dialogue;
comprises a lack of a communicative goal between the two or more comments included in the new sub-dialogue; or
is off topic across the two or more comments included in the new sub-dialogue.
5. The method of claim 1, comprising determining that the new sub-dialogue is constructive based upon a determination that the new sub-dialogue comprises one or more points of agreement between two or more comments included in the new sub-dialogue.
6. The method of claim 1, comprising determining that the new sub-dialogue is constructive based upon a determination that the new sub-dialogue comprises one or more points of disagreement between two or more comments included in the new sub-dialogue.
7. The method of claim 1, comprising determining that the new sub-dialogue is constructive based upon a determination that the new sub-dialogue is on topic across two or more comments included in the new sub-dialogue.
8. The method of claim 1, comprising determining that the new sub-dialogue is non-constructive based upon a determination that the new sub-dialogue comprises a disconnect between two or more comments included in the new sub-dialogue.
9. The method of claim 1, comprising determining that the new sub-dialogue is non-constructive based upon a determination that the new sub-dialogue comprises a lack of a communicative goal between two or more comments included in the new sub-dialogue.
10. One or more non-transitory computer-readable media persistently storing a program, wherein the program, when executed, instructs a processor to perform operations comprising:
extract a plurality of sub-dialogues from a thread in a corpus from an online forum, wherein a sub-dialogue includes a plurality of comments and the thread includes at least one sub-dialogue;
obtain one or more sub-dialogue annotations associated with the sub-dialogue, wherein the one or more sub-dialogue annotations are associated with an annotation as to whether the sub-dialogue is constructive based upon one or more features of a first comment of the sub-dialogue and one or more features of a second comment of the sub-dialogue;
extract a plurality of features from the sub-dialogue, the plurality of features comprising a first set of sub-dialogue features and a second set of sub-dialogue features represented using a vector of values, wherein the extracting the plurality of features comprises:
determining comment features comprising a first comment feature associated with the first comment and a second comment feature associated with the second comment;
weighting the comment features to generate weighted comment features;
generating the first set of sub-dialogue features based upon the weighted comment features; and
generating the second set of sub-dialogue features based upon a window of chronologically ordered comments;
use the vector representing the plurality of features from the sub-dialogue and at least one sub-dialogue annotation associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive;
obtain a new sub-dialogue from a thread currently displayed in the online forum and extract one or more features from the new sub-dialogue, the one or more features from the new sub-dialogue represented as a new vector;
input the new vector representing the one or more features extracted from the new sub-dialogue into the classifier and obtain a determination as to whether the new sub-dialogue is constructive, wherein the determination is performed using values of the one or more features of the new sub-dialogue included in the new vector; and
re-locate the new sub-dialogue in the thread based upon the determination.
11. The non-transitory computer-readable media of claim 10, wherein the one or more sub-dialogue annotations are obtained from a human annotator.
12. The non-transitory computer-readable media of claim 10, wherein the classifier uses logistic regression with L1 regularization.
13. The non-transitory computer-readable media of claim 10, wherein the classifier uses convolutional neural networks.
14. The non-transitory computer-readable media of claim 10, wherein the plurality of features are represented as sequential values in a vector.
15. The non-transitory computer-readable media of claim 10, wherein the first set of sub-dialogue features are different than the second set of sub-dialogue features.
16. The non-transitory computer-readable media of claim 10, wherein the sub-dialogue is sequentially modeled using conditional random fields.
17. The non-transitory computer-readable media of claim 16, wherein the window is a window of three sequential comments.
18. The non-transitory computer-readable media of claim 10, wherein the plurality of features include a measure of similarity to preceding comments in the sub-dialogue.
19. A method for performing a plurality of operations, comprising:
extracting a plurality of sub-dialogues from a thread in a corpus from an online forum, wherein a sub-dialogue includes a plurality of comments and the thread includes at least one sub-dialogue;
obtaining one or more sub-dialogue annotations associated with the sub-dialogue, wherein the one or more sub-dialogue annotations are associated with an annotation as to whether the sub-dialogue is constructive based upon one or more features of a first comment of the sub-dialogue and one or more features of a second comment of the sub-dialogue, wherein the one or more sub-dialogue annotations are received from an annotator;
extracting a plurality of features from the sub-dialogue, the plurality of features comprising a first set of sub-dialogue features and a second set of sub-dialogue features represented using a vector of values, wherein the extracting the plurality of features comprises:
determining comment features comprising a first comment feature associated with the first comment and a second comment feature associated with the second comment;
weighting the comment features to generate weighted comment features;
generating the first set of sub-dialogue features based upon the weighted comment features; and
generating the second set of sub-dialogue features based upon a window of chronologically ordered comments;
using the vector representing the plurality of features from the sub-dialogue and at least one sub-dialogue annotation associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive;
obtaining a new sub-dialogue from a thread currently displayed in the online forum and extracting one or more features from the new sub-dialogue, the one or more features from the new sub-dialogue represented as a new vector;
inputting the new vector representing the one or more features extracted from the new sub-dialogue into the classifier and obtaining a determination as to whether the new sub-dialogue is constructive, wherein the sub-dialogue is sequentially modeled using conditional random fields and wherein the determination is performed using the values of the one or more features of the new sub-dialogue included in the new vector; and
re-locating the new sub-dialogue in the thread based upon the determination, wherein each operation of the plurality of operations is performed by one or more processors.
20. The method of claim 19, wherein the classifier uses at least one of logistic regression with L1 regularization or convolutional neural networks.
US15/406,565 2017-01-13 2017-01-13 Identifying constructive sub-dialogues Expired - Fee Related US10628737B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/406,565 US10628737B2 (en) 2017-01-13 2017-01-13 Identifying constructive sub-dialogues

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/406,565 US10628737B2 (en) 2017-01-13 2017-01-13 Identifying constructive sub-dialogues

Publications (2)

Publication Number Publication Date
US20180203846A1 US20180203846A1 (en) 2018-07-19
US10628737B2 true US10628737B2 (en) 2020-04-21

Family

ID=62840936

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/406,565 Expired - Fee Related US10628737B2 (en) 2017-01-13 2017-01-13 Identifying constructive sub-dialogues

Country Status (1)

Country Link
US (1) US10628737B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10810373B1 (en) * 2018-10-30 2020-10-20 Oath Inc. Systems and methods for unsupervised neologism normalization of electronic content using embedding space mapping

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10460748B2 (en) 2017-10-04 2019-10-29 The Toronto-Dominion Bank Conversational interface determining lexical personality score for response generation with synonym replacement
US11308110B2 (en) * 2019-08-15 2022-04-19 Rovi Guides, Inc. Systems and methods for pushing content
US11159458B1 (en) 2020-06-10 2021-10-26 Capital One Services, Llc Systems and methods for combining and summarizing emoji responses to generate a text reaction from the emoji responses

Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7594189B1 (en) * 2005-04-21 2009-09-22 Amazon Technologies, Inc. Systems and methods for statistically selecting content items to be used in a dynamically-generated display
US20100030798A1 (en) * 2007-01-23 2010-02-04 Clearwell Systems, Inc. Systems and Methods for Tagging Emails by Discussions
US7930302B2 (en) * 2006-11-22 2011-04-19 Intuit Inc. Method and system for analyzing user-generated content
US7962555B2 (en) * 2006-09-29 2011-06-14 International Business Machines Corporation Advanced discussion thread management using a tag-based categorization system
US20110202512A1 (en) * 2010-02-14 2011-08-18 Georges Pierre Pantanelli Method to obtain a better understanding and/or translation of texts by using semantic analysis and/or artificial intelligence and/or connotations and/or rating
US8386335B1 (en) * 2011-04-04 2013-02-26 Google Inc. Cross-referencing comments
US20130103623A1 (en) * 2011-10-21 2013-04-25 Educational Testing Service Computer-Implemented Systems and Methods for Detection of Sentiment in Writing
US20130179766A1 (en) * 2012-01-05 2013-07-11 Educational Testing Service System and Method for Identifying Organizational Elements in Argumentative or Persuasive Discourse
US20130282362A1 (en) * 2012-03-28 2013-10-24 Lockheed Martin Corporation Identifying cultural background from text
US20140093845A1 (en) * 2011-10-26 2014-04-03 Sk Telecom Co., Ltd. Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same
US20140344261A1 (en) * 2013-05-20 2014-11-20 Chacha Search, Inc Method and system for analyzing a request
US20140351257A1 (en) * 2013-05-22 2014-11-27 Matthew Zuzik Voting and expiring system to rank internet content
US20150032829A1 (en) * 2013-07-29 2015-01-29 Dropbox, Inc. Identifying relevant content in email
US20150179168A1 (en) * 2013-12-20 2015-06-25 Microsoft Corporation Multi-user, Multi-domain Dialog System
US20150186497A1 (en) * 2012-10-02 2015-07-02 Banjo, Inc. Dynamic event detection system and method
US9104750B1 (en) * 2012-05-22 2015-08-11 Google Inc. Using concepts as contexts for query term substitutions
US9386107B1 (en) * 2013-03-06 2016-07-05 Blab, Inc. Analyzing distributed group discussions
US9542669B1 (en) * 2013-03-14 2017-01-10 Blab, Inc. Encoding and using information about distributed group discussions
US9552399B1 (en) * 2013-03-08 2017-01-24 Blab, Inc. Displaying information about distributed group discussions
US9560152B1 (en) * 2016-01-27 2017-01-31 International Business Machines Corporation Personalized summary of online communications
US20170034107A1 (en) * 2015-07-29 2017-02-02 International Business Machines Corporation Annotating content with contextually relevant comments
US20170084269A1 (en) * 2015-09-17 2017-03-23 Panasonic Intellectual Property Management Co., Ltd. Subject estimation system for estimating subject of dialog
US9665551B2 (en) * 2014-08-05 2017-05-30 Linkedin Corporation Leveraging annotation bias to improve annotations
US20170206271A1 (en) * 2016-01-20 2017-07-20 Facebook, Inc. Generating Answers to Questions Using Information Posted By Users on Online Social Networks
US20170228361A1 (en) * 2016-02-10 2017-08-10 Yong Zhang Electronic message information retrieval system
US20170300862A1 (en) * 2016-04-14 2017-10-19 Linkedln Corporation Machine learning algorithm for classifying companies into industries
US9866516B1 (en) * 2011-07-19 2018-01-09 Open Invention Network, Llc Method and apparatus of processing social networking-based user input information
US20180032898A1 (en) * 2016-07-27 2018-02-01 Facebook, Inc. Systems and methods for comment sampling
US20180046614A1 (en) * 2016-08-09 2018-02-15 Panasonic Intellectual Property Management Co., Ltd. Dialogie act estimation method, dialogie act estimation apparatus, and medium

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7594189B1 (en) * 2005-04-21 2009-09-22 Amazon Technologies, Inc. Systems and methods for statistically selecting content items to be used in a dynamically-generated display
US7962555B2 (en) * 2006-09-29 2011-06-14 International Business Machines Corporation Advanced discussion thread management using a tag-based categorization system
US7930302B2 (en) * 2006-11-22 2011-04-19 Intuit Inc. Method and system for analyzing user-generated content
US20100030798A1 (en) * 2007-01-23 2010-02-04 Clearwell Systems, Inc. Systems and Methods for Tagging Emails by Discussions
US9779094B2 (en) * 2008-07-29 2017-10-03 Veritas Technologies Llc Systems and methods for tagging emails by discussions
US20110202512A1 (en) * 2010-02-14 2011-08-18 Georges Pierre Pantanelli Method to obtain a better understanding and/or translation of texts by using semantic analysis and/or artificial intelligence and/or connotations and/or rating
US8386335B1 (en) * 2011-04-04 2013-02-26 Google Inc. Cross-referencing comments
US9866516B1 (en) * 2011-07-19 2018-01-09 Open Invention Network, Llc Method and apparatus of processing social networking-based user input information
US20130103623A1 (en) * 2011-10-21 2013-04-25 Educational Testing Service Computer-Implemented Systems and Methods for Detection of Sentiment in Writing
US20140093845A1 (en) * 2011-10-26 2014-04-03 Sk Telecom Co., Ltd. Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same
US20130179766A1 (en) * 2012-01-05 2013-07-11 Educational Testing Service System and Method for Identifying Organizational Elements in Argumentative or Persuasive Discourse
US20130282362A1 (en) * 2012-03-28 2013-10-24 Lockheed Martin Corporation Identifying cultural background from text
US9104750B1 (en) * 2012-05-22 2015-08-11 Google Inc. Using concepts as contexts for query term substitutions
US20150186497A1 (en) * 2012-10-02 2015-07-02 Banjo, Inc. Dynamic event detection system and method
US9386107B1 (en) * 2013-03-06 2016-07-05 Blab, Inc. Analyzing distributed group discussions
US9674128B1 (en) * 2013-03-06 2017-06-06 Blab, Inc. Analyzing distributed group discussions
US9552399B1 (en) * 2013-03-08 2017-01-24 Blab, Inc. Displaying information about distributed group discussions
US9542669B1 (en) * 2013-03-14 2017-01-10 Blab, Inc. Encoding and using information about distributed group discussions
US20140344261A1 (en) * 2013-05-20 2014-11-20 Chacha Search, Inc Method and system for analyzing a request
US20140351257A1 (en) * 2013-05-22 2014-11-27 Matthew Zuzik Voting and expiring system to rank internet content
US20150032829A1 (en) * 2013-07-29 2015-01-29 Dropbox, Inc. Identifying relevant content in email
US9680782B2 (en) * 2013-07-29 2017-06-13 Dropbox, Inc. Identifying relevant content in email
US20150179168A1 (en) * 2013-12-20 2015-06-25 Microsoft Corporation Multi-user, Multi-domain Dialog System
US9665551B2 (en) * 2014-08-05 2017-05-30 Linkedin Corporation Leveraging annotation bias to improve annotations
US20170034107A1 (en) * 2015-07-29 2017-02-02 International Business Machines Corporation Annotating content with contextually relevant comments
US9923860B2 (en) * 2015-07-29 2018-03-20 International Business Machines Corporation Annotating content with contextually relevant comments
US20170084269A1 (en) * 2015-09-17 2017-03-23 Panasonic Intellectual Property Management Co., Ltd. Subject estimation system for estimating subject of dialog
US20170206271A1 (en) * 2016-01-20 2017-07-20 Facebook, Inc. Generating Answers to Questions Using Information Posted By Users on Online Social Networks
US9560152B1 (en) * 2016-01-27 2017-01-31 International Business Machines Corporation Personalized summary of online communications
US20170228361A1 (en) * 2016-02-10 2017-08-10 Yong Zhang Electronic message information retrieval system
US20170300862A1 (en) * 2016-04-14 2017-10-19 Linkedln Corporation Machine learning algorithm for classifying companies into industries
US20180032898A1 (en) * 2016-07-27 2018-02-01 Facebook, Inc. Systems and methods for comment sampling
US20180046614A1 (en) * 2016-08-09 2018-02-15 Panasonic Intellectual Property Management Co., Ltd. Dialogie act estimation method, dialogie act estimation apparatus, and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FitzGerald, Nicholas, et al. "Exploiting conversational features to detect high-quality blog comments." Canadian Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2011. (Year: 2011). *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10810373B1 (en) * 2018-10-30 2020-10-20 Oath Inc. Systems and methods for unsupervised neologism normalization of electronic content using embedding space mapping
US11636266B2 (en) 2018-10-30 2023-04-25 Yahoo Assets Llc Systems and methods for unsupervised neologism normalization of electronic content using embedding space mapping

Also Published As

Publication number Publication date
US20180203846A1 (en) 2018-07-19

Similar Documents

Publication Publication Date Title
US10902076B2 (en) Ranking and recommending hashtags
US10699077B2 (en) Scalable multilingual named-entity recognition
US10810499B2 (en) Method and apparatus for recommending social media information
Gattani et al. Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach
JP6749110B2 (en) Language identification in social media
Luo et al. An effective approach to tweets opinion retrieval
US20200004882A1 (en) Misinformation detection in online content
US10628737B2 (en) Identifying constructive sub-dialogues
WO2019037258A1 (en) Information recommendation method, device and system, and computer-readable storage medium
US20170315996A1 (en) Focused sentiment classification
US11568274B2 (en) Surfacing unique facts for entities
US9183598B2 (en) Identifying event-specific social discussion threads
US10269080B2 (en) Method and apparatus for providing a response to an input post on a social page of a brand
Okuno et al. A challenge of authorship identification for ten-thousand-scale microblog users
US10621261B2 (en) Matching a comment to a section of a content item based upon a score for the section
Torshizi et al. Automatic Twitter rumor detection based on LSTM classifier
US9020957B1 (en) Systems and methods for enhancing social networking content
US20170193074A1 (en) Finding Related Articles for a Content Stream Using Iterative Merge-Split Clusters
Kapočiūtė-Dzikienė et al. Authorship attribution of internet comments with thousand candidate authors
Hashavit et al. Implicit user modeling in group chat
Cole An information diffusion approach for detecting emotional contagion in online social networks
Feyisetan et al. Quick-and-clean extraction of linked data entities from microblogs
Wang et al. Mining personal interests of microbloggers based on free tags in SINA Weibo
Wang et al. Recognizing sentiment of relations between entities in text
Dandannavar et al. Sentiment Analysis of Real World Big Data–A Review of General Approaches

Legal Events

Date Code Title Description
AS Assignment

Owner name: YAHOO! INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COHEN, COURTNEY NAPOLES;PAPPU, AASISH;TETREAULT, JOEL;SIGNING DATES FROM 20170112 TO 20170113;REEL/FRAME:040995/0212

AS Assignment

Owner name: YAHOO HOLDINGS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO! INC.;REEL/FRAME:042963/0211

Effective date: 20170613

AS Assignment

Owner name: OATH INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO HOLDINGS, INC.;REEL/FRAME:045240/0310

Effective date: 20171231

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: VERIZON MEDIA INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OATH INC.;REEL/FRAME:054258/0635

Effective date: 20201005

AS Assignment

Owner name: VERIZON PATENT AND LICENSING INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VERIZON MEDIA INC.;REEL/FRAME:057453/0431

Effective date: 20210801

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20240421