US10628737B2 - Identifying constructive sub-dialogues - Google Patents
Identifying constructive sub-dialogues Download PDFInfo
- Publication number
- US10628737B2 US10628737B2 US15/406,565 US201715406565A US10628737B2 US 10628737 B2 US10628737 B2 US 10628737B2 US 201715406565 A US201715406565 A US 201715406565A US 10628737 B2 US10628737 B2 US 10628737B2
- Authority
- US
- United States
- Prior art keywords
- dialogue
- sub
- features
- new
- comment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 239000000284 extract Substances 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 21
- 238000007477 logistic regression Methods 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 description 17
- 230000002085 persistent effect Effects 0.000 description 13
- 210000003813 thumb Anatomy 0.000 description 13
- 238000004220 aggregation Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000014759 maintenance of location Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 235000008694 Humulus lupulus Nutrition 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 241000590419 Polygonia interrogationis Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011112 process operation Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000002889 sympathetic effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06F17/241—
-
- G06F17/279—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G06N3/0445—
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H04L67/2833—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/566—Grouping or aggregating service requests, e.g. for unified processing
Definitions
- Online forums such as reddit and comment sections for online articles allow users to converse with one another through the posting of comments in comment threads.
- a processor-executed method is described.
- software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum.
- Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue.
- the software obtains one or more sub-dialogue annotations associated with each sub-dialogue.
- the one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
- the software extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive.
- the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue.
- the software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Then the software uses the determination to re-locate the new sub-dialogue in the thread.
- an apparatus namely, computer-readable media which persistently store a program for a website hosting an online forum.
- the program extracts a plurality of sub-dialogues from each thread in a corpus from the online forum.
- Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue.
- the program obtains one or more sub-dialogue annotations associated with each sub-dialogue.
- the one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
- the program extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive.
- the program obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue.
- the program inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Then the program uses the determination to re-locate the new sub-dialogue in the thread.
- Another example embodiment also involves a processor-executed method.
- a processor-executed method is described.
- software on a website hosting an online forum extracts a plurality of sub-dialogues from each thread in a corpus from the online forum.
- Each sub-dialogue includes a plurality of comments and each thread includes at least one sub-dialogue.
- the software obtains one or more sub-dialogue annotations associated with each sub-dialogue.
- the sub-dialogue annotations are obtained from a human annotator.
- the one or more sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
- the software extracts a plurality of features from each sub-dialogue and uses them and the sub-dialogue annotations associated with the sub-dialogue to train a classifier that determines whether a particular sub-dialogue is constructive. Then the software obtains a new sub-dialogue from a thread currently displayed in the online forum and extracts the plurality of features from the new sub-dialogue. The software inputs the features extracted from the new sub-dialogue into the classifier and obtains a determination as to whether the new sub-dialogue is constructive. Each sub-dialogue is sequentially modeled using conditional random fields. Then the software uses the determination to re-locate the new sub-dialogue in the thread.
- FIG. 1 is a network diagram showing a website hosting a content-aggregation service and a website hosting an online forum, in accordance with an example embodiment.
- FIG. 2 is a diagram of a pipeline of software modules for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
- FIG. 3 is a flowchart diagram of a process for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
- FIG. 4 shows the comments in a thread in an online forum, in accordance with an example embodiment.
- FIG. 5 shows a table listing specified sub-dialogue annotations and specified comment annotations, in accordance with an example embodiment.
- FIG. 6A shows a co-occurrence graph of sub-dialogue annotations with other sub-dialogue annotations, in accordance with an example embodiment.
- FIG. 6B shows a co-occurrence graph of sub-dialogue annotations with comment annotations, in accordance with an example embodiment.
- FIG. 7 shows a table listing precision scores for particular features, in accordance with an example embodiment.
- terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context.
- the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
- Constructive conversations do not require “conclusions.” A conversation or argument does not have to have a winner or conclusion to be constructive, as long as there is a clear exchange of ideas, opinions, and information done somewhat respectfully. A constructive conversation should contain one or multiple points of agreement and/or disagreement, all mostly on topic, and be relatively respectful. Comments should contain new information (be informative) and/or attempt to persuade. Comments may also seek to contribute humor, sarcasm, or even meanness if in the context of a passionate attempt at persuasiveness. How much “meanness” degrades the constructiveness is subjective—some people are more tolerant than others of fearful language when heated arguments occur.
- Non-constructive conversations are those which are largely unproductive. Usually, the initial commenter's point does not get properly addressed (i.e., conversation does not contain a clear communicative goal; conversation is disconnected), is comprised of few attempts at persuasiveness, and each speech act can be taken in isolation. A sub-dialogue can also be deemed non-constructive if largely negative (i.e., an exchange of insults) or “all over the place” in terms of topic.
- FIG. 1 is a network diagram showing a website hosting a content-aggregation service and a website hosting an online forum, in accordance with an example embodiment.
- a personal computer 102 e.g., a laptop or other mobile computer
- a mobile device 103 e.g., a smartphone such as an iPhone, Android, Windows Phone, etc., or a tablet computer such as an iPad, Galaxy, etc.
- a network 101 e.g., a wide area network (WAN) including the Internet, which might be wireless in part or in whole
- WAN wide area network
- Websites hosting a content-aggregation service including websites hosting a social-networking service, often display content to a user using graphical user interface (GUI) functionality called a “content stream”.
- GUI graphical user interface
- Such websites determine inclusion or prominence of an item (e.g., an article) in the content stream based at least in part on a personalized user-interest profile which records the user's explicit and implicit relevance feedback as to previous items of content presented in the content stream.
- Explicit relevance feedback might take the form of user input to a GUI dialog inquiring about the user's interests.
- Implicit relevance feedback might include the viewing/listening history of the user, e.g., click-throughs and/or other measures of time spent (e.g., time spent viewing, time spent listening, time spent playing, etc.) by the user on categorized content.
- website 104 might be a website such as Yahoo! News or Google News, which ingests content from the Internet through “push” technology (e.g., a subscription to a web feed such as an RSS feed) and/or “pull” technology (e.g., web crawling), including articles (or Uniform Resource Locators (URLs) for articles).
- website 104 might host an online social network such as Facebook or Twitter.
- online social network is to be broadly interpreted to include, for example, any online service, including a social-media service, that allows its users to, among other things: (a) selectively access (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, or other control list) content (e.g., text including articles and web links, images, videos, animations, audio recordings, games and other software, etc.) associated with each other's profiles (e.g., Facebook walls, Flickr photo albums, Pinterest boards, etc.); (b) selectively (e.g., according to a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list) broadcast content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) to each other's newsfeeds (e.g.
- content-aggregation service is to be broadly interpreted to include any online service, including a social-media service, that allows its users to, among other things, access and/or annotate (e.g., comment on) content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) aggregated/ingested by the online service (e.g., using its own curators and/or its own algorithms) and/or posted by its users and presented in a “wall” view or “stream” view.
- content e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.
- a website hosting a content-aggregation service might have social features based on a friend list, contact list, buddy list, social graph, interest graph, distribution list, or other control list that is accessed over the network from a separate website hosting an online social network through an application programming interface (API) exposed by the separate website.
- API application programming interface
- Yahoo! News might identify the content items (e.g., articles) in its newsfeed (e.g., as displayed on the front page of Yahoo! News) that have been viewed/read by a user's friends, as listed on a Facebook friend list that the user has authorized Yahoo! News to access.
- websites 104 and 106 might be composed of a number of servers (e.g., racked servers) connected by a network (e.g., a local area network (LAN) or a WAN) to each other in a cluster (e.g., a load-balancing cluster, a Beowulf cluster, a Hadoop cluster, etc.) or other distributed system which might run website software (e.g., web-server software, database software, search-engine software, etc.), and distributed-computing and/or cloud software such as Map-Reduce, Google File System, Hadoop, Hadoop File System, Hadoop YARN, Pig, Hive, Dremel, CloudBase, etc.
- a network e.g., a local area network (LAN) or a WAN
- a cluster e.g., a load-balancing cluster, a Beowulf cluster, a Hadoop cluster, etc.
- other distributed system which might run website software (e.g., web-server software, database software,
- the servers in website 104 might be connected to persistent storage 105 and the servers in website 106 might be connected to persistent storage 107 .
- Persistent storages 105 and 107 might include flash memory, a redundant array of independent disks (RAID), and/or a storage area network (SAN), in an example embodiment.
- the servers for websites 104 and 106 and/or the persistent storage in persistent storages 105 and 107 might be hosted wholly or partially in a public and/or private cloud, e.g., where the cloud resources serve as a platform-as-a-service (PaaS) or an infrastructure-as-a-service (IaaS).
- PaaS platform-as-a-service
- IaaS infrastructure-as-a-service
- Persistent storages 105 and 107 might be used to store content (e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.) and/or its related data. Additionally, persistent storage 105 might be used to store data related to users and their social contacts (e.g., Facebook friends), as well as software including algorithms and other processes, as described in detail below, for re-locating comments in a thread of comments on an article in a content stream. In an example embodiment, the content stream might be ordered from top to bottom (a) in reverse chronology (e.g., latest in time on top), or (b) according to interestingness scores, including the rankings discussed below.
- content e.g., text including articles and/or comments, web links, images, videos, animations, audio recordings, games and other software, etc.
- persistent storage 105 might be used to store data related to users and their social contacts (e.g., Facebook friends), as well as software including algorithms and other processes, as described in detail below
- some of the content (and/or its related data) might be stored in persistent storages 105 and 107 and might have been received from a content delivery or distribution network (CDN), e.g., Akami Technologies. Or, alternatively, some of the content (and/or its related data) might be delivered directly from the CDN to the personal computer 102 or the mobile device 103 , without being stored in persistent storages 105 and 107 .
- CDN content delivery or distribution network
- Personal computer 102 and the servers at websites 104 and 106 might include (1) hardware consisting of one or more microprocessors (e.g., from the x86 family, the ARM family, or the PowerPC family), volatile storage (e.g., RAM), and persistent storage (e.g., flash memory, a hard disk, or a solid-state drive), and (2) an operating system (e.g., Windows, Mac OS, Linux, Windows Server, Mac OS Server, etc.) that runs on the hardware.
- microprocessors e.g., from the x86 family, the ARM family, or the PowerPC family
- volatile storage e.g., RAM
- persistent storage e.g., flash memory, a hard disk, or a solid-state drive
- an operating system e.g., Windows, Mac OS, Linux, Windows Server, Mac OS Server, etc.
- mobile device 103 might include (1) hardware consisting of one or more microprocessors (e.g., from the ARM family or the x86 family), volatile storage (e.g., RAM), and persistent storage (e.g., flash memory such as microSD), (2) an operating system (e.g., iOS, webOS, Windows Mobile, Android, Linux, Symbian OS, RIM BlackBerry OS, etc.) that runs on the hardware, and (3) one or more accelerometers, one or more gyroscopes, and a global positioning system (GPS) or other location-identifying type capability.
- microprocessors e.g., from the ARM family or the x86 family
- volatile storage e.g., RAM
- persistent storage e.g., flash memory such as microSD
- an operating system e.g., iOS, webOS, Windows Mobile, Android, Linux, Symbian OS, RIM BlackBerry OS, etc.
- GPS global positioning system
- personal computer 102 and mobile device 103 might each include a browser as an application program or as part of an operating system. Examples of browsers that might execute on personal computer 102 include Internet Explorer, Mozilla Firefox, Safari, and Google Chrome. Examples of browsers that might execute on mobile device 103 include Safari, Mozilla Firefox, Android Browser, and webOS Browser. It will be appreciated that users of personal computer 102 and/or mobile device 103 might use browsers to access content presented by websites 104 and 106 . Alternatively, users of personal computer 102 and/or mobile device 103 might use other application programs (or apps, including hybrid apps that display HTML content) to access content presented by websites 104 and 106 .
- application programs or apps, including hybrid apps that display HTML content
- FIG. 2 is a diagram of a pipeline of software modules for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
- pipeline 201 includes three software modules, identified as modules 202 - 204 , which might run on the servers at website 104 .
- module 202 extracts sub-dialogues from the comment section of articles displayed (e.g., in a content stream) on Yahoo! News.
- a “sub-dialogue” consists of two or more comments in a thread of comments, e.g., in chronological order from the top (earliest) to the bottom (latest).
- the sub-dialogues are: (1) provided to human annotators who label the sub-dialogues as constructive or non-constructive and who annotate the sub-dialogues and comments, e.g., using the specified annotations shown in FIG. 5 , as will be described in further detail below; (2) turned into representations (e.g., vectors) whose sequential values are specified features which will also be described in further detail below.
- the labeled and annotated sub-dialogues and their corresponding representations might be used by module 203 to train binary classifier 204 , which in an example embodiment might use logistic regression with L1 regularization, as described in Lee et al., “Efficient L 1 Regularized Logistic Regression” (American Association for Artificial Intelligence 2006), which is incorporated herein by reference. Then once trained, binary classifier 204 might be used to determine whether unseen sub-dialogues are constructive or not constructive, as shown in the figure.
- FIG. 3 is a flowchart diagram of a process for determining the constructiveness of a sub-dialogue, in accordance with an example embodiment.
- the operations shown in this figure might be performed by software running on servers at website 104 (e.g., Yahoo! News, Google News, Facebook, Twitter, etc.) using persistent storage 105 or on servers at website 106 (e.g., an online forum such as reddit) using persistent storage 107 .
- some of the operations shown in this figure might be performed by software (e.g., a client application including, for example, a webpage with embedded JavaScript or ActionScript) running on a client device (e.g., personal computer 102 or mobile device 103 ). It will be appreciated that these operations provide specifics for the general operations depicted in FIG. 2 .
- software extracts the sub-dialogues from each thread in a corpus from an online forum (e.g., the comment section to articles on Yahoo! News), where each sub-dialogue consists of a series (e.g., two or more) of comments, in operation 301 .
- the software obtains specified sub-dialogue annotations (e.g., the sub-dialogue annotations listed in FIG. 5 ) for each sub-dialogue and specified comment annotations (e.g., the comment annotations listed in FIG.
- the specified sub-dialogue annotations include an annotation as to whether the sub-dialogue is constructive.
- the software might obtain only the constructiveness annotation from the human annotator (e.g., a trained annotator or a worker from Amazon Mechanical Turk).
- the software verifies the constructiveness annotation for each sub-dialogue, using the other sub-dialogue annotations for the sub-dialogue and the associated specified comment annotations.
- the software extracts specified features from each sub-dialogue, where the specified features, as described in detail below, are represented as sequential values in a vector.
- a sequential representation e.g., a struct or record
- the software uses the specified features for each sub-dialogue (e.g., constructiveness) and the specified sub-dialogue annotations associated with the sub-dialogue to train a binary classifier (e.g., logistic regression with 11 regularization) that determines whether a particular sub-dialogue is constructive.
- the software might also use the associated specified comment annotations for the sub-dialogue to train the binary classifier.
- the software obtains a new sub-dialogue from a thread currently displayed in the online forum (e.g., the comment section to articles on Yahoo!
- the software inputs the specified features extracted from the new sub-dialogue into the trained binary classifier to obtain a determination as to whether the new sub-dialogue is constructive, in operation 307 . Then in operation 308 , the software uses the determination to re-locate the new sub-dialogue in the displayed thread. For example, if a sub-dialogue is determined to be constructive, it might be moved toward the top of a thread. If a sub-dialogue is determined to be non-constructive, it might be moved toward the bottom of a thread.
- the binary classifier might use logistic regression with L1 regularization.
- An off-the-shelf (OTS) version of such a binary classifier is included in scikit-learn.
- the binary classifier might use convolutional neural networks.
- An off-the-shelf (OTS) version of such a binary classifier is included in TensorFlow.
- sub-dialogues might be represented using features.
- features might be calculated for each comment and concatenated together to form a sub-dialogue, so that each comment has its own feature space and/or comment features are weighted equally.
- features might be calculated for a sub-dialogue “as a whole”.
- a window might be used.
- a sub-dialogue with a window of 3 might include a particular comment and the comment prior (e.g., chronologically) to the particular comment and the comment following (e.g., chronologically) the particular comment.
- other windows might be used, e.g., 5, 7, 9, etc.
- every window size that is compatible with a particular thread, including the thread itself might be used in a “brute force” approach.
- the feature values for a sub-dialogue might be weighted, e.g., to reflect decay (or staleness).
- a sub-dialogue might be sequentially modeled using conditional random fields (CRF) or recurrent neural networks.
- CRF conditional random fields
- FIG. 4 shows the comments in a thread in an online forum, in accordance with an example embodiment.
- the online forum is reddit, but other online forums and comment sections could be substituted here without loss of generality.
- thread 401 was started by an initial post 402 describing “Loral hops”. Following the initial post, 36 comments were posted, including comments 403 - 406 .
- Comment 403 appears to be directed to the initial post 402 and describes the purchase of one half pound of Loral hops.
- comment 404 poses a question as to whether Loral hops would work as a single hop addition for an altbier.
- comments 405 and 406 respond positively to the question.
- software performing the operations described in FIG. 3 might determine that the sub-dialogue consisting of comments 404 - 406 is a constructive sub-dialogue and relocate those comments above comment 403 .
- FIG. 5 shows a table listing specified sub-dialogue annotations and specified comment annotations, in accordance with an example embodiment.
- these annotations might be added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3 .
- human annotators e.g., a trained annotator or a worker from Amazon Mechanical Turk
- the specified sub-dialogue annotations listed at the top of Table 1 in FIG. 5 include Constructiveness (whose enumerated values are “constructive” and “not constructive”), as well as groupings under Type and Agreement.
- the sub-dialogue annotations might consist solely of Constructiveness.
- Persuasiveness can take an enumerated value of “persuasive” or “not persuasive”
- FIG. 6A shows a co-occurrence graph of sub-dialogue annotations with other sub-dialogue annotations, in accordance with an example embodiment.
- co-occurrence graph 601 might be used to verify the sub-dialogue annotations, including the sub-dialogue annotation as to constructiveness, added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3 .
- “constructiveness” on either the left or top axis
- not constructive on either the left or top axis
- FIG. 6B shows a co-occurrence graph of sub-dialogue annotations with comment annotations, in accordance with an example embodiment.
- co-occurrence graph 602 might be used to verify the sub-dialogue annotations, including the sub-dialogue annotation as to constructiveness, added by human annotators (e.g., a trained annotator or a worker from Amazon Mechanical Turk) and used to train a binary classifier as described above with respect to FIGS. 2 and 3 .
- the “constructiveness” on the top axis
- not constructive” on the top axis
- sub-dialogues might be represented using features, which in turn might be represented using sequential values in a vector, struct, record, etc.
- the features used to represent sub-dialogues might include features from one or more of the following feature groups:
- Comment Features describing the length and popularity of a comment: the number of sentences and average token-length of sentences and character-length of tokens, the number of thumbs up, thumbs down, and thumbs up and down received.
- Word embeddings from Word2Vec see Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, Distributed representations of words and phrases and their compositionality (2013), in Advances in neural information processing systems, pages 3111-3119, which is incorporated herein by reference).
- Entity Counts of the named entities by type and average person name length.
- Influence The total number of comments made and sub-dialogues participated in by the commenter over the course of two consecutive months; the total number of thumbs up, thumbs down, and both thumbs up and down received, and the percent of thumbs received; the total active time of the user during the period; and the activity rate (number of comments/time active).
- Lexicon features Counts of phrases from different lexicons that appear in the comment.
- the lexicons are pronouns, expressions conveying certainty, hedges, comparisons, contingencies, expansions, hate words, and opinions. There are also binary features indicating if there are agreement or disagreement phrases in the comment.
- Subjectivity Normalized count of hedge words, pronouns, and passive constructions; and the subjectivity and polarity scores estimated using TextBlob.
- Temporal Maximum, minimum, and mean difference between comments and the total elapsed time between the first and last comment.
- Thread Features describing the thread structure and popularity. Structure features are the number of comments, commenters, and the average number of comments per person. Popularity features are the counts of thumbs up and thumbs down in a thread as well the percent of thumbs up out of total number of thumbs up or down.
- FIG. 7 shows a table listing precision scores for particular features, in accordance with an example embodiment.
- an L1-regularized logistic regression classifier was trained using feature groups in isolation. As indicated in Table 3 in FIG. 7 , the following features gave high precision when determining constructiveness: the counts of named entities, the counts of thumbs up and thumbs down in a thread, the comment length, lower- and upper-case characteristics, and the formality score.
- the inventions might employ various computer-implemented operations involving data stored in computer systems. Any of the operations described herein that form part of the inventions are useful machine operations.
- the inventions also relate to a device or an apparatus for performing these operations.
- the apparatus may be specially constructed for the required purposes, such as the carrier network discussed above, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
- various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
- the inventions can also be embodied as computer readable code on a computer readable medium.
- the computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, DVDs, Flash, magnetic tapes, and other optical and non-optical data storage devices.
- the computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/406,565 US10628737B2 (en) | 2017-01-13 | 2017-01-13 | Identifying constructive sub-dialogues |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/406,565 US10628737B2 (en) | 2017-01-13 | 2017-01-13 | Identifying constructive sub-dialogues |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180203846A1 US20180203846A1 (en) | 2018-07-19 |
US10628737B2 true US10628737B2 (en) | 2020-04-21 |
Family
ID=62840936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/406,565 Expired - Fee Related US10628737B2 (en) | 2017-01-13 | 2017-01-13 | Identifying constructive sub-dialogues |
Country Status (1)
Country | Link |
---|---|
US (1) | US10628737B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10810373B1 (en) * | 2018-10-30 | 2020-10-20 | Oath Inc. | Systems and methods for unsupervised neologism normalization of electronic content using embedding space mapping |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10460748B2 (en) | 2017-10-04 | 2019-10-29 | The Toronto-Dominion Bank | Conversational interface determining lexical personality score for response generation with synonym replacement |
US11308110B2 (en) * | 2019-08-15 | 2022-04-19 | Rovi Guides, Inc. | Systems and methods for pushing content |
US11159458B1 (en) | 2020-06-10 | 2021-10-26 | Capital One Services, Llc | Systems and methods for combining and summarizing emoji responses to generate a text reaction from the emoji responses |
Citations (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7594189B1 (en) * | 2005-04-21 | 2009-09-22 | Amazon Technologies, Inc. | Systems and methods for statistically selecting content items to be used in a dynamically-generated display |
US20100030798A1 (en) * | 2007-01-23 | 2010-02-04 | Clearwell Systems, Inc. | Systems and Methods for Tagging Emails by Discussions |
US7930302B2 (en) * | 2006-11-22 | 2011-04-19 | Intuit Inc. | Method and system for analyzing user-generated content |
US7962555B2 (en) * | 2006-09-29 | 2011-06-14 | International Business Machines Corporation | Advanced discussion thread management using a tag-based categorization system |
US20110202512A1 (en) * | 2010-02-14 | 2011-08-18 | Georges Pierre Pantanelli | Method to obtain a better understanding and/or translation of texts by using semantic analysis and/or artificial intelligence and/or connotations and/or rating |
US8386335B1 (en) * | 2011-04-04 | 2013-02-26 | Google Inc. | Cross-referencing comments |
US20130103623A1 (en) * | 2011-10-21 | 2013-04-25 | Educational Testing Service | Computer-Implemented Systems and Methods for Detection of Sentiment in Writing |
US20130179766A1 (en) * | 2012-01-05 | 2013-07-11 | Educational Testing Service | System and Method for Identifying Organizational Elements in Argumentative or Persuasive Discourse |
US20130282362A1 (en) * | 2012-03-28 | 2013-10-24 | Lockheed Martin Corporation | Identifying cultural background from text |
US20140093845A1 (en) * | 2011-10-26 | 2014-04-03 | Sk Telecom Co., Ltd. | Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same |
US20140344261A1 (en) * | 2013-05-20 | 2014-11-20 | Chacha Search, Inc | Method and system for analyzing a request |
US20140351257A1 (en) * | 2013-05-22 | 2014-11-27 | Matthew Zuzik | Voting and expiring system to rank internet content |
US20150032829A1 (en) * | 2013-07-29 | 2015-01-29 | Dropbox, Inc. | Identifying relevant content in email |
US20150179168A1 (en) * | 2013-12-20 | 2015-06-25 | Microsoft Corporation | Multi-user, Multi-domain Dialog System |
US20150186497A1 (en) * | 2012-10-02 | 2015-07-02 | Banjo, Inc. | Dynamic event detection system and method |
US9104750B1 (en) * | 2012-05-22 | 2015-08-11 | Google Inc. | Using concepts as contexts for query term substitutions |
US9386107B1 (en) * | 2013-03-06 | 2016-07-05 | Blab, Inc. | Analyzing distributed group discussions |
US9542669B1 (en) * | 2013-03-14 | 2017-01-10 | Blab, Inc. | Encoding and using information about distributed group discussions |
US9552399B1 (en) * | 2013-03-08 | 2017-01-24 | Blab, Inc. | Displaying information about distributed group discussions |
US9560152B1 (en) * | 2016-01-27 | 2017-01-31 | International Business Machines Corporation | Personalized summary of online communications |
US20170034107A1 (en) * | 2015-07-29 | 2017-02-02 | International Business Machines Corporation | Annotating content with contextually relevant comments |
US20170084269A1 (en) * | 2015-09-17 | 2017-03-23 | Panasonic Intellectual Property Management Co., Ltd. | Subject estimation system for estimating subject of dialog |
US9665551B2 (en) * | 2014-08-05 | 2017-05-30 | Linkedin Corporation | Leveraging annotation bias to improve annotations |
US20170206271A1 (en) * | 2016-01-20 | 2017-07-20 | Facebook, Inc. | Generating Answers to Questions Using Information Posted By Users on Online Social Networks |
US20170228361A1 (en) * | 2016-02-10 | 2017-08-10 | Yong Zhang | Electronic message information retrieval system |
US20170300862A1 (en) * | 2016-04-14 | 2017-10-19 | Linkedln Corporation | Machine learning algorithm for classifying companies into industries |
US9866516B1 (en) * | 2011-07-19 | 2018-01-09 | Open Invention Network, Llc | Method and apparatus of processing social networking-based user input information |
US20180032898A1 (en) * | 2016-07-27 | 2018-02-01 | Facebook, Inc. | Systems and methods for comment sampling |
US20180046614A1 (en) * | 2016-08-09 | 2018-02-15 | Panasonic Intellectual Property Management Co., Ltd. | Dialogie act estimation method, dialogie act estimation apparatus, and medium |
-
2017
- 2017-01-13 US US15/406,565 patent/US10628737B2/en not_active Expired - Fee Related
Patent Citations (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7594189B1 (en) * | 2005-04-21 | 2009-09-22 | Amazon Technologies, Inc. | Systems and methods for statistically selecting content items to be used in a dynamically-generated display |
US7962555B2 (en) * | 2006-09-29 | 2011-06-14 | International Business Machines Corporation | Advanced discussion thread management using a tag-based categorization system |
US7930302B2 (en) * | 2006-11-22 | 2011-04-19 | Intuit Inc. | Method and system for analyzing user-generated content |
US20100030798A1 (en) * | 2007-01-23 | 2010-02-04 | Clearwell Systems, Inc. | Systems and Methods for Tagging Emails by Discussions |
US9779094B2 (en) * | 2008-07-29 | 2017-10-03 | Veritas Technologies Llc | Systems and methods for tagging emails by discussions |
US20110202512A1 (en) * | 2010-02-14 | 2011-08-18 | Georges Pierre Pantanelli | Method to obtain a better understanding and/or translation of texts by using semantic analysis and/or artificial intelligence and/or connotations and/or rating |
US8386335B1 (en) * | 2011-04-04 | 2013-02-26 | Google Inc. | Cross-referencing comments |
US9866516B1 (en) * | 2011-07-19 | 2018-01-09 | Open Invention Network, Llc | Method and apparatus of processing social networking-based user input information |
US20130103623A1 (en) * | 2011-10-21 | 2013-04-25 | Educational Testing Service | Computer-Implemented Systems and Methods for Detection of Sentiment in Writing |
US20140093845A1 (en) * | 2011-10-26 | 2014-04-03 | Sk Telecom Co., Ltd. | Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same |
US20130179766A1 (en) * | 2012-01-05 | 2013-07-11 | Educational Testing Service | System and Method for Identifying Organizational Elements in Argumentative or Persuasive Discourse |
US20130282362A1 (en) * | 2012-03-28 | 2013-10-24 | Lockheed Martin Corporation | Identifying cultural background from text |
US9104750B1 (en) * | 2012-05-22 | 2015-08-11 | Google Inc. | Using concepts as contexts for query term substitutions |
US20150186497A1 (en) * | 2012-10-02 | 2015-07-02 | Banjo, Inc. | Dynamic event detection system and method |
US9386107B1 (en) * | 2013-03-06 | 2016-07-05 | Blab, Inc. | Analyzing distributed group discussions |
US9674128B1 (en) * | 2013-03-06 | 2017-06-06 | Blab, Inc. | Analyzing distributed group discussions |
US9552399B1 (en) * | 2013-03-08 | 2017-01-24 | Blab, Inc. | Displaying information about distributed group discussions |
US9542669B1 (en) * | 2013-03-14 | 2017-01-10 | Blab, Inc. | Encoding and using information about distributed group discussions |
US20140344261A1 (en) * | 2013-05-20 | 2014-11-20 | Chacha Search, Inc | Method and system for analyzing a request |
US20140351257A1 (en) * | 2013-05-22 | 2014-11-27 | Matthew Zuzik | Voting and expiring system to rank internet content |
US20150032829A1 (en) * | 2013-07-29 | 2015-01-29 | Dropbox, Inc. | Identifying relevant content in email |
US9680782B2 (en) * | 2013-07-29 | 2017-06-13 | Dropbox, Inc. | Identifying relevant content in email |
US20150179168A1 (en) * | 2013-12-20 | 2015-06-25 | Microsoft Corporation | Multi-user, Multi-domain Dialog System |
US9665551B2 (en) * | 2014-08-05 | 2017-05-30 | Linkedin Corporation | Leveraging annotation bias to improve annotations |
US20170034107A1 (en) * | 2015-07-29 | 2017-02-02 | International Business Machines Corporation | Annotating content with contextually relevant comments |
US9923860B2 (en) * | 2015-07-29 | 2018-03-20 | International Business Machines Corporation | Annotating content with contextually relevant comments |
US20170084269A1 (en) * | 2015-09-17 | 2017-03-23 | Panasonic Intellectual Property Management Co., Ltd. | Subject estimation system for estimating subject of dialog |
US20170206271A1 (en) * | 2016-01-20 | 2017-07-20 | Facebook, Inc. | Generating Answers to Questions Using Information Posted By Users on Online Social Networks |
US9560152B1 (en) * | 2016-01-27 | 2017-01-31 | International Business Machines Corporation | Personalized summary of online communications |
US20170228361A1 (en) * | 2016-02-10 | 2017-08-10 | Yong Zhang | Electronic message information retrieval system |
US20170300862A1 (en) * | 2016-04-14 | 2017-10-19 | Linkedln Corporation | Machine learning algorithm for classifying companies into industries |
US20180032898A1 (en) * | 2016-07-27 | 2018-02-01 | Facebook, Inc. | Systems and methods for comment sampling |
US20180046614A1 (en) * | 2016-08-09 | 2018-02-15 | Panasonic Intellectual Property Management Co., Ltd. | Dialogie act estimation method, dialogie act estimation apparatus, and medium |
Non-Patent Citations (1)
Title |
---|
FitzGerald, Nicholas, et al. "Exploiting conversational features to detect high-quality blog comments." Canadian Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2011. (Year: 2011). * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10810373B1 (en) * | 2018-10-30 | 2020-10-20 | Oath Inc. | Systems and methods for unsupervised neologism normalization of electronic content using embedding space mapping |
US11636266B2 (en) | 2018-10-30 | 2023-04-25 | Yahoo Assets Llc | Systems and methods for unsupervised neologism normalization of electronic content using embedding space mapping |
Also Published As
Publication number | Publication date |
---|---|
US20180203846A1 (en) | 2018-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10902076B2 (en) | Ranking and recommending hashtags | |
US10699077B2 (en) | Scalable multilingual named-entity recognition | |
US10810499B2 (en) | Method and apparatus for recommending social media information | |
Gattani et al. | Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach | |
JP6749110B2 (en) | Language identification in social media | |
Luo et al. | An effective approach to tweets opinion retrieval | |
US20200004882A1 (en) | Misinformation detection in online content | |
US10628737B2 (en) | Identifying constructive sub-dialogues | |
WO2019037258A1 (en) | Information recommendation method, device and system, and computer-readable storage medium | |
US20170315996A1 (en) | Focused sentiment classification | |
US11568274B2 (en) | Surfacing unique facts for entities | |
US9183598B2 (en) | Identifying event-specific social discussion threads | |
US10269080B2 (en) | Method and apparatus for providing a response to an input post on a social page of a brand | |
Okuno et al. | A challenge of authorship identification for ten-thousand-scale microblog users | |
US10621261B2 (en) | Matching a comment to a section of a content item based upon a score for the section | |
Torshizi et al. | Automatic Twitter rumor detection based on LSTM classifier | |
US9020957B1 (en) | Systems and methods for enhancing social networking content | |
US20170193074A1 (en) | Finding Related Articles for a Content Stream Using Iterative Merge-Split Clusters | |
Kapočiūtė-Dzikienė et al. | Authorship attribution of internet comments with thousand candidate authors | |
Hashavit et al. | Implicit user modeling in group chat | |
Cole | An information diffusion approach for detecting emotional contagion in online social networks | |
Feyisetan et al. | Quick-and-clean extraction of linked data entities from microblogs | |
Wang et al. | Mining personal interests of microbloggers based on free tags in SINA Weibo | |
Wang et al. | Recognizing sentiment of relations between entities in text | |
Dandannavar et al. | Sentiment Analysis of Real World Big Data–A Review of General Approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: YAHOO! INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COHEN, COURTNEY NAPOLES;PAPPU, AASISH;TETREAULT, JOEL;SIGNING DATES FROM 20170112 TO 20170113;REEL/FRAME:040995/0212 |
|
AS | Assignment |
Owner name: YAHOO HOLDINGS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO! INC.;REEL/FRAME:042963/0211 Effective date: 20170613 |
|
AS | Assignment |
Owner name: OATH INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO HOLDINGS, INC.;REEL/FRAME:045240/0310 Effective date: 20171231 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: VERIZON MEDIA INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OATH INC.;REEL/FRAME:054258/0635 Effective date: 20201005 |
|
AS | Assignment |
Owner name: VERIZON PATENT AND LICENSING INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VERIZON MEDIA INC.;REEL/FRAME:057453/0431 Effective date: 20210801 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20240421 |