US20170221128A1 - Sentiment Extraction From Consumer Reviews For Providing Product Recommendations - Google Patents
Sentiment Extraction From Consumer Reviews For Providing Product Recommendations Download PDFInfo
- Publication number
- US20170221128A1 US20170221128A1 US15/489,059 US201715489059A US2017221128A1 US 20170221128 A1 US20170221128 A1 US 20170221128A1 US 201715489059 A US201715489059 A US 201715489059A US 2017221128 A1 US2017221128 A1 US 2017221128A1
- Authority
- US
- United States
- Prior art keywords
- product
- computer readable
- program code
- readable program
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012552 review Methods 0.000 title description 29
- 238000000605 extraction Methods 0.000 title description 18
- 238000000034 method Methods 0.000 claims abstract description 50
- 238000004590 computer program Methods 0.000 claims description 14
- 230000014509 gene expression Effects 0.000 claims description 14
- 230000004044 response Effects 0.000 abstract description 6
- 230000009182 swimming Effects 0.000 description 14
- 230000003340 mental effect Effects 0.000 description 11
- 241000282472 Canis lupus familiaris Species 0.000 description 8
- 238000013459 approach Methods 0.000 description 8
- 239000000284 extract Substances 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000000903 blocking effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000015654 memory Effects 0.000 description 4
- 230000000135 prohibitive effect Effects 0.000 description 4
- 230000002040 relaxant effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010418 babysitting Methods 0.000 description 3
- 238000009795 derivation Methods 0.000 description 3
- 230000009189 diving Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 101100026202 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) neg1 gene Proteins 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000008094 contradictory effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 235000020004 porter Nutrition 0.000 description 2
- 230000000699 topical effect Effects 0.000 description 2
- HGRWHBQLRXWSLV-DEOSSOPVSA-N (4s)-3'-(3,6-dihydro-2h-pyran-5-yl)-1'-fluoro-7'-(3-fluoropyridin-2-yl)spiro[5h-1,3-oxazole-4,5'-chromeno[2,3-c]pyridine]-2-amine Chemical compound C1OC(N)=N[C@]21C1=CC(C=3COCCC=3)=NC(F)=C1OC1=CC=C(C=3C(=CC=CN=3)F)C=C12 HGRWHBQLRXWSLV-DEOSSOPVSA-N 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 101100026203 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) neg-1 gene Proteins 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003749 cleanliness Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003121 nonmonotonic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007790 scraping Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G06F17/2765—
-
- G06F17/2785—
-
- G06F17/30675—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present disclosure is directed to recommending products to users based upon topical and sentiment data extracted from documents about products and providing users quotes from documents relevant to the features of interest to the user.
- a potential buyer may want a camera from which the photographs come out with very true colors as opposed to oversaturated colors.
- Other features, such as the weight of the camera or the complexity of the controls are of lesser concern to this potential buyer.
- a review with many stars may extol the virtues of the ease of changing batteries of the camera and a review of few stars may complain that the camera only has a 3 ⁇ optical zoom. Neither of these reviews is relevant to the potential buyer. In order to determine that however the potential buyer must wade through comments, if any, provided by the reviewer that explain why the reviewer scored the camera a certain way. This is a highly time consuming process.
- the present disclosure presents systems and methods for providing topical sentiment-based recommendations based on a rules-based analysis of electronically stored customer communications.
- One aspect of a system incorporates polarity, topicality and relevance to customer request.
- the system also is able to present to a user, in response to a query about a specific feature about a product, a quote from another consumer's review that is responsive to the user's query.
- the system provides methods for utilizing rule-based natural language processing (NLP) and information extraction (IE) techniques to determine the polarity and topicality of an expression.
- NLP rule-based natural language processing
- IE information extraction
- a system for automatic generation of customer recommendation communications is also provided.
- the present disclosure also provides systems for computing a numeric metric of the aggregate opinion about a hierarchy of topic expressed in a set of customer expressions.
- the disclosure herein also illustrates a system for analyzing a document to determine a product being discussed in the document and sentiments about the product as well as individual features of the product and sentiments about the individual features.
- the system further identifies exemplary sentences or phrases from the analyzed document that provide an opinion of the product or a feature of the product.
- sentiments about the product and its features are aggregated to determine a score which is stored in addition to storing the exemplary sentences or phrases from each analyzed document.
- the system returns recommendations that include scores and sentiments about the product and/or features of interest to the user.
- FIG. 1 is a diagram of the system according to one embodiment.
- FIG. 2 is a diagram of the system architecture according to one embodiment.
- FIG. 3 is a graphic illustrating a method for building a query for each atomic semantic template.
- FIG. 3A is a graphic illustrating a method for building an atomic semantic template.
- FIG. 3B is a graphic illustrating a method for building a dialectic tree with defeasible logic programming.
- FIG. 4 is a flow chart illustrating a method for sentiment extraction from a document determining a score and representative quote for a product feature according to one embodiment.
- FIG. 5 is a flow chart illustrating a method for recommending a product to a user in response to a search query.
- FIG. 6 is a screenshot illustrating the user interface displaying recommendations returned in response to a search query according to one embodiment.
- Text model is an expression which includes a feature and a sentiment.
- Feature is a characteristic of the product and that characteristic can be an item or an abstract characteristic.
- a feature can be a swimming pool or a characteristic such as “family-friendly.”
- Features related to travel themes, amenities of travel products and suitability for categories of travelers include “view”, “good for children”, “good for pets,” “safe for single female travelers,” “safe for teenagers,” “location”, and “ambiance”. If a document indicates that a hotel has a babysitting service, the feature “family friendly” can then be implicit from the feature babysitting service.
- “Sentiment” is an expression of a subjective judgment or determination. “Sentiment” can be embodied by adjectives, verb expressions, negations or indirect indications. Sentiment is also assigned a value, ⁇ 1, ⁇ 0.5, 0, 0.5, or 1. For example, the adjectives “awful,” “so-so,” “good,” and “great” are, respectively, ⁇ 1, ⁇ 0.5, 0.5, and 1. The verb expressions “I dislike” and “I like” are, respectively, ⁇ 0.5 and 0.5. Negations, including, “not”, “would not”, “no”, and “instead” usually refer to a negative sentiment and have a value of ⁇ 1.
- sentiments overlap with features.
- the phrase “clean room” indicates a positive sentiment in the word, “clean.”
- “cleanliness” is a feature of a product, such as a hotel.
- a sentiment value of 0 indicates that there is no sentiment indicated in the phrase extracted.
- “Atoms” are the building blocks of templates. They include sentiments, features, amplifiers (“even,” “very,” “never,” “always”), negations, and mental and special states (“knowing”, “believing”, “changing”, “returning”, “liking”) which are correlated with sentiments. Mental and special states may attenuate them or make irrelevant. Atoms are associated with rules which extract sentiments from text directly, and use other semantic constructions to assess the overall sentiment meaning of sentences. Mental state atoms usually dilute a sentiment or evidence (“did not know if restaurant was good” vs “restaurant was not good”). Certain state atoms might invalidate the sentiment value as well: “swimming pool was not open” as opposed to, “it did not look like swimming pool at all!”
- “Semantic template” is an unordered set of atomic semantic templates. Simple examples are ⁇ sentiment> ⁇ amplifier> ⁇ feature>, ⁇ negation> ⁇ sentiment> ⁇ feature>, ⁇ mental> ⁇ sentiment> ⁇ feature>.
- An “atomic semantic template” includes a ⁇ feature> and a ⁇ sentiment>.
- “Syntactic template” is a word, an ordered list of words or word occurrence rules with certain limitations for the forms of these words. There are multiple syntactic templates for a given semantic template. Each syntactic template corresponds to multiple phrasings of the same meaning. For the atomic semantic template ⁇ sentiment> ⁇ amplifier> ⁇ feature> syntactic templates would be ⁇ “very” or “quite” or “rather”> ⁇ “poor” or “bad” or “unpleasant”> ⁇ “stuff” or “bellman” or “porter” or “reception”>. Such syntactic template forms a rule that would be satisfied by a sentence like “very poor porter”, “quite unpleasant experience with reception.”
- Complex semantic templates consist from multiple atomic semantic templates, like “feature> ⁇ sentiment1 but feature2-sentiment2” where sentiment 1 and sentiment2 have different signs—one positive and the other negative. For example, “room was clean, but the service was not perfect” would be covered by this semantic template.
- “Information retrieval queries” are the implementations of syntactic templates for search through the indexed text. Queries include disjunctions of the allowed forms of words “pool or pools or swimming pool” with distances (numbers of words in between) and prohibited words. An example of a prohibited word in the case of “pool” is “billiard” when the word “pool” is meant to refer to a “swimming pool.” These queries take into account stemmed and tokenized forms of words. An example of an information retrieval query is a span query.
- Span query is a query that combines the constraint on the occurrence of words with the constraints of mutual positions of the words. Span queries implement information extraction based on syntactic templates. An example of a span query would be “put-1-down” which allows 0 or 1 word in between “put” and “down.”
- “Lexicography” are the words which occur in syntactic templates. The words that are part of a lexicography are keywords. There are lexicographies for each of the features of atomic semantic templates. There are global lexicographies for sentiments, amplifiers and negations. Additionally there are lexicographies of positive and negative sentiments that correspond to individual features. There are positive and negative lexicographies for each feature. The structure of lexicographies is hierarchical. The global positive sentiment lexicography includes “nice”, “great”, “quiet.” However, general positive sentiment “quiet” is not part of the positive lexicography for the feature “nightlife.” An example of a lexicography for the feature “family friendly” would include other words that are associated with a place being family friendly. Those include swimming pool, playground and the like. As explained in reference to FIG. 3A , span queries in the disclosed method act to extract quotations. Therefore the lexicography used in building the span queries is necessarily more detailed than a more traditional lexicography.
- FIG. 1 is a diagram of the system according to one embodiment.
- the system comprises a client 105 , a network 110 , and a recommendation system 100 .
- the system 100 further comprises a front end server 115 , a recommendation engine 120 , an extraction engine 125 and a database 130 .
- the elements of the recommendation system 100 are communicatively coupled to each other.
- the client 105 communicates with the recommendation system 100 via the network 110 .
- the client 105 is any computing device adapted to access the recommendation system 100 via the network 110 .
- Possible clients 105 include, but are not limited to, personal computers, workstations, internet-enabled mobile telephones and the like. While only two clients 105 are shown for example purposes, it is contemplated that there will be numerous clients 105 .
- the network 110 is typically the Internet, but may also be any network, including but not limited to a local area network, a metropolitan area network, a wide area network, a mobile, wired or wireless network, a private network, or a virtual private network, and any combination thereof.
- the front end server 115 , recommendation engine 120 and extraction engine 125 are implemented as server programs executing on one or more server-class computers comprising a central processing unit (CPU), memory, network interface, peripheral interfaces, and other well known components.
- the computers themselves preferably run an open-source operating system such as LINUX, have generally high performance CPUs, 1 gigabyte or more of memory, and 100 G or more of disk storage.
- LINUX open-source operating system
- Other types of computers can be used, and it is expected that as more powerful computers are developed in the future, they can be configured in accordance with the teachings here.
- the functionality implemented by any of the elements can be provided from computer program products that are stored in tangible computer accessible storage mediums (e.g., RAM, hard disk, or optical/magnetic media).
- FIG. 2 is a diagram of the system architecture according to one embodiment.
- the recommendation system 100 comprises the front end server 115 , recommendation engine 120 , extraction engine 125 , and recommendation database 130 .
- the extraction engine 125 further comprises a template database 250 , a rules engine 270 , a text indexing engine 275 and a document database 280 .
- the template database stores the syntactic templates 255 , atomic semantic templates 258 , semantic templates 260 and combination semantic templates 265 .
- the document database 280 stores the documents to be analyzed after indexing.
- the databases may be stored using any storage structure known in the art. Examples of database structure include a hash map or hash table.
- the rules engine is implemented on a server computer as described for the other servers in reference to FIG. 1 .
- a system and method as disclosed comprises obtaining information about products from various sources on the internet. These sources include descriptions of the product on the websites of the manufacturer and retailers and comments about the product from consumers which may be at the websites of the manufacturer and retailers but also anywhere else on the internet such as chat rooms, message boards and on web logs. Queries are assembled to extract relevant information from the obtained information. Using the queries, information about the products, features of the products and sentiments about the products and features of the product are extracted and stored. This data is used to determine a score not only for the product as a whole but also features of the product. In addition, the method is capable of local extraction of phrases and sentences about an individual feature of the product and identifying that phrase or sentence as belonging to the individual feature. That is also stored for later presentation to a user searching for that individual feature of a product.
- the system analyzes the search query for features of interest to the user in addition to the product of interest to the user and makes a recommendation of not only a product but also the overall score for that product, a score for the feature of interest to the user and a quote about the feature of interest to the user.
- a lexicography is built for the particular product for which documents will be analyzed. Building lexicographies is well known in the art. The lexicography for a product or feature may be built manually or it may be built by machine. In either scenario, representative documents about the product or feature are analyzed and words that are useful in analyzing and classifying the product or feature are extracted and added to the lexicography.
- “family-friendly,” for a hotel feature words that are explicit, such as references to family and relatives are added. More complex features like “pet-friendly” are then added. This is more complex because references to pets in reviews of hotels are not always relevant to whether or not the hotel did or did not allow pets to stay there. Therefore a more detailed analysis of prohibitive words in syntactic templates is required.
- FIG. 3 is a flow chart illustrating a method for building a query for each semantic template. These queries are then used to extract information from source documents. This process is applied separately for positive sentiments and negative sentiments.
- a complex template is built from atomic templates by enumerating 305 atomic semantic queries. As an example, to build a complex template to ascertain sentiments about a swimming pool at a hotel, atomic templates would include ⁇ sentiment> ⁇ amplifier> ⁇ feature>. Phrases that would be extracted with such a template include, “there was a very nice swimming pool,” and “the pool was very good.” These both are extracted because the order of the words in an atomic template is usually not relevant.
- a syntactic template is built 315 for keywords in the lexicography.
- an atomic semantic template is built 320 .
- the atomic semantic template covers all of the possible phrasings of the same sentiment—the various syntactic templates above for positive sentiments about the swimming pool for example.
- the disjunction of the syntactic templates is used to build 325 a complex semantic template.
- Prohibitive clauses are then added 330 to avoid quote extraction in the wrong context.
- a prohibitive clause is an expression that does not have sentiment associated with it for that feature. For example, for the feature, “pet-friendly,” “dog house” conveys neither positive nor negative sentiment.
- a span query is built 335 for each complex semantic template.
- the resulting span query is assigned 340 a sentiment value which can be expressed as a number, for example, ⁇ 1, ⁇ 0.5, 0, 0.5, or 1.
- Span queries are traditionally implemented as search queries. However, in the disclosed method, the span query also acts as an extraction tool to identify the topic and sentiment of an individual phrase or sentence and then extract that very phrase or sentence as a quotation. Span queries utilized in this capacity are necessarily more complex than those utilized as search queries in conventional keyword searching of documents.
- FIG. 3A is a graphic that describes in further detail the building of an atomic semantic template from its components and the resulting lattice of semantic templates used to analyze a document. These semantic templates make it possible to identify the topic and sentiment of individual sentences and phrases in a document. This improvement allows not only for showing users quotations from documents that are relevant to the user's query, but also allows for determining a score for a product or feature with fewer source documents than are required by traditional statistics-based methods.
- the atomic semantic templates in FIG. 3A are for the analysis of travel products.
- the negative and positive indicators 350 are: Pos1, Pos0.5, Neg1, Neg0.5.
- the parts of speech 352 include N for nouns, A for adjectives, and V for verbs.
- An amplifier 355 is a word or phrase which increases the evidence of the meaning of semantic template to which they are attached. Amplifiers 355 include “never,” “very,” and “always.”
- Mental atoms 360 can distort the meaning of the atomic semantic template.
- Mental atoms which can also be referred to as predicates, have two possible arguments. The first is 1Arg 361 over the agent who commits a mental or communicative action. The second is 2Arg 362 over the agent who passively receives this action. Examples of mental atoms 360 include “knowing” and “recommending.”
- Negating words 363 are those that reverse the polarity of what the sentence would be without the negating word. Negating words include “not” and “no.” For example, “I will come back” and “I will not come back” are similar except that they convey opposite sentiment evidenced only by the word “not” being added to the second sentence.
- the example “did not know about this beach” 383 illustrates the interplay of a mental atom 360 , “know,” and a negating word 363 , “not.” “Not” could reverse the polarity of the sentiment of what the phrase would otherwise be except that the presence of the word “knowing” distorts the meaning. That the speaker did not know of the presence of a beach is neither the positive nor negative about the destination where that beach is located.
- the presence of mental atoms 360 in the atomic semantic template accounts for this eventuality.
- Special cases 365 are concepts that, in addition to the mental atoms 360 , can distort the meaning of the atomic semantic template.
- the special cases 365 identify a group of words that convey the concept.
- the concept for a special case is dependent on the product or feature being analyzed. In the case of travel products, whether or not a consumer would return to that destination, hotel, or restaurant, is an indication of that consumer's satisfaction with the destination, hotel or restaurant. However, in purchasing a camera, the concept of “returning” is not a special case. In the instant example “letting” is another special case 365 . Whether or not a consumer would let someone else go to that destination is indicative of consumer satisfaction. Another word that expresses this special case is “allow.” In 375 , the words “come back” are a phrase that expresses the special case 365 of “returning.”
- Domain atoms 370 are characteristics of a feature. They can be expressed as nouns or adjectives. Examples include “dog friendly,” “clean,” and “quiet.”
- An atomic semantic template is considered to cover a sentence or phrase from a document if this sentence satisfies all constraints from the template.
- Some words can fulfill multiple components of an atomic semantic template. For example, “never” is an amplifier as well as a negating word and can fulfill both requirements in an atomic semantic template. “I will not come back” and “I will never come back” are both covered by the templates ⁇ Neg> ⁇ Ret>375. Only “I will never come back is covered by ⁇ Neg> ⁇ Ret> ⁇ Amplifier>380. That is because “never” is both an amplifier 355 and a negating word 363 .
- Atoms may include the terms in the form of nouns, adjectives or other parts of speech.
- the part of speech may be included in the atomic semantic template if it is relevant to distinguishing the feature.
- An example is ⁇ sentiment> ⁇ feature(noun)> which is fulfilled by “not good for dogs.”
- Another example 385 that covers “not good for dogs” is ⁇ not> ⁇ pos> ⁇ domain>.
- Semantic templates are assigned a template sentiment value. This value may be expressed as a number, ⁇ 1, ⁇ 1 ⁇ 2, 0, +1 ⁇ 2, or 1.
- the atomic semantic templates are arranged in a hierarchy and there is order between some pairs of semantic templates. This order reflects how specific to a given feature or sentiment the atomic semantic templates are.
- ST 2 is an extension of ST 1 because it is more specific.
- template sentiment value of ST1 template sentiment value ST 2. If however, ST 1 ⁇ ST 2 then the template sentiment values are not necessarily the same. Conflicts such as these can occur when an atomic semantic template is an extension of more than one other semantic template.
- semantic templates closer to the top of the figure are less specific and therefore > semantic templates lower down on the figure.
- ST ⁇ Neg> ⁇ Ret> is less specific (>) than ⁇ Neg> ⁇ Ret> ⁇ Amplifier>.
- additional semantic templates with additional constraints are built.
- extension semantic template there is more than one extension semantic template and in this instance they are not all continually more specific. This is referred to as the extension of the template being inconsistent. If they were consistent, the specificity would be described as follows: ST>ST 1 & ST>ST 2 : ST>ST 1 , ST 2 . When they are not consistent, there are conflicts between the atomic semantic templates.
- the conflict between the atomic semantic templates must be resolved. Broadly, this requires a method for reasoning when there are conflicting theorems.
- the conflicting theorems are the conflicting atomic semantic templates.
- Two major classes of methods for reasoning when there are conflicting theorems are non-monotonic reasoning and argumentation-based approaches.
- the conflict between the atomic semantic templates is resolved using an argumentation-based approach which uses the order of the specificity of the atomic semantic templates and defeasible reasoning.
- the approach is a rules-based system that includes two classes of rules—rules that are absolute and those that may be violated.
- rules-based system determines which of the conflicting semantic templates that cover the phrase or sentence is most specific.
- the sentiment of that most specific semantic template will be the sentiment of that extracted phrase or sentence.
- DLP defeasible logic program
- a defeasible logic program (“DLP”) is an example of such a rules-based system.
- DLP is known in the art, for example as set forth by Garcia and Simari in “Defeasible Logic Programming: an argumentative approach” in Theory and Practice of Logic Programming 4(1), 95-138 (2004).
- DLP is applied resolve the inconsistent semantic templates.
- a dialectic tree is built.
- DLP is a set of facts (that the sentence is covered by all of the relevant semantic templates); strict rules, ⁇ ; and defeasible rules, ⁇ .
- the defeasible rules are of the form (ST A > ⁇ ST B ), where ST B >ST A . (an atomic semantic template usually, but not always implies its extension). ST A may occur in a defeasible rule as a negation ( ⁇ ST A > ⁇ ST B ) if there is no order relation between ST A and ST B .
- A is minimal: there is no proper subset A 0 of A such that A 0 satisfies conditions (1) and (2).
- an argument structure ⁇ A, h> is a minimal non-contradictory set of defeasible rules, obtained from a defeasible derivation for a given literal h.
- ⁇ A 1 , h 1 > attacks ⁇ A 2 , h 2 > if there exists a sub-argument ⁇ A, h> of ⁇ A 2 , h 2 >(A ⁇ A 1 ) so that h and h 1 are inconsistent.
- Argumentation line is a sequence of argument structures where each element in a sequence attacks its predecessor. There are a number of acceptability requirements for argumentation lines outlined in Garcia and Simari.
- the definition of dialectic tree gives us an algorithm to discover implicit self-attack relations in users' claims. Let ⁇ A 0 , h 0 > be an argument structure from a program P.
- a dialectical tree for ⁇ A 0 , h 0 > is defined as follows:
- the root of the tree is labeled with ⁇ A 0 , h 0 >
- every vertex (except the root) represents an attack relation to its parent, and leaves correspond to non-attacked arguments. Each path from the root to a leaf corresponds to one different acceptable argumentation line.
- the dialectical tree provides a structure for considering all the possible acceptable argumentation lines that can be generated for deciding whether a most specific ST is defeated. This tree is called dialectical because it represents an exhaustive dialectical analysis for the argument in its root.
- the atomic semantic templates are represented only by their subscript identifiers, a, b, f and h in place of ST a , ST b , ST f and ST h .
- ST a is supported by ⁇ A
- ST a > ⁇ (ST a ⁇ ST b ), (ST b ⁇ ST c ) ⁇ , a> and there exist three defeaters for it, each of them starting three different argumentation lines:
- ⁇ B 1 , ⁇ ST b > ⁇ ( ⁇ ST b ⁇ ST c , ST d ) ⁇ , ⁇ ST b > (proper defeater),
- ⁇ ST b > ⁇ ( ⁇ ST b ⁇ ST c , ST f ), (ST f ⁇ ST g ) ⁇ , ⁇ ST b > (proper defeater),
- ⁇ ST b > ⁇ ( ⁇ ST b ⁇ ST e ) ⁇ , ⁇ ST b > (blocking defeater).
- the argument structure ⁇ B 1 , ⁇ ST b > has the counter-argument ⁇ (ST b ⁇ ST c ) ⁇ , ⁇ ST b >, but it is not a defeater because the former is more specific. Thus, no defeaters for ⁇ B 1 , ⁇ ST b > exist and the argumentation line ends there.
- the argument structure ⁇ B 3 , ⁇ ST b > has a blocking defeater ⁇ (ST b ⁇ ST e ) ⁇ , ⁇ ST b >. It is the disagreement subargument of ⁇ A, ST a >; therefore, it cannot be introduced because it produces an argumentation line that is not acceptable.
- the argument structure ⁇ B 2 , ⁇ ST b > has two defeaters that can be introduced:
- ⁇ ST f > ⁇ ( ⁇ ST f ⁇ ST g , ST h ), (ST h ⁇ ST j ) ⁇ , ⁇ ST f > (proper defeater),
- the argument ⁇ C 1 , ⁇ ST f > has a blocking defeater that can be introduced in the line:
- ⁇ D 1 , ⁇ ST h > ⁇ ( ⁇ ST h ⁇ ST k ) ⁇ , ⁇ ST h >.
- the resultant sentiment value is the one of ST a .
- DLP is one way to determine both the topic and sentiment of a single phrase or sentence extracted from a source document. This degree of accuracy is not possible with traditional statistics-based methods.
- FIG. 4 is a flow chart illustrating a method for sentiment extraction from a document according to one embodiment.
- Documents from which data is extracted are obtained through conventional means. An example is “scraping.”
- Documents are received 405 and stored in the document database 280 after being indexed.
- the text indexing engine 275 tokenizes and stems the text and eliminates stop words. This step can be accomplished by software known in the art such as LUCENETM available from the Apache Software Foundation.
- the text indexing engine 275 identifies ambiguous multiwords, words that have multiple meanings and therefore can be confused with the keywords in the lexicographies, and replaces them with an expression which cannot be confused with those in lexicographies. This addresses exceptions for atypical meanings.
- An example of an atypical meaning would be the phrase “would never complain about the room.” “Never” in this context does not express a negative sentiment about room and the phrase “would never complain about the room” would be substituted by “would never complain” so that the syntactic template rule is not satisfied.
- the rules engine 270 runs 410 queries on the documents. The first and last token number for each satisfied occurrence is found 415 . The first and last token number indicates the positions of the first and last words in the satisfied syntactic template. The product of the query is then those tokens and the document ID of the source document so that the exact location of the quote and the source document are known.
- Another query is run to do a local extraction 425 of quotes that illustrate product features and associated sentiment. Local extraction is possible because of the DLP which resolves conflicts between semantic templates.
- the sentiment for a quote starts with the score of the atomic semantic template adjusted if the sentiment is in the title as described above for determining the value of an extracted sentiment. Additionally for a quote, the sentiment is adjusted by the overall sentiment for the document from which it is extracted as well as the volatility of the sentiment in the document. Both the overall sentiment for the document and volatility of the sentiment in the document is described in further detail below.
- the overall sentiment for a document is then determined 430 .
- This determination is an addition of all of the sentiments for features from the document.
- a document yields the following features and sentiments: good room service +0.5; great balcony +1; not clean ⁇ 1; very noisy ⁇ 1; fridge was loud (noisy again) ⁇ 1; and nice view +1.
- the volatility of the sentiment for the document is determined 435 .
- the volatility of the sentiment for the document is a determination of the degree of uniformity of the sentiment expressed in the document. This is determined by pair-wise comparisons of sentiments expressed in the document in the order in which they appear. By analyzing the sentiments in order, there is not just a comparison of the most positive and most negative sentiments but also how often the sentiment changes within the document. For a document that is a user review of a particular hotel, if the user expresses generally the same level of satisfaction (regardless of the degree of satisfaction) the document has low volatility of sentiment. If the user writing the review adored the swimming pool but found room service to be appallingly slow, the document volatility will likely be high.
- the order of A, B, C and D is the order in which they appeared in the original document from which they were extracted.
- the determination of volatility can be used as an indicator of whether or not sarcasm is being used by the user writing the review as well.
- Reviewers may use sarcasm to express either satisfaction or dissatisfaction. If that sarcasm is not identified by the atomic semantic template, the sentiment expressed may be taken at face value when the document is queried with the templates. Under those circumstances, the volatility of the document however will highlight the seeming inconsistency.
- An example of such a review is, “The pool was gorgeous and the drinks were brought quickly. The sun was shining all week. It was just torture being there.” At face value, this review would result in a middle or lower score for the hotel being reviewed. In the disclosed system, this is counteracted because in addition to an overall score, the volatility of the review is assessed. Because the first sentence includes two very positive sentiments about the hotel and the last sentence is very negative, the volatility score for this review would be higher. That leads to the reducing the score for the review and its quotes, and therefore, less likely that this quote will be shown to
- the sentiment volatility may be incorporated into a reliability score for the document.
- the lower the volatility score of the document (or the higher the reliability), the more useful the document is and the more likely that a quote from the document will be used when recommending the product to a user of the system.
- a heuristic model is used to extract the words that express the topic and sentiment of that phrase or sentence. This allows for enough words to be extracted to give the quote context but analyzes those words for relevance to the topic and sentiment. In contrast, traditional extraction of quotes uses less analysis and takes a pre-determined number of words before and after the word or words of interest without any assessment of whether those words are relevant.
- syntactic analysis determines that the additional extracted words are relevant to the topic determined for that quote—for example, the swimming pool. Semantic analysis determines if the additionally extracted words are related to the product in general—travel in this case—by looking for keywords for travel.
- the product features, associated sentiments, quotes, overall sentiment and sentiment volatility for a document are stored 440 in the recommendation database 130 .
- a score is determined 445 using sentiments extracted from multiple documents.
- the values of the sentiments extracted are summed in the manner that the overall score for a document is calculated as described earlier. Additionally, the determination can include the overall sentiment score for the product from multiple documents and the volatility determinations of the source document.
- a representative quote is determined 450 from the quotes extracted from all documents for that product feature. For example, for the feature “family-friendly” for a given hotel, the following quote from a review may be chosen, “my kids loved the pool.” The quotes are ranked by their sentiment value and the quote with the most positive sentiment value for the feature being illustrated is the representative quote.
- the numeric score for the product feature as well as the representative quote are also stored 455 in the recommendation database 130 .
- FIG. 5 is a flow chart illustrating a method for recommending a product to a user in response to a search query.
- a search query is entered by a user at the client 105 and is received 505 via the network 110 at the recommendation system 100 by the front end server 115 .
- An example is shown at 605 .
- the recommendation engine extracts 510 features from the query.
- the user has requested information on “Family Friendly Hotel Campground” in Maui, the features extracted are “family-friendly” and “campground.”
- the system sorts 515 the extracted product features based on the lexicography.
- a semantic template is built 520 using the user's search query. That template is used to search 525 the recommendation database for products with product features that match the user's query. Searching of the recommendation database with the atomic semantic template may be implemented with a span query.
- the atomic semantic template built for querying the database is similar in nature to the atomic semantic template built to extract that sentiment from source documents originally. Additionally, for each product that matches the user's query, a quote is retrieved that is relevant to the feature of the product of interest to the user. In the above example, there may be many quotes about all of the hotels about any number of the hotel's features.
- the retrieved quotes 625 are relevant to that feature. Relevant quotes include those about children's activities, the swimming pool or babysitting services. Quotes about the hotel bar's top shelf liquor collection would not be relevant and not be retrieved. Additionally, the features of the hotel that make it family-friendly are also extracted and listed for the user 620 . The results of the search are sorted 530 and the recommended products and quotes returned 535 to the user at the client 105 via the front end server 115 and network 110 . The sorting of the results is according to the score for the feature, “family-friendly,” for the hotel. In FIG. 6 , the search results 610 are shown to the user with the first listing 615 having a 98% relevancy.
- the disclosed system and method provides a more efficient recommendation product for users because in addition to general sentiment scores from reviewers, the user is presented with product ratings that are specific to the feature that is of interest to the user. Additionally, the characteristics of the product that led to that score are enumerated for the user as well as quotes from reviewers about the feature are presented.
- a user is provided mechanisms, e.g., by receiving and/or transmitting control signals, to control access to particular information as described herein.
- control signals e.g., by receiving and/or transmitting control signals
- these benefits accrue regardless of whether all or portions of components, e.g., server systems, to support their functionality are located locally or remotely relative to the user.
- FIGS. 3, 3A, 3B, 4 and 5 Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory and executable by a processor. Examples of this are illustrated and described for FIGS. 3, 3A, 3B, 4 and 5 .
- FIGS. 3, 3A, 3B, 4 and 5 These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art.
- An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result, for example, as in describing DLP.
- the steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated.
- Coupled and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
- processing refers to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.
- physical quantities e.g., electronic
- any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
- the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
Description
- Field of Art
- The present disclosure is directed to recommending products to users based upon topical and sentiment data extracted from documents about products and providing users quotes from documents relevant to the features of interest to the user.
- Description of Related Art
- When purchasing online, consumers are interested in researching the product or service they are looking to purchase. Currently, this means reading through reviews written on websites of different vendors that happen to offer the product or service. For example, if the consumer is interested in purchasing a digital camera, several on-line vendors allow consumers to post reviews of cameras on the website. Gathering information from such reviews is still a daunting process as there is little way to sort the reviews for the features that are of interest to any one potential buyer so the potential buyer must read through them manually. Sometimes reviewers rate a product with a given number of stars in addition to making comments. An average high or low number of stars is not necessarily very informative to a potential buyer, especially if he or she is especially concerned about certain features on the camera. For example, a potential buyer may want a camera from which the photographs come out with very true colors as opposed to oversaturated colors. Other features, such as the weight of the camera or the complexity of the controls are of lesser concern to this potential buyer. A review with many stars may extol the virtues of the ease of changing batteries of the camera and a review of few stars may complain that the camera only has a 3× optical zoom. Neither of these reviews is relevant to the potential buyer. In order to determine that however the potential buyer must wade through comments, if any, provided by the reviewer that explain why the reviewer scored the camera a certain way. This is a highly time consuming process.
- Analyzing a document for presence of sentiment within the document via a fine-grained NLP-based textual analysis is disclosed in J. Wiebe, T. Wilson, and M. Bell, “Identifying collocations for recognizing opinions,” in Proceedings of ACL/EACL “01 Workshop on Collocation, (Toulouse, France), July 2001. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques,” in Proceedings of EMNLP 2002, discloses a machine learning classification-based approach utilizing statistics to analyze movie reviews and extract an overall sentiment about the movie. That means that a significantly large sample size is required in order to provide meaningful results. This statistical approach averages through the whole review document and results in a global assessment of the feature and associated sentiment. The approach is not sensitive enough to allow for the extraction of a sentence or phrase from a review and identify both its sentiment and its topic.
- More recently, other researchers have made progress in determining not only the sentiment but the topic about which the sentiment is being expressed. Nigam and Hurst have published a method for determining an overall sentiment for a particular product by analyzing multiple messages posted on-line about that product. “Towards a Robust Metric of Opinion” in Computing Attitude and Affect in Text: Theory and Applications. Shanahan, J., J. Qu, and J. Wiebe, Eds. Dordrecht, Netherlands: Springer, 2006, pp. 265-280. This method also does not allow for, in addition to the overall sentiment and topic determination, local extraction to determine a specific quote from the analyzed messages that exemplifies the sentiment about that topic. Such a quote would be useful to serve as an argument for why this particular product is recommended.
- The present disclosure presents systems and methods for providing topical sentiment-based recommendations based on a rules-based analysis of electronically stored customer communications. One aspect of a system incorporates polarity, topicality and relevance to customer request. The system also is able to present to a user, in response to a query about a specific feature about a product, a quote from another consumer's review that is responsive to the user's query. The system provides methods for utilizing rule-based natural language processing (NLP) and information extraction (IE) techniques to determine the polarity and topicality of an expression. Additionally a system for automatic generation of customer recommendation communications is also provided. The present disclosure also provides systems for computing a numeric metric of the aggregate opinion about a hierarchy of topic expressed in a set of customer expressions.
- The disclosure herein also illustrates a system for analyzing a document to determine a product being discussed in the document and sentiments about the product as well as individual features of the product and sentiments about the individual features. The system further identifies exemplary sentences or phrases from the analyzed document that provide an opinion of the product or a feature of the product.
- Further, sentiments about the product and its features are aggregated to determine a score which is stored in addition to storing the exemplary sentences or phrases from each analyzed document. In response to a query from a user the system returns recommendations that include scores and sentiments about the product and/or features of interest to the user.
- The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the disclosed subject matter.
- The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
-
FIG. 1 is a diagram of the system according to one embodiment. -
FIG. 2 is a diagram of the system architecture according to one embodiment. -
FIG. 3 is a graphic illustrating a method for building a query for each atomic semantic template. -
FIG. 3A is a graphic illustrating a method for building an atomic semantic template. -
FIG. 3B is a graphic illustrating a method for building a dialectic tree with defeasible logic programming. -
FIG. 4 is a flow chart illustrating a method for sentiment extraction from a document determining a score and representative quote for a product feature according to one embodiment. -
FIG. 5 is a flow chart illustrating a method for recommending a product to a user in response to a search query. -
FIG. 6 is a screenshot illustrating the user interface displaying recommendations returned in response to a search query according to one embodiment. - The Figures (Figs.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimable subject matter.
- Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
- In one embodiment, as discussed herein, the following terms are introduced below. The terms are provided for ease of discussion with respect to the principles disclosed herein.
- “Text model” is an expression which includes a feature and a sentiment.
- “Feature” is a characteristic of the product and that characteristic can be an item or an abstract characteristic. A feature can be a swimming pool or a characteristic such as “family-friendly.” Features related to travel themes, amenities of travel products and suitability for categories of travelers include “view”, “good for children”, “good for pets,” “safe for single female travelers,” “safe for teenagers,” “location”, and “ambiance”. If a document indicates that a hotel has a babysitting service, the feature “family friendly” can then be implicit from the feature babysitting service.
- “Sentiment” is an expression of a subjective judgment or determination. “Sentiment” can be embodied by adjectives, verb expressions, negations or indirect indications. Sentiment is also assigned a value, −1, −0.5, 0, 0.5, or 1. For example, the adjectives “awful,” “so-so,” “good,” and “great” are, respectively, −1, −0.5, 0.5, and 1. The verb expressions “I dislike” and “I like” are, respectively, −0.5 and 0.5. Negations, including, “not”, “would not”, “no”, and “instead” usually refer to a negative sentiment and have a value of −1. Examples of indirect indications of sentiment and their values include “would return,” (+0.5); “worth doing,” (+1); “never wanted to leave the room,” (+1 for the feature “Romantic travel”); “never come again” (−1); and “safe to travel on my own,” (+1). In some instances, sentiments overlap with features. For example, the phrase “clean room” indicates a positive sentiment in the word, “clean.” Additionally “cleanliness” is a feature of a product, such as a hotel. A sentiment value of 0 indicates that there is no sentiment indicated in the phrase extracted.
- “Atoms” are the building blocks of templates. They include sentiments, features, amplifiers (“even,” “very,” “never,” “always”), negations, and mental and special states (“knowing”, “believing”, “changing”, “returning”, “liking”) which are correlated with sentiments. Mental and special states may attenuate them or make irrelevant. Atoms are associated with rules which extract sentiments from text directly, and use other semantic constructions to assess the overall sentiment meaning of sentences. Mental state atoms usually dilute a sentiment or evidence (“did not know if restaurant was good” vs “restaurant was not good”). Certain state atoms might invalidate the sentiment value as well: “swimming pool was not open” as opposed to, “it did not look like swimming pool at all!”
- “Semantic template” is an unordered set of atomic semantic templates. Simple examples are <sentiment><amplifier><feature>, <negation><sentiment><feature>, <mental><sentiment><feature>. An “atomic semantic template” includes a <feature> and a <sentiment>.
- “Syntactic template” is a word, an ordered list of words or word occurrence rules with certain limitations for the forms of these words. There are multiple syntactic templates for a given semantic template. Each syntactic template corresponds to multiple phrasings of the same meaning. For the atomic semantic template <sentiment><amplifier><feature> syntactic templates would be <“very” or “quite” or “rather”><“poor” or “bad” or “unpleasant”><“stuff” or “bellman” or “porter” or “reception”>. Such syntactic template forms a rule that would be satisfied by a sentence like “very poor porter”, “quite unpleasant experience with reception.”
- Complex semantic templates consist from multiple atomic semantic templates, like “feature><sentiment1 but feature2-sentiment2” where
sentiment 1 and sentiment2 have different signs—one positive and the other negative. For example, “room was clean, but the service was not perfect” would be covered by this semantic template. - “Information retrieval queries” are the implementations of syntactic templates for search through the indexed text. Queries include disjunctions of the allowed forms of words “pool or pools or swimming pool” with distances (numbers of words in between) and prohibited words. An example of a prohibited word in the case of “pool” is “billiard” when the word “pool” is meant to refer to a “swimming pool.” These queries take into account stemmed and tokenized forms of words. An example of an information retrieval query is a span query.
- “Span query” is a query that combines the constraint on the occurrence of words with the constraints of mutual positions of the words. Span queries implement information extraction based on syntactic templates. An example of a span query would be “put-1-down” which allows 0 or 1 word in between “put” and “down.”
- “Lexicography” are the words which occur in syntactic templates. The words that are part of a lexicography are keywords. There are lexicographies for each of the features of atomic semantic templates. There are global lexicographies for sentiments, amplifiers and negations. Additionally there are lexicographies of positive and negative sentiments that correspond to individual features. There are positive and negative lexicographies for each feature. The structure of lexicographies is hierarchical. The global positive sentiment lexicography includes “nice”, “great”, “quiet.” However, general positive sentiment “quiet” is not part of the positive lexicography for the feature “nightlife.” An example of a lexicography for the feature “family friendly” would include other words that are associated with a place being family friendly. Those include swimming pool, playground and the like. As explained in reference to
FIG. 3A , span queries in the disclosed method act to extract quotations. Therefore the lexicography used in building the span queries is necessarily more detailed than a more traditional lexicography. -
FIG. 1 is a diagram of the system according to one embodiment. The system comprises aclient 105, anetwork 110, and arecommendation system 100. Thesystem 100 further comprises afront end server 115, arecommendation engine 120, anextraction engine 125 and adatabase 130. The elements of therecommendation system 100 are communicatively coupled to each other. Theclient 105 communicates with therecommendation system 100 via thenetwork 110. - The
client 105 is any computing device adapted to access therecommendation system 100 via thenetwork 110.Possible clients 105 include, but are not limited to, personal computers, workstations, internet-enabled mobile telephones and the like. While only twoclients 105 are shown for example purposes, it is contemplated that there will benumerous clients 105. - The
network 110 is typically the Internet, but may also be any network, including but not limited to a local area network, a metropolitan area network, a wide area network, a mobile, wired or wireless network, a private network, or a virtual private network, and any combination thereof. - The
front end server 115,recommendation engine 120 andextraction engine 125 are implemented as server programs executing on one or more server-class computers comprising a central processing unit (CPU), memory, network interface, peripheral interfaces, and other well known components. The computers themselves preferably run an open-source operating system such as LINUX, have generally high performance CPUs, 1 gigabyte or more of memory, and 100 G or more of disk storage. Of course, other types of computers can be used, and it is expected that as more powerful computers are developed in the future, they can be configured in accordance with the teachings here. The functionality implemented by any of the elements can be provided from computer program products that are stored in tangible computer accessible storage mediums (e.g., RAM, hard disk, or optical/magnetic media). -
FIG. 2 is a diagram of the system architecture according to one embodiment. Therecommendation system 100 comprises thefront end server 115,recommendation engine 120,extraction engine 125, andrecommendation database 130. Theextraction engine 125 further comprises atemplate database 250, arules engine 270, atext indexing engine 275 and adocument database 280. The template database stores thesyntactic templates 255, atomicsemantic templates 258,semantic templates 260 and combinationsemantic templates 265. Thedocument database 280 stores the documents to be analyzed after indexing. - The databases may be stored using any storage structure known in the art. Examples of database structure include a hash map or hash table. The rules engine is implemented on a server computer as described for the other servers in reference to
FIG. 1 . - The function of each element of the system is described in further detail in reference to
FIGS. 3-7 . - Very broadly, a system and method as disclosed comprises obtaining information about products from various sources on the internet. These sources include descriptions of the product on the websites of the manufacturer and retailers and comments about the product from consumers which may be at the websites of the manufacturer and retailers but also anywhere else on the internet such as chat rooms, message boards and on web logs. Queries are assembled to extract relevant information from the obtained information. Using the queries, information about the products, features of the products and sentiments about the products and features of the product are extracted and stored. This data is used to determine a score not only for the product as a whole but also features of the product. In addition, the method is capable of local extraction of phrases and sentences about an individual feature of the product and identifying that phrase or sentence as belonging to the individual feature. That is also stored for later presentation to a user searching for that individual feature of a product.
- When a user enters a search query for a product, the system analyzes the search query for features of interest to the user in addition to the product of interest to the user and makes a recommendation of not only a product but also the overall score for that product, a score for the feature of interest to the user and a quote about the feature of interest to the user.
- A lexicography is built for the particular product for which documents will be analyzed. Building lexicographies is well known in the art. The lexicography for a product or feature may be built manually or it may be built by machine. In either scenario, representative documents about the product or feature are analyzed and words that are useful in analyzing and classifying the product or feature are extracted and added to the lexicography. In building a lexicography for the feature, “family-friendly,” for a hotel, feature words that are explicit, such as references to family and relatives are added. More complex features like “pet-friendly” are then added. This is more complex because references to pets in reviews of hotels are not always relevant to whether or not the hotel did or did not allow pets to stay there. Therefore a more detailed analysis of prohibitive words in syntactic templates is required. Other activity-related features also require more detailed analysis. For example, “bat” and “tank” both have meaning in the sports of baseball and diving as well as in other domains such as animals for “bat” and reading for “diving” as in “diving into books.” Furthermore, N-gram word occurrence analysis is performed to verify that syntactic templates created cover most frequent expressions in available product reviews.
- From the lexicography, queries are built for analyzing the documents collected about the product.
FIG. 3 is a flow chart illustrating a method for building a query for each semantic template. These queries are then used to extract information from source documents. This process is applied separately for positive sentiments and negative sentiments. First, a complex template is built from atomic templates by enumerating 305 atomic semantic queries. As an example, to build a complex template to ascertain sentiments about a swimming pool at a hotel, atomic templates would include <sentiment><amplifier><feature>. Phrases that would be extracted with such a template include, “there was a very nice swimming pool,” and “the pool was very good.” These both are extracted because the order of the words in an atomic template is usually not relevant. The lexicography for a particular feature—either the positive or the negative—is embedded 310 into a syntactic template. A syntactic template is built 315 for keywords in the lexicography. Next an atomic semantic template is built 320. The atomic semantic template covers all of the possible phrasings of the same sentiment—the various syntactic templates above for positive sentiments about the swimming pool for example. The disjunction of the syntactic templates is used to build 325 a complex semantic template. Prohibitive clauses are then added 330 to avoid quote extraction in the wrong context. A prohibitive clause is an expression that does not have sentiment associated with it for that feature. For example, for the feature, “pet-friendly,” “dog house” conveys neither positive nor negative sentiment. Therefore, it would be a prohibitive clause when looking to extract quotes that indicate either positive or negative sentiment about the pet friendliness of a hotel. Building of atomic semantic templates is described in further detail in reference toFIG. 3A . A span query is built 335 for each complex semantic template. The resulting span query is assigned 340 a sentiment value which can be expressed as a number, for example, −1, −0.5, 0, 0.5, or 1. Span queries are traditionally implemented as search queries. However, in the disclosed method, the span query also acts as an extraction tool to identify the topic and sentiment of an individual phrase or sentence and then extract that very phrase or sentence as a quotation. Span queries utilized in this capacity are necessarily more complex than those utilized as search queries in conventional keyword searching of documents. -
FIG. 3A is a graphic that describes in further detail the building of an atomic semantic template from its components and the resulting lattice of semantic templates used to analyze a document. These semantic templates make it possible to identify the topic and sentiment of individual sentences and phrases in a document. This improvement allows not only for showing users quotations from documents that are relevant to the user's query, but also allows for determining a score for a product or feature with fewer source documents than are required by traditional statistics-based methods. The atomic semantic templates inFIG. 3A are for the analysis of travel products. The negative andpositive indicators 350 are: Pos1, Pos0.5, Neg1, Neg0.5. The parts ofspeech 352 include N for nouns, A for adjectives, and V for verbs. Anamplifier 355 is a word or phrase which increases the evidence of the meaning of semantic template to which they are attached.Amplifiers 355 include “never,” “very,” and “always.” -
Mental atoms 360 can distort the meaning of the atomic semantic template. Mental atoms, which can also be referred to as predicates, have two possible arguments. The first is1Arg 361 over the agent who commits a mental or communicative action. The second is2Arg 362 over the agent who passively receives this action. Examples ofmental atoms 360 include “knowing” and “recommending.” - Negating
words 363 are those that reverse the polarity of what the sentence would be without the negating word. Negating words include “not” and “no.” For example, “I will come back” and “I will not come back” are similar except that they convey opposite sentiment evidenced only by the word “not” being added to the second sentence. - The example “did not know about this beach” 383 illustrates the interplay of a
mental atom 360, “know,” and a negatingword 363, “not.” “Not” could reverse the polarity of the sentiment of what the phrase would otherwise be except that the presence of the word “knowing” distorts the meaning. That the speaker did not know of the presence of a beach is neither the positive nor negative about the destination where that beach is located. The presence ofmental atoms 360 in the atomic semantic template accounts for this eventuality. -
Special cases 365 are concepts that, in addition to themental atoms 360, can distort the meaning of the atomic semantic template. Thespecial cases 365 identify a group of words that convey the concept. The concept for a special case is dependent on the product or feature being analyzed. In the case of travel products, whether or not a consumer would return to that destination, hotel, or restaurant, is an indication of that consumer's satisfaction with the destination, hotel or restaurant. However, in purchasing a camera, the concept of “returning” is not a special case. In the instant example “letting” is anotherspecial case 365. Whether or not a consumer would let someone else go to that destination is indicative of consumer satisfaction. Another word that expresses this special case is “allow.” In 375, the words “come back” are a phrase that expresses thespecial case 365 of “returning.” -
Domain atoms 370 are characteristics of a feature. They can be expressed as nouns or adjectives. Examples include “dog friendly,” “clean,” and “quiet.” - An atomic semantic template is considered to cover a sentence or phrase from a document if this sentence satisfies all constraints from the template. Some words can fulfill multiple components of an atomic semantic template. For example, “never” is an amplifier as well as a negating word and can fulfill both requirements in an atomic semantic template. “I will not come back” and “I will never come back” are both covered by the templates <Neg><Ret>375. Only “I will never come back is covered by <Neg><Ret><Amplifier>380. That is because “never” is both an
amplifier 355 and a negatingword 363. - Atoms may include the terms in the form of nouns, adjectives or other parts of speech. The part of speech may be included in the atomic semantic template if it is relevant to distinguishing the feature. An example is <sentiment><feature(noun)> which is fulfilled by “not good for dogs.” Another example 385 that covers “not good for dogs” is <not><pos><domain>.
- Semantic templates are assigned a template sentiment value. This value may be expressed as a number, −1, −½, 0, +½, or 1.
- As shown in
FIG. 3A , the atomic semantic templates are arranged in a hierarchy and there is order between some pairs of semantic templates. This order reflects how specific to a given feature or sentiment the atomic semantic templates are. Generally,ST 1>ST 2 if ST 2=ST 1+“additional atom” wherein A>B indicates that B is more specific than A and A<B indicates that A is more specific than B. In the above example, ST 2 is an extension ofST 1 because it is more specific. For determining the sentiment within the hierarchy, ifST 1>ST 2 then template sentiment value of ST1=template sentiment value ST 2. If however,ST 1<ST 2 then the template sentiment values are not necessarily the same. Conflicts such as these can occur when an atomic semantic template is an extension of more than one other semantic template. - In
FIG. 3A , semantic templates closer to the top of the figure are less specific and therefore > semantic templates lower down on the figure. For example, ST<Neg><Ret> is less specific (>) than <Neg><Ret><Amplifier>. For a given semantic template, additional semantic templates with additional constraints are built. - At the top of the hierarchy, two semantic templates are shown, <not><pos><domain>, 385 and <neg><ret>, 375. “Will not come back” is covered by 375 and “Not good for dogs” is covered by 385. These are extended to create more complex semantic templates—“They would not let my dogs in” which is covered by <not><let><They><domain> 387; “I would not let my dogs in—so dirty” which is covered by <not><let><I><domain><neg1> 390; and “They let their dogs bark all night” which is covered by <let><They><domain><neg1> 393 for 385. A more specific template for 380 is 375, <neg><ret><amplifier> which covers “Never come back.” This is accomplished by adding amplifiers, predicate-arguments, mental atoms and special cases. As explained previously the template sentiment for 375 and 380 is the same because ST 2=ST1+<amplifier>.
- In the case of 385, there is more than one extension semantic template and in this instance they are not all continually more specific. This is referred to as the extension of the template being inconsistent. If they were consistent, the specificity would be described as follows: ST>ST1 & ST>ST2: ST>ST1, ST2. When they are not consistent, there are conflicts between the atomic semantic templates.
- In order to complete the analysis of a phrase or sentence and determine both the topic and sentiment, the conflict between the atomic semantic templates must be resolved. Broadly, this requires a method for reasoning when there are conflicting theorems. In the disclosed method, the conflicting theorems are the conflicting atomic semantic templates. Two major classes of methods for reasoning when there are conflicting theorems are non-monotonic reasoning and argumentation-based approaches.
- Within argumentation-based approaches, there are several options. These vary in how the strength of the arguments is measured. Some methods are based on the abstract binary order between the rules.
- In the disclosed method, the conflict between the atomic semantic templates is resolved using an argumentation-based approach which uses the order of the specificity of the atomic semantic templates and defeasible reasoning. The approach is a rules-based system that includes two classes of rules—rules that are absolute and those that may be violated. Such a rules-based system determines which of the conflicting semantic templates that cover the phrase or sentence is most specific. The sentiment of that most specific semantic template will be the sentiment of that extracted phrase or sentence.
- A defeasible logic program (“DLP”) is an example of such a rules-based system. DLP is known in the art, for example as set forth by Garcia and Simari in “Defeasible Logic Programming: an argumentative approach” in Theory and Practice of Logic Programming 4(1), 95-138 (2004). For the disclosed method, DLP is applied resolve the inconsistent semantic templates. In DLP, a dialectic tree is built. DLP is a set of facts (that the sentence is covered by all of the relevant semantic templates); strict rules, Π; and defeasible rules, Δ. The strict rules are of the following form: STA:−STB, when it is always true that template sentiment value (STA)=template sentiment value (STB). The defeasible rules are of the form (STA>−STB), where STB>STA. (an atomic semantic template usually, but not always implies its extension). STA may occur in a defeasible rule as a negation (˜STA>−STB) if there is no order relation between STA and STB.
- Let P=(Π, Δ) be a de.l.p. and L a ground literal. A defeasible derivation of L from P consists of a finite sequence L1, L2, . . . , Ln=L of ground literals, and each literal Li is in the sequence because:
- (a) Li is a fact in Π, or
- (b) there exists a rule Ri in P (strict or defeasible) with head L and body B1, B2, . . . , Bk and every literal of the body is an element LL of the sequence appearing before LL (j<i)
- Let h be a literal, and P=(Π, Δ) a DLP. <A, h> is an argument structure for h, if A is a set of defeasible rules of Δ, such that:
- 1. there exists a defeasible derivation for h from =(Π∪A)
- 2. the set (Π∪A) is non-contradictory, and
- 3. A is minimal: there is no proper subset A0 of A such that A0 satisfies conditions (1) and (2).
- Hence an argument structure <A, h> is a minimal non-contradictory set of defeasible rules, obtained from a defeasible derivation for a given literal h. <A1, h1> attacks <A2, h2> if there exists a sub-argument <A, h> of <A2, h2>(A⊂A1) so that h and h1 are inconsistent. Argumentation line is a sequence of argument structures where each element in a sequence attacks its predecessor. There are a number of acceptability requirements for argumentation lines outlined in Garcia and Simari. The definition of dialectic tree gives us an algorithm to discover implicit self-attack relations in users' claims. Let <A0, h0> be an argument structure from a program P. A dialectical tree for <A0, h0> is defined as follows:
- 1. The root of the tree is labeled with <A0, h0>
- 2. Let N be a non-root vertex of the tree labeled <An, hn> and Λ=[<A0, h0>, <A1, h1>, . . . , <An, hn>] the sequence of labels of the path from the root to N. Let [<B0, q0>, <B1, q1>, <Bk, qk>] all attack <An, hn>. For each attacker <Bi, q1> with acceptable argumentation line [Λ,<Bi, qi>], there is an arc between N and its child Ni.
- For example if there are the following semantic templates that cover a sentence: {STe, STd, STe, STj, STg STk, STi}, and STa is the most specific template, whose sentiment value should be determined. Other semantic templates which link STa with {STc, STd, STe STj, STg STk, STi} in the following DLP:
- STa−<STb; STb−<STc; ˜STb−<STc, STd; ˜STb−<STe; ˜STf−<STg, STh; ˜STh−<STj; ˜STb−<STc, STf; STf−<STg; ˜STf−<STi; ˜STh−<STk
- In a dialectical tree every vertex (except the root) represents an attack relation to its parent, and leaves correspond to non-attacked arguments. Each path from the root to a leaf corresponds to one different acceptable argumentation line. As shown in
FIG. 3B , the dialectical tree provides a structure for considering all the possible acceptable argumentation lines that can be generated for deciding whether a most specific ST is defeated. This tree is called dialectical because it represents an exhaustive dialectical analysis for the argument in its root. InFIG. 3B , the atomic semantic templates are represented only by their subscript identifiers, a, b, f and h in place of STa, STb, STf and STh. - Here, STa is supported by <A, STa>=<{(STa−<STb), (STb−<STc)}, a> and there exist three defeaters for it, each of them starting three different argumentation lines:
- <B1, ˜STb>=<{(˜STb−<STc, STd)}, ˜STb> (proper defeater),
- <B2, ˜STb>=<{(˜STb−<STc, STf), (STf−<STg)}, ˜STb> (proper defeater),
- <B3, ˜STb>=<{(˜STb−<STe)}, ˜STb> (blocking defeater).
- The argument structure <B1, ˜STb> has the counter-argument <{(STb−<STc)}, ˜STb>, but it is not a defeater because the former is more specific. Thus, no defeaters for <B1, ˜STb> exist and the argumentation line ends there. The argument structure <B3, ˜STb> has a blocking defeater <{(STb−<STe)}, ˜STb>. It is the disagreement subargument of <A, STa>; therefore, it cannot be introduced because it produces an argumentation line that is not acceptable.
- The argument structure <B2, ˜STb> has two defeaters that can be introduced:
- <C1, ˜STf>=<{(˜STf−<STg, STh), (STh−<STj)}, ˜STf> (proper defeater),
- <C2, ˜STf>=<{(˜STf−<STi)}, ˜STf> (blocking defeater),
- Thus, one of the lines is split in two argumentation lines. The argument <C1, ˜STf> has a blocking defeater that can be introduced in the line:
- <D1, ˜STh>=<{(˜STh−<STk)}, ˜STh>. Both <D1, ˜STh> and <C2, ˜STf> have a blocking defeater, but they cannot be introduced, because they make the argumentation line not acceptable. Hence the resultant sentiment value is the one of STa.
- As illustrated, DLP is one way to determine both the topic and sentiment of a single phrase or sentence extracted from a source document. This degree of accuracy is not possible with traditional statistics-based methods.
- The lexicography and templates assembled are used to extract sentiment from documents, extract quotes illustrating the sentiment and determine scores for the individual documents as well as aggregating the data from the documents to determine scores for products as a whole and features of those products.
FIG. 4 is a flow chart illustrating a method for sentiment extraction from a document according to one embodiment. Documents from which data is extracted are obtained through conventional means. An example is “scraping.” Documents are received 405 and stored in thedocument database 280 after being indexed. Thetext indexing engine 275 tokenizes and stems the text and eliminates stop words. This step can be accomplished by software known in the art such as LUCENE™ available from the Apache Software Foundation. Additionally, thetext indexing engine 275 identifies ambiguous multiwords, words that have multiple meanings and therefore can be confused with the keywords in the lexicographies, and replaces them with an expression which cannot be confused with those in lexicographies. This addresses exceptions for atypical meanings. An example of an atypical meaning would be the phrase “would never complain about the room.” “Never” in this context does not express a negative sentiment about room and the phrase “would never complain about the room” would be substituted by “would never complain” so that the syntactic template rule is not satisfied. In order to extract the product features and sentiment values about those product features from the documents, therules engine 270 runs 410 queries on the documents. The first and last token number for each satisfied occurrence is found 415. The first and last token number indicates the positions of the first and last words in the satisfied syntactic template. The product of the query is then those tokens and the document ID of the source document so that the exact location of the quote and the source document are known. - The value of the sentiment about the feature is determined by the sentiment value of the atomic semantic template. Additionally, if that sentiment appears in the title of the document the value of the sentiment is multiplied by 1.5. In the case of a review of a hotel, a reviewer will often title the review with some statement about the hotel. If that same sentiment appears in the body of the review as well, that sentiment in the body is also multiplied by 1.5. For example, if a review is titled, “Great ocean view!” that sentiment is would ordinarily be a +1 but is 1.5*(+1.0)=+1.5 because it appeared in the title of the review. If in addition, the reviewer reiterates that sentiment in the body, by saying, for example, “enjoyed the view,” that while otherwise only worth+1 would have a sentiment value of +1.5 because it is echoing the sentiment in the title.
- Another query is run to do a
local extraction 425 of quotes that illustrate product features and associated sentiment. Local extraction is possible because of the DLP which resolves conflicts between semantic templates. The sentiment for a quote starts with the score of the atomic semantic template adjusted if the sentiment is in the title as described above for determining the value of an extracted sentiment. Additionally for a quote, the sentiment is adjusted by the overall sentiment for the document from which it is extracted as well as the volatility of the sentiment in the document. Both the overall sentiment for the document and volatility of the sentiment in the document is described in further detail below. - The overall sentiment for a document is then determined 430. This determination is an addition of all of the sentiments for features from the document. For example a document yields the following features and sentiments: good room service +0.5;
great balcony + 1; not clean −1; very noisy −1; fridge was loud (noisy again) −1; andnice view + 1. The sentiment for the document=(+0.5)+(+1)+(−1)+(−1)+(−1)+(+1)=−0.5. - In addition to the overall sentiment, the volatility of the sentiment for the document is determined 435. The volatility of the sentiment for the document is a determination of the degree of uniformity of the sentiment expressed in the document. This is determined by pair-wise comparisons of sentiments expressed in the document in the order in which they appear. By analyzing the sentiments in order, there is not just a comparison of the most positive and most negative sentiments but also how often the sentiment changes within the document. For a document that is a user review of a particular hotel, if the user expresses generally the same level of satisfaction (regardless of the degree of satisfaction) the document has low volatility of sentiment. If the user writing the review adored the swimming pool but found room service to be appallingly slow, the document volatility will likely be high. The determination of volatility is the sum of the absolute value if the difference between a sentiment and the next sentiment from the document. For sentiments, A, B, C and D, volatility=|A-B|+|B-C|+|C-D|. The order of A, B, C and D is the order in which they appeared in the original document from which they were extracted.
- The volatility of the sentiment in the example document for which overall sentiment was previously determined, is
-
[(+1)−(+0.5)]+[(−1)−(+1)]+[(−1)−(−1)]+[(−1)−(−1)]+[(+1)−(−1)]=4.5. - The determination of volatility can be used as an indicator of whether or not sarcasm is being used by the user writing the review as well. Reviewers may use sarcasm to express either satisfaction or dissatisfaction. If that sarcasm is not identified by the atomic semantic template, the sentiment expressed may be taken at face value when the document is queried with the templates. Under those circumstances, the volatility of the document however will highlight the seeming inconsistency. An example of such a review is, “The pool was gorgeous and the drinks were brought quickly. The sun was shining all week. It was just torture being there.” At face value, this review would result in a middle or lower score for the hotel being reviewed. In the disclosed system, this is counteracted because in addition to an overall score, the volatility of the review is assessed. Because the first sentence includes two very positive sentiments about the hotel and the last sentence is very negative, the volatility score for this review would be higher. That leads to the reducing the score for the review and its quotes, and therefore, less likely that this quote will be shown to a user.
- Alternatively or additionally, the sentiment volatility may be incorporated into a reliability score for the document. The reliability score is the inverse of the volatility score. For the example document discussed previously with a volatility of 4.5, the reliability=1/(4.5)=0.22.
- Generally, the lower the volatility score of the document (or the higher the reliability), the more useful the document is and the more likely that a quote from the document will be used when recommending the product to a user of the system.
- When extracting the words that make up the quotes, a heuristic model is used to extract the words that express the topic and sentiment of that phrase or sentence. This allows for enough words to be extracted to give the quote context but analyzes those words for relevance to the topic and sentiment. In contrast, traditional extraction of quotes uses less analysis and takes a pre-determined number of words before and after the word or words of interest without any assessment of whether those words are relevant. In the heuristic model, syntactic analysis determines that the additional extracted words are relevant to the topic determined for that quote—for example, the swimming pool. Semantic analysis determines if the additionally extracted words are related to the product in general—travel in this case—by looking for keywords for travel. For example, if the review stated in part, “My husband and I were fighting all week but the pool was very relaxing, with a great hot tub” a traditional extraction method extracting a quote about the swimming pool extracts words from both sides of “the pool was very relaxing” leading to “ . . . were fighting all week but the pool was very relaxing, with a great hot tub . . . ” Using a heuristic model, the extracted quote would not include the portion about the reviewer and her husband's relationship. Instead, it would read, “ . . . the pool was very relaxing, with a great hot tub . . . .”
- The product features, associated sentiments, quotes, overall sentiment and sentiment volatility for a document are stored 440 in the
recommendation database 130. - For each product feature, a score is determined 445 using sentiments extracted from multiple documents. The values of the sentiments extracted are summed in the manner that the overall score for a document is calculated as described earlier. Additionally, the determination can include the overall sentiment score for the product from multiple documents and the volatility determinations of the source document.
- Additionally, for each product feature, a representative quote is determined 450 from the quotes extracted from all documents for that product feature. For example, for the feature “family-friendly” for a given hotel, the following quote from a review may be chosen, “my kids loved the pool.” The quotes are ranked by their sentiment value and the quote with the most positive sentiment value for the feature being illustrated is the representative quote.
- The numeric score for the product feature as well as the representative quote are also stored 455 in the
recommendation database 130. -
FIG. 5 is a flow chart illustrating a method for recommending a product to a user in response to a search query. Referring to bothFIGS. 5 and 6 , a search query is entered by a user at theclient 105 and is received 505 via thenetwork 110 at therecommendation system 100 by thefront end server 115. An example is shown at 605. The recommendation engine extracts 510 features from the query. In the example, the user has requested information on “Family Friendly Hotel Campground” in Maui, the features extracted are “family-friendly” and “campground.” The system sorts 515 the extracted product features based on the lexicography. Because certain words are already associated in the lexicography with family-friendly, had the query included both swimming pool and family-friendly, those features are synonymous and covered by the same templates. A semantic template is built 520 using the user's search query. That template is used to search 525 the recommendation database for products with product features that match the user's query. Searching of the recommendation database with the atomic semantic template may be implemented with a span query. The atomic semantic template built for querying the database is similar in nature to the atomic semantic template built to extract that sentiment from source documents originally. Additionally, for each product that matches the user's query, a quote is retrieved that is relevant to the feature of the product of interest to the user. In the above example, there may be many quotes about all of the hotels about any number of the hotel's features. Because the user was interested in a family-friendly hotel, the retrieved quotes 625 are relevant to that feature. Relevant quotes include those about children's activities, the swimming pool or babysitting services. Quotes about the hotel bar's top shelf liquor collection would not be relevant and not be retrieved. Additionally, the features of the hotel that make it family-friendly are also extracted and listed for theuser 620. The results of the search are sorted 530 and the recommended products and quotes returned 535 to the user at theclient 105 via thefront end server 115 andnetwork 110. The sorting of the results is according to the score for the feature, “family-friendly,” for the hotel. InFIG. 6 , the search results 610 are shown to the user with thefirst listing 615 having a 98% relevancy. - The disclosed system and method provides a more efficient recommendation product for users because in addition to general sentiment scores from reviewers, the user is presented with product ratings that are specific to the feature that is of interest to the user. Additionally, the characteristics of the product that led to that score are enumerated for the user as well as quotes from reviewers about the feature are presented.
- Further, the features and advantages described in the specification provide a beneficial use to those making use of a system and a method as described in embodiments herein. For example, a user is provided mechanisms, e.g., by receiving and/or transmitting control signals, to control access to particular information as described herein. Further, these benefits accrue regardless of whether all or portions of components, e.g., server systems, to support their functionality are located locally or remotely relative to the user.
- Numerous specific details have been set forth herein to provide a thorough understanding of the embodiments. It will be understood by those skilled in the art, however, that the embodiments may be practiced without these specific details. In other instances, well-known operations, components and circuits have not been described in detail so as not to obscure the embodiments. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the embodiments.
- In addition, some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory and executable by a processor. Examples of this are illustrated and described for
FIGS. 3, 3A, 3B, 4 and 5 . These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result, for example, as in describing DLP. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations of physical quantities as modules or code devices, without loss of generality. - Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
- Unless specifically stated otherwise, it may be appreciated that terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. The embodiments are not limited in this context.
- As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Upon reading this disclosure, those of skill in the art will appreciate still additional alternative systems and methods for providing recommendations to users that include specific quotations Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the embodiments are not limited to the precise construction and components disclosed herein and that various modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope of the disclosure and appended additional claimable subject matter.
Claims (21)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/489,059 US20170221128A1 (en) | 2008-05-12 | 2017-04-17 | Sentiment Extraction From Consumer Reviews For Providing Product Recommendations |
US18/297,694 US20230360106A1 (en) | 2008-05-12 | 2023-04-10 | Sentiment extraction from consumer reviews for providing product recommendations |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/119,465 US9646078B2 (en) | 2008-05-12 | 2008-05-12 | Sentiment extraction from consumer reviews for providing product recommendations |
US15/489,059 US20170221128A1 (en) | 2008-05-12 | 2017-04-17 | Sentiment Extraction From Consumer Reviews For Providing Product Recommendations |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/119,465 Continuation US9646078B2 (en) | 2008-05-12 | 2008-05-12 | Sentiment extraction from consumer reviews for providing product recommendations |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/297,694 Continuation US20230360106A1 (en) | 2008-05-12 | 2023-04-10 | Sentiment extraction from consumer reviews for providing product recommendations |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170221128A1 true US20170221128A1 (en) | 2017-08-03 |
Family
ID=41267711
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/119,465 Active 2031-01-03 US9646078B2 (en) | 2008-05-12 | 2008-05-12 | Sentiment extraction from consumer reviews for providing product recommendations |
US15/489,059 Abandoned US20170221128A1 (en) | 2008-05-12 | 2017-04-17 | Sentiment Extraction From Consumer Reviews For Providing Product Recommendations |
US18/297,694 Pending US20230360106A1 (en) | 2008-05-12 | 2023-04-10 | Sentiment extraction from consumer reviews for providing product recommendations |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/119,465 Active 2031-01-03 US9646078B2 (en) | 2008-05-12 | 2008-05-12 | Sentiment extraction from consumer reviews for providing product recommendations |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/297,694 Pending US20230360106A1 (en) | 2008-05-12 | 2023-04-10 | Sentiment extraction from consumer reviews for providing product recommendations |
Country Status (2)
Country | Link |
---|---|
US (3) | US9646078B2 (en) |
WO (1) | WO2009140296A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170132676A1 (en) * | 2015-11-09 | 2017-05-11 | Anupam Madiratta | System and method for hotel discovery and generating generalized reviews |
CN108959555A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Extended method, device, computer equipment and the storage medium of query formulation |
US11914961B2 (en) | 2021-01-07 | 2024-02-27 | Oracle International Corporation | Relying on discourse trees to build ontologies |
Families Citing this family (103)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090306967A1 (en) * | 2008-06-09 | 2009-12-10 | J.D. Power And Associates | Automatic Sentiment Analysis of Surveys |
AU2009260033A1 (en) * | 2008-06-19 | 2009-12-23 | Wize Technologies, Inc. | System and method for aggregating and summarizing product/topic sentiment |
US9129008B1 (en) | 2008-11-10 | 2015-09-08 | Google Inc. | Sentiment-based classification of media content |
US9286389B2 (en) | 2009-05-20 | 2016-03-15 | Tripledip Llc | Semiotic square search and/or sentiment analysis system and method |
US9235646B2 (en) * | 2009-05-28 | 2016-01-12 | Tip Top Technologies, Inc. | Method and system for a search engine for user generated content (UGC) |
US8849649B2 (en) * | 2009-12-24 | 2014-09-30 | Metavana, Inc. | System and method for determining sentiment expressed in documents |
JP5390463B2 (en) | 2010-04-27 | 2014-01-15 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Defect predicate expression extraction device, defect predicate expression extraction method, and defect predicate expression extraction program for extracting predicate expressions indicating defects |
US20110307294A1 (en) * | 2010-06-10 | 2011-12-15 | International Business Machines Corporation | Dynamic generation of products for online recommendation |
US8949211B2 (en) | 2011-01-31 | 2015-02-03 | Hewlett-Packard Development Company, L.P. | Objective-function based sentiment |
US20120246092A1 (en) * | 2011-03-24 | 2012-09-27 | Aaron Stibel | Credibility Scoring and Reporting |
US8914400B2 (en) * | 2011-05-17 | 2014-12-16 | International Business Machines Corporation | Adjusting results based on a drop point |
JP5548654B2 (en) * | 2011-06-22 | 2014-07-16 | 楽天株式会社 | Information processing apparatus, information processing method, information processing program, and recording medium on which information processing program is recorded |
US8661403B2 (en) * | 2011-06-30 | 2014-02-25 | Truecar, Inc. | System, method and computer program product for predicting item preference using revenue-weighted collaborative filter |
US20130018875A1 (en) * | 2011-07-11 | 2013-01-17 | Lexxe Pty Ltd | System and method for ordering semantic sub-keys utilizing superlative adjectives |
US10311113B2 (en) * | 2011-07-11 | 2019-06-04 | Lexxe Pty Ltd. | System and method of sentiment data use |
US20130066800A1 (en) * | 2011-09-12 | 2013-03-14 | Scott Falcone | Method of aggregating consumer reviews |
US9098600B2 (en) * | 2011-09-14 | 2015-08-04 | International Business Machines Corporation | Deriving dynamic consumer defined product attributes from input queries |
EP2570938A1 (en) * | 2011-09-16 | 2013-03-20 | Lexxe Pty Ltd. | System and method for ordering semantic sub-keys utilizing superlative adjectives |
US9009024B2 (en) * | 2011-10-24 | 2015-04-14 | Hewlett-Packard Development Company, L.P. | Performing sentiment analysis |
US9104771B2 (en) | 2011-11-03 | 2015-08-11 | International Business Machines Corporation | Providing relevant product reviews to the user to aid in purchasing decision |
US20130159919A1 (en) | 2011-12-19 | 2013-06-20 | Gabriel Leydon | Systems and Methods for Identifying and Suggesting Emoticons |
US8868558B2 (en) * | 2011-12-19 | 2014-10-21 | Yahoo! Inc. | Quote-based search |
US20130204833A1 (en) * | 2012-02-02 | 2013-08-08 | Bo PANG | Personalized recommendation of user comments |
US10636041B1 (en) | 2012-03-05 | 2020-04-28 | Reputation.Com, Inc. | Enterprise reputation evaluation |
US9697490B1 (en) | 2012-03-05 | 2017-07-04 | Reputation.Com, Inc. | Industry review benchmarking |
US20130297383A1 (en) * | 2012-05-03 | 2013-11-07 | International Business Machines Corporation | Text analytics generated sentiment tree |
US8918312B1 (en) * | 2012-06-29 | 2014-12-23 | Reputation.Com, Inc. | Assigning sentiment to themes |
US9607325B1 (en) * | 2012-07-16 | 2017-03-28 | Amazon Technologies, Inc. | Behavior-based item review system |
US9336297B2 (en) | 2012-08-02 | 2016-05-10 | Paypal, Inc. | Content inversion for user searches and product recommendations systems and methods |
US9105036B2 (en) * | 2012-09-11 | 2015-08-11 | International Business Machines Corporation | Visualization of user sentiment for product features |
US10095692B2 (en) * | 2012-11-29 | 2018-10-09 | Thornson Reuters Global Resources Unlimited Company | Template bootstrapping for domain-adaptable natural language generation |
US9477704B1 (en) * | 2012-12-31 | 2016-10-25 | Teradata Us, Inc. | Sentiment expression analysis based on keyword hierarchy |
US9311294B2 (en) * | 2013-03-15 | 2016-04-12 | International Business Machines Corporation | Enhanced answers in DeepQA system according to user preferences |
CN105531698B (en) | 2013-03-15 | 2019-08-13 | 美国结构数据有限公司 | Equipment, system and method for batch and real time data processing |
US10331786B2 (en) | 2013-08-19 | 2019-06-25 | Google Llc | Device compatibility management |
US9734192B2 (en) * | 2013-09-20 | 2017-08-15 | Oracle International Corporation | Producing sentiment-aware results from a search query |
US20150095330A1 (en) * | 2013-10-01 | 2015-04-02 | TCL Research America Inc. | Enhanced recommender system and method |
US9727371B2 (en) | 2013-11-22 | 2017-08-08 | Decooda International, Inc. | Emotion processing systems and methods |
CN104679769B (en) * | 2013-11-29 | 2018-04-06 | 国际商业机器公司 | The method and device classified to the usage scenario of product |
US9373075B2 (en) | 2013-12-12 | 2016-06-21 | International Business Machines Corporation | Applying a genetic algorithm to compositional semantics sentiment analysis to improve performance and accelerate domain adaptation |
US10607255B1 (en) | 2013-12-17 | 2020-03-31 | Amazon Technologies, Inc. | Product detail page advertising |
US9727617B1 (en) | 2014-03-10 | 2017-08-08 | Google Inc. | Systems and methods for searching quotes of entities using a database |
US9817907B1 (en) * | 2014-06-18 | 2017-11-14 | Google Inc. | Using place of accommodation as a signal for ranking reviews and point of interest search results |
US9632998B2 (en) * | 2014-06-19 | 2017-04-25 | International Business Machines Corporation | Claim polarity identification |
US9317566B1 (en) | 2014-06-27 | 2016-04-19 | Groupon, Inc. | Method and system for programmatic analysis of consumer reviews |
US11250450B1 (en) | 2014-06-27 | 2022-02-15 | Groupon, Inc. | Method and system for programmatic generation of survey queries |
US9043196B1 (en) | 2014-07-07 | 2015-05-26 | Machine Zone, Inc. | Systems and methods for identifying and suggesting emoticons |
US10878017B1 (en) | 2014-07-29 | 2020-12-29 | Groupon, Inc. | System and method for programmatic generation of attribute descriptors |
US20160034951A1 (en) * | 2014-07-30 | 2016-02-04 | Microsoft Corporation | Allocating prominent display space for query answers |
US20160055555A1 (en) * | 2014-08-21 | 2016-02-25 | Credibility Corp. | Contextual and Holistic Credibility |
US10318566B2 (en) | 2014-09-24 | 2019-06-11 | International Business Machines Corporation | Perspective data analysis and management |
US10055478B2 (en) | 2014-09-24 | 2018-08-21 | International Business Machines Corporation | Perspective data analysis and management |
US20160110778A1 (en) * | 2014-10-17 | 2016-04-21 | International Business Machines Corporation | Conditional analysis of business reviews |
US10977667B1 (en) | 2014-10-22 | 2021-04-13 | Groupon, Inc. | Method and system for programmatic analysis of consumer sentiment with regard to attribute descriptors |
US10078843B2 (en) * | 2015-01-05 | 2018-09-18 | Saama Technologies, Inc. | Systems and methods for analyzing consumer sentiment with social perspective insight |
US9720905B2 (en) | 2015-06-22 | 2017-08-01 | International Business Machines Corporation | Augmented text search with syntactic information |
US20180260860A1 (en) * | 2015-09-23 | 2018-09-13 | Giridhari Devanathan | A computer-implemented method and system for analyzing and evaluating user reviews |
US20170091838A1 (en) | 2015-09-30 | 2017-03-30 | International Business Machines Corporation | Product recommendation using sentiment and semantic analysis |
US10409792B1 (en) * | 2015-09-30 | 2019-09-10 | Groupon, Inc. | Apparatus and method for data object generation and control |
US10235699B2 (en) * | 2015-11-23 | 2019-03-19 | International Business Machines Corporation | Automated updating of on-line product and service reviews |
US20170213138A1 (en) * | 2016-01-27 | 2017-07-27 | Machine Zone, Inc. | Determining user sentiment in chat data |
US10467277B2 (en) | 2016-03-25 | 2019-11-05 | Raftr, Inc. | Computer implemented detection of semiotic similarity between sets of narrative data |
US11093706B2 (en) | 2016-03-25 | 2021-08-17 | Raftr, Inc. | Protagonist narrative balance computer implemented analysis of narrative data |
US9842100B2 (en) | 2016-03-25 | 2017-12-12 | TripleDip, LLC | Functional ontology machine-based narrative interpreter |
CN107402912B (en) * | 2016-05-19 | 2019-12-31 | 北京京东尚科信息技术有限公司 | Method and device for analyzing semantics |
US10311069B2 (en) | 2016-06-02 | 2019-06-04 | International Business Machines Corporation | Sentiment normalization using personality characteristics |
US10558691B2 (en) | 2016-08-22 | 2020-02-11 | International Business Machines Corporation | Sentiment normalization based on current authors personality insight data points |
US10387467B2 (en) | 2016-08-22 | 2019-08-20 | International Business Machines Corporation | Time-based sentiment normalization based on authors personality insight data points |
US10445742B2 (en) | 2017-01-31 | 2019-10-15 | Walmart Apollo, Llc | Performing customer segmentation and item categorization |
US10657575B2 (en) | 2017-01-31 | 2020-05-19 | Walmart Apollo, Llc | Providing recommendations based on user-generated post-purchase content and navigation patterns |
US10839154B2 (en) | 2017-05-10 | 2020-11-17 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
US11373632B2 (en) | 2017-05-10 | 2022-06-28 | Oracle International Corporation | Using communicative discourse trees to create a virtual persuasive dialogue |
US11615145B2 (en) | 2017-05-10 | 2023-03-28 | Oracle International Corporation | Converting a document into a chatbot-accessible form via the use of communicative discourse trees |
US10817670B2 (en) | 2017-05-10 | 2020-10-27 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US11960844B2 (en) | 2017-05-10 | 2024-04-16 | Oracle International Corporation | Discourse parsing using semantic and syntactic relations |
US10599885B2 (en) | 2017-05-10 | 2020-03-24 | Oracle International Corporation | Utilizing discourse structure of noisy user-generated content for chatbot learning |
US11386274B2 (en) | 2017-05-10 | 2022-07-12 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
US11586827B2 (en) | 2017-05-10 | 2023-02-21 | Oracle International Corporation | Generating desired discourse structure from an arbitrary text |
JP7086993B2 (en) | 2017-05-10 | 2022-06-20 | オラクル・インターナショナル・コーポレイション | Enable rhetorical analysis by using a discourse tree for communication |
US10891947B1 (en) | 2017-08-03 | 2021-01-12 | Wells Fargo Bank, N.A. | Adaptive conversation support bot |
WO2019050501A1 (en) * | 2017-09-05 | 2019-03-14 | TripleDip, LLC | Functional ontology machine-based narrative interpreter |
EP3688626A1 (en) | 2017-09-28 | 2020-08-05 | Oracle International Corporation | Enabling autonomous agents to discriminate between questions and requests |
US11907990B2 (en) * | 2017-09-28 | 2024-02-20 | International Business Machines Corporation | Desirability of product attributes |
WO2019067869A1 (en) | 2017-09-28 | 2019-04-04 | Oracle International Corporation | Determining cross-document rhetorical relationships based on parsing and identification of named entities |
US11537645B2 (en) * | 2018-01-30 | 2022-12-27 | Oracle International Corporation | Building dialogue structure by using communicative discourse trees |
EP3791292A1 (en) | 2018-05-09 | 2021-03-17 | Oracle International Corporation | Constructing imaginary discourse trees to improve answering convergent questions |
US11455494B2 (en) | 2018-05-30 | 2022-09-27 | Oracle International Corporation | Automated building of expanded datasets for training of autonomous agents |
US11645459B2 (en) | 2018-07-02 | 2023-05-09 | Oracle International Corporation | Social autonomous agent implementation using lattice queries and relevancy detection |
US11238519B1 (en) * | 2018-08-01 | 2022-02-01 | Amazon Technologies, Inc. | Search results based on user-submitted content |
US11238508B2 (en) * | 2018-08-22 | 2022-02-01 | Ebay Inc. | Conversational assistant using extracted guidance knowledge |
US11562135B2 (en) | 2018-10-16 | 2023-01-24 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US11107092B2 (en) * | 2019-01-18 | 2021-08-31 | Sprinklr, Inc. | Content insight system |
US11321536B2 (en) | 2019-02-13 | 2022-05-03 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
US11715134B2 (en) | 2019-06-04 | 2023-08-01 | Sprinklr, Inc. | Content compliance system |
US11163962B2 (en) * | 2019-07-12 | 2021-11-02 | International Business Machines Corporation | Automatically identifying and minimizing potentially indirect meanings in electronic communications |
US11144730B2 (en) | 2019-08-08 | 2021-10-12 | Sprinklr, Inc. | Modeling end to end dialogues using intent oriented decoding |
US11783388B2 (en) * | 2020-02-26 | 2023-10-10 | Airbnb, Inc. | Detecting user preferences of subscription living users |
US11188968B2 (en) | 2020-02-28 | 2021-11-30 | International Business Machines Corporation | Component based review system |
US11514507B2 (en) | 2020-03-03 | 2022-11-29 | International Business Machines Corporation | Virtual image prediction and generation |
CN111737579B (en) * | 2020-06-28 | 2024-06-25 | 北京达佳互联信息技术有限公司 | Object recommendation method and device, electronic equipment and storage medium |
US20220398635A1 (en) * | 2021-05-21 | 2022-12-15 | Airbnb, Inc. | Holistic analysis of customer sentiment regarding a software feature and corresponding shipment determinations |
US11921754B2 (en) | 2021-06-29 | 2024-03-05 | Walmart Apollo, Llc | Systems and methods for categorization of ingested database entries to determine topic frequency |
CN115329757A (en) * | 2022-10-17 | 2022-11-11 | 广州数说故事信息科技有限公司 | Product innovation concept mining method and device, storage medium and terminal equipment |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5331556A (en) * | 1993-06-28 | 1994-07-19 | General Electric Company | Method for natural language data processing using morphological and part-of-speech information |
US20040078190A1 (en) * | 2000-09-29 | 2004-04-22 | Fass Daniel C | Method and system for describing and identifying concepts in natural language text for information retrieval and processing |
US20060248440A1 (en) * | 1998-07-21 | 2006-11-02 | Forrest Rhoads | Systems, methods, and software for presenting legal case histories |
US20070073533A1 (en) * | 2005-09-23 | 2007-03-29 | Fuji Xerox Co., Ltd. | Systems and methods for structural indexing of natural language text |
US20070106499A1 (en) * | 2005-08-09 | 2007-05-10 | Kathleen Dahlgren | Natural language search system |
US20080133488A1 (en) * | 2006-11-22 | 2008-06-05 | Nagaraju Bandaru | Method and system for analyzing user-generated content |
US20090112892A1 (en) * | 2007-10-29 | 2009-04-30 | Claire Cardie | System and method for automatically summarizing fine-grained opinions in digital text |
US20090193328A1 (en) * | 2008-01-25 | 2009-07-30 | George Reis | Aspect-Based Sentiment Summarization |
US7945600B1 (en) * | 2001-05-18 | 2011-05-17 | Stratify, Inc. | Techniques for organizing data to support efficient review and analysis |
US8977953B1 (en) * | 2006-01-27 | 2015-03-10 | Linguastat, Inc. | Customizing information by combining pair of annotations from at least two different documents |
Family Cites Families (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5706406A (en) * | 1995-05-22 | 1998-01-06 | Pollock; John L. | Architecture for an artificial agent that reasons defeasibly |
US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
US6584464B1 (en) * | 1999-03-19 | 2003-06-24 | Ask Jeeves, Inc. | Grammar template query system |
US6601026B2 (en) * | 1999-09-17 | 2003-07-29 | Discern Communications, Inc. | Information retrieval by natural language querying |
US6751621B1 (en) * | 2000-01-27 | 2004-06-15 | Manning & Napier Information Services, Llc. | Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors |
US20020103809A1 (en) * | 2000-02-02 | 2002-08-01 | Searchlogic.Com Corporation | Combinatorial query generating system and method |
EP1241602A1 (en) * | 2001-03-14 | 2002-09-18 | MeVis Technology GmbH & Co. KG | Method and computer system for computing and displaying a phase space |
US7526425B2 (en) * | 2001-08-14 | 2009-04-28 | Evri Inc. | Method and system for extending keyword searching to syntactically and semantically annotated data |
NO316480B1 (en) * | 2001-11-15 | 2004-01-26 | Forinnova As | Method and system for textual examination and discovery |
US7158983B2 (en) * | 2002-09-23 | 2007-01-02 | Battelle Memorial Institute | Text analysis technique |
US20040153330A1 (en) * | 2003-02-05 | 2004-08-05 | Fidelity National Financial, Inc. | System and method for evaluating future collateral risk quality of real estate |
US7260519B2 (en) * | 2003-03-13 | 2007-08-21 | Fuji Xerox Co., Ltd. | Systems and methods for dynamically determining the attitude of a natural language speaker |
US20040243554A1 (en) * | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis |
JP2005202535A (en) * | 2004-01-14 | 2005-07-28 | Hitachi Ltd | Document tabulation method and device, and storage medium storing program used therefor |
WO2006039566A2 (en) * | 2004-09-30 | 2006-04-13 | Intelliseek, Inc. | Topical sentiments in electronically stored communications |
US7689557B2 (en) * | 2005-06-07 | 2010-03-30 | Madan Pandit | System and method of textual information analytics |
US8046348B1 (en) * | 2005-06-10 | 2011-10-25 | NetBase Solutions, Inc. | Method and apparatus for concept-based searching of natural language discourse |
US20070073745A1 (en) * | 2005-09-23 | 2007-03-29 | Applied Linguistics, Llc | Similarity metric for semantic profiling |
US7912755B2 (en) * | 2005-09-23 | 2011-03-22 | Pronto, Inc. | Method and system for identifying product-related information on a web page |
US7480652B2 (en) * | 2005-10-26 | 2009-01-20 | Microsoft Corporation | Determining relevance of a document to a query based on spans of query terms |
CN1794233A (en) * | 2005-12-28 | 2006-06-28 | 刘文印 | Network user interactive asking answering method and its system |
US8195683B2 (en) * | 2006-02-28 | 2012-06-05 | Ebay Inc. | Expansion of database search queries |
US7603350B1 (en) * | 2006-05-09 | 2009-10-13 | Google Inc. | Search result ranking based on trust |
US8862591B2 (en) * | 2006-08-22 | 2014-10-14 | Twitter, Inc. | System and method for evaluating sentiment |
US7734623B2 (en) * | 2006-11-07 | 2010-06-08 | Cycorp, Inc. | Semantics-based method and apparatus for document analysis |
US7996210B2 (en) * | 2007-04-24 | 2011-08-09 | The Research Foundation Of The State University Of New York | Large-scale sentiment analysis |
US8346756B2 (en) * | 2007-08-31 | 2013-01-01 | Microsoft Corporation | Calculating valence of expressions within documents for searching a document index |
US8954867B2 (en) * | 2008-02-26 | 2015-02-10 | Biz360 Inc. | System and method for gathering product, service, entity and/or feature opinions |
US8239189B2 (en) * | 2008-02-26 | 2012-08-07 | Siemens Enterprise Communications Gmbh & Co. Kg | Method and system for estimating a sentiment for an entity |
US7925743B2 (en) | 2008-02-29 | 2011-04-12 | Networked Insights, Llc | Method and system for qualifying user engagement with a website |
US20100088154A1 (en) * | 2008-10-06 | 2010-04-08 | Aditya Vailaya | Systems, methods and computer program products for computing and outputting a timeline value, indication of popularity, and recommendation |
US8166032B2 (en) * | 2009-04-09 | 2012-04-24 | MarketChorus, Inc. | System and method for sentiment-based text classification and relevancy ranking |
US20130018892A1 (en) | 2011-07-12 | 2013-01-17 | Castellanos Maria G | Visually Representing How a Sentiment Score is Computed |
US8671098B2 (en) * | 2011-09-14 | 2014-03-11 | Microsoft Corporation | Automatic generation of digital composite product reviews |
US8949243B1 (en) | 2011-12-28 | 2015-02-03 | Symantec Corporation | Systems and methods for determining a rating for an item from user reviews |
US20130263019A1 (en) | 2012-03-30 | 2013-10-03 | Maria G. Castellanos | Analyzing social media |
US20130311315A1 (en) | 2012-05-21 | 2013-11-21 | Ebay Inc. | Systems and methods for managing group buy transactions |
US10949753B2 (en) | 2014-04-03 | 2021-03-16 | Adobe Inc. | Causal modeling and attribution |
-
2008
- 2008-05-12 US US12/119,465 patent/US9646078B2/en active Active
-
2009
- 2009-05-12 WO PCT/US2009/043658 patent/WO2009140296A1/en active Application Filing
-
2017
- 2017-04-17 US US15/489,059 patent/US20170221128A1/en not_active Abandoned
-
2023
- 2023-04-10 US US18/297,694 patent/US20230360106A1/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5331556A (en) * | 1993-06-28 | 1994-07-19 | General Electric Company | Method for natural language data processing using morphological and part-of-speech information |
US20060248440A1 (en) * | 1998-07-21 | 2006-11-02 | Forrest Rhoads | Systems, methods, and software for presenting legal case histories |
US20040078190A1 (en) * | 2000-09-29 | 2004-04-22 | Fass Daniel C | Method and system for describing and identifying concepts in natural language text for information retrieval and processing |
US7945600B1 (en) * | 2001-05-18 | 2011-05-17 | Stratify, Inc. | Techniques for organizing data to support efficient review and analysis |
US20070106499A1 (en) * | 2005-08-09 | 2007-05-10 | Kathleen Dahlgren | Natural language search system |
US20070073533A1 (en) * | 2005-09-23 | 2007-03-29 | Fuji Xerox Co., Ltd. | Systems and methods for structural indexing of natural language text |
US8977953B1 (en) * | 2006-01-27 | 2015-03-10 | Linguastat, Inc. | Customizing information by combining pair of annotations from at least two different documents |
US20080133488A1 (en) * | 2006-11-22 | 2008-06-05 | Nagaraju Bandaru | Method and system for analyzing user-generated content |
US20090112892A1 (en) * | 2007-10-29 | 2009-04-30 | Claire Cardie | System and method for automatically summarizing fine-grained opinions in digital text |
US20090193328A1 (en) * | 2008-01-25 | 2009-07-30 | George Reis | Aspect-Based Sentiment Summarization |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170132676A1 (en) * | 2015-11-09 | 2017-05-11 | Anupam Madiratta | System and method for hotel discovery and generating generalized reviews |
US10956948B2 (en) * | 2015-11-09 | 2021-03-23 | Anupam Madiratta | System and method for hotel discovery and generating generalized reviews |
CN108959555A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Extended method, device, computer equipment and the storage medium of query formulation |
US11914961B2 (en) | 2021-01-07 | 2024-02-27 | Oracle International Corporation | Relying on discourse trees to build ontologies |
Also Published As
Publication number | Publication date |
---|---|
US20090282019A1 (en) | 2009-11-12 |
US9646078B2 (en) | 2017-05-09 |
US20230360106A1 (en) | 2023-11-09 |
WO2009140296A1 (en) | 2009-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230360106A1 (en) | Sentiment extraction from consumer reviews for providing product recommendations | |
US10921956B2 (en) | System and method for assessing content | |
Asghar et al. | Sentiment analysis on youtube: A brief survey | |
Zhao et al. | Personalized reason generation for explainable song recommendation | |
Malik et al. | Comparing mobile apps by identifying ‘Hot’features | |
US9720904B2 (en) | Generating training data for disambiguation | |
Routray et al. | A survey on sentiment analysis | |
Wicaksono et al. | Automatic extraction of advice-revealing sentences foradvice mining from online forums | |
Sharma et al. | A multi-criteria review-based hotel recommendation system | |
KR101074820B1 (en) | Recommendation searching system using internet and method thereof | |
Nidhi et al. | Twitter-user recommender system using tweets: A content-based approach | |
US10380244B2 (en) | Server and method for providing content based on context information | |
Itani | Sentiment analysis and resources for informal Arabic text on social media | |
AleEbrahim et al. | Summarising customer online reviews using a new text mining approach | |
Pak | Automatic, adaptive, and applicative sentiment analysis | |
Hamborg | Media Bias Analysis | |
Moghaddam et al. | Opinion mining in online reviews: Recent trends | |
Mishra et al. | VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search | |
Liu et al. | ASK: A taxonomy of accuracy, social, and knowledge information seeking posts in social question and answering | |
Song et al. | OpenFact: Factuality enhanced open knowledge extraction | |
Tonkin | A day at work (with text): A brief introduction | |
Campos et al. | Extracting Context Data from User Reviews for Recommendation: A Linked Data Approach. | |
Abirami et al. | Sentiment analysis | |
Nguyen et al. | Opinion spam recognition method for online reviews using ontological features | |
Chen | Aspect-based sentiment analysis for social recommender systems. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: UPTAKE NETWORKS, INC., DELAWARE Free format text: CHANGE OF NAME;ASSIGNOR:THREEALL, INC.;REEL/FRAME:044069/0640 Effective date: 20080716 Owner name: GROUPON, INC., ILLINOIS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UPTAKE NETWORKS, INC.;REEL/FRAME:043740/0706 Effective date: 20120223 Owner name: THREEALL, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GALITSKY, BORIS;MCKENNA, EUGENE WILLIAM;REEL/FRAME:043740/0158 Effective date: 20080620 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS Free format text: SECURITY INTEREST;ASSIGNORS:GROUPON, INC.;LIVINGSOCIAL, LLC;REEL/FRAME:053294/0495 Effective date: 20200717 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STCV | Information on status: appeal procedure |
Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER |
|
STCV | Information on status: appeal procedure |
Free format text: EXAMINER'S ANSWER TO APPEAL BRIEF MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: TC RETURN OF APPEAL |
|
STCV | Information on status: appeal procedure |
Free format text: APPEAL READY FOR REVIEW |
|
STCV | Information on status: appeal procedure |
Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |
|
AS | Assignment |
Owner name: LIVINGSOCIAL, LLC (F/K/A LIVINGSOCIAL, INC.), ILLINOIS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:066676/0001 Effective date: 20240212 Owner name: GROUPON, INC., ILLINOIS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:066676/0001 Effective date: 20240212 Owner name: LIVINGSOCIAL, LLC (F/K/A LIVINGSOCIAL, INC.), ILLINOIS Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN INTELLECTUAL PROPERTY RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:066676/0251 Effective date: 20240212 Owner name: GROUPON, INC., ILLINOIS Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN INTELLECTUAL PROPERTY RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:066676/0251 Effective date: 20240212 |