WO2019125486A1 - Natural language grammars adapted for interactive experiences - Google Patents

Natural language grammars adapted for interactive experiences Download PDF

Info

Publication number
WO2019125486A1
WO2019125486A1 PCT/US2017/068211 US2017068211W WO2019125486A1 WO 2019125486 A1 WO2019125486 A1 WO 2019125486A1 US 2017068211 W US2017068211 W US 2017068211W WO 2019125486 A1 WO2019125486 A1 WO 2019125486A1
Authority
WO
WIPO (PCT)
Prior art keywords
bid
conversation
prospect
natural language
intent
Prior art date
Application number
PCT/US2017/068211
Other languages
French (fr)
Inventor
Qindi Zhang
Joel McKenzie
Original Assignee
Soundhound, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Soundhound, Inc. filed Critical Soundhound, Inc.
Priority to PCT/US2017/068211 priority Critical patent/WO2019125486A1/en
Priority to JP2018230121A priority patent/JP7178248B2/en
Priority to EP18214971.6A priority patent/EP3502923A1/en
Priority to CN201811572622.6A priority patent/CN110110317A/en
Publication of WO2019125486A1 publication Critical patent/WO2019125486A1/en
Priority to JP2020036823A priority patent/JP7129439B2/en
Priority to JP2021093124A priority patent/JP7525445B2/en
Priority to JP2024114807A priority patent/JP2024147731A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0273Determination of fees for advertising

Definitions

  • the present invention is in the field speech-based conversational human- machine interfaces using natural language grammars (class 704/E17.015 or 704/251 ).
  • advertising has consisted of: advertisers bidding for ad opportunities by providing ads and money; publishers accepting the highest bids and delivering the associated ads, amid their desirable content, to prospective customers of the advertisers’ goods or services (prospects); and, hopefully, converting the prospects into paying customers of the advertisers.
  • Another valuable form of advertising is product placement. That is, including the advertised product or service as an integral part of the content story. For example, in the movie E.T., the extraterrestrial followed a trail of Reese’s PiecesTM brand candy and in the television show Sex and the City, the character Carrie often showed and mentioned Manolo Blahnik brand shoes.
  • Telemarketing Another form of advertising is telemarketing. That is calling prospects on the phone. Telemarketing is notoriously annoying to prospects, at least because it is not timed to a moment of convenience to the prospect, and, since it must occur in a single unbroken conversation, it is a blatant sales pitch. Telemarketing is notoriously ineffective, has low conversion rates, requires relatively expensive human operators to handle unpredictable prospect responses, requires relatively simple scripts that human operators can follow, is wearing on the patience and psychological health of the operators, and fails to reach the most lucrative prospects.
  • the present disclosure is directed to systems and methods of: defining ad units that support conversational natural language interactions with prospects; delivering ads; bidding for ad inventory; providing information from prospects to publishers and from publishers to advertisers; performing machine learning from customer interactions; and related aspects of the advertising ecosystem.
  • ad conversions can be accurately and meaningfully attributed to the acquisition channel. Conversion rates are higher and more measurable, and targeting is more influential and more precise.
  • Delivering natural language interactive ads from the interfaces of trusted machines is an ultimate form of product placement, where the published content is the behavior of the machine and the product is placed thereamid.
  • FIG. 1 illustrates engagement between a conversational robot device and a prospect, according to an embodiment.
  • FIG. 2 illustrates a flow chart of a method of providing experiences, according to an embodiment.
  • FIG. 3 illustrates a human-machine dialog with conversation state and bids based on conversation state, according to an embodiment.
  • FIG. 4 illustrates interpreting expression in the context of conversation state also used for analyzing bids, according to an embodiment.
  • FIG. 5 illustrates a dialog script, according to an embodiment.
  • FIG. 6 illustrates advertisers bidding for a publisher to deliver their ads to prospects, according to an embodiment.
  • FIG. 7 illustrates the flow of ads from advertisers to prospects through an advertising ecosystem, according to an embodiment.
  • FIG. 8 illustrates interactive ad delivery with actions and conversational feedback through an ecosystem, according to an embodiment.
  • FIG. 9 illustrates a system for interaction between a prospect and ad server, according to an embodiment.
  • FIG. 10A illustrates a rotating disk non-transitory computer readable medium, according to an embodiment.
  • FIG. 10B illustrates a flash random access memory non-transitory computer readable media, according to an embodiment.
  • FIG. 10C illustrates the bottom side of a computer processor based system-on-chip, according to an embodiment.
  • FIG. 10D illustrates the top side of a computer processor based system- on-chip, according to an embodiment.
  • FIG. 1 1 illustrates a server, according to an embodiment.
  • FIG. 12 illustrates a block diagram of a system-on-chip for devices, according to an embodiment.
  • FIG. 13 illustrates a block diagram of a server processor, according to an embodiment.
  • FIG. 14A illustrates components of a conversational robot device, according to an embodiment.
  • FIG. 14B illustrates a voice assistant device, according to an embodiment.
  • FIG. 14C illustrates an audio earpiece device, according to an embodiment.
  • FIG. 14D illustrates a mobile phone device, according to an embodiment.
  • FIG. 14E illustrates an automobile device, according to an embodiment.
  • natural language expressions can be human expressions of one or more tokens of information, such as by speech or other means such as typing, tapping, gesturing, or direct cognitive coupling.
  • Natural language grammars can be one or more intents, each with a set of possible expressions that map to it. Domains can be sets of grammars specific to a topic of conversation or data set.
  • Intents can be data structures that represent a human’s hypothesized desired action expressed by a natural language expression. Some examples of actions are retrieving information and responding to a query, sending a message, performing a motion, or making a software development kit (SDK) function call to a web application programming interface (API) for performing any desired action.
  • SDK software development kit
  • API web application programming interface
  • Interpreting can be the process of deriving an intent from a token sequence or token sequence hypothesis.
  • advertisers are entities who define ad units and pay for them to be delivered to prospects.
  • Ad units comprise content, a bid, and one or more natural language grammars.
  • ads begin by delivering an introductory message defined as content within an interactive ad unit.
  • the introductory message is content intended to initiate a specific human-machine interaction.
  • Some introductory messages are as short as a single phrase.
  • Content can be one or more of images, video, audio clips, spoken words, and so forth, wherein the content has meaning to prospects.
  • Some introductory messages are as long as conventional commercial messages.
  • Some embodiments present an introductory message to initiate a new conversation.
  • Some embodiments work an introductory message into a conversation as part of a machine’s conversation turn, such as responding to a request for ingredients of a cooking recipe with a specific name of a brand of the ingredient.
  • publishers are entities that deliver ads to prospects and are paid for doing so. Generally, they deliver such ads amid other content that prospects desire, expect, or request.
  • a publisher uses a server to control or serve content for one or more remote devices.
  • a device alone performs the publisher function.
  • a publisher works with one or more intermediaries to receive or determine what ads to deliver to prospects.
  • a publisher receives information from or about a prospect or a user of the device that delivers ads.
  • Some examples of device embodiments are anthropomorphic robots, household appliances, household assistant devices, wearable electronic devices, mobile phones, desktop computers, automobiles, billboards, kiosks, vending machines, and other such devices with human-machine interfaces.
  • the introductory message after finishing delivering or after starting delivery the introductory message, publishers proceed to identify a natural language expression, provided by a prospect, indicating an engagement. Such embodiments interpret the natural language expression according to a natural language grammar defined within the interactive ad unit. In response to the interpretation finding that the natural language expression matches the grammar, such embodiments are able to determine that an engagement has occurred.
  • Some embodiments use intermediaries, which are entities that facilitate the delivery of ad units to inventory, such as by performing one or more of: connecting advertisers to publishers; aggregating ad units; aggregating inventory; analyzing bids based on prospect attributes, context, conversation state, mood, etc.; setting pricing according to supply and bids; and providing information to advertisers about prospect engagement and conversion.
  • the manner of engagement varies between embodiments. According to embodiments using voice-enabled, conversational devices, an engagement is a spoken response interpreted by an ad unit grammar.
  • Various embodiments use various modes and means of natural language interaction. Some are verbal, using text-to-speech (TTS) for a machine to output speech audio and automatic speech recognition (ASR) for the machine to recognize human speech audio. Such embodiments provide an audio experience. Some embodiments use music, sound effects, or other audio information to provide an audio experience. Some embodiments perform interactions using modes of text input and output, including displaying or transmitting text. Some embodiments use other neural sensing and stimulation modes of natural language based human-machine interaction.
  • TTS text-to-speech
  • ASR automatic speech recognition
  • Some embodiments use a visual display instead of or in addition to means of audible machine output such as speakers and some use text entry means such as keyboards or touchscreens, instead of or in addition to ASR of audio signals captured from microphones or other air pressure wave sensing devices.
  • Ads spoken by a machine can be more convincing if delivered in the voice that prospects are conditioned to expect from the machine.
  • Some embodiments that deliver ads using TTS to generate speech audio output do so richly by accepting text with metadata, such as in a markup language such as Speech Synthesis Markup Language (SSML). This provides for, while using the machine’s voice, setting the tone and placing emphasis on certain words, as is useful for attracting and persuading prospects.
  • SSML Speech Synthesis Markup Language
  • a modal dialog can be a sequence of human-machine interactions in which a machine interacts with its user in turns to prompt the user to provide information or take an action that somebody desires. For example, to produce an intent for sending a text message, a machine might successively ask its user to provide the name of the recipient, then the message. For another example, to calculate a monthly mortgage payment a machine might successively ask its user to provide: the home value, the percent of down payment, the interest rate, the loan term, and other information. It could frustrate some users if another interaction interrupts their modal dialog. Some embodiments only deliver an introductory message when a conversational interface is not engaged in a modal dialog. Some embodiments, upon realizing an opportunity to deliver an introductory message do so immediately after completion of a modal dialog.
  • Some embodiments can interrupt an original activity, initiate engagement, and require a response before continuing the original activity. For example, while using a machine capability of playing a game, some embodiments interrupt the game to deliver an introductory message in a casual style. Some such embodiments have a timeout, after which, if there has been no prospect engagement, the original activity resumes.
  • Various embodiments benefit advertisers by giving precise control over when ads are delivered so that they are more effective. Some embodiments delivery some ads not immediately, but at a time that the conversation indicates would be useful. For example, a machine conversation about the weather, if the weather will be rainy, causes the machine to deliver an ad for umbrellas only at a time that the prospect is preparing to go outside.
  • a timeout at the end of a period of no voice activity can indicate an end to the ad conversation.
  • a timeout period is 5 seconds, though shorter or longer periods are appropriate for different embodiments.
  • the timeout is longer than if the machine turn ended with a statement.
  • an indication of a lack of interest ends an ad conversation.
  • an expression from a prospect that has a high score in a domain grammar other than the ad grammar can indicate a lack of interest in the ad.
  • Some embodiments store in conversation state the fact that an ad conversation is in progress and, accordingly, give higher weight to ad grammars when computing scores relative to other domain grammars.
  • conversion ends a conversation.
  • Ad units can define what constitutes conversion. For example, a request in a navigation domain to a store selling an item can be a conversion, a request to order a product or service online can be a conversion, and a request for a future follow-up can be a conversion.
  • Some embodiments allow an ad unit to maintain a conversational engagement indefinitely, such as with dynamic small talk, if a prospect responds accordingly.
  • a prospect ends the conversation by allowing the system to timeout or by making an expression that scores sufficiently highly in a domain other than the ad grammar.
  • Some embodiments consider ad units as domains. Many ad units can be simultaneously present for the natural language interpreter and compete with each other and non-ad domains for the highest interpretation score in response to each of a prospect’s expressions. Some such embodiments allow grammar expressions to give dynamic weights to different possible phrasings of intents. [0064] Some embodiments use grammar-based interpreters. Some embodiments use purely statistical language model based interpreters.
  • a benefit of some embodiments of conversational ads is that they provide at least as much improvement in effectiveness for audio ads for people who are visually disabled and people who are visually otherwise engaged, such as by driving cars or piloting helicopters or space drones.
  • FIG. 1 shows a scenario of engagement.
  • An anthropomorphic robot assistant natural language conversational publishing device 1 1 having an ad unit, provides an introductory message using built-in TTS software saying, “did you hear about the new StarPaste?”.
  • a human prospect 12 hears the introductory message and engages by saying,“tell me about it”.
  • FIG. 2 shows a flowchart of the operation of delivering an interactive ad experience.
  • a system comprises a database 21 of ad units, each having bids, content, and grammars.
  • the process begins in step 22 by evaluating the bids of the ad units in database 21 and choosing the ad unit with the highest bid value. In some embodiments, the evaluation of bids happens offline, well prior to delivery of ad units. In some embodiments, bid evaluation happens in real time.
  • the process continues in step 23 to deliver the introductory message of ad unit chosen for its highest bid value. In step 24, after delivering the introductory message, the system receives a natural language expression from a prospect.
  • step 25 The system proceeds in step 25 to interpret the natural language expression according to the grammar 26 from the ad unit chosen for its highest bid. If the expression matches the grammar 26, the process proceeds to step 27 in which the system performs the action indicated by the intent of the grammar 26 matched by the expression. Conversation state
  • Ads are most effective when they are most relevant to what is in a prospect’s mind at the time the ad is delivered. There is no better case of relevance than for an ad to be delivered when its content matches a prospect’s immediate situation or immediate topic of conversation. These are the best indicators available today of what is in a prospect’s mind at any moment.
  • Some embodiments allow the introductory message of an ad to comprise content conditioned by context such as time, location, user profile, and conversation state. For example, an introductory message may begin by speaking,“good morning”, “good afternoon”, or“good evening” depending on the time of day.
  • An introductory message may say the name of the nearest store that offers a product, where the name of the store varies by location, such as,“did you know that they have the new Star brand toothpaste at Wal-MartTM?” or, “did you know that they have the new Star brand toothpaste at CarrefourTM?”.
  • An introductory message may say the name of the prospect based on a user profile.
  • An introductory message may refer to specific entities within the state of a recent conversation such as, after requesting the score of the Eagles game,“before you go to the next Eagles game you should try the new Star brand toothpaste”, or when buying a new outfit“before you go to tomorrow’s party, you should try the new Star brand toothpaste”.
  • Some embodiments that support conditional introductory message content allow the specification of the condition based on a range, such as, if the weather forecast is for temperatures between 20 and 30 degrees Celsius (68 and 86 Fahrenheit) the message is,“check out the new Star Wear styles of activewear”, if the temperature forecast is greater than 30 degrees Celsius (86 Fahrenheit) the message is,“check out the new Star Wear swimsuit lineup”, and if the temperature forecast is less than 20 degrees Celsius (68 degrees Fahrenheit) the message is,“check out the new Star Wear down parka collection”.
  • Some embodiments support ad bid values being conditional based on conversation state. Some embodiments support conditioning according to thresholds, and some support conditioning according to ranges. Some embodiments support ad bid values being conditional based on context other than conversation state.
  • conversation state is a data structure that stores grammar-relevant variables and values.
  • Some embodiments drop conversation state variables after a certain number of conversational dialog turns.
  • Some embodiments include time stamps in conversation state and drop conversation state variables after a certain amount of time.
  • Some embodiments use different numbers of dialog turns or amounts of time depending on the kind of variable or its value.
  • Some embodiments include one or more sentiment values associated with various expressions stored in conversation state. Some embodiments maintain a current mood variable representing the mood in the environment of the human-machine interface.
  • Some embodiments save the conversation state when initiating an ad experience and then, after ending an ad conversation, reinstate the conversation state from before it delivered the ad experience.
  • Some embodiments maintain conversation state variables on a server. Some embodiments maintain conversation state variables on a client device. Some embodiments transfer conversation state with each human expression and machine response sent between clients and servers. [0076] Some embodiments share conversation state between different apps or different devices in order to provide a seamless experience for a prospect interacting with a single virtual agent through multiple devices.
  • FIG. 3 shows an example dialog between a prospect on the left and a conversational device on the right.
  • the prospect asks,“what’s the score of the eagles game”.
  • the device performs speech recognition and natural language interpretation according to a large array of domain grammars, some local and some on a remote server.
  • the device recognizes“Eagles” as the name of a sports team, and, based on that and the fact that the query uses the word“score”, interprets the query as one about sports.
  • the device sends a request to a web API of a provider of live sports information, which identifies that the Eagles are currently in an active game against the Hawks.
  • the provider responds through the API to the device that the score is“5-7”.
  • the device gives a spoken natural language response to the prospect,“eagles 5 hawks 7”.
  • the prospect instructs the device,“text what’s up to 555-1234”.
  • the device interprets the expression, realizing a high score in an SMS domain.
  • the device uses a call to an SMS web API provider and parses the expression to determine that “what’s up” is the message of the text and“555-1234” is the phone number to which the text should be sent.
  • the web API provides a confirmation response to the device, which provides a natural language response“message sent” to the prospect.
  • the prospect proceeds to ask,“where can I get some toothpaste”.
  • the device interprets this as highly scoring in a shopping domain, reads the location from a geolocation service software function, and sends the location information to a provider of maps indicating geolocations of retail establishments.
  • the device proceeds to access APIs for successively farther retail establishments from the device geolocation until finding one, in this case, Star-Mart, which responds positively that it has toothpaste in stock.
  • the device proceeds to provide the natural language response, “there’s toothpaste at Star-Mart”.
  • the prospect asks,“will it rain tomorrow”.
  • the device interprets this as highly scoring in a weather domain, accesses a weather information provider API to get the local weather forecast, and replies,“no”.
  • the prospect proceeds to ask,“what’s the capital of France”.
  • the device interprets this as a question for Wikipedia, sends a request to the Wikipedia API for the Capital city information in the Wikipedia article on France, and gets a response, “Paris”.
  • the device provides the natural language response,“Paris is the capital of France”.
  • the device builds an array of conversation state variables and values.
  • the system adds to conversation state a variable sports_team with value eagles and a variable sports_request with value score. That is the information that the system uses to access the sports web API to get a response.
  • the response comprises two pieces of information that the device stores in conversation state. Specifically, a variable opponent with value hawks and a variable score_value with value 5-7.
  • the device also maintains an array of bid functions for various ad units.
  • the array includes three bid functions, one for each of an ad unit for StarShoes, and ad unit for StarPaste, and an ad unit for StarTheater.
  • the ad bid functions describe an amount of money offered for delivering the ad unit. Some embodiments accept bids in units of money for 1000 deliveries of an ad unit. Some embodiments support, and tend to process, more complex ad bid functions than shown in FIG. 3.
  • the device delivers the introductory message of the StarPaste ad unit.
  • the introductory message is TTS generated natural language speech saying,“did you know that StarPaste is on sale at Star-Mart for just 2 bitcoins?”. This invites the prospect to engage with the ad.
  • FIG. 4 shows a process according to an embodiment.
  • a publisher receives an interprets expressions according to a grammar 42 and array of conversation state variable 43. Accordingly, the publisher produces an intent.
  • stage 44 the publisher performs an action specified by the intent.
  • the interpretation stage also produces references to specific entities, which the publisher stores in the array of conversation state variables 43.
  • a stage 45 receives a multiplicity of ad units. It analyzes the bids of the ad units in the context of the conversation state 43. Accordingly, the analyzing stage 45 chooses a highest bidding ad unit and outputs its introductory message. The purpose is to engage a prospect such that the prospect makes an expression that matches the ad unit’s grammar.
  • Mood a multiplicity of ad units. It analyzes the bids of the ad units in the context of the conversation state 43. Accordingly, the analyzing stage 45 chooses a highest bidding ad unit and outputs its introductory message. The purpose is to engage a prospect such that the prospect
  • Some embodiments detect a mood and deliver ad unit introductory messages conditionally based on the mood. Some embodiments support ad content, such as the introductory message content, being conditioned by the mood. Some embodiments support ad bid values being conditioned by mood. Some embodiments support conditioning according to thresholds, and some support conditioning according to continuous values. Some embodiments reevaluate the mood frequently, and some embodiments reevaluate the mood at occasional intervals.
  • the conversational ads of some embodiments guide prospects through a script.
  • scripts can have multiple conditional conversation paths, and the machine offers explicit choices. For example, the machine might ask,“would you like to know more about the health benefits, the flavor, or the brightness of StarPaste?”
  • a script can encourage prospects to convert the engagement into a purchase. Different prospects have different interests. By analyzing their natural language interactions, a scripted ad unit can classify the prospect. By classifying the prospect, the script is able to provide information that most effectively encourages the prospect to convert.
  • the machine uses information gleaned from prospect responses to profile the instantaneous state of the prospect’s thoughts and feelings. For example, if the prospect speaks more quickly and loudly than usual, the ad might offer calming words. For another example, if the conversation state indicates recent discussion of money, the ad might offer information about a discount price.
  • Some embodiments send feedback from the human-machine interface to the publisher, intermediary, or advertiser at each step in the dialog script.
  • the feedback indicates the script location.
  • the feedback includes transcripts or audio recordings of the captured prospect expressions.
  • Some embodiments include personally identifying information. Some embodiments charge advertisers or intermediaries extra money for extra information.
  • FIG. 5 shows an embodiment of an ad dialog script. It begins with an introductory message 51 telling the prospect,“did you hear about the new StarPaste?”. The interface proceeds to wait for a recognized expression from a prospect and match any received expression to an ad unit grammar. If an expression matches a grammar showing disinterest, such as including a phrase,“don’t care”,“shut up”, or silence, the script replies with “sorry”, ends the conversation, and the interface device sends feedback to that effect to the advertiser.
  • a grammar showing disinterest such as including a phrase,“don’t care”,“shut up”, or silence
  • a prospect’s expression matches a grammar indicating engagement 54, such as“tell me more” or“what about it”
  • the script proceeds to offer a follow-up content item, which is a message,“it gives a healthier mouth and more sparkle”.
  • the script calls for classifying the prospect as one who has a greater interest in health than vanity or the other way around.
  • the message includes information (“healthier mouth”) that appeals to health-concerned prospects or information (“more sparkle”) that appeals to vanity-concerned prospects.
  • a next prospect expression matches a grammar related to health, such as by saying,“healthy how?” or“does it have fluoride?” in path 55 the script replies with a message to entice health-concerned prospects to move forward. The message is, “dentist recommended with double fluoride”. If a next prospect expression accesses a grammar related to vanity, such as by saying,“brighter how?” or“what about breath?” in path 56 the script replies with a message to entice vanity-concerned prospects to move forward. The message is,“number 1 in Hollywood for shiny fresh breath”.
  • Some embodiments support, and some ad units define much more complex scripts with many conditional paths, multiple dimensions of prospect classification, and conditional, personalized message content. Privacy
  • Some embodiments can deliver ads that would embarrass or shame a prospect if other people heard or saw the ad’s introductory message.
  • some ads may be for gifts for surprise parties, some ads may be for products that treat private medical conditions, and some ads may be for services that serve prurient interests.
  • Some embodiments allow for ads, or ad bids, to be conditioned by the presence of people other than the target prospect. Some embodiments do so with cameras and image processing to detect people. Some embodiments use audio voice discriminating algorithms to detect whether multiple speakers are present. Some embodiments detect the radio signal emissions from personal portable electronic devices such as Bluetooth, Wi-Fi, or cellular network signals from mobile phones.
  • Some embodiments that perform detection of the presence of others lower a privacy level variable when other people are present. Some such embodiments detect attributes of people present, such as using image processing or voice characterization to determine whether male or female people or adults or children are present.
  • Some embodiments raise a privacy level variable when the natural language human-machine interface is through a personal interface, such as a headphone, wearable device, or personal mobile handset.
  • Some embodiments parsing ad unit code to detect a privacy level requirement associated with the ad unit. They only allow the ad to be displayed, and in some embodiments only consider the ad in the bidding, if the current privacy level is at or above the level defined in the ad unit code. Intents and actions
  • Natural language understanding systems interpret expressions to determine intents.
  • Intents are data structures that can represent actions that the speaker wishes a machine to perform.
  • interpreting is according to context-free grammar rules, such as ones created by system designers.
  • interpreting is by application of statistical language models to the words expressed. This can be done in various ways, such as by using trained neural networks.
  • interactive natural language ads comprise grammars with rules that define specific intents according to words expressed by prospects. For example, a grammar may recognize the words“tell me more” as an intent of requesting additional information, which means requesting the interface to perform an action of delivering more information about the product or service of the ad. A grammar may recognize the words“shut up” as an intent requesting the action of ending the ad conversation. A grammar for a mobile phone ad may recognize the words “how’s its battery life” as a request for specific information about the battery capacity and power consumption of the advertised phone.
  • a grammar may recognize the words “get it” as an intent requesting an action of ordering the advertised product or service [0106]
  • a text message sending function provides means of sending text messages.
  • An order placement function provides means of ordering the delivery of an advertised product or service.
  • a mapping function provides means for giving prospects directions to a place for purchasing the advertised goods or services.
  • a browser provides means of presenting web content, such as hypertext markup language (HTML) and cascading style sheet (CSS), about the advertised goods or services.
  • a browser also provides means for executing certain kinds of scripts, such as ones written in the JavaScript language. Such scripts can cause computers to perform any action of which they are capable.
  • An autonomous vehicle provides a means for transporting a prospect to a location to purchase the advertised goods or services.
  • inventory is the ad space/time that a publisher has available to sell. Many methods are known for ad bidding on available inventory.
  • FIG. 6 shows an embodiment in which advertisers 62a, 62b, 62c provide ad units, comprising bids, along with money to a publisher 61 .
  • the publisher provides desirable content along with ads to prospects 63a, 63b, 63c. Eventually, ad conversion happens and prospects spend money on goods or services from the advertiser.
  • FIG. 7 shows a system in which advertisers 72a, 72b, 72c provide ad units to a media buyer 75, who aggregates and packages ad units for distribution to multiple publishers and negotiates pricing.
  • Typical media buyers are large ad agencies.
  • the media buyer 75 provides ad units to a trading desk 76, which bids on an ad exchange 77.
  • Large ad agencies tend to have trading desks.
  • Ad exchange 77 matches ad unit bids with inventory and makes a market with programmatic auction pricing.
  • AppNexusTM is an example of an ad exchange.
  • another advertiser 72d and the media buyer 75 provide ad units to an ad network 74.
  • GoogleTM AdWordsTM is an example of an ad network.
  • the ad network facilitates automatic placement of ads within available inventory.
  • a publisher 71 offers inventory on the exchange 77 and the ad network 74 to deliver to prospects 73a, 73b, 73c.
  • Some ad exchanges implement programmatic bidding, which provides high liquidity in the ads market.
  • Programmatic bidding in various embodiments depends on user profile, present location, location history, present activity, time of day, time of week, time of year, types of ads recently delivered, types of information in conversation state, specific values of information in conversation state, mood, feelings, and neural activity patterns.
  • a publisher maintains conversation state for a conversational human-machine interface.
  • the publisher analyzes bids for ads in relation to the conversational human-machine interface.
  • the bids indicate interesting values of conversation state variables and are influenced by at least one current conversation state variable having the interesting conversation state variable value.
  • Some embodiments support bids based on the presence of keywords in conversation state. Some embodiments support bids based on the presence of a particular domain of conversation in conversation state. For example, weather is a domain and sports is another domain. Some embodiments support bids based on the presence of a particular variable within conversation state. For example, sports_team is a variable. Some embodiments support bids based on specific values of variables. For example, “eagles” can be a value of the variable sports_team. Some embodiments support bids based on sets of possible values of variables, such as“eagles”,“hawks”, and“falcons” as values of a variable sports_teams.
  • Some embodiments support bids based on ranges of values of numeric variables, such as a temperature variable with a value between 20 and 30 degrees Celsius (68 to 86 Fahrenheit). Some embodiments support bids based on combinations of the above criteria. Some embodiment support specifying bids as programmatic equations of any of the presence of types of variables, specific values, sets of values, and ranges. Such equations determine a degree of match between a bid and conversation state. Equations can also express negative influences on bid value based on the factors described above.
  • Some embodiments support bids based on mood. Such embodiments detect a current mood value at the human-machine interface and evaluate bids in the context of the mood value. Some embodiments receive an externally computed mood value and use the value to influence the computed bid value.
  • Some embodiments support bids conditioned by whether a prospect is in a private listening environment. Such embodiments detect the environment and compute bid values accordingly. Some such embodiments determine the listening environment by detecting the presence of other people.
  • Some embodiments perform voice activity detection at the human- machine interface and only output an ad introductory message when there is no voice activity. Some embodiments wait for a certain amount of time after detecting no voice activity before outputting an ad introductory message.
  • Some embodiments support bids based on a specific intent of a natural language expression. Some embodiments support bids based on a specific action triggered by one or more intents.
  • Some embodiments charge advertisers for their bids only when engagement occurs. Some embodiments charge advertisers only when a conversion occurs. Some embodiments charge advertisers based on how many interactions a prospect has with the ad before conversion or disengagement. Some embodiments charge advertisers based on what kinds of interactions a prospect has, such as whether or not the prospect makes a price request or whether or not the prospect expresses a negative sentiment.
  • Some embodiments are methods of defining bids and methods of analyzing bids. Some embodiments are systems that perform the analysis of bids. Some embodiments are non-transitory computer readable media that store bids or bid functions.
  • Some embodiments by processing natural language conversations in real-time, extract extremely valuable information for ad targeting. Such information is useful when applied to real-time bidding algorithms for conversational ad units. It is also useful to advertisers if fed back from human-machine interactions.
  • Some embodiments feed information back for real-time bids by interpreting natural language expressions from prospects to create intent data structures. They proceed to analyze the intent data structures to derive analytical results in real-time. Then they provide the analytical results to real-time bids of advertisers. [0124] Some embodiments provide feedback, such as whether engagement or conversion happened, on a per-expression basis. Some embodiments provide feedback on a per-ad conversation basis.
  • the analytical results comprise an indication of an intent. Some embodiments, with the analytical results, feed back an indication of the domain of conversation prior to the engagement. Some embodiments, with the analytical results, feed back an indication of the domain of conversation immediately following the ad interaction.
  • Some embodiments feed personally identifying information back to advertisers. Some embodiments specifically hide personally identifying information. One reason to do so is to ensure privacy of system users. Another use of personally identifying information is to sell it to advertisers for a higher payment than for anonymized prospect interaction analysis (analytics).
  • Some embodiments use feedback analytics about topics, intents, semantic information, mood/emotions, questions asked, engagement rate, other engagement metrics, conversion rate of ads and categories of ads, reach, deliveries, amount spent, and other conversion metrics.
  • Analytics can include which/how many ads, of an array of competing ad bids, matched a condition for ad unit delivery. This improves the ability to determine timing of delivering ads.
  • Proctor & GambleTM defines the first moment of truth as the time when a prospect purchases a product and the second moment of truth as the time when the prospect first uses the product.
  • GoogleTM defines the zero moment of truth as the time when a prospect first searches for information about a type of product.
  • Some key query phrases indicate a zero moment of truth, such as“show me ...” or“where can I get ...”.
  • Some intents derived from natural language interpretations indicate a zero moment of truth. Numerous language patterns can match grammars that create intents related to research about products or services.
  • Some embodiments identify ad opportunities by monitoring natural language interactions at human-machine interfaces. They interpret natural language expressions to identify zero moment of truth intents. At such time, such embodiments identify a type of product or service referenced in the query or in conversation state, then upon identifying a referenced product or service, alert an ad bid for the product or service and raise the price of ad delivery because of its timely relevance. Some such embodiments proceed to deliver the ad unit identified by the bid. Some such embodiments deliver an introductory message for an ad unit for the product or service. Some such embodiments proceed to receive follow-up expressions from a prospect and match them to a grammar.
  • FIG. 8 shows a timeline of information flow between an advertiser, an intermediary, a publisher, and a prospect according to an ad conversation scenario of an embodiment.
  • an advertiser provides an ad unit to an intermediary.
  • the intermediary prices the ad unit based on its bid in an auctioning process.
  • the publisher identifies the ad unit with the winning bid.
  • the publisher proceeds, at time t2, to provide meta information to the intermediary and the advertiser.
  • the meta information can comprise an identification of the winning ad, the conversation state, most recent conversation domain, and mood at the time of choosing the ad, the geolocation and user profile of the prospect, and dynamic pricing information, such as whether there is a zero moment of truth ad delivery premium.
  • the publisher delivers the introductory message of the ad unit.
  • the prospect provides an engagement spoken expression, also known, using browser-ad terminology, as a spoken click (slick).
  • the slick requests specific information on the pricing of the advertised product.
  • the publisher sends a web API request to the advertiser for the pricing information.
  • the advertiser provides a web API response to the publisher with marked up text for a TTS response.
  • the publisher uses TTS software built into its human- machine interface device to synthesize speech in the machine’s normal voice and outputs the synthesized speech audio through a speaker.
  • the prospect provides a second slick, asking to place an order.
  • the publisher makes a web API request to an intermediary for order fulfillment.
  • the order fulfillment intermediary processes the order for delivery and, at time 10, provides a web API response to the publisher.
  • the publisher performs an action of emailing a receipt and provides a TTS response to the prospect confirming the order.
  • the publisher reports the conversion to the advertiser for use in analytics, and the publisher outputs a thanking message as TTS speech audio.
  • Various embodiments have various arrangements of a human-machine interface device and one or more connected servers for publisher, advertiser, and intermediary roles.
  • a single device communicates with an ad server that comprises at least one ad unit.
  • advertisers have multiple divisions in different offices, some having programmers that develop grammars, some having artists that develop ad content, and some having ad departments the combine grammars and content into ad units and combine ad units into campaigns.
  • the advertiser passes the ads to an ad agency, which stores ads on database servers and accesses them by media buyers who interact with other intermediaries.
  • the media buyers deliver ads and bids to ad networks and to third-party trading desks, which manage supply side networks.
  • the trading desks deliver ad bids to third-party exchanges, which match bids to inventory.
  • Ad units that win bids are sent to publishers with servers distributed geographically for low latency access to prospects through devices.
  • Ad units are stored locally on different types of devices designed by different consumer product manufacturers, and the ad unit content is delivered to prospects under the control of the publisher.
  • FIG. 9 shows an embodiment.
  • a prospect 91 communicates with a publisher device 92. It comprises a speaker 93 that delivers audio content to the prospect 91 .
  • the publisher device 92 further comprises a microphone 94, which receives audio that includes natural language speech from the prospect 91 .
  • the publisher device 92 communicates, through network 95, with ad server 96.
  • the ad server 96 delivers ad units and executes ad bid functions to select the most valuable ad units to deliver.
  • FIG. 10A shows a non-transitory computer readable rotating disk medium 101 that stores computer code that, if executed by a computer processor, would cause the computer processor to perform methods or partial method steps described herein.
  • FIG. 10B shows a non-transitory computer readable Flash random access memory (RAM) chip medium 102 that stores computer code that, if executed by a computer processor, would cause the computer processor to perform methods or partial method steps described herein.
  • RAM computer readable Flash random access memory
  • FIG. 10C shows the bottom (solder ball) side of a packaged system-on- chip (SoC) 103 comprising multiple computer processor cores that comprises a component of some embodiments and that, by executing computer code, perform methods or partial method steps described herein.
  • FIG. 10D shows the top side of the SoC 103.
  • SoC system-on- chip
  • FIG. 1 1 shows a rack-based server system 1 1 1 , used as a component of various embodiments.
  • Such servers are useful as advertiser servers, publisher servers, and servers for various intermediary functions.
  • FIG. 12 shows a block diagram of the cores within the system-on-chip 103. It comprises a multi-core computer processor (CPU) 121 and a multi-core graphics accelerator processor (GPU) 122.
  • the CPU 121 and GPU 122 are connected through a network-on-chip 123 to a DRAM interface 124 and a Flash RAM interface 125.
  • a display interface 126 controls a display, enabling the system to output Motion Picture Experts Group (MPEG) video and Joint Picture Experts Group (JPEG) still image ad content.
  • An I/O interface 127 provides for speaker and microphone access for the human-machine interface of a device controlled by SoC 103.
  • a network interface 128 provides access for the device to communicate with servers over the internet.
  • FIG. 13 shows an embodiment of the server 1 1 1 .
  • a multiprocessor CPU array 131 and a GPU array 132 connect through a board-level interconnect 133 to a DRAM subsystem 134 that stores computer code and a network interface 135 that provides internet access to other servers or publisher devices.
  • Various embodiments of devices can be used to publish interactive natural language ads. Some are mundane, and some are exciting.
  • FIG. 14A illustrates components of the exciting anthropomorphic robot assistant natural language conversational publishing device 1 1 . It comprises a speaker 142 on each side of the device in order to output audio.
  • the device comprises a microphone array 143, which comprises several microelectromechanical system (MEMS) microphones, physically arranged to receive sound with different amounts of delay.
  • the device comprises an internal processor that runs software that performs digital signal processing (DSP) to use the microphone array 143 to detect the direction of detected speech.
  • DSP digital signal processing
  • the device 1 1 further comprises a module 144 with two cameras to provide stereoscopic image and video capture. Further DSP software runs neural network- based object recognition on models trained on human forms in order to detect the location and relative orientation of one or more prospects.
  • the device 1 1 further comprises a display screen 145 that, for some ad units, outputs visual ad content such as JPEG still images and MPEG video streams.
  • the device 1 1 further comprises a wheel 146a and a wheel 146b, each of which can turn independently or in unison. By turning in accordance, the device is able to move, such as to follow a prospect around. By turning independently, the device is able to turn, such as to face and monitor the movement and activity of a prospect.
  • the device 1 1 further comprises a power switch 147, which a prospect can use to shut the device up if it becomes annoying.
  • FIG. 14B shows an embodiment of a home virtual assistant and music playing device 148.
  • FIG. 14C shows an embodiment of a Bluetooth-enabled earpiece device 149.
  • FIG. 14D shows an embodiment of a mundane mobile phone 1410.
  • FIG. 14E shows an embodiment of an automobile 141 1 .
  • Some embodiments, such as the home virtual assistant 148 give little privacy to its users. Some embodiments, such as earpiece 149 give great privacy to prospects. Some embodiments, such as the mobile phone 1410 have visual display screens. Some embodiments are screenless, such as the earpiece 149, which has no display screen. Some embodiments, such as the home virtual assistant 148 are stationary. Some embodiments, such as the automobile 141 1 are mobile. Some embodiments, such as the mobile phone 1410, are stationary. [0149] Some elements in the flow of ads may be present in different countries, though the functioning of the methods and systems and computer-readable media of ad delivery constitute full embodiments. In other words, passing ad units or their components through servers in different countries does not avoid direct infringement of claimed methods, systems, and computer readable media.
  • ad unit code listing section is a code listing for an example ad unit. It uses a specialized programming language with syntax that is similar to that of C.
  • Lines 3-55 describe a bid function.
  • the bid value assigned in lines 40-53 is conditioned on whether it is blocked for specific user IDs.
  • the bid value is further conditioned on the privacy level being personal, as opposed to shared or public.
  • the bid value is further conditioned by a mood value being above a threshold.
  • the bid value is further positively affected by the presence in conversation state of specific recent domains, specified in lines 6-8.
  • the bid value is negatively affected by the presence of certain domains in conversation state, specified in lines 9-13.
  • the bid is further positively and negatively affected by the presence of keywords in recent conversation state, the keywords specified in lines 15-27.
  • the bid is further positively and negatively affected by the presence of certain meta information descriptive of the present prospect, environment, and machine state.
  • the degrees of effect on the bid value of each of different domains, keywords, and meta information are scaled to the ratio of 2.0, 1 .5, and 2.5 respectively.
  • Lines 61 -68 describe an introductory message. It includes a reference to an image for display, an audio clip for output, and some text to be spoken, marked up with word emphasis.
  • the ad unit is restricted to non-offensive content in line 71 .
  • the ad unit is assigned to provide a high level of information reporting from publisher to advertiser in line 79.
  • the ad is configured to listen for prospect responses for 10.0 seconds before considering the conversation to have ended without an engagement in line 82.
  • Lines 85-96 define a grammar, including certain phrasings with optional wordings.
  • the grammar intents are function calls to functions defined outside of the ad unit or defined below within the ad unit.
  • Lines 99-1 19 specify content elements, including video clips, animations, images, video streams, and TTS text with mark-up for conditional content, such as finding the nearest city to a current device latitude and longitude.
  • Lines 122-167 define custom functions called by the grammar intents.
  • Lines 123-132 specify a function for delivering a sequence of additional information content in response to prospect requests for more information.
  • Lines 134-137 specify a function for sending a text message.
  • Lines 139-142 specify a function for sending an email message.
  • Lines 144-147 specify a function for looking up the price of a particular stock keeping unit (SKU) number for the advertised product.
  • Lines 149-152 specify a function for displaying an image of the product if the publishing device has a display screen.
  • Lines 154-164 specify a function for finding the nearest store with the SKU available, the price of the product at the nearest store, and whether the product is currently on sale.
  • Lines 165-167 specify a function for setting an indication, per user ID, of whether the prospect indicated a non-interest in the product. If so, the ad will not be delivered in the future for that user ID.
  • the code shown in the example below is illustrative of just some capabilities of some embodiments. Sophisticated advertisers will create ad units with more code, more sophisticated bidding, more complex and varied grammars, more dependencies, and more and subtler content. Some embodiments will provide other system functions available for use in ad units and support more complex code constructs.
  • Some embodiments provide simple templates for less sophisticated advertisers. For example, an embodiment provides for an SKU and TTS strings and automatically provides capabilities for looking up pricing and locations, displaying web content, and providing grammars for answering product-general queries. An ad definition for a simple template looks like the following.
  • the system provides the grammar, including responding to“tell me more” with TTS of the MORE text and pricing lookup using the SKU value.
  • check_dont_show (user_id ( ) ) ? 0 :
  • send_email "http : //www .starpastes. com/robo_ad . htrnl " ) ;
  • int store_id nearest_stock ( " 9781542841443 “ ) ;

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Natural language grammars interpret expressions at the conversational human-machine interfaces of devices. Under conditions favoring engagement, as specified in a unit of conversational code, the device initiates a discussion using one or more of TTS, images, video, audio, and animation depending on the device capabilities of screen and audio output. Conversational code units specify conditions based on conversation state, mood, and privacy. Grammars provide intents that cause calls to system functions. Units can provide scripts for guiding the conversation. The device, or supporting server system, can provide feedback to creators of the conversational code units for analysis and machine learning.

Description

NATURAL LANGUAGE GRAMMARS ADAPTED FOR INTERACTIVE EXPERIENCES
FIELD OF THE INVENTION
[0001] The present invention is in the field speech-based conversational human- machine interfaces using natural language grammars (class 704/E17.015 or 704/251 ).
BACKGROUND
[0002] Before becoming the head of RCA™ and founder of NBC™, David Sarnoff, an early pioneer in radio and television broadcast technology, once said, “The radio music box has no imaginable commercial value. Who would pay for a message sent to nobody in particular?". At that time, he had not yet recognized the forthcoming giant industry of transmitting advertisements through telecommunications systems and networks.
[0003] Visual advertisements had been in printed publications for almost as long as publications had been printed. Eventually, to Mr. Sarnoff’s doubtless surprise, it became routine to broadcast audio advertisements several times per hour, within other radio broadcast content. Television broadcasters, similarly, adopted the approach of broadcasting video advertisements several times per hour within television broadcast content. Naturally, the advent of internet-connected computers, mobile handsets, tablets, music players, virtual reality, and other such devices brought about visual, video, and audio advertising through those devices, interspersed with other content desired by their users.
[0004] Whether in the medium of print, radio, television, or internet-connected devices, advertising has consisted of: advertisers bidding for ad opportunities by providing ads and money; publishers accepting the highest bids and delivering the associated ads, amid their desirable content, to prospective customers of the advertisers’ goods or services (prospects); and, hopefully, converting the prospects into paying customers of the advertisers.
[0005] With print, radio, and television advertising there was no direct way to identify how many prospects engaged with an advertisement, such as by reading or paying attention to the broadcast message. There was also no direct way to characterize engagements, such as by what time and location engagements occurred and by what kinds of prospects. Internet-based devices brought about the ability to directly identify and characterize prospect engagement. That is, generally, the act of a prospect clicking a link in a browser. Because clicking a link sends an information request to the advertiser, the internet provided ways to measure engagement and more accurately predict conversion (purchase of an advertised good or service).
[0006] Another valuable form of advertising is product placement. That is, including the advertised product or service as an integral part of the content story. For example, in the movie E.T., the extraterrestrial followed a trail of Reese’s Pieces™ brand candy and in the television show Sex and the City, the character Carrie often showed and mentioned Manolo Blahnik brand shoes.
[0007] Another form of advertising is telemarketing. That is calling prospects on the phone. Telemarketing is notoriously annoying to prospects, at least because it is not timed to a moment of convenience to the prospect, and, since it must occur in a single unbroken conversation, it is a blatant sales pitch. Telemarketing is notoriously ineffective, has low conversion rates, requires relatively expensive human operators to handle unpredictable prospect responses, requires relatively simple scripts that human operators can follow, is wearing on the patience and psychological health of the operators, and fails to reach the most lucrative prospects. SUMMARY OF THE INVENTION
[0008] Today, most ads are not interactive. Some are static images, some are audio messages, some are video. Much research has gone into accurately targeting prospects with non-interactive ads. The greatest extent of interactivity in conventional ads is the ability to click one or a small number of buttons within visual ads to see small amounts of specific additional information or to play games.
[0009] The present disclosure is directed to systems and methods of: defining ad units that support conversational natural language interactions with prospects; delivering ads; bidding for ad inventory; providing information from prospects to publishers and from publishers to advertisers; performing machine learning from customer interactions; and related aspects of the advertising ecosystem.
[0010] By using certain embodiments, ad conversions can be accurately and meaningfully attributed to the acquisition channel. Conversion rates are higher and more measurable, and targeting is more influential and more precise.
[0011] Delivering natural language interactive ads from the interfaces of trusted machines is an ultimate form of product placement, where the published content is the behavior of the machine and the product is placed thereamid.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] FIG. 1 illustrates engagement between a conversational robot device and a prospect, according to an embodiment.
[0013] FIG. 2 illustrates a flow chart of a method of providing experiences, according to an embodiment.
[0014] FIG. 3 illustrates a human-machine dialog with conversation state and bids based on conversation state, according to an embodiment. [0015] FIG. 4 illustrates interpreting expression in the context of conversation state also used for analyzing bids, according to an embodiment.
[0016] FIG. 5 illustrates a dialog script, according to an embodiment.
[0017] FIG. 6 illustrates advertisers bidding for a publisher to deliver their ads to prospects, according to an embodiment.
[0018] FIG. 7 illustrates the flow of ads from advertisers to prospects through an advertising ecosystem, according to an embodiment.
[0019] FIG. 8 illustrates interactive ad delivery with actions and conversational feedback through an ecosystem, according to an embodiment.
[0020] FIG. 9 illustrates a system for interaction between a prospect and ad server, according to an embodiment.
[0021] FIG. 10A illustrates a rotating disk non-transitory computer readable medium, according to an embodiment.
[0022] FIG. 10B illustrates a flash random access memory non-transitory computer readable media, according to an embodiment.
[0023] FIG. 10C illustrates the bottom side of a computer processor based system-on-chip, according to an embodiment.
[0024] FIG. 10D illustrates the top side of a computer processor based system- on-chip, according to an embodiment.
[0025] FIG. 1 1 illustrates a server, according to an embodiment.
[0026] FIG. 12 illustrates a block diagram of a system-on-chip for devices, according to an embodiment.
[0027] FIG. 13 illustrates a block diagram of a server processor, according to an embodiment. [0028] FIG. 14A illustrates components of a conversational robot device, according to an embodiment.
[0029] FIG. 14B illustrates a voice assistant device, according to an embodiment.
[0030] FIG. 14C illustrates an audio earpiece device, according to an embodiment.
[0031] FIG. 14D illustrates a mobile phone device, according to an embodiment.
[0032] FIG. 14E illustrates an automobile device, according to an embodiment.
DETAILED DESCRIPTION
[0033] According to various embodiments, natural language expressions can be human expressions of one or more tokens of information, such as by speech or other means such as typing, tapping, gesturing, or direct cognitive coupling. Natural language grammars can be one or more intents, each with a set of possible expressions that map to it. Domains can be sets of grammars specific to a topic of conversation or data set. Intents can be data structures that represent a human’s hypothesized desired action expressed by a natural language expression. Some examples of actions are retrieving information and responding to a query, sending a message, performing a motion, or making a software development kit (SDK) function call to a web application programming interface (API) for performing any desired action. Interpreting can be the process of deriving an intent from a token sequence or token sequence hypothesis.
[0034] According to some embodiments, advertisers are entities who define ad units and pay for them to be delivered to prospects. Ad units comprise content, a bid, and one or more natural language grammars. According to some embodiments, ads begin by delivering an introductory message defined as content within an interactive ad unit. The introductory message is content intended to initiate a specific human-machine interaction. Some introductory messages are as short as a single phrase. Content, according to various embodiments, can be one or more of images, video, audio clips, spoken words, and so forth, wherein the content has meaning to prospects. Some introductory messages are as long as conventional commercial messages. Some embodiments present an introductory message to initiate a new conversation. Some embodiments work an introductory message into a conversation as part of a machine’s conversation turn, such as responding to a request for ingredients of a cooking recipe with a specific name of a brand of the ingredient.
[0035] According to some embodiments, publishers are entities that deliver ads to prospects and are paid for doing so. Generally, they deliver such ads amid other content that prospects desire, expect, or request. In some embodiments, a publisher uses a server to control or serve content for one or more remote devices. In some embodiments, a device alone performs the publisher function. In some embodiments, a publisher works with one or more intermediaries to receive or determine what ads to deliver to prospects. In some embodiments, a publisher receives information from or about a prospect or a user of the device that delivers ads. Some examples of device embodiments are anthropomorphic robots, household appliances, household assistant devices, wearable electronic devices, mobile phones, desktop computers, automobiles, billboards, kiosks, vending machines, and other such devices with human-machine interfaces.
[0036] According to some embodiments, after finishing delivering or after starting delivery the introductory message, publishers proceed to identify a natural language expression, provided by a prospect, indicating an engagement. Such embodiments interpret the natural language expression according to a natural language grammar defined within the interactive ad unit. In response to the interpretation finding that the natural language expression matches the grammar, such embodiments are able to determine that an engagement has occurred.
[0037] Examples of expressions that some embodiments can process are conversation-launching engagements questions such as:
[0038] “tell me more [about that ad | product]”
[0039] “what was that [ad]”,
[0040] personal assistant engagements such as:
[0041] “email me a link”
[0042] “remind me about this [tonight, tomorrow]”,
[0043] generic retail questions engagements such as:
[0044] “show [it to] me”
[0045] “where can I buy it”
[0046] “[how much | what] does it cost”
[0047] “buy it [online] [now]”,
[0048] product-specific engagement questions such as:
[0049] “how many rolls is that”
[0050] “what colors does it come in”
[0051] “how much does it weigh”,
[0052] or other question engagements without prepared responses.
[0053] Some embodiments use intermediaries, which are entities that facilitate the delivery of ad units to inventory, such as by performing one or more of: connecting advertisers to publishers; aggregating ad units; aggregating inventory; analyzing bids based on prospect attributes, context, conversation state, mood, etc.; setting pricing according to supply and bids; and providing information to advertisers about prospect engagement and conversion. The manner of engagement varies between embodiments. According to embodiments using voice-enabled, conversational devices, an engagement is a spoken response interpreted by an ad unit grammar.
[0054] Various embodiments use various modes and means of natural language interaction. Some are verbal, using text-to-speech (TTS) for a machine to output speech audio and automatic speech recognition (ASR) for the machine to recognize human speech audio. Such embodiments provide an audio experience. Some embodiments use music, sound effects, or other audio information to provide an audio experience. Some embodiments perform interactions using modes of text input and output, including displaying or transmitting text. Some embodiments use other neural sensing and stimulation modes of natural language based human-machine interaction.
[0055] Some embodiments use a visual display instead of or in addition to means of audible machine output such as speakers and some use text entry means such as keyboards or touchscreens, instead of or in addition to ASR of audio signals captured from microphones or other air pressure wave sensing devices.
[0056] Ads spoken by a machine can be more convincing if delivered in the voice that prospects are conditioned to expect from the machine. Some embodiments that deliver ads using TTS to generate speech audio output do so richly by accepting text with metadata, such as in a markup language such as Speech Synthesis Markup Language (SSML). This provides for, while using the machine’s voice, setting the tone and placing emphasis on certain words, as is useful for attracting and persuading prospects.
[0057] Many people find it easier to specify their intent with successive small increments of information needed to fully realize the intent. Modal dialogs can facilitate such interactions. A modal dialog can be a sequence of human-machine interactions in which a machine interacts with its user in turns to prompt the user to provide information or take an action that somebody desires. For example, to produce an intent for sending a text message, a machine might successively ask its user to provide the name of the recipient, then the message. For another example, to calculate a monthly mortgage payment a machine might successively ask its user to provide: the home value, the percent of down payment, the interest rate, the loan term, and other information. It could frustrate some users if another interaction interrupts their modal dialog. Some embodiments only deliver an introductory message when a conversational interface is not engaged in a modal dialog. Some embodiments, upon realizing an opportunity to deliver an introductory message do so immediately after completion of a modal dialog.
[0058] Some embodiments, even outside of modal dialogs, can interrupt an original activity, initiate engagement, and require a response before continuing the original activity. For example, while using a machine capability of playing a game, some embodiments interrupt the game to deliver an introductory message in a casual style. Some such embodiments have a timeout, after which, if there has been no prospect engagement, the original activity resumes.
[0059] Various embodiments benefit advertisers by giving precise control over when ads are delivered so that they are more effective. Some embodiments delivery some ads not immediately, but at a time that the conversation indicates would be useful. For example, a machine conversation about the weather, if the weather will be rainy, causes the machine to deliver an ad for umbrellas only at a time that the prospect is preparing to go outside.
[0060] According to some embodiments, there are multiple conditions for ending an ad conversation. A timeout at the end of a period of no voice activity can indicate an end to the ad conversation. According to some embodiments, a timeout period is 5 seconds, though shorter or longer periods are appropriate for different embodiments. In some embodiments, if a machine’s turn in the ad conversation ended with a question, the timeout is longer than if the machine turn ended with a statement. Some embodiments, to avoid performing false speech recognition of background babble, require a prospect to use a wake phrase, such as“okay robot”, to provide a response. Some embodiments can use a click of a button on a screen to initiate responses.
[0061] According to some embodiments, an indication of a lack of interest ends an ad conversation. For example, an expression from a prospect that has a high score in a domain grammar other than the ad grammar can indicate a lack of interest in the ad. Some embodiments store in conversation state the fact that an ad conversation is in progress and, accordingly, give higher weight to ad grammars when computing scores relative to other domain grammars.
[0062] According to some embodiments, conversion ends a conversation. Ad units can define what constitutes conversion. For example, a request in a navigation domain to a store selling an item can be a conversion, a request to order a product or service online can be a conversion, and a request for a future follow-up can be a conversion. Some embodiments allow an ad unit to maintain a conversational engagement indefinitely, such as with dynamic small talk, if a prospect responds accordingly. In some such embodiments, a prospect ends the conversation by allowing the system to timeout or by making an expression that scores sufficiently highly in a domain other than the ad grammar.
[0063] Some embodiments consider ad units as domains. Many ad units can be simultaneously present for the natural language interpreter and compete with each other and non-ad domains for the highest interpretation score in response to each of a prospect’s expressions. Some such embodiments allow grammar expressions to give dynamic weights to different possible phrasings of intents. [0064] Some embodiments use grammar-based interpreters. Some embodiments use purely statistical language model based interpreters.
[0065] Whereas clickable ads in web pages were more effective than print or television ads for visual modes of communication, a benefit of some embodiments of conversational ads is that they provide at least as much improvement in effectiveness for audio ads for people who are visually disabled and people who are visually otherwise engaged, such as by driving cars or piloting helicopters or space drones.
[0066] FIG. 1 shows a scenario of engagement. An anthropomorphic robot assistant natural language conversational publishing device 1 1 , having an ad unit, provides an introductory message using built-in TTS software saying, “did you hear about the new StarPaste?”. A human prospect 12 hears the introductory message and engages by saying,“tell me about it”.
[0067] FIG. 2 shows a flowchart of the operation of delivering an interactive ad experience. A system comprises a database 21 of ad units, each having bids, content, and grammars. The process begins in step 22 by evaluating the bids of the ad units in database 21 and choosing the ad unit with the highest bid value. In some embodiments, the evaluation of bids happens offline, well prior to delivery of ad units. In some embodiments, bid evaluation happens in real time. The process continues in step 23 to deliver the introductory message of ad unit chosen for its highest bid value. In step 24, after delivering the introductory message, the system receives a natural language expression from a prospect. The system proceeds in step 25 to interpret the natural language expression according to the grammar 26 from the ad unit chosen for its highest bid. If the expression matches the grammar 26, the process proceeds to step 27 in which the system performs the action indicated by the intent of the grammar 26 matched by the expression. Conversation state
[0068] Ads are most effective when they are most relevant to what is in a prospect’s mind at the time the ad is delivered. There is no better case of relevance than for an ad to be delivered when its content matches a prospect’s immediate situation or immediate topic of conversation. These are the best indicators available today of what is in a prospect’s mind at any moment.
[0069] Some embodiments allow the introductory message of an ad to comprise content conditioned by context such as time, location, user profile, and conversation state. For example, an introductory message may begin by speaking,“good morning”, “good afternoon”, or“good evening” depending on the time of day. An introductory message may say the name of the nearest store that offers a product, where the name of the store varies by location, such as,“did you know that they have the new Star brand toothpaste at Wal-Mart™?” or, “did you know that they have the new Star brand toothpaste at Carrefour™?”. An introductory message may say the name of the prospect based on a user profile. For example,“hey, Maria, did you hear about the new Star brand toothpaste?” or, “hey, Mario, did you hear about the new Star brand toothpaste?”. An introductory message may refer to specific entities within the state of a recent conversation such as, after requesting the score of the Eagles game,“before you go to the next Eagles game you should try the new Star brand toothpaste”, or when buying a new outfit“before you go to tomorrow’s party, you should try the new Star brand toothpaste”.
[0070] Some embodiments that support conditional introductory message content allow the specification of the condition based on a range, such as, if the weather forecast is for temperatures between 20 and 30 degrees Celsius (68 and 86 Fahrenheit) the message is,“check out the new Star Wear styles of activewear”, if the temperature forecast is greater than 30 degrees Celsius (86 Fahrenheit) the message is,“check out the new Star Wear swimsuit lineup”, and if the temperature forecast is less than 20 degrees Celsius (68 degrees Fahrenheit) the message is,“check out the new Star Wear down parka collection”.
[0071] Some embodiments support ad bid values being conditional based on conversation state. Some embodiments support conditioning according to thresholds, and some support conditioning according to ranges. Some embodiments support ad bid values being conditional based on context other than conversation state.
[0072] In some embodiments, conversation state is a data structure that stores grammar-relevant variables and values. Some embodiments drop conversation state variables after a certain number of conversational dialog turns. Some embodiments include time stamps in conversation state and drop conversation state variables after a certain amount of time. Some embodiments use different numbers of dialog turns or amounts of time depending on the kind of variable or its value.
[0073] Some embodiments include one or more sentiment values associated with various expressions stored in conversation state. Some embodiments maintain a current mood variable representing the mood in the environment of the human-machine interface.
[0074] Some embodiments save the conversation state when initiating an ad experience and then, after ending an ad conversation, reinstate the conversation state from before it delivered the ad experience.
[0075] Some embodiments maintain conversation state variables on a server. Some embodiments maintain conversation state variables on a client device. Some embodiments transfer conversation state with each human expression and machine response sent between clients and servers. [0076] Some embodiments share conversation state between different apps or different devices in order to provide a seamless experience for a prospect interacting with a single virtual agent through multiple devices.
[0077] FIG. 3 shows an example dialog between a prospect on the left and a conversational device on the right. First, the prospect asks,“what’s the score of the eagles game”. The device performs speech recognition and natural language interpretation according to a large array of domain grammars, some local and some on a remote server. The device recognizes“Eagles” as the name of a sports team, and, based on that and the fact that the query uses the word“score”, interprets the query as one about sports. The device sends a request to a web API of a provider of live sports information, which identifies that the Eagles are currently in an active game against the Hawks. The provider responds through the API to the device that the score is“5-7”. The device gives a spoken natural language response to the prospect,“eagles 5 hawks 7”.
[0078] Next, the prospect instructs the device,“text what’s up to 555-1234”. The device interprets the expression, realizing a high score in an SMS domain. The device uses a call to an SMS web API provider and parses the expression to determine that “what’s up” is the message of the text and“555-1234” is the phone number to which the text should be sent. The web API provides a confirmation response to the device, which provides a natural language response“message sent” to the prospect.
[0079] The prospect proceeds to ask,“where can I get some toothpaste”. The device interprets this as highly scoring in a shopping domain, reads the location from a geolocation service software function, and sends the location information to a provider of maps indicating geolocations of retail establishments. The device proceeds to access APIs for successively farther retail establishments from the device geolocation until finding one, in this case, Star-Mart, which responds positively that it has toothpaste in stock. The device proceeds to provide the natural language response, “there’s toothpaste at Star-Mart”.
[0080] Next, the prospect asks,“will it rain tomorrow”. The device interprets this as highly scoring in a weather domain, accesses a weather information provider API to get the local weather forecast, and replies,“no”. The prospect proceeds to ask,“what’s the capital of France”. The device interprets this as a question for Wikipedia, sends a request to the Wikipedia API for the Capital city information in the Wikipedia article on France, and gets a response, “Paris”. The device provides the natural language response,“Paris is the capital of France”.
[0081] During the conversation, the device builds an array of conversation state variables and values. As a result of the first query, the system adds to conversation state a variable sports_team with value eagles and a variable sports_request with value score. That is the information that the system uses to access the sports web API to get a response. The response comprises two pieces of information that the device stores in conversation state. Specifically, a variable opponent with value hawks and a variable score_value with value 5-7. As a result of the second interaction, the device adds to conversation state variables text_message =“what’s up” and text_number = 555-1234. The third interaction causes the device to store shopping product = toothpaste, shopping request = nearest_store, and shopping response = star-mart. The fourth interaction causes the device to add to conversation state weather request = will rain, weather time = tomorrow, and weather response = no_rain. The fifth interaction causes the device to add to conversation state wikipedia_request = capital_city, wikipedia_article = France, and wikipedia_answer = Paris.
[0082] The device also maintains an array of bid functions for various ad units. The array includes three bid functions, one for each of an ad unit for StarShoes, and ad unit for StarPaste, and an ad unit for StarTheater. The ad bid functions describe an amount of money offered for delivering the ad unit. Some embodiments accept bids in units of money for 1000 deliveries of an ad unit. Some embodiments support, and tend to process, more complex ad bid functions than shown in FIG. 3.
[0083] Though the conversation state includes a variable shopping product, because it is not = shoes, the StarShoes ad bid is for a 0.3 amount of money. Because conversation state includes a shopping product variable and its value is toothpaste, the StarPaste ad bid is for a 0.9 amount of money. Although conversation state includes a wikipedia_request variable, because its value is not person age, the StarTheater ad bid is for 0.8 amount of money. Therefore, StarPaste is the ad unit with the highest bid function value.
[0084] After a certain period of silence, the device delivers the introductory message of the StarPaste ad unit. The introductory message is TTS generated natural language speech saying,“did you know that StarPaste is on sale at Star-Mart for just 2 bitcoins?”. This invites the prospect to engage with the ad.
[0085] FIG. 4 shows a process according to an embodiment. In a stage 41 , a publisher receives an interprets expressions according to a grammar 42 and array of conversation state variable 43. Accordingly, the publisher produces an intent. In stage 44 the publisher performs an action specified by the intent. The interpretation stage also produces references to specific entities, which the publisher stores in the array of conversation state variables 43. Meanwhile, a stage 45 receives a multiplicity of ad units. It analyzes the bids of the ad units in the context of the conversation state 43. Accordingly, the analyzing stage 45 chooses a highest bidding ad unit and outputs its introductory message. The purpose is to engage a prospect such that the prospect makes an expression that matches the ad unit’s grammar. Mood
[0086] Prospects, when encountering ads, assign mental feelings to advertised products or services according to the prospects’ feeling at the time of encountering the ad. The best instantaneous gauge of a prospect’s current feeling is emotion detection. Many emotion detection algorithms exist that use natural language, speech prosody, camera input, and other forms of sensors and algorithms to estimate people’s emotions. Whereas emotion is a property of a person, mood is an ambient property of interaction between people or between people and machines.
[0087] Some embodiments detect a mood and deliver ad unit introductory messages conditionally based on the mood. Some embodiments support ad content, such as the introductory message content, being conditioned by the mood. Some embodiments support ad bid values being conditioned by mood. Some embodiments support conditioning according to thresholds, and some support conditioning according to continuous values. Some embodiments reevaluate the mood frequently, and some embodiments reevaluate the mood at occasional intervals.
[0088] Using mood to condition delivery of ads allows advertisers to control mental feelings that prospects assign to advertised products and services.
Scripted conversations
[0089] The conversational ads of some embodiments guide prospects through a script. In some embodiments, scripts can have multiple conditional conversation paths, and the machine offers explicit choices. For example, the machine might ask,“would you like to know more about the health benefits, the flavor, or the brightness of StarPaste?”
[0090] Whereas a purpose of an introductory message is to achieve an engagement, a script can encourage prospects to convert the engagement into a purchase. Different prospects have different interests. By analyzing their natural language interactions, a scripted ad unit can classify the prospect. By classifying the prospect, the script is able to provide information that most effectively encourages the prospect to convert.
[0091] In some embodiments, the machine uses information gleaned from prospect responses to profile the instantaneous state of the prospect’s thoughts and feelings. For example, if the prospect speaks more quickly and loudly than usual, the ad might offer calming words. For another example, if the conversation state indicates recent discussion of money, the ad might offer information about a discount price.
[0092] It can be useful to advertisers and intermediaries to know how a prospect engages with an ad. Some embodiments send feedback from the human-machine interface to the publisher, intermediary, or advertiser at each step in the dialog script. In some embodiments, the feedback indicates the script location. In some embodiments, the feedback includes transcripts or audio recordings of the captured prospect expressions. Some embodiments include personally identifying information. Some embodiments charge advertisers or intermediaries extra money for extra information.
[0093] FIG. 5 shows an embodiment of an ad dialog script. It begins with an introductory message 51 telling the prospect,“did you hear about the new StarPaste?”. The interface proceeds to wait for a recognized expression from a prospect and match any received expression to an ad unit grammar. If an expression matches a grammar showing disinterest, such as including a phrase,“don’t care”,“shut up”, or silence, the script replies with “sorry”, ends the conversation, and the interface device sends feedback to that effect to the advertiser.
[0094] If, after the introductory message 51 , a prospect’s expression matches with a strong score in a domain other than the ad 53, the script ends the ad conversation, the interface sends feedback to that effect to the advertiser, and the interface proceeds with a conversation in the other domain.
[0095] If, after the introductory message 51 , a prospect’s expression matches a grammar indicating engagement 54, such as“tell me more” or“what about it”, the script proceeds to offer a follow-up content item, which is a message,“it gives a healthier mouth and more sparkle”. The script calls for classifying the prospect as one who has a greater interest in health than vanity or the other way around. The message includes information (“healthier mouth”) that appeals to health-concerned prospects or information (“more sparkle”) that appeals to vanity-concerned prospects.
[0096] If a next prospect expression matches a grammar related to health, such as by saying,“healthy how?” or“does it have fluoride?” in path 55 the script replies with a message to entice health-concerned prospects to move forward. The message is, “dentist recommended with double fluoride”. If a next prospect expression accesses a grammar related to vanity, such as by saying,“brighter how?” or“what about breath?” in path 56 the script replies with a message to entice vanity-concerned prospects to move forward. The message is,“number 1 in Hollywood for shiny fresh breath”.
[0097] After taking path 55 or path 56 the script proceeds to encourage the prospect to convert to a purchase with message 57 saying,“StarPaste is on sale now at Star-Mart for just 2 bitcoins”. At this point, the script ends the conversation and the interface sends corresponding feedback to the advertiser.
[0098] Some embodiments support, and some ad units define much more complex scripts with many conditional paths, multiple dimensions of prospect classification, and conditional, personalized message content. Privacy
[0099] Some embodiments can deliver ads that would embarrass or shame a prospect if other people heard or saw the ad’s introductory message. For example, some ads may be for gifts for surprise parties, some ads may be for products that treat private medical conditions, and some ads may be for services that serve prurient interests.
[0100] Some embodiments allow for ads, or ad bids, to be conditioned by the presence of people other than the target prospect. Some embodiments do so with cameras and image processing to detect people. Some embodiments use audio voice discriminating algorithms to detect whether multiple speakers are present. Some embodiments detect the radio signal emissions from personal portable electronic devices such as Bluetooth, Wi-Fi, or cellular network signals from mobile phones.
[0101] Some embodiments that perform detection of the presence of others lower a privacy level variable when other people are present. Some such embodiments detect attributes of people present, such as using image processing or voice characterization to determine whether male or female people or adults or children are present.
[0102] Some embodiments raise a privacy level variable when the natural language human-machine interface is through a personal interface, such as a headphone, wearable device, or personal mobile handset.
[0103] Some embodiments parsing ad unit code to detect a privacy level requirement associated with the ad unit. They only allow the ad to be displayed, and in some embodiments only consider the ad in the bidding, if the current privacy level is at or above the level defined in the ad unit code. Intents and actions
[0104] Natural language understanding systems interpret expressions to determine intents. Intents are data structures that can represent actions that the speaker wishes a machine to perform. In some embodiments, interpreting is according to context-free grammar rules, such as ones created by system designers. In some embodiments, interpreting is by application of statistical language models to the words expressed. This can be done in various ways, such as by using trained neural networks.
[0105] According to some embodiments, interactive natural language ads comprise grammars with rules that define specific intents according to words expressed by prospects. For example, a grammar may recognize the words“tell me more” as an intent of requesting additional information, which means requesting the interface to perform an action of delivering more information about the product or service of the ad. A grammar may recognize the words“shut up” as an intent requesting the action of ending the ad conversation. A grammar for a mobile phone ad may recognize the words “how’s its battery life” as a request for specific information about the battery capacity and power consumption of the advertised phone. A grammar may recognize the words “get it” as an intent requesting an action of ordering the advertised product or service [0106] Many means can be used for executing actions defined by intents of natural language grammars. For example, a text message sending function provides means of sending text messages. An order placement function provides means of ordering the delivery of an advertised product or service. A mapping function provides means for giving prospects directions to a place for purchasing the advertised goods or services. A browser provides means of presenting web content, such as hypertext markup language (HTML) and cascading style sheet (CSS), about the advertised goods or services. A browser also provides means for executing certain kinds of scripts, such as ones written in the JavaScript language. Such scripts can cause computers to perform any action of which they are capable. An autonomous vehicle provides a means for transporting a prospect to a location to purchase the advertised goods or services. Various means for executing actions indicated by intents, as appropriate for different applications of embodiments, will be apparent to ordinarily skilled artisans.
[0107] Various embodiments, to the extent that they are able, execute the action indicated by the intent. Some embodiments, if unable to carry out the intended action reply to the prospect, to the advertiser, or both that it is unable.
Bidding
[0108] According to some embodiments, inventory is the ad space/time that a publisher has available to sell. Many methods are known for ad bidding on available inventory.
[0109] FIG. 6 shows an embodiment in which advertisers 62a, 62b, 62c provide ad units, comprising bids, along with money to a publisher 61 . The publisher provides desirable content along with ads to prospects 63a, 63b, 63c. Hopefully, ad conversion happens and prospects spend money on goods or services from the advertiser.
[0110] Systems of bidding can include numerous intermediaries and types of intermediaries between advertisers and publishers. FIG. 7 shows a system in which advertisers 72a, 72b, 72c provide ad units to a media buyer 75, who aggregates and packages ad units for distribution to multiple publishers and negotiates pricing. Typical media buyers are large ad agencies. The media buyer 75 provides ad units to a trading desk 76, which bids on an ad exchange 77. Large ad agencies tend to have trading desks. Ad exchange 77 matches ad unit bids with inventory and makes a market with programmatic auction pricing. AppNexus™ is an example of an ad exchange. In an alternate path, another advertiser 72d and the media buyer 75 provide ad units to an ad network 74. Google™ AdWords™ is an example of an ad network. The ad network facilitates automatic placement of ads within available inventory. A publisher 71 offers inventory on the exchange 77 and the ad network 74 to deliver to prospects 73a, 73b, 73c.
[0111] Some ad exchanges implement programmatic bidding, which provides high liquidity in the ads market. Programmatic bidding in various embodiments depends on user profile, present location, location history, present activity, time of day, time of week, time of year, types of ads recently delivered, types of information in conversation state, specific values of information in conversation state, mood, feelings, and neural activity patterns.
[0112] According to some embodiments, a publisher maintains conversation state for a conversational human-machine interface. The publisher analyzes bids for ads in relation to the conversational human-machine interface. The bids indicate interesting values of conversation state variables and are influenced by at least one current conversation state variable having the interesting conversation state variable value.
[0113] Some embodiments support bids based on the presence of keywords in conversation state. Some embodiments support bids based on the presence of a particular domain of conversation in conversation state. For example, weather is a domain and sports is another domain. Some embodiments support bids based on the presence of a particular variable within conversation state. For example, sports_team is a variable. Some embodiments support bids based on specific values of variables. For example, “eagles” can be a value of the variable sports_team. Some embodiments support bids based on sets of possible values of variables, such as“eagles”,“hawks”, and“falcons” as values of a variable sports_teams. Some embodiments support bids based on ranges of values of numeric variables, such as a temperature variable with a value between 20 and 30 degrees Celsius (68 to 86 Fahrenheit). Some embodiments support bids based on combinations of the above criteria. Some embodiment support specifying bids as programmatic equations of any of the presence of types of variables, specific values, sets of values, and ranges. Such equations determine a degree of match between a bid and conversation state. Equations can also express negative influences on bid value based on the factors described above.
[0114] Some embodiments support bids based on mood. Such embodiments detect a current mood value at the human-machine interface and evaluate bids in the context of the mood value. Some embodiments receive an externally computed mood value and use the value to influence the computed bid value.
[0115] Some embodiments support bids conditioned by whether a prospect is in a private listening environment. Such embodiments detect the environment and compute bid values accordingly. Some such embodiments determine the listening environment by detecting the presence of other people.
[0116] Some embodiments perform voice activity detection at the human- machine interface and only output an ad introductory message when there is no voice activity. Some embodiments wait for a certain amount of time after detecting no voice activity before outputting an ad introductory message.
[0117] Some embodiments support bids based on a specific intent of a natural language expression. Some embodiments support bids based on a specific action triggered by one or more intents.
[0118] Some embodiments charge advertisers for their bids only when engagement occurs. Some embodiments charge advertisers only when a conversion occurs. Some embodiments charge advertisers based on how many interactions a prospect has with the ad before conversion or disengagement. Some embodiments charge advertisers based on what kinds of interactions a prospect has, such as whether or not the prospect makes a price request or whether or not the prospect expresses a negative sentiment.
[0119] Some embodiments are methods of defining bids and methods of analyzing bids. Some embodiments are systems that perform the analysis of bids. Some embodiments are non-transitory computer readable media that store bids or bid functions.
Feedback and analytics
[0120] Knowing the present state of mind of a prospect, the exact topic in mind, and the specific objects of a prospect’s thoughts, are extremely valuable for offline analysis by advertisers and analytics companies. However, knowing such information in real-time is much more valuable.
[0121] It is only with difficulty, complexity, and low accuracy that ad feedback based on clicks can determine a prospect’s state of mind, topic, and objects of thought. Natural language processing of a prospect’s communications, such as by processing the content of email messages, is useful for determining state of mind, topic, and objects of thought. However, it is not real-time.
[0122] Some embodiments, by processing natural language conversations in real-time, extract extremely valuable information for ad targeting. Such information is useful when applied to real-time bidding algorithms for conversational ad units. It is also useful to advertisers if fed back from human-machine interactions.
[0123] Some embodiments feed information back for real-time bids by interpreting natural language expressions from prospects to create intent data structures. They proceed to analyze the intent data structures to derive analytical results in real-time. Then they provide the analytical results to real-time bids of advertisers. [0124] Some embodiments provide feedback, such as whether engagement or conversion happened, on a per-expression basis. Some embodiments provide feedback on a per-ad conversation basis.
[0125] In some embodiments, the analytical results comprise an indication of an intent. Some embodiments, with the analytical results, feed back an indication of the domain of conversation prior to the engagement. Some embodiments, with the analytical results, feed back an indication of the domain of conversation immediately following the ad interaction.
[0126] Some embodiments feed personally identifying information back to advertisers. Some embodiments specifically hide personally identifying information. One reason to do so is to ensure privacy of system users. Another use of personally identifying information is to sell it to advertisers for a higher payment than for anonymized prospect interaction analysis (analytics).
[0127] Some embodiments use feedback analytics about topics, intents, semantic information, mood/emotions, questions asked, engagement rate, other engagement metrics, conversion rate of ads and categories of ads, reach, deliveries, amount spent, and other conversion metrics. Analytics can include which/how many ads, of an array of competing ad bids, matched a condition for ad unit delivery. This improves the ability to determine timing of delivering ads.
[0128] Proctor & Gamble™ defines the first moment of truth as the time when a prospect purchases a product and the second moment of truth as the time when the prospect first uses the product. Google™ defines the zero moment of truth as the time when a prospect first searches for information about a type of product. Some key query phrases indicate a zero moment of truth, such as“show me ...” or“where can I get ...”. Some intents derived from natural language interpretations indicate a zero moment of truth. Numerous language patterns can match grammars that create intents related to research about products or services.
[0129] Some embodiments identify ad opportunities by monitoring natural language interactions at human-machine interfaces. They interpret natural language expressions to identify zero moment of truth intents. At such time, such embodiments identify a type of product or service referenced in the query or in conversation state, then upon identifying a referenced product or service, alert an ad bid for the product or service and raise the price of ad delivery because of its timely relevance. Some such embodiments proceed to deliver the ad unit identified by the bid. Some such embodiments deliver an introductory message for an ad unit for the product or service. Some such embodiments proceed to receive follow-up expressions from a prospect and match them to a grammar.
[0130] FIG. 8 shows a timeline of information flow between an advertiser, an intermediary, a publisher, and a prospect according to an ad conversation scenario of an embodiment. At time to, before the conversation begins, an advertiser provides an ad unit to an intermediary. The intermediary prices the ad unit based on its bid in an auctioning process.
[0131] At time t1 , which can be much later than time tO, the publisher identifies the ad unit with the winning bid. The publisher proceeds, at time t2, to provide meta information to the intermediary and the advertiser. The meta information can comprise an identification of the winning ad, the conversation state, most recent conversation domain, and mood at the time of choosing the ad, the geolocation and user profile of the prospect, and dynamic pricing information, such as whether there is a zero moment of truth ad delivery premium. [0132] Shortly thereafter, at time t3 the publisher delivers the introductory message of the ad unit. In this scenario, the prospect provides an engagement spoken expression, also known, using browser-ad terminology, as a spoken click (slick). The slick requests specific information on the pricing of the advertised product. At time t5, the publisher sends a web API request to the advertiser for the pricing information. At time t6, the advertiser provides a web API response to the publisher with marked up text for a TTS response. At time t7 the publisher uses TTS software built into its human- machine interface device to synthesize speech in the machine’s normal voice and outputs the synthesized speech audio through a speaker.
[0133] At time t8 the prospect provides a second slick, asking to place an order. At time t9 the publisher makes a web API request to an intermediary for order fulfillment. The order fulfillment intermediary processes the order for delivery and, at time 10, provides a web API response to the publisher. At time t1 1 , the publisher performs an action of emailing a receipt and provides a TTS response to the prospect confirming the order.
[0134] At time t12 the publisher reports the conversion to the advertiser for use in analytics, and the publisher outputs a thanking message as TTS speech audio.
[0135] Various embodiments have various arrangements of a human-machine interface device and one or more connected servers for publisher, advertiser, and intermediary roles. In one embodiment, a single device communicates with an ad server that comprises at least one ad unit.
[0136] In another embodiment, advertisers have multiple divisions in different offices, some having programmers that develop grammars, some having artists that develop ad content, and some having ad departments the combine grammars and content into ad units and combine ad units into campaigns. The advertiser passes the ads to an ad agency, which stores ads on database servers and accesses them by media buyers who interact with other intermediaries. The media buyers deliver ads and bids to ad networks and to third-party trading desks, which manage supply side networks. The trading desks deliver ad bids to third-party exchanges, which match bids to inventory. Ad units that win bids are sent to publishers with servers distributed geographically for low latency access to prospects through devices. Ad units are stored locally on different types of devices designed by different consumer product manufacturers, and the ad unit content is delivered to prospects under the control of the publisher.
[0137] FIG. 9 shows an embodiment. A prospect 91 communicates with a publisher device 92. It comprises a speaker 93 that delivers audio content to the prospect 91 . The publisher device 92 further comprises a microphone 94, which receives audio that includes natural language speech from the prospect 91 .
[0138] The publisher device 92 communicates, through network 95, with ad server 96. The ad server 96 delivers ad units and executes ad bid functions to select the most valuable ad units to deliver.
[0139] FIG. 10A shows a non-transitory computer readable rotating disk medium 101 that stores computer code that, if executed by a computer processor, would cause the computer processor to perform methods or partial method steps described herein.
[0140] FIG. 10B shows a non-transitory computer readable Flash random access memory (RAM) chip medium 102 that stores computer code that, if executed by a computer processor, would cause the computer processor to perform methods or partial method steps described herein.
[0141] FIG. 10C shows the bottom (solder ball) side of a packaged system-on- chip (SoC) 103 comprising multiple computer processor cores that comprises a component of some embodiments and that, by executing computer code, perform methods or partial method steps described herein. FIG. 10D shows the top side of the SoC 103.
[0142] FIG. 1 1 shows a rack-based server system 1 1 1 , used as a component of various embodiments. Such servers are useful as advertiser servers, publisher servers, and servers for various intermediary functions.
[0143] FIG. 12 shows a block diagram of the cores within the system-on-chip 103. It comprises a multi-core computer processor (CPU) 121 and a multi-core graphics accelerator processor (GPU) 122. The CPU 121 and GPU 122 are connected through a network-on-chip 123 to a DRAM interface 124 and a Flash RAM interface 125. A display interface 126 controls a display, enabling the system to output Motion Picture Experts Group (MPEG) video and Joint Picture Experts Group (JPEG) still image ad content. An I/O interface 127 provides for speaker and microphone access for the human-machine interface of a device controlled by SoC 103. A network interface 128 provides access for the device to communicate with servers over the internet.
[0144] FIG. 13 shows an embodiment of the server 1 1 1 . A multiprocessor CPU array 131 and a GPU array 132 connect through a board-level interconnect 133 to a DRAM subsystem 134 that stores computer code and a network interface 135 that provides internet access to other servers or publisher devices.
[0145] Various embodiments of devices can be used to publish interactive natural language ads. Some are mundane, and some are exciting.
[0146] FIG. 14A illustrates components of the exciting anthropomorphic robot assistant natural language conversational publishing device 1 1 . It comprises a speaker 142 on each side of the device in order to output audio. The device comprises a microphone array 143, which comprises several microelectromechanical system (MEMS) microphones, physically arranged to receive sound with different amounts of delay. The device comprises an internal processor that runs software that performs digital signal processing (DSP) to use the microphone array 143 to detect the direction of detected speech. The device 1 1 further comprises a module 144 with two cameras to provide stereoscopic image and video capture. Further DSP software runs neural network- based object recognition on models trained on human forms in order to detect the location and relative orientation of one or more prospects. The device 1 1 further comprises a display screen 145 that, for some ad units, outputs visual ad content such as JPEG still images and MPEG video streams. The device 1 1 further comprises a wheel 146a and a wheel 146b, each of which can turn independently or in unison. By turning in accordance, the device is able to move, such as to follow a prospect around. By turning independently, the device is able to turn, such as to face and monitor the movement and activity of a prospect. The device 1 1 further comprises a power switch 147, which a prospect can use to shut the device up if it becomes annoying.
[0147] FIG. 14B shows an embodiment of a home virtual assistant and music playing device 148. FIG. 14C shows an embodiment of a Bluetooth-enabled earpiece device 149. FIG. 14D shows an embodiment of a mundane mobile phone 1410. FIG. 14E shows an embodiment of an automobile 141 1 .
[0148] Some embodiments, such as the home virtual assistant 148 give little privacy to its users. Some embodiments, such as earpiece 149 give great privacy to prospects. Some embodiments, such as the mobile phone 1410 have visual display screens. Some embodiments are screenless, such as the earpiece 149, which has no display screen. Some embodiments, such as the home virtual assistant 148 are stationary. Some embodiments, such as the automobile 141 1 are mobile. Some embodiments, such as the mobile phone 1410, are stationary. [0149] Some elements in the flow of ads may be present in different countries, though the functioning of the methods and systems and computer-readable media of ad delivery constitute full embodiments. In other words, passing ad units or their components through servers in different countries does not avoid direct infringement of claimed methods, systems, and computer readable media.
Code examples
[0150] In the example ad unit code listing section below is a code listing for an example ad unit. It uses a specialized programming language with syntax that is similar to that of C. Lines 3-55 describe a bid function. The bid value assigned in lines 40-53 is conditioned on whether it is blocked for specific user IDs. The bid value is further conditioned on the privacy level being personal, as opposed to shared or public. The bid value is further conditioned by a mood value being above a threshold. The bid value is further positively affected by the presence in conversation state of specific recent domains, specified in lines 6-8. The bid value is negatively affected by the presence of certain domains in conversation state, specified in lines 9-13. The bid is further positively and negatively affected by the presence of keywords in recent conversation state, the keywords specified in lines 15-27. The bid is further positively and negatively affected by the presence of certain meta information descriptive of the present prospect, environment, and machine state. The degrees of effect on the bid value of each of different domains, keywords, and meta information are scaled to the ratio of 2.0, 1 .5, and 2.5 respectively.
[0151] Lines 61 -68 describe an introductory message. It includes a reference to an image for display, an audio clip for output, and some text to be spoken, marked up with word emphasis. [0152] The ad unit is restricted to non-offensive content in line 71 . The ad unit is assigned to provide a high level of information reporting from publisher to advertiser in line 79. The ad is configured to listen for prospect responses for 10.0 seconds before considering the conversation to have ended without an engagement in line 82.
[0153] Lines 85-96 define a grammar, including certain phrasings with optional wordings. The grammar intents are function calls to functions defined outside of the ad unit or defined below within the ad unit.
[0154] Lines 99-1 19 specify content elements, including video clips, animations, images, video streams, and TTS text with mark-up for conditional content, such as finding the nearest city to a current device latitude and longitude.
[0155] Lines 122-167 define custom functions called by the grammar intents. Lines 123-132 specify a function for delivering a sequence of additional information content in response to prospect requests for more information. Lines 134-137 specify a function for sending a text message. Lines 139-142 specify a function for sending an email message. Lines 144-147 specify a function for looking up the price of a particular stock keeping unit (SKU) number for the advertised product. Lines 149-152 specify a function for displaying an image of the product if the publishing device has a display screen. Lines 154-164 specify a function for finding the nearest store with the SKU available, the price of the product at the nearest store, and whether the product is currently on sale.
[0156] Lines 165-167 specify a function for setting an indication, per user ID, of whether the prospect indicated a non-interest in the product. If so, the ad will not be delivered in the future for that user ID.
[0157] The code shown in the example below is illustrative of just some capabilities of some embodiments. Sophisticated advertisers will create ad units with more code, more sophisticated bidding, more complex and varied grammars, more dependencies, and more and subtler content. Some embodiments will provide other system functions available for use in ad units and support more complex code constructs.
[0158] Some embodiments provide simple templates for less sophisticated advertisers. For example, an embodiment provides for an SKU and TTS strings and automatically provides capabilities for looking up pricing and locations, displaying web content, and providing grammars for answering product-general queries. An ad definition for a simple template looks like the following.
[0159] BID = 1 .5
[0160] SKU = 9781542841443
[0161] INTRO =“did you hear about the new StarPaste?”
[0162] MORE =“it gives a healthier mouth and more sparkle”
[0163] The system provides the grammar, including responding to“tell me more” with TTS of the MORE text and pricing lookup using the SKU value.
Example ad unit code listing
001 ad StarPaste {
002 float PPKI = 0.25; // default price per thousand impressions
003 bid{
004 float mood_min: 0.5; // lower threshold for mood
005 // domain of conversation weighting
006 domain cntxt_domain_pos = {"health",
007 "grocery",
008 "dating"};
009 domain cntxt_domain_neg = {"math",
010 "geography",
Oil MoonPaste", // a competing product ad "MoonPaste " ,
"MoonPaste " } ;
// keyword weighting
words cntxt_words_pos = {"smell",
"breath" ,
"breath", // repetition boosts weight "breath" ,
"fresh " ,
"date" ,
"night",
"sleep",
"morning" } ;
words cntxt_words_neg = {"directions",
"text " ,
"call",
"problem" } ;
// meta information weighting
meta cntxt_meta_pos = { "location=US",
"gender=female " ,
"age>16",
"environment=home " ,
"environment=retail-grocery " ,
"people_present<4 " ,
"maie_present=FALSE " } ;
meta cntxt_meta_neg = { "time=9to5",
"environment=restaurant " ,
"male_present=TRUE " } ;
// don't give add to prospects in a bad mood
float bid_val =
check_dont_show (user_id ( ) ) ? 0 :
(privacy_level < PERSONAL) ? 0 : (mood < mood_min) ? 0 :
PPKI * mood
// only give ad if recent conversation domains include
// more positive than negative domains
* 2.0 *min ( count ( cntxt_domain_pos ) - count ( cntxt_domain_neg) , 0 ) // only give ad if recent keywords include
// more positive than negative domains
* 1.5 *min ( count ( cntxt_words_pos ) - count (cntxt_words_neg) , 0 )
// only give ad if recent meta information includes
// more positive than negative domains
* 2.5 *min ( count ( cntxt_meta_pos ) - count ( cntxt_meta_neg) , 0 ) ; return bid_val;
// introductory message
// use image if human-machine interface has a screen
// use tts if human-machine interface has TTS service
// use audio if tts service is unavailable
intro {
image = " . /media/StarPaste . jpg" ; // ref to image content
// for devices with audio out but no TTS ability
audio = /media/ j ingle .mp3 " ;
// system outputs speech from the global tts string variable tts = "have you heard about the <emphasis level=" strong">new
</emphasis> Star brand StarPaste?";
// global setting to block ads with offensive content or grammars non_offensive_restriction = TRUE;
// global setting to have system report to the advertiser: // every conversation expression (ones that hit the grammar and an
// immediately following one if it hits no other grammars)
// along with: user ID, time stamp, location, language/accent, // people present (specific ID if detected or number and
// characteristics), mood, conversation state
reporting_level = HIGH;
// global setting for seconds with until timeout if no response conversation timeout = 10.0;
/ / the grammar
grammar {
"tell me more" => next_more();
"[can youlplease] text [me] a link [to me]" => text_url(); "[can youlplease] send [me] a link [to me]" => email_url(); "how much is it" => say_cost();
"[how much | what] does it cost" => say_cost();
"show me" => show_product ( ) ;
"where can I [buy | get] [it | some]" => where_to_buy ( ) ;
"i don't care" => set_dont_care ( ) ;
"dont tell me about it" => set_dont_care ( ) ;
"i hate [it | that]" => set_dont_care ( ) ;
// content objects defined for screen and screenless devices content morel {
tts = "it's made with ground unicorn horn for a magic sparkle"; video = /media/ sparkle . mpg" ; // ref to video content
}
content more2 {
tts use it today and get a date by tonight, guaranteed"; flash . /media/date . swf " ; // ref to Flash animation
}
content more3 {
tts = "it's <nearest_city (lat_long ( ) ) > ' s silkiest tooth paste"; image = /media/ lingerie . gif " ; // ref to gif animation of silk content more4 {
tts = rand(
"why don't you add it to your cart",
"ask your shopping assistant for express delivery",
"it's available everywhere that fine toiletries are sold",
) ;
// ref to live stream of a celebrity spokesperson
stream = "http://www.starpastes.com/celebrity.strm";
// grammars can call any ad-specific functions
int more_idx=0;
function next_more {
if (more_idx <= 4) more++;
case (more_idx) {
// the deliver function handles output of content objects 1: deliver (morel ) ;
2: deliver (more2 ) ;
3: deliver (more3 ) ;
4: deliver (more4);
// send SMS
function text_url {
send_text ( "http : / /www .starpastes. com/robo_ad . htrnl " ) ; tts here you go";
}
// send email
function email_url {
send_email ( "http : //www .starpastes. com/robo_ad . htrnl " ) ;
tts = "done";
}
// speak cost
function say_cost {
float price = get_price (" 9781542841443 ") ; // StarPaste SKU tts = "it's only <price>";
}
// display product on screen
function show_product {
if (DISPLAY_SCREEN = TRUE)
image = " . /media/StarPaste . jpg" ;
}
// say where to buy the product, price, and if it's on sale function where_to_buy {
// find nearest store with item in stock
int store_id = nearest_stock ( " 9781542841443 " ) ;
// check price at nearest store using store_name function tts = "it's available at <store_name ( store_id) > for just
<price_check ( store_id, "9781542841443") >";
// check whether the item is on sale using store_sales function if (store_sales (store_id, "9781542841443"))
tts = "it's on sale at <store_name (store_id) >
<emphas is level=" strong">today</emphasis> " ; set_dont_care {
set_dont_show (user_id ( ) ) ; 167 }
168 }

Claims

CLAIMS What is claimed is:
1 . A non-transitory computer readable medium storing ad units, the ad units comprising: an introductory message; and
a definition of a natural language grammar associated with each of the ad units, the natural language grammar comprising at least one intent,
wherein, if an ad server system interprets an expression, from a prospect, matched by the grammar, after starting delivery of the introductory message, the ad server system would determine the intent.
2. A method comprising specifying a bid function for an ad unit wherein the bid function, if executed by a system storing values of conversation state variables representing a human-machine conversation, would cause the computation of a bid value that depends on the stored values of the conversation state variables.
3. The method of claim 2 wherein the conversation state variable represents one or more keywords.
4. The method of claim 2 wherein the conversation state variable represents a domain.
5. The method of claim 2 wherein the bid value further depends on the result of a programmatic equation.
6. The method of claim 2 wherein the computation of the bid value would depend on the value of a mood variable.
7. The method of claim 2 wherein the bid function is conditioned by an indication of a desire for privacy.
8. A non-transitory computer readable medium storing code that defines a real-time bid wherein the bid depends on an intent of a natural language expression.
9. A non-transitory computer readable medium storing code that defines a real-time bidding bid wherein the bid depends on a domain of conversation.
10. A method of feeding back information for a real-time bid, the method comprising: interpreting a natural language expression from a prospect to create an intent data structure;
analyzing the intent data structure to derive analytical results in real-time; and providing the analytical results to a real-time bid of an advertiser.
1 1. The method of claim 10 wherein the analytical results comprise an indication of an intent.
12. The method of claim 10 wherein the analytical results comprise an indication of the domain of conversation prior to the engagement.
13. The method of claim 10 wherein the analytical results comprise an indication of the domain of conversation immediately following the ad interaction.
14. The method of claim 10 wherein the analytical results include personally identifying information.
15. The method of claim 10 wherein the analysis hides personally identifying information.
16. A method of identifying an ad opportunity comprising:
monitoring natural language interactions at a human-machine interface;
interpreting a natural language expression as a query that identifies a zero moment of truth intent;
identifying a type of product or service referenced in the query; and
alerting an ad bid.
17. The method of claim 16 further comprising delivering an ad unit identified by the ad bid.
18. The method of claim 16 further comprising delivering an introductory message associated with the ad bid.
19. The method of claim 18, further comprising receiving a follow-up that matches a grammar.
PCT/US2017/068211 2017-12-22 2017-12-22 Natural language grammars adapted for interactive experiences WO2019125486A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
PCT/US2017/068211 WO2019125486A1 (en) 2017-12-22 2017-12-22 Natural language grammars adapted for interactive experiences
JP2018230121A JP7178248B2 (en) 2017-12-22 2018-12-07 Natural Language Grammar Adapted for Interactive Experiences
EP18214971.6A EP3502923A1 (en) 2017-12-22 2018-12-20 Natural language grammars adapted for interactive experiences
CN201811572622.6A CN110110317A (en) 2017-12-22 2018-12-21 It is suitable for the grammar for natural language of interactive experience
JP2020036823A JP7129439B2 (en) 2017-12-22 2020-03-04 Natural Language Grammar Adapted for Interactive Experiences
JP2021093124A JP7525445B2 (en) 2017-12-22 2021-06-02 Method, program and computer for delivering context-aware introductory messages - Patents.com
JP2024114807A JP2024147731A (en) 2017-12-22 2024-07-18 Programs and methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2017/068211 WO2019125486A1 (en) 2017-12-22 2017-12-22 Natural language grammars adapted for interactive experiences

Publications (1)

Publication Number Publication Date
WO2019125486A1 true WO2019125486A1 (en) 2019-06-27

Family

ID=66993708

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/068211 WO2019125486A1 (en) 2017-12-22 2017-12-22 Natural language grammars adapted for interactive experiences

Country Status (3)

Country Link
JP (4) JP7178248B2 (en)
CN (1) CN110110317A (en)
WO (1) WO2019125486A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11900928B2 (en) 2017-12-23 2024-02-13 Soundhound Ai Ip, Llc System and method for adapted interactive experiences

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517096A (en) * 2019-08-30 2019-11-29 百度在线网络技术(北京)有限公司 Content method for implantation, device, electronic equipment and storage medium
JP7015291B2 (en) * 2019-11-19 2022-02-02 株式会社ラストワンマイル Information processing equipment
JP7314442B2 (en) * 2020-07-17 2023-07-26 株式会社三鷹ホールディングス point signage business system
WO2022108073A1 (en) * 2020-11-20 2022-05-27 삼성전자주식회사 Server and control method therefor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090072599A (en) * 2007-12-28 2009-07-02 (주)다음소프트 Method and apparatus for advertisement by chatting robot used in messenger
US20120016678A1 (en) * 2010-01-18 2012-01-19 Apple Inc. Intelligent Automated Assistant
KR20130112221A (en) * 2012-04-03 2013-10-14 주식회사 로보플래닛 System and method for providing conversation service connected with advertisements and contents using robot
KR20160023935A (en) * 2014-08-20 2016-03-04 한국해양대학교 산학협력단 Apparatus and method for providing advertisement by using vending machine
KR20170027705A (en) * 2014-04-17 2017-03-10 소프트뱅크 로보틱스 유럽 Methods and systems of handling a dialog with a robot
US20170221157A1 (en) * 2014-05-06 2017-08-03 International Business Machines Corporation Real-time social group based bidding system
JP2017220077A (en) * 2016-06-09 2017-12-14 真由美 稲場 Program for obtaining function for supporting communication by understanding personality and likings of other party

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000021232A2 (en) * 1998-10-02 2000-04-13 International Business Machines Corporation Conversational browser and conversational systems
JP2004234492A (en) 2003-01-31 2004-08-19 Nec Software Tohoku Ltd Chat system and advertisement providing method
US20070038737A1 (en) * 2005-07-19 2007-02-15 International Business Machines Corporation System and method for networking educational equipment
JP2009524157A (en) * 2006-01-23 2009-06-25 チャチャ サーチ,インコーポレイテッド Target mobile device advertisement
DE102006036338A1 (en) * 2006-08-03 2008-02-07 Siemens Ag Method for generating a context-based speech dialog output in a speech dialogue system
US8270580B2 (en) * 2008-04-01 2012-09-18 Microsoft Corporation Interactive voice advertisement exchange
CN101651699B (en) * 2008-08-15 2014-02-19 华为技术有限公司 Method, device and system for acquiring advertisement content and launching advertisement services
US11012732B2 (en) * 2009-06-25 2021-05-18 DISH Technologies L.L.C. Voice enabled media presentation systems and methods
US8275384B2 (en) * 2010-03-20 2012-09-25 International Business Machines Corporation Social recommender system for generating dialogues based on similar prior dialogues from a group of users
US20120130822A1 (en) * 2010-11-19 2012-05-24 Microsoft Corporation Computing cost per interaction for interactive advertising sessions
US10972530B2 (en) * 2016-12-30 2021-04-06 Google Llc Audio-based data structure generation
US20130179271A1 (en) * 2012-01-11 2013-07-11 Paul Adams Grouping and Ordering Advertising Units Based on User Activity
US9626692B2 (en) * 2012-10-08 2017-04-18 Facebook, Inc. On-line advertising with social pay
JP5705816B2 (en) * 2012-12-04 2015-04-22 ヤフー株式会社 Advertisement information providing apparatus and advertisement information providing method
US9701530B2 (en) * 2013-11-22 2017-07-11 Michael J. Kline System, method, and apparatus for purchasing, dispensing, or sampling of products
GB201404234D0 (en) * 2014-03-11 2014-04-23 Realeyes O Method of generating web-based advertising inventory, and method of targeting web-based advertisements
JP6262613B2 (en) * 2014-07-18 2018-01-17 ヤフー株式会社 Presentation device, presentation method, and presentation program
JP6570226B2 (en) * 2014-08-20 2019-09-04 ヤフー株式会社 Response generation apparatus, response generation method, and response generation program
US20170091612A1 (en) * 2015-09-30 2017-03-30 Apple Inc. Proactive assistant with memory assistance
US10455088B2 (en) * 2015-10-21 2019-10-22 Genesys Telecommunications Laboratories, Inc. Dialogue flow optimization and personalization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090072599A (en) * 2007-12-28 2009-07-02 (주)다음소프트 Method and apparatus for advertisement by chatting robot used in messenger
US20120016678A1 (en) * 2010-01-18 2012-01-19 Apple Inc. Intelligent Automated Assistant
KR20130112221A (en) * 2012-04-03 2013-10-14 주식회사 로보플래닛 System and method for providing conversation service connected with advertisements and contents using robot
KR20170027705A (en) * 2014-04-17 2017-03-10 소프트뱅크 로보틱스 유럽 Methods and systems of handling a dialog with a robot
US20170221157A1 (en) * 2014-05-06 2017-08-03 International Business Machines Corporation Real-time social group based bidding system
KR20160023935A (en) * 2014-08-20 2016-03-04 한국해양대학교 산학협력단 Apparatus and method for providing advertisement by using vending machine
JP2017220077A (en) * 2016-06-09 2017-12-14 真由美 稲場 Program for obtaining function for supporting communication by understanding personality and likings of other party

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11900928B2 (en) 2017-12-23 2024-02-13 Soundhound Ai Ip, Llc System and method for adapted interactive experiences

Also Published As

Publication number Publication date
JP2020091907A (en) 2020-06-11
CN110110317A (en) 2019-08-09
JP7525445B2 (en) 2024-07-30
JP7178248B2 (en) 2022-11-25
JP2021131908A (en) 2021-09-09
JP2019125357A (en) 2019-07-25
JP7129439B2 (en) 2022-09-01
JP2024147731A (en) 2024-10-16

Similar Documents

Publication Publication Date Title
US20240185853A1 (en) System and method for adapted interactive experiences
JP7525445B2 (en) Method, program and computer for delivering context-aware introductory messages - Patents.com
US20200320604A1 (en) Information Provision System, Information Provision Method, and Storage Medium
Xiao et al. Exploring the factors influencing consumer engagement behavior regarding short-form video advertising: A big data perspective
Mishra et al. From “touch” to a “multisensory” experience: The impact of technology interface and product type on consumer responses
US10096319B1 (en) Voice-based determination of physical and emotional characteristics of users
Maroufkhani et al. How do interactive voice assistants build brands' loyalty?
WO2021056837A1 (en) Customization platform and method for service quality evaluation product
US10839424B1 (en) Voice user interface advertising control method
US20110029365A1 (en) Targeting Multimedia Content Based On Authenticity Of Marketing Data
Stephen et al. The effects of content characteristics on consumer engagement with branded social media content on Facebook
CN113421143A (en) Processing method and device for assisting live broadcast and electronic equipment
CN108475282B (en) Communication system and communication control method
Liu et al. The power of talk: Exploring the effects of streamers’ linguistic styles on sales performance in B2B livestreaming commerce
Kreutzer et al. Fields of application of artificial intelligence—customer service, marketing and sales
CN111612588B (en) Commodity presenting method and device, computing equipment and computer readable storage medium
Akrimi et al. An analysis of perceived usability, perceived interactivity and website personality and their effects on consumer satisfaction
Abbasi et al. Do pop-up ads in online videogames influence children’s inspired-to behavior?
Chang et al. A content-based metric for social media influencer marketing
Wang et al. Understanding the effect of group emotions on consumer instant order cancellation behavior in livestreaming E-commerce: Empirical evidence from TikTok
Vernuccio et al. The perceptual antecedents of brand anthropomorphism in the name-brand voice assistant context
Chan et al. Encouraging Purchase Intention in TikTok Live Streaming: The Role of Live Streaming Shopping Attributes
JP2020160641A (en) Virtual person selection device, virtual person selection system and program
WO2021075337A1 (en) Information processing device, information processing method, and information processing program
EP3502923A1 (en) Natural language grammars adapted for interactive experiences

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17935802

Country of ref document: EP

Kind code of ref document: A1