US20170046124A1 - Responding to Human Spoken Audio Based on User Input - Google Patents

Responding to Human Spoken Audio Based on User Input Download PDF

Info

Publication number
US20170046124A1
US20170046124A1 US15/336,714 US201615336714A US2017046124A1 US 20170046124 A1 US20170046124 A1 US 20170046124A1 US 201615336714 A US201615336714 A US 201615336714A US 2017046124 A1 US2017046124 A1 US 2017046124A1
Authority
US
United States
Prior art keywords
intelligent assistant
assistant device
natural language
audio
language processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/336,714
Inventor
Jonathon Nostrant
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Interactive Voice Inc
Original Assignee
Interactive Voice Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactive Voice Inc filed Critical Interactive Voice Inc
Priority to US15/336,714 priority Critical patent/US20170046124A1/en
Publication of US20170046124A1 publication Critical patent/US20170046124A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • the present invention relates generally to systems and methods of responding to human spoken audio, and more specifically, to systems and methods that interpret human spoken audio and then generate a response based on the interpretation of the human spoken audio.
  • the present technology may be directed to methods that comprise: receiving audio input for generating a speech signal using at least one microphone communicatively coupled to an intelligent assistant device; transmitting the audio input from the intelligent assistant device to a natural language processor; converting the audio input from speech to a text query using the natural language processor; processing the text query using artificial intelligence (AI) logic using the natural language processor; determining an Application Programming Interface (API) from a plurality of APIs for processing the text query using the natural language processor; and transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output using the natural language processor.
  • AI artificial intelligence
  • API Application Programming Interface
  • the present technology may be directed to a system that comprises: an intelligent assistant device comprising a processor which executes logic to perform operations comprising: receiving audio input for generating a speech signal using at least one microphone communicatively coupled to the intelligent assistant device; and a natural language processor communicatively coupled with the intelligent assistant device that executes logic to perform operations comprising: receiving the audio input from the intelligent assistant device; converting the audio input from speech to a text query; processing the text query using artificial intelligence (AI) logic; determining an Application Programming Interface (API) from a plurality of APIs for processing the text query; and transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output.
  • AI artificial intelligence
  • API Application Programming Interface
  • FIG. 1 is a system for processing human spoken audio, in accordance with embodiments of the present invention
  • FIG. 2 illustrates a flowchart of processing human spoken audio, in accordance with embodiments of the present invention
  • FIG. 3 illustrates a display of interactions utilizing a device command interpreter, in accordance with embodiments of the present invention
  • FIG. 4 illustrates a front perspective view of an intelligent assistant device, in accordance with embodiments of the present invention
  • FIG. 5 illustrates a rear perspective view of an intelligent assistant device, in accordance with embodiments of the present invention
  • FIG. 6 illustrates an overhead view of an intelligent assistant device, in accordance with embodiments of the present invention.
  • FIG. 7 illustrates side views of an intelligent assistant device, in accordance with embodiments of the present invention.
  • FIG. 8 illustrates another front perspective view of an intelligent assistant device, in accordance with embodiments of the present invention.
  • FIG. 9 provides a block diagram of components of an intelligent assistant device, in accordance with embodiments of the present invention.
  • FIG. 10 is a perspective view of an exemplary intelligent assistant device
  • FIG. 11 is a perspective view of another exemplary intelligent assistant device
  • FIG. 11A is a schematic diagram of an intelligent assistant device
  • FIGS. 12A-G collectively illustrate a flow of data through an exemplary system architecture
  • FIG. 13 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology.
  • FIGS. 14A and 14B collectively include various views of another exemplary intelligent assistant device.
  • the present technology provides hardware and software components that interact interpret and respond to human spoken audio.
  • the hardware components include a microphone that receives audio comprising human spoken audio.
  • the audio that comprises human spoken audio may in some instances be transmitted to a cloud computing cluster (e.g., cloud-based computing environment for processing.
  • a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors and/or that combines the storage capacity of a large grouping of computer memories or storage devices.
  • systems that provide a cloud resource may be utilized exclusively by their owners; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
  • the cloud may be formed, for example, by a network of web servers, with each web server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource customers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depend on the type of business associated with the user.
  • the audio commands that comprise human spoken audio may be processed to clarify the human spoken audio components from other audio aspects that may have also been recorded, such as background noise.
  • the present technology may utilize digital signal process beam-forming microphone assembly, which is included in an end user device.
  • various digital signal processes may be utilized at the cloud level to remove background noise or other audio artifacts.
  • the processed human spoken audio may then be transmitted to a text processor.
  • the text processor uses speech-to-text software (such as from Nuance®) and converts the human spoken audio into a string of text that represents the human spoken audio (hereinafter, “string of text”).
  • natural language processor may include, but is not limited to any system, process, or combination of systems and methods that evaluate, process, parse, convert, translate or otherwise analyze and/or transform natural language commands.
  • an exemplary natural language processor may convert natural language content from an audio-format into a text format (e.g., speech to text), and may also evaluate the content for sentiment, mood, context (e.g., domain), and so forth.
  • these natural language commands may include audio format and/or text format natural language content, as well as other content formats that would be known to one of ordinary skill in the art.
  • the string of text may be broken down into formal representations such as first-order logic structures that contain contextual clues or keyword targets. These contextual clues and/or keyword targets are then used by a computer system to understand and manipulate the string of text.
  • the natural language processor may identify the most likely semantics of the string of text based on an assessment of the multiple possible semantics which could be derived from the string of text.
  • one or more of the features described herein such as noise reduction, natural language processing, speech-to-text services (and vice versa), text parsing, and other features described herein may be executed at the device level (e.g., on the intelligent assistant device).
  • many or all of the aforementioned features may be executed at the cloud level, such that the intelligent assistant device receives audio commands and returns responses.
  • most or all of the processing of the audio commands may occur at the cloud level.
  • the intelligent assistant device and the cloud may share processing duties such that the intelligent assistant device executes some of the features of the present technology and the cloud executes other processes. This cooperative content processing relationship between the intelligent assistant device and the cloud may function to load balance the duties required to process audio commands and return responses to the end user.
  • data from the string of text will be prepared for delivery to an appropriate application program interface (API).
  • API application program interface
  • a string of text comprising the query “What's the weather look like today in Los Angeles, Calif.?” may be processed by the computer system and distributed to a weather API.
  • the weather API may process the data from the string of text to access a weather forecast for Los Angeles, Calif. associated with the day that the query was asked.
  • one aspect of the natural language processor may be formatting the data from the string of text to correspond to the data structure format of an API that has been determined to be appropriate.
  • an API response may be generated. This API response may then be converted into an appropriate pre-spoken text string, also referred to as a fulfillment. Further, the API response may be recorded in a database and then converted to a speech response. Once the API response has been converted to a speech response, the speech response may be distributed to a hardware component to playback the API speech response.
  • the API response may also be saved and/or paired with the query data and saved in a cache for efficient lookup. That is, rather than processing the same query data multiple times, the present technology may obtain previously generated responses to identical or substantially identical query data from the cache. Temporal limitations may be placed upon the responses stored in the cache. For example, responses for queries regarding weather may be obtained from the cache only for relevant periods of time, such as an hour.
  • a hardware unit may act as a base station that is connected over a Wi-Fi network to both the natural language processor as well as other enabled devices.
  • other enabled devices may act as a microphone, transmitting human spoken audio via the base station to the natural language processor or transmitting human spoken audio directly to the intelligent assistant device and then to the natural language processor.
  • enabled devices may also receive commands from a general server based on interpretation of the human spoken data using the natural language processor. For instance, a Wi-Fi enabled thermostat may receive a command from the base station when the natural language processor has interpreted human spoken audio to request an increase in temperature within a room controlled by the user device that received the human spoken audio data.
  • the intelligent assistant device may utilize other communication media including, but not limited to, Bluetooth, near field communications, RFID, infrared, or other communicative media that would be known to one of ordinary skill in the art with the present disclosure before them.
  • FIG. 1 is a system 100 for processing human spoken audio, in accordance with embodiments of the present invention.
  • a user spoken question or command 105 is received at microphone 110 .
  • the user spoken question or command is within audio data.
  • the audio data comprises human spoken audio.
  • the audio data may then be transmitted to hardware device 115 (such as an intelligent assistant device) or connected hardware device 120 , such as a cellular phone or digital music player. If the audio data is transmitted to connected hardware device 120 , the audio data may then be further transmitted to hardware device 115 after being received at connected hardware device 120 .
  • hardware device 115 may comprise microphone 110 such that hardware device 115 is able to record user commands and distribute recorded user commands to a speech and text system for initial processing.
  • audio 125 and user identification information 130 are transmitted to a server 135 , also referred to a command processing server.
  • the audio may be cleaned at the server 135 .
  • a combination of a microphone array e.g., using audio captured by multiple microphones
  • beam-forming, noise-reduction, and/or echo cancellation system may be used to clean up the audio received to remove audio characteristics that are not human spoken audio.
  • the audio is provided to a speech-to-text service 140 .
  • the audio is converted to a string of text that represents human spoken audio.
  • the string of text may be stored in a database 142 .
  • database 142 may be used for storage of user data and for the storage of processed queries. Further, database 142 may be used to manage learned behaviors based on unique hardware identification.
  • Natural language processor 145 may parse unstructured text documents and extract key concepts such as context, category, meaning, and keywords. Additionally, the natural language processor 145 may comprise artificial intelligence (AI) logic to process the string of text into a discernible query. Further, natural language processor 145 may utilize machine learning and/or a neural network to analyze the string of text. For example, natural language processor 145 may utilize a method of interpreting and learning from patterns and behaviors of the users and attributing data to such behaviors.
  • AI artificial intelligence
  • natural language processor 145 may be run using a server system used to run natural language processor 145 and a neural network. Additionally, the natural language processor 145 may determine which query API 150 is most appropriate to receive the query associated with the string of text. Further, once a query API 150 is determined, the natural language processor 145 may modify the query to comply with the structure of queries appropriate to the determined query API 150 .
  • the query generated at the natural language processor 145 is then provided to a query API 150 .
  • An exemplary API may comprise a variety of open source or licensed APIs that are used to take natural language processor output and retrieve the necessary data.
  • the query API 150 processes the query and provides a query response to server 135 .
  • the query response may be transmitted to a format response component 155 .
  • the format response component 155 may comprise, for example, a text-to-speech translator.
  • a text-to-speech translator may comprise a system used to take the national language processor output in text format and output in spoken audio.
  • the answer 160 may then be provided to hardware device 115 , such as via a device interface comprising a process of returning the spoken audio, from the text-to-speech component to hardware device 115 . From hardware device 115 , the answer may be transmitted through a speaker 165 to a system spoken audio response 170 .
  • FIG. 2 illustrates a flowchart for processing human spoken audio, in accordance with embodiments of the present invention.
  • a customer speaks a first trigger command, “Hello, ivee.” It will be understood that the trigger command may be end user defined.
  • a microphone captures the first trigger command, which may then be transmitted to device 210 .
  • the first trigger command may also be referred to as an initiating command. An initiating command may prompt the device to ready itself for a subsequent audio command.
  • a customer speaks a second trigger command (also referred to as an audio command) such as, “Weather Los Angeles,” at 206 .
  • the audio command may comprise a natural language or spoken word command, such as the audio command at 206 .
  • the second trigger command is captured at 208 by one or more microphones and transmitted to device 210 .
  • Device 210 is coupled with a private API 230 via WiFi connection 222 . Further, device 210 provides an audio query 224 to private API 230 . In particular, audio query 224 is derived from an audio command.
  • Audio query 224 may then be provided to a speech/text processor 232 , which translates the audio query 224 into text query “Weather Los Angeles” 226 .
  • Text query “Weather Los Angeles” 226 is then provided to private API 230 , which directs text query “Weather Los Angeles” 226 to an AI Logic component 234 .
  • AI Logic component 234 then provides text query “Weather Los Angeles” 226 to a third party API 236 .
  • an appropriate third party API 236 may be a weather API.
  • Third party API 236 then generates text answer “76 Degrees” 228 .
  • Text answer “76 Degrees” 228 may then be provided to AI Logic component 234 . Further, text answer “76 Degrees” 228 may then be transmitted from AI Logic component 234 to private API 230 . Further, text answer “76 Degrees” 228 is provided to a speech/text processor 232 where text answer “76 Degrees” 228 is translated to audio answer 238 . Audio answer 238 may then be provided to private API 230 and then provided to device 210 . From device 210 , audio answer 238 is output as an audio “76 Degrees” 240 . In particular, audio response “76 Degrees” 240 is in response to audio command “Weather Los Angeles” 204 . Further, audio “Command Please” 242 is in response to the initiating command “Hello, ivee” 202 .
  • FIG. 3 illustrates a display 300 of interactions utilizing a device command interpreter 305 , in accordance with embodiments of the present invention.
  • device command interpreter 305 interacts with an interface with a device 310 .
  • device command interpreter 305 receives a command from the interface with the device 310 .
  • Device command interpreter 305 also provides commands to the interface with the device 310 .
  • device command interpreter 305 interacts with a text/speech processor 325 .
  • device command interpreter 305 may provide a request for text-to-speech translation by providing a string of text to a text-to-string (“TTS”) component 315 of the text/speech processor 325 .
  • TTS text-to-string
  • device command interpreter 305 may provide a request for speech-to-text translation by providing a voice file to a speech-to-text component 320 of the text/speech processor 325 . Further, device command interpreter 305 may receive scenario information from scenario building component 330 .
  • device command interpreter 305 is also communicatively coupled with language interpreter 335 .
  • device command interpreter 305 may provide a sentence with scenario information to language interpreter 335 .
  • the language interpreter 335 may generate analyzed sentence information and send the analyzed sentence information with scenario information to a decision making engine 340 .
  • the decision making engine 340 may select a most appropriate action.
  • the decision making engine 340 may utilize user accent references from a voice database 345 . Based on the analyzed sentence information and the scenario information, the decision making engine 340 may generate a selected most appropriate action from scenarios and, further, may provide the selected most appropriate action to the device command interpreter 305 .
  • the device command interpreter 305 may also send a request to build sentence information to a sentence generator component 350 .
  • the sentence generator component 350 may provide a built sentence string to device command interpreter 305 .
  • the device command interpreter 305 may request service on servers by providing a service request to an add on service interface 355 .
  • the add on service interface 355 may provide the service request to a voice database web server 360 . Further, a response generated by voice database web server 360 may be provided to the device command interpreter 305 via the add on service interface 355 .
  • device command interpreter 305 may interact with a user database 370 via a user information database 365 .
  • device command interpreter 305 may provide user information and device authentication to the user database 370 via an interface of the user information database 365 .
  • device command interpreter 305 may interact with a streaming interface 380 of a device via communications module 375 .
  • device command interpreter 305 may provide a file for download and/or text-to-speech voice data to file downloader.
  • Communications module 375 may include any data communications module that is capable of providing control of data streaming processes to the streaming interface 380 of the device.
  • the streaming interface 380 of the device may provide a data stream to communications module 375 .
  • the communications module 375 may provide a voice streaming up of device command interpreter 305 .
  • FIG. 4 illustrates a front perspective view 400 of an intelligent assistant device, in accordance with embodiments of the present invention.
  • intelligent assistant device comprises a screen 405 , a frame 410 , and a device stand 415 .
  • FIG. 5 illustrates a rear perspective view 500 of an intelligent assistant device, in accordance with embodiments of the present invention.
  • intelligent assistant device comprises a speaker 505 , input slots 510 , device stand 515 , and button 520 .
  • FIG. 6 illustrates an overhead view 600 of an intelligent assistant device, in accordance with embodiments of the present invention.
  • the intelligent assistant device comprises an audio button 605 , a snooze button 610 , a mode button 615 , and an intelligent assistant device stand 620 .
  • FIG. 7 illustrates side views 705 a and 705 b of an intelligent assistant device, in accordance with embodiments of the present invention.
  • the intelligent assistant device comprises buttons 710 and intelligent assistant device stand 715 .
  • FIG. 8 illustrates another front perspective view 800 of an intelligent assistant device, in accordance with embodiments of the present invention.
  • the intelligent assistant device comprises a screen 805 , a frame 810 , and a device stand 815 .
  • screen 805 comprises a city indicator; a weather indicator; a date indicator; a time indicator; an alarm indicator; a message indicator; and a battery indicator.
  • the screens are merely exemplary and other indicators may also likewise be utilized in accordance with the present technology.
  • the indicators utilized in screen 805 may relate to the types or domains of natural language queries that may be processed by the device.
  • FIG. 9 provides a block diagram 900 of components of an intelligent assistant device, in accordance with embodiments of the present invention.
  • FIG. 9 comprises microphones 902 which provide audio data to an audio processor module 904 .
  • Audio processor module 904 provides analog data to a sensory natural language processor 906 .
  • audio processor module 904 provides an analog or SPI (Serial Peripheral Interface) signal to a processor 908 , where processor comprises a main Atmel chip.
  • light sensor 910 and temperature sensor 912 also provide data to processor 908 .
  • Buttons and/or switches 914 also provide data to processor 908 via a touch sensor controller 916 .
  • data processor 908 is communicated between processor 908 and sensory natural language processor 906 .
  • Sensory natural language processor 906 is also coupled with an external memory for firmware 918 .
  • Processor 908 also exchanges information with a Fast Super-Twisted Nematic Display (FSTN) Liquid Crystal Display (LCD) display module with driver 920 , as well as a WiFi module 922 . Further, processor 908 is communicatively coupled with an Erasable Programmable Read-Only Memory (EEPROM) 924 for user information and/or settings. Processor 908 is also communicatively coupled with radio module 926 and audio mux 928 . Audio mux 928 is an audio amplifier chip. Audio mux 928 also receives data from aux audio input 930 . Further, sensory natural language processor 906 also provides data to audio mux 928 . Additionally, audio mux 928 provides data to audio amp 932 and stereo speaker 934 . FIG. 9 also comprises, for example, a USB Jack (or other similar communicative interface) for recharging 936 that provides rechargeable battery 938 .
  • FSTN Fast Super-Twisted Nematic Display
  • LCD Liquid Crystal Display
  • WiFi module 922 WiFi module 922
  • another exemplary embodiment may utilize a plurality of microphones of a smartphone base to implement a natural language processor, in accordance with embodiments of the present invention.
  • audio is received at the plurality of microphones at a smartphone base.
  • the audio is received at an application running a natural language processor, such as natural language processor 145 .
  • the application comprises a clean-up component that utilizes a combination of a microphone array, beam-forming, noise-reduction, and/or echo cancellation system to clean up the audio received.
  • the clean-up component may remove non-human spoken audio and/or may remove garbled human spoken audio (e.g., background conversations) that do not comprise primary human spoken audio (e.g., the human spoken audio of the primary user).
  • a user can interact with a smartphone application from approximately ten feet away and closer.
  • audio clean-up processes such as beamforming
  • audio received from microphones of auxiliary hardware devices such as a dock for smartphone devices, may be used to interact with an application that comprises a natural language processor, such as natural language processor 145 .
  • FIG. 10 is a perspective view of an exemplary intelligent assistant device, which includes a base station in combination with a clock.
  • the intelligent assistant device may include any of the natural language processing and third party information features described above in addition to features commonly utilized with alarm clocks.
  • the alarm clock may be controlled by the features and operations of the personal digital assistant device associated therewith.
  • FIG. 11 is a perspective view of another exemplary intelligent assistant device, which includes a sleek and uni-body construction.
  • FIG. 11A is a schematic diagram of various components of an intelligent assistant device, for use with any of the intelligent assistant device products described herein.
  • FIGS. 12A-G collectively illustrate an exemplary flow diagram of data through an exemplary system that includes an intelligent assistant device 1200 .
  • an intelligent assistant device 1200 may be communicatively coupled with various systems through a client API 1205 . More specifically, the intelligent assistant device 1200 may communicatively couple with a speech processor 1210 of FIG. 12B , which in turn, couples with an external speech recognition engine 1215 , in some instances. Again, the intelligent assistant device 1200 may include an integral speech recognition application.
  • a frames scheduler may be utilized to schedule and correlate responses with other objects such as advertisements.
  • the intelligent assistant device 1200 may also communicatively couple with a notifications server 1220 as shown in FIG. 12C .
  • the notifications server 1220 may cooperate with the frames scheduler and an advertisements engine to query relevant advertisements and integrate the same into a response, which is returned to the intelligent assistant device 1200 .
  • the system may utilize a command fulfiller 1225 that creates API requests and processes responses to those requests. Additionally, the command fulfiller 1225 may also generate return response objects.
  • the command fulfiller 1225 may communicatively couple with the speech processor 1210 of FIG. 12B , as well as various sub-classes of command fulfillers 1230 . These sub-classes of command fulfillers 1230 may query third party information sources, such as an external knowledge engine 1235 .
  • the sub-classes of command fulfillers 1230 may be domain specific, such as news, weather, and so forth.
  • FIG. 12E illustrates the use of a frame generator 1240 that processes information obtained by the sub-classes of command fulfillers 1230 of FIG. 12D . Additionally, a plug-in framework for third party applications module 1245 is shown. This module 1245 allows for communicative coupling and interfacing of third party applications 1265 of FIG. 12G , via a developer API 1270 .
  • a user management system 1250 may allow for end user setup of the intelligent assistant device 1200 via an end user.
  • the end user may utilize a web-based portal 1255 of FIG. 12F that allows for the end user to setup and manage their device via a device management API 1260 .
  • FIG. 13 illustrates an exemplary computing system 1300 that may be used to implement an embodiment of the present systems and methods.
  • the system 1300 of FIG. 13 may be implemented in the contexts of the likes of computing systems, networks, servers, or combinations thereof.
  • the computing system 1300 of FIG. 13 includes one or more processors 1310 and main memory 1320 .
  • Main memory 1320 stores, in part, instructions and data for execution by processor 1310 .
  • Main memory 1320 may store the executable code when in operation.
  • the system 1300 of FIG. 13 further includes a mass storage device 1330 , portable storage device 1340 , output devices 1350 , user input devices 1360 , a display system 1370 , and peripheral devices 1380 .
  • FIG. 13 The components shown in FIG. 13 are depicted as being connected via a single bus 1390 .
  • the components may be connected through one or more data transport means.
  • Processor unit 1310 and main memory 1320 may be connected via a local microprocessor bus, and the mass storage device 1330 , peripheral device(s) 1380 , portable storage device 1340 , and display system 1370 may be connected via one or more input/output (I/O) buses.
  • I/O input/output
  • Mass storage device 1330 which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 1310 . Mass storage device 1330 may store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 1320 .
  • Portable storage device 1340 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk, digital video disc, or USB storage device, to input and output data and code to and from the computer system 1300 of FIG. 13 .
  • a portable non-volatile storage medium such as a floppy disk, compact disk, digital video disc, or USB storage device
  • the system software for implementing embodiments of the present invention may be stored on such a portable medium and input to the computer system 1300 via the portable storage device 1340 .
  • User input devices 1360 provide a portion of a user interface.
  • User input devices 1360 may include an alphanumeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys.
  • Additional user input devices 1360 may comprise, but are not limited to, devices such as speech recognition systems, facial recognition systems, motion-based input systems, gesture-based systems, and so forth.
  • user input devices 1360 may include a touchscreen.
  • the system 1300 as shown in FIG. 13 includes output devices 1350 . Suitable output devices include speakers, printers, network interfaces, and monitors.
  • Display system 1370 may include a liquid crystal display (LCD) or other suitable display device.
  • Display system 1370 receives textual and graphical information, and processes the information for output to the display device.
  • LCD liquid crystal display
  • Peripherals device(s) 1380 may include any type of computer support device to add additional functionality to the computer system. Peripheral device(s) 1380 may include a modem or a router.
  • the components provided in the computer system 1300 of FIG. 13 are those typically found in computer systems that may be suitable for use with embodiments of the present invention and are intended to represent a broad category of such computer components that are well known in the art.
  • the computer system 1300 of FIG. 13 may be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system.
  • the computer may also include different bus configurations, networked platforms, multi-processor platforms, etc.
  • Various operating systems may be used including Unix, Linux, Windows, Mac OS, Palm OS, Android, iOS (known as iPhone OS before June 2010), QNX, and other suitable operating systems.
  • FIGS. 14A and 14B collectively provide views of an exemplary embodiment of an intelligent assistant device that functions as a base for receiving a second hardware device, such as a cellular telephone. It will be understood that the intelligent assistant device may include any communicative interface that allows for one or more devices to interface with the intelligent assistant device via a physical connection.
  • Computer-readable storage media refer to any medium or media that participate in providing instructions to a central processing unit (CPU), a processor, a microcontroller, or the like. Such media may take forms including, but not limited to, non-volatile and volatile media such as optical or magnetic disks and dynamic memory, respectively. Common forms of computer-readable storage media include a floppy disk, a flexible disk, a hard disk, magnetic tape, and any other magnetic storage medium, a CD-ROM disk, digital video disk (DVD), any other optical storage medium, RAM, PROM, EPROM, a FLASHEPROM, any other memory chip or cartridge.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be coupled with the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)

Abstract

Systems and methods for responding to human spoken are provided herein. Exemplary methods may include receiving audio input for generating a speech signal using at least one microphone communicatively coupled to an intelligent assistant device. The method may also include transmitting the audio input from the intelligent assistant device to a natural language processor, the audio input having been converted from speech to a text query. The method may further include processing the text query using artificial intelligence (AI) logic, determining an Application Programming Interface (API) from a plurality of APIs for processing the text query, and transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 13/734,282, filed on Jan. 4, 2013 and entitled “Systems and Methods for Responding to Human Spoken Audio,” which claims the benefit of U.S. Provisional Patent Application Ser. No. 61/584,752, filed on Jan. 9, 2012, and entitled “System and Methods for Responding to Human Spoken Audio Using a Natural Language Processor.” All of the above applications are hereby incorporated herein by reference in their entirety including all references cited therein.
  • FIELD OF THE INVENTION
  • The present invention relates generally to systems and methods of responding to human spoken audio, and more specifically, to systems and methods that interpret human spoken audio and then generate a response based on the interpretation of the human spoken audio.
  • SUMMARY OF THE PRESENT TECHNOLOGY
  • According to some embodiments, the present technology may be directed to methods that comprise: receiving audio input for generating a speech signal using at least one microphone communicatively coupled to an intelligent assistant device; transmitting the audio input from the intelligent assistant device to a natural language processor; converting the audio input from speech to a text query using the natural language processor; processing the text query using artificial intelligence (AI) logic using the natural language processor; determining an Application Programming Interface (API) from a plurality of APIs for processing the text query using the natural language processor; and transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output using the natural language processor.
  • According to some embodiments, the present technology may be directed to a system that comprises: an intelligent assistant device comprising a processor which executes logic to perform operations comprising: receiving audio input for generating a speech signal using at least one microphone communicatively coupled to the intelligent assistant device; and a natural language processor communicatively coupled with the intelligent assistant device that executes logic to perform operations comprising: receiving the audio input from the intelligent assistant device; converting the audio input from speech to a text query; processing the text query using artificial intelligence (AI) logic; determining an Application Programming Interface (API) from a plurality of APIs for processing the text query; and transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Certain embodiments of the present technology are illustrated by the accompanying figures. It will be understood that the figures are not necessarily to scale and that details not necessary for an understanding of the technology or that render other details difficult to perceive may be omitted. It will be understood that the technology is not necessarily limited to the particular embodiments illustrated herein.
  • FIG. 1 is a system for processing human spoken audio, in accordance with embodiments of the present invention;
  • FIG. 2 illustrates a flowchart of processing human spoken audio, in accordance with embodiments of the present invention;
  • FIG. 3 illustrates a display of interactions utilizing a device command interpreter, in accordance with embodiments of the present invention;
  • FIG. 4 illustrates a front perspective view of an intelligent assistant device, in accordance with embodiments of the present invention;
  • FIG. 5 illustrates a rear perspective view of an intelligent assistant device, in accordance with embodiments of the present invention;
  • FIG. 6 illustrates an overhead view of an intelligent assistant device, in accordance with embodiments of the present invention;
  • FIG. 7 illustrates side views of an intelligent assistant device, in accordance with embodiments of the present invention;
  • FIG. 8 illustrates another front perspective view of an intelligent assistant device, in accordance with embodiments of the present invention;
  • FIG. 9 provides a block diagram of components of an intelligent assistant device, in accordance with embodiments of the present invention;
  • FIG. 10 is a perspective view of an exemplary intelligent assistant device;
  • FIG. 11 is a perspective view of another exemplary intelligent assistant device;
  • FIG. 11A is a schematic diagram of an intelligent assistant device;
  • FIGS. 12A-G collectively illustrate a flow of data through an exemplary system architecture;
  • FIG. 13 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology; and
  • FIGS. 14A and 14B collectively include various views of another exemplary intelligent assistant device.
  • DESCRIPTION OF EXEMPLARY EMBODIMENTS
  • While this technology is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail several specific embodiments with the understanding that the present disclosure is to be considered as an exemplification of the principles of the technology and is not intended to limit the technology to the embodiments illustrated.
  • The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, an and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
  • It will be understood that like or analogous elements and/or components, referred to herein, may be identified throughout the drawings with like reference characters. It will be further understood that several of the figures are merely schematic representations of the present technology. As such, some of the components may have been distorted from their actual scale for pictorial clarity.
  • The present technology provides hardware and software components that interact interpret and respond to human spoken audio. In some embodiments, the hardware components include a microphone that receives audio comprising human spoken audio. The audio that comprises human spoken audio may in some instances be transmitted to a cloud computing cluster (e.g., cloud-based computing environment for processing. In general, a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors and/or that combines the storage capacity of a large grouping of computer memories or storage devices. For example, systems that provide a cloud resource may be utilized exclusively by their owners; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
  • The cloud may be formed, for example, by a network of web servers, with each web server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource customers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depend on the type of business associated with the user.
  • With respect to the present technology, the audio commands that comprise human spoken audio may be processed to clarify the human spoken audio components from other audio aspects that may have also been recorded, such as background noise. In some instances, the present technology may utilize digital signal process beam-forming microphone assembly, which is included in an end user device. In other embodiments, various digital signal processes may be utilized at the cloud level to remove background noise or other audio artifacts. The processed human spoken audio may then be transmitted to a text processor. The text processor uses speech-to-text software (such as from Nuance®) and converts the human spoken audio into a string of text that represents the human spoken audio (hereinafter, “string of text”).
  • Once the human spoken audio has been processed into a string of text, the text processor may then return the string of text to a processing server, which then transmits the string of text to a natural language processor. The terms “natural language processor” may include, but is not limited to any system, process, or combination of systems and methods that evaluate, process, parse, convert, translate or otherwise analyze and/or transform natural language commands. For example, an exemplary natural language processor may convert natural language content from an audio-format into a text format (e.g., speech to text), and may also evaluate the content for sentiment, mood, context (e.g., domain), and so forth. Again, these natural language commands may include audio format and/or text format natural language content, as well as other content formats that would be known to one of ordinary skill in the art.
  • At the natural language processor, the string of text may be broken down into formal representations such as first-order logic structures that contain contextual clues or keyword targets. These contextual clues and/or keyword targets are then used by a computer system to understand and manipulate the string of text. The natural language processor may identify the most likely semantics of the string of text based on an assessment of the multiple possible semantics which could be derived from the string of text.
  • It will be understood that in some embodiments, one or more of the features described herein such as noise reduction, natural language processing, speech-to-text services (and vice versa), text parsing, and other features described herein may be executed at the device level (e.g., on the intelligent assistant device). In other instances, many or all of the aforementioned features may be executed at the cloud level, such that the intelligent assistant device receives audio commands and returns responses. Thus, most or all of the processing of the audio commands may occur at the cloud level. In some instances, the intelligent assistant device and the cloud may share processing duties such that the intelligent assistant device executes some of the features of the present technology and the cloud executes other processes. This cooperative content processing relationship between the intelligent assistant device and the cloud may function to load balance the duties required to process audio commands and return responses to the end user.
  • Based on the computer system's understanding of the semantics of the string of text, data from the string of text will be prepared for delivery to an appropriate application program interface (API). For example, a string of text comprising the query, “What's the weather look like today in Los Angeles, Calif.?” may be processed by the computer system and distributed to a weather API. Further, the weather API may process the data from the string of text to access a weather forecast for Los Angeles, Calif. associated with the day that the query was asked.
  • Since APIs may have different data structure requirements for processing queries, one aspect of the natural language processor may be formatting the data from the string of text to correspond to the data structure format of an API that has been determined to be appropriate.
  • Once an API has processed the query data derived from human spoken audio, an API response may be generated. This API response may then be converted into an appropriate pre-spoken text string, also referred to as a fulfillment. Further, the API response may be recorded in a database and then converted to a speech response. Once the API response has been converted to a speech response, the speech response may be distributed to a hardware component to playback the API speech response. The API response may also be saved and/or paired with the query data and saved in a cache for efficient lookup. That is, rather than processing the same query data multiple times, the present technology may obtain previously generated responses to identical or substantially identical query data from the cache. Temporal limitations may be placed upon the responses stored in the cache. For example, responses for queries regarding weather may be obtained from the cache only for relevant periods of time, such as an hour.
  • In some embodiments, a hardware unit (e.g., intelligent assistant device) may act as a base station that is connected over a Wi-Fi network to both the natural language processor as well as other enabled devices. For instance, other enabled devices may act as a microphone, transmitting human spoken audio via the base station to the natural language processor or transmitting human spoken audio directly to the intelligent assistant device and then to the natural language processor. Additionally, enabled devices may also receive commands from a general server based on interpretation of the human spoken data using the natural language processor. For instance, a Wi-Fi enabled thermostat may receive a command from the base station when the natural language processor has interpreted human spoken audio to request an increase in temperature within a room controlled by the user device that received the human spoken audio data. In some instances, the intelligent assistant device may utilize other communication media including, but not limited to, Bluetooth, near field communications, RFID, infrared, or other communicative media that would be known to one of ordinary skill in the art with the present disclosure before them.
  • In accordance with the discussion above, FIG. 1 is a system 100 for processing human spoken audio, in accordance with embodiments of the present invention. In particular, a user spoken question or command 105 is received at microphone 110. The user spoken question or command is within audio data. As such, the audio data comprises human spoken audio. The audio data may then be transmitted to hardware device 115 (such as an intelligent assistant device) or connected hardware device 120, such as a cellular phone or digital music player. If the audio data is transmitted to connected hardware device 120, the audio data may then be further transmitted to hardware device 115 after being received at connected hardware device 120. Alternatively, hardware device 115 may comprise microphone 110 such that hardware device 115 is able to record user commands and distribute recorded user commands to a speech and text system for initial processing.
  • From hardware device 115, audio 125 and user identification information 130 are transmitted to a server 135, also referred to a command processing server. The audio may be cleaned at the server 135. In particular, a combination of a microphone array (e.g., using audio captured by multiple microphones), beam-forming, noise-reduction, and/or echo cancellation system may be used to clean up the audio received to remove audio characteristics that are not human spoken audio. Once audio is received at the server 135, the audio is provided to a speech-to-text service 140.
  • At the speech-to-text service 140, the audio is converted to a string of text that represents human spoken audio. In embodiments, the string of text may be stored in a database 142. In particular, database 142 may be used for storage of user data and for the storage of processed queries. Further, database 142 may be used to manage learned behaviors based on unique hardware identification.
  • The string of text may then be transmitted to a natural language processor 145. Natural language processor 145 may parse unstructured text documents and extract key concepts such as context, category, meaning, and keywords. Additionally, the natural language processor 145 may comprise artificial intelligence (AI) logic to process the string of text into a discernible query. Further, natural language processor 145 may utilize machine learning and/or a neural network to analyze the string of text. For example, natural language processor 145 may utilize a method of interpreting and learning from patterns and behaviors of the users and attributing data to such behaviors.
  • Further, natural language processor 145 may be run using a server system used to run natural language processor 145 and a neural network. Additionally, the natural language processor 145 may determine which query API 150 is most appropriate to receive the query associated with the string of text. Further, once a query API 150 is determined, the natural language processor 145 may modify the query to comply with the structure of queries appropriate to the determined query API 150.
  • The query generated at the natural language processor 145 is then provided to a query API 150. An exemplary API may comprise a variety of open source or licensed APIs that are used to take natural language processor output and retrieve the necessary data. The query API 150 processes the query and provides a query response to server 135. Once the query response is received at server 135, the query response may be transmitted to a format response component 155. The format response component 155 may comprise, for example, a text-to-speech translator. In particular, a text-to-speech translator may comprise a system used to take the national language processor output in text format and output in spoken audio. The answer 160 may then be provided to hardware device 115, such as via a device interface comprising a process of returning the spoken audio, from the text-to-speech component to hardware device 115. From hardware device 115, the answer may be transmitted through a speaker 165 to a system spoken audio response 170.
  • FIG. 2 illustrates a flowchart for processing human spoken audio, in accordance with embodiments of the present invention. At 202, a customer speaks a first trigger command, “Hello, ivee.” It will be understood that the trigger command may be end user defined. At 204, a microphone captures the first trigger command, which may then be transmitted to device 210. The first trigger command may also be referred to as an initiating command. An initiating command may prompt the device to ready itself for a subsequent audio command.
  • In another embodiment, a customer speaks a second trigger command (also referred to as an audio command) such as, “Weather Los Angeles,” at 206. In some instances, the audio command may comprise a natural language or spoken word command, such as the audio command at 206. The second trigger command is captured at 208 by one or more microphones and transmitted to device 210. Device 210 is coupled with a private API 230 via WiFi connection 222. Further, device 210 provides an audio query 224 to private API 230. In particular, audio query 224 is derived from an audio command.
  • Audio query 224 may then be provided to a speech/text processor 232, which translates the audio query 224 into text query “Weather Los Angeles” 226. Text query “Weather Los Angeles” 226 is then provided to private API 230, which directs text query “Weather Los Angeles” 226 to an AI Logic component 234. AI Logic component 234 then provides text query “Weather Los Angeles” 226 to a third party API 236. For example, for the text query “Weather Los Angeles” 226, an appropriate third party API 236 may be a weather API. Third party API 236 then generates text answer “76 Degrees” 228.
  • Text answer “76 Degrees” 228 may then be provided to AI Logic component 234. Further, text answer “76 Degrees” 228 may then be transmitted from AI Logic component 234 to private API 230. Further, text answer “76 Degrees” 228 is provided to a speech/text processor 232 where text answer “76 Degrees” 228 is translated to audio answer 238. Audio answer 238 may then be provided to private API 230 and then provided to device 210. From device 210, audio answer 238 is output as an audio “76 Degrees” 240. In particular, audio response “76 Degrees” 240 is in response to audio command “Weather Los Angeles” 204. Further, audio “Command Please” 242 is in response to the initiating command “Hello, ivee” 202.
  • FIG. 3 illustrates a display 300 of interactions utilizing a device command interpreter 305, in accordance with embodiments of the present invention. For example, device command interpreter 305 interacts with an interface with a device 310. In particular, device command interpreter 305 receives a command from the interface with the device 310. Device command interpreter 305 also provides commands to the interface with the device 310. Further, device command interpreter 305 interacts with a text/speech processor 325. In particular, device command interpreter 305 may provide a request for text-to-speech translation by providing a string of text to a text-to-string (“TTS”) component 315 of the text/speech processor 325. Additionally, device command interpreter 305 may provide a request for speech-to-text translation by providing a voice file to a speech-to-text component 320 of the text/speech processor 325. Further, device command interpreter 305 may receive scenario information from scenario building component 330.
  • Additionally, device command interpreter 305 is also communicatively coupled with language interpreter 335. In particular, device command interpreter 305 may provide a sentence with scenario information to language interpreter 335. Further, the language interpreter 335 may generate analyzed sentence information and send the analyzed sentence information with scenario information to a decision making engine 340. The decision making engine 340 may select a most appropriate action. Further, the decision making engine 340 may utilize user accent references from a voice database 345. Based on the analyzed sentence information and the scenario information, the decision making engine 340 may generate a selected most appropriate action from scenarios and, further, may provide the selected most appropriate action to the device command interpreter 305.
  • The device command interpreter 305 may also send a request to build sentence information to a sentence generator component 350. In response, the sentence generator component 350 may provide a built sentence string to device command interpreter 305. Additionally, the device command interpreter 305 may request service on servers by providing a service request to an add on service interface 355. The add on service interface 355 may provide the service request to a voice database web server 360. Further, a response generated by voice database web server 360 may be provided to the device command interpreter 305 via the add on service interface 355.
  • Further, device command interpreter 305 may interact with a user database 370 via a user information database 365. In particular, device command interpreter 305 may provide user information and device authentication to the user database 370 via an interface of the user information database 365. Additionally, device command interpreter 305 may interact with a streaming interface 380 of a device via communications module 375. In particular, device command interpreter 305 may provide a file for download and/or text-to-speech voice data to file downloader. Communications module 375 may include any data communications module that is capable of providing control of data streaming processes to the streaming interface 380 of the device. In response, the streaming interface 380 of the device may provide a data stream to communications module 375. The communications module 375 may provide a voice streaming up of device command interpreter 305.
  • FIG. 4 illustrates a front perspective view 400 of an intelligent assistant device, in accordance with embodiments of the present invention. In particular, intelligent assistant device comprises a screen 405, a frame 410, and a device stand 415. Further, FIG. 5 illustrates a rear perspective view 500 of an intelligent assistant device, in accordance with embodiments of the present invention. In particular, intelligent assistant device comprises a speaker 505, input slots 510, device stand 515, and button 520.
  • FIG. 6 illustrates an overhead view 600 of an intelligent assistant device, in accordance with embodiments of the present invention. In particular, the intelligent assistant device comprises an audio button 605, a snooze button 610, a mode button 615, and an intelligent assistant device stand 620. FIG. 7 illustrates side views 705 a and 705 b of an intelligent assistant device, in accordance with embodiments of the present invention. In particular, the intelligent assistant device comprises buttons 710 and intelligent assistant device stand 715.
  • FIG. 8 illustrates another front perspective view 800 of an intelligent assistant device, in accordance with embodiments of the present invention. In particular, the intelligent assistant device comprises a screen 805, a frame 810, and a device stand 815. Further, screen 805 comprises a city indicator; a weather indicator; a date indicator; a time indicator; an alarm indicator; a message indicator; and a battery indicator. It will be understood that the screens are merely exemplary and other indicators may also likewise be utilized in accordance with the present technology. In some instances, the indicators utilized in screen 805 may relate to the types or domains of natural language queries that may be processed by the device.
  • FIG. 9 provides a block diagram 900 of components of an intelligent assistant device, in accordance with embodiments of the present invention. In particular, FIG. 9 comprises microphones 902 which provide audio data to an audio processor module 904. Audio processor module 904 provides analog data to a sensory natural language processor 906. Further, audio processor module 904 provides an analog or SPI (Serial Peripheral Interface) signal to a processor 908, where processor comprises a main Atmel chip. Further, light sensor 910 and temperature sensor 912 also provide data to processor 908. Buttons and/or switches 914 also provide data to processor 908 via a touch sensor controller 916. Additionally, data processor 908 is communicated between processor 908 and sensory natural language processor 906. Sensory natural language processor 906 is also coupled with an external memory for firmware 918.
  • Processor 908 also exchanges information with a Fast Super-Twisted Nematic Display (FSTN) Liquid Crystal Display (LCD) display module with driver 920, as well as a WiFi module 922. Further, processor 908 is communicatively coupled with an Erasable Programmable Read-Only Memory (EEPROM) 924 for user information and/or settings. Processor 908 is also communicatively coupled with radio module 926 and audio mux 928. Audio mux 928 is an audio amplifier chip. Audio mux 928 also receives data from aux audio input 930. Further, sensory natural language processor 906 also provides data to audio mux 928. Additionally, audio mux 928 provides data to audio amp 932 and stereo speaker 934. FIG. 9 also comprises, for example, a USB Jack (or other similar communicative interface) for recharging 936 that provides rechargeable battery 938.
  • In addition to the embodiments described above, another exemplary embodiment may utilize a plurality of microphones of a smartphone base to implement a natural language processor, in accordance with embodiments of the present invention. In particular, audio is received at the plurality of microphones at a smartphone base. The audio is received at an application running a natural language processor, such as natural language processor 145. Further, the application comprises a clean-up component that utilizes a combination of a microphone array, beam-forming, noise-reduction, and/or echo cancellation system to clean up the audio received. In particular, the clean-up component may remove non-human spoken audio and/or may remove garbled human spoken audio (e.g., background conversations) that do not comprise primary human spoken audio (e.g., the human spoken audio of the primary user). By using this process, a user can interact with a smartphone application from approximately ten feet away and closer. As such, by using audio clean-up processes such as beamforming, audio received from microphones of auxiliary hardware devices, such as a dock for smartphone devices, may be used to interact with an application that comprises a natural language processor, such as natural language processor 145.
  • FIG. 10 is a perspective view of an exemplary intelligent assistant device, which includes a base station in combination with a clock. The intelligent assistant device may include any of the natural language processing and third party information features described above in addition to features commonly utilized with alarm clocks. Thus, the alarm clock may be controlled by the features and operations of the personal digital assistant device associated therewith. FIG. 11 is a perspective view of another exemplary intelligent assistant device, which includes a sleek and uni-body construction.
  • FIG. 11A is a schematic diagram of various components of an intelligent assistant device, for use with any of the intelligent assistant device products described herein.
  • FIGS. 12A-G collectively illustrate an exemplary flow diagram of data through an exemplary system that includes an intelligent assistant device 1200. In FIG. 12A, an intelligent assistant device 1200 may be communicatively coupled with various systems through a client API 1205. More specifically, the intelligent assistant device 1200 may communicatively couple with a speech processor 1210 of FIG. 12B, which in turn, couples with an external speech recognition engine 1215, in some instances. Again, the intelligent assistant device 1200 may include an integral speech recognition application.
  • A frames scheduler may be utilized to schedule and correlate responses with other objects such as advertisements.
  • The intelligent assistant device 1200 may also communicatively couple with a notifications server 1220 as shown in FIG. 12C. The notifications server 1220 may cooperate with the frames scheduler and an advertisements engine to query relevant advertisements and integrate the same into a response, which is returned to the intelligent assistant device 1200.
  • As shown in FIG. 12D, the system may utilize a command fulfiller 1225 that creates API requests and processes responses to those requests. Additionally, the command fulfiller 1225 may also generate return response objects. The command fulfiller 1225 may communicatively couple with the speech processor 1210 of FIG. 12B, as well as various sub-classes of command fulfillers 1230. These sub-classes of command fulfillers 1230 may query third party information sources, such as an external knowledge engine 1235. The sub-classes of command fulfillers 1230 may be domain specific, such as news, weather, and so forth.
  • FIG. 12E illustrates the use of a frame generator 1240 that processes information obtained by the sub-classes of command fulfillers 1230 of FIG. 12D. Additionally, a plug-in framework for third party applications module 1245 is shown. This module 1245 allows for communicative coupling and interfacing of third party applications 1265 of FIG. 12G, via a developer API 1270.
  • Additionally, a user management system 1250 may allow for end user setup of the intelligent assistant device 1200 via an end user. The end user may utilize a web-based portal 1255 of FIG. 12F that allows for the end user to setup and manage their device via a device management API 1260.
  • FIG. 13 illustrates an exemplary computing system 1300 that may be used to implement an embodiment of the present systems and methods. The system 1300 of FIG. 13 may be implemented in the contexts of the likes of computing systems, networks, servers, or combinations thereof. The computing system 1300 of FIG. 13 includes one or more processors 1310 and main memory 1320. Main memory 1320 stores, in part, instructions and data for execution by processor 1310. Main memory 1320 may store the executable code when in operation. The system 1300 of FIG. 13 further includes a mass storage device 1330, portable storage device 1340, output devices 1350, user input devices 1360, a display system 1370, and peripheral devices 1380.
  • The components shown in FIG. 13 are depicted as being connected via a single bus 1390. The components may be connected through one or more data transport means. Processor unit 1310 and main memory 1320 may be connected via a local microprocessor bus, and the mass storage device 1330, peripheral device(s) 1380, portable storage device 1340, and display system 1370 may be connected via one or more input/output (I/O) buses.
  • Mass storage device 1330, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 1310. Mass storage device 1330 may store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 1320.
  • Portable storage device 1340 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk, digital video disc, or USB storage device, to input and output data and code to and from the computer system 1300 of FIG. 13. The system software for implementing embodiments of the present invention may be stored on such a portable medium and input to the computer system 1300 via the portable storage device 1340.
  • User input devices 1360 provide a portion of a user interface. User input devices 1360 may include an alphanumeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Additional user input devices 1360 may comprise, but are not limited to, devices such as speech recognition systems, facial recognition systems, motion-based input systems, gesture-based systems, and so forth. For example, user input devices 1360 may include a touchscreen. Additionally, the system 1300 as shown in FIG. 13 includes output devices 1350. Suitable output devices include speakers, printers, network interfaces, and monitors.
  • Display system 1370 may include a liquid crystal display (LCD) or other suitable display device. Display system 1370 receives textual and graphical information, and processes the information for output to the display device.
  • Peripherals device(s) 1380 may include any type of computer support device to add additional functionality to the computer system. Peripheral device(s) 1380 may include a modem or a router.
  • The components provided in the computer system 1300 of FIG. 13 are those typically found in computer systems that may be suitable for use with embodiments of the present invention and are intended to represent a broad category of such computer components that are well known in the art. Thus, the computer system 1300 of FIG. 13 may be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system. The computer may also include different bus configurations, networked platforms, multi-processor platforms, etc. Various operating systems may be used including Unix, Linux, Windows, Mac OS, Palm OS, Android, iOS (known as iPhone OS before June 2010), QNX, and other suitable operating systems.
  • FIGS. 14A and 14B collectively provide views of an exemplary embodiment of an intelligent assistant device that functions as a base for receiving a second hardware device, such as a cellular telephone. It will be understood that the intelligent assistant device may include any communicative interface that allows for one or more devices to interface with the intelligent assistant device via a physical connection.
  • It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the systems and methods provided herein. Computer-readable storage media refer to any medium or media that participate in providing instructions to a central processing unit (CPU), a processor, a microcontroller, or the like. Such media may take forms including, but not limited to, non-volatile and volatile media such as optical or magnetic disks and dynamic memory, respectively. Common forms of computer-readable storage media include a floppy disk, a flexible disk, a hard disk, magnetic tape, and any other magnetic storage medium, a CD-ROM disk, digital video disk (DVD), any other optical storage medium, RAM, PROM, EPROM, a FLASHEPROM, any other memory chip or cartridge.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be coupled with the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Exemplary embodiments were chosen and described in order to best explain the principles of the present technology and its practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
  • Aspects of the present invention are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. The descriptions are not intended to limit the scope of the technology to the particular forms set forth herein. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments. It should be understood that the above description is illustrative and not restrictive. To the contrary, the present descriptions are intended to cover such alternatives, modifications, and equivalents as may be included within the spirit and scope of the technology as defined by the appended claims and otherwise appreciated by one of ordinary skill in the art. The scope of the technology should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.

Claims (20)

What is claimed is:
1. A system comprising:
an intelligent assistant device comprising a processor which executes logic to perform operations comprising:
receiving audio input for generating a speech signal using at least one microphone communicatively coupled to the intelligent assistant device; and
a natural language processor communicatively coupled with the intelligent assistant device that executes logic to perform operations comprising:
receiving the audio input from the intelligent assistant device;
converting the audio input from speech to a text query;
processing the text query using artificial intelligence (AI) logic;
determining an Application Programming Interface (API) from a plurality of APIs for processing the text query; and
transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output.
2. The system of claim 1, wherein the natural language processor further uses machine learning to analyze a string of text.
3. The system of claim 1, wherein the natural language processor uses a neural network to analyze the string of text.
4. The system of claim 1, wherein the natural language processor interprets and learns from patterns and behaviors of a user, attributing data to the patterns and behaviors such that a response to a future command from the user can be automatically generated by the intelligent assistant device.
5. The system of claim 1, wherein the intelligent assistant device acts as a base station connected to at least one enabled device such that the audio input received by the intelligent assistant device is used to adjust an operation of the connected at least one enabled device.
6. The system of claim 5, wherein the at least one enabled device receives data to transmit to or from the intelligent assistant device.
7. The system of claim 5, wherein the at least one enabled device is connected to the intelligent assistant device via Bluetooth.
8. The system of claim 5, wherein the at least one enabled device comprises a smartphone comprising:
at least one microphone for receiving audio commands;
at least one user input interface;
a mobile application for processing audio and user input commands; and
a natural language processor for performing automatic speech recognition of the audio commands.
9. The system of claim 5, wherein the at least one enabled device comprises at least one smart home device.
10. The system of claim 5, wherein the at least one smart home device receives commands from a general server connected to the intelligent assistant device.
11. The system of claim 1, wherein the intelligent assistant device utilizes digital signal processing to separate background noise in the audio input.
12. The system of claim 1, wherein the intelligent assistant device includes indicators that provide interactive feedback.
13. A method, comprising:
receiving audio input for generating a speech signal using at least one microphone communicatively coupled to an intelligent assistant device;
transmitting the audio input from the intelligent assistant device to a natural language processor;
converting the audio input from speech to a text query using the natural language processor;
processing the text query using artificial intelligence (AI) logic using the natural language processor;
determining an Application Programming Interface (API) from a plurality of APIs for processing the text query using the natural language processor; and
transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output using the natural language processor.
14. The method of claim 13, further comprising processing the text query using machine learning to analyze a string of text using the natural language processor.
15. The method of claim 13, further comprising processing the text query using a neural network to analyze the string of text using the natural language processor.
16. The method of claim 13, further comprising connecting the intelligent assistant device to at least one enabled device such that the audio input received by the intelligent assistant device is used to adjust an operation of the connected at least one enabled device.
17. The method of claim 16, wherein the at least one enabled device receives data to transmit to or from the intelligent assistant device
18. The method of claim 16, wherein the at least one enabled device comprises a smartphone comprising:
at least one microphone for receiving audio commands;
at least one user input interface;
a mobile application for processing audio and user input commands; and
a natural language processor for performing automatic speech recognition of the audio commands.
19. The method of claim 16, wherein the at least one enabled device comprises at least one smart home device that receives commands from a general server connected to the intelligent assistant device.
20. An interactive device, comprising:
at least one microphone;
at least one speaker; and
a processor that executes logic stored in memory to perform operations comprising:
receiving audio input for generating a speech signal using the at least one microphone;
transmitting the audio input from the device to a natural language processor;
receiving a response to the audio input from a server; and
outputting the response.
US15/336,714 2012-01-09 2016-10-27 Responding to Human Spoken Audio Based on User Input Abandoned US20170046124A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/336,714 US20170046124A1 (en) 2012-01-09 2016-10-27 Responding to Human Spoken Audio Based on User Input

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261584752P 2012-01-09 2012-01-09
US13/734,282 US9542956B1 (en) 2012-01-09 2013-01-04 Systems and methods for responding to human spoken audio
US15/336,714 US20170046124A1 (en) 2012-01-09 2016-10-27 Responding to Human Spoken Audio Based on User Input

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/734,282 Continuation US9542956B1 (en) 2012-01-09 2013-01-04 Systems and methods for responding to human spoken audio

Publications (1)

Publication Number Publication Date
US20170046124A1 true US20170046124A1 (en) 2017-02-16

Family

ID=57706028

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/734,282 Expired - Fee Related US9542956B1 (en) 2012-01-09 2013-01-04 Systems and methods for responding to human spoken audio
US15/336,714 Abandoned US20170046124A1 (en) 2012-01-09 2016-10-27 Responding to Human Spoken Audio Based on User Input

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/734,282 Expired - Fee Related US9542956B1 (en) 2012-01-09 2013-01-04 Systems and methods for responding to human spoken audio

Country Status (1)

Country Link
US (2) US9542956B1 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179464A1 (en) * 2014-12-22 2016-06-23 Microsoft Technology Licensing, Llc Scaling digital personal assistant agents across devices
US20180315093A1 (en) * 2017-05-01 2018-11-01 International Business Machines Corporation Method and system for targeted advertising based on natural language analytics
WO2018208453A1 (en) * 2017-05-09 2018-11-15 Microsoft Technology Licensing, Llc Random factoid generation
US20190130899A1 (en) * 2012-08-03 2019-05-02 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US20190155934A1 (en) * 2017-11-22 2019-05-23 International Business Machines Corporation Search query enhancement with context analysis
EP3502928A1 (en) * 2017-12-22 2019-06-26 Sap Se Intelligent natural language query processor
CN110400561A (en) * 2018-04-16 2019-11-01 松下航空电子公司 Method and system for the vehicles
US20200150981A1 (en) * 2018-11-09 2020-05-14 International Business Machines Corporation Dynamic Generation of User Interfaces Based on Dialogue
WO2020166796A1 (en) * 2019-02-11 2020-08-20 삼성전자주식회사 Electronic device and control method therefor
US10769189B2 (en) * 2015-11-13 2020-09-08 Microsoft Technology Licensing, Llc Computer speech recognition and semantic understanding from activity patterns
US10959018B1 (en) * 2019-01-18 2021-03-23 Amazon Technologies, Inc. Method for autonomous loudspeaker room adaptation
US11017028B2 (en) * 2018-10-03 2021-05-25 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes
US20210165630A1 (en) * 2012-12-14 2021-06-03 Amazon Technologies, Inc. Response endpoint selection
US11048869B2 (en) * 2016-08-19 2021-06-29 Panasonic Avionics Corporation Digital assistant and associated methods for a transportation vehicle
US11107457B2 (en) 2017-03-29 2021-08-31 Google Llc End-to-end text-to-speech conversion
US20210304020A1 (en) * 2020-03-17 2021-09-30 MeetKai, Inc. Universal client api for ai services
US11183190B2 (en) * 2019-05-21 2021-11-23 Lg Electronics Inc. Method and apparatus for recognizing a voice
US20210398527A1 (en) * 2018-10-16 2021-12-23 Huawei Technologies Co., Ltd. Terminal screen projection control method and terminal
US11328711B2 (en) * 2019-07-05 2022-05-10 Korea Electronics Technology Institute User adaptive conversation apparatus and method based on monitoring of emotional and ethical states
US11347749B2 (en) 2018-05-24 2022-05-31 Sap Se Machine learning in digital paper-based interaction
US11381903B2 (en) 2014-02-14 2022-07-05 Sonic Blocks Inc. Modular quick-connect A/V system and methods thereof
US11429883B2 (en) 2015-11-13 2022-08-30 Microsoft Technology Licensing, Llc Enhanced computer experience from activity prediction
WO2022271385A1 (en) * 2021-06-21 2022-12-29 Roots For Education Llc Automatic generation of lectures derived from generic, educational or scientific contents, fitting specified parameters
US11790176B2 (en) * 2019-03-19 2023-10-17 Servicenow, Inc. Systems and methods for a virtual agent in a cloud computing environment
US11869497B2 (en) 2020-03-10 2024-01-09 MeetKai, Inc. Parallel hypothetical reasoning to power a multi-lingual, multi-turn, multi-domain virtual assistant
US11921712B2 (en) 2020-10-05 2024-03-05 MeetKai, Inc. System and method for automatically generating question and query pairs
US11991253B2 (en) 2020-03-17 2024-05-21 MeetKai, Inc. Intelligent layer to power cross platform, edge-cloud hybrid artificial intelligence services

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10630751B2 (en) 2016-12-30 2020-04-21 Google Llc Sequence dependent data message consolidation in a voice activated computer network environment
US10956485B2 (en) * 2011-08-31 2021-03-23 Google Llc Retargeting in a search environment
US9703757B2 (en) 2013-09-30 2017-07-11 Google Inc. Automatically determining a size for a content item for a web page
US10431209B2 (en) 2016-12-30 2019-10-01 Google Llc Feedback controller for data transmissions
US10614153B2 (en) 2013-09-30 2020-04-07 Google Llc Resource size-based content item selection
RU2654789C2 (en) * 2014-05-30 2018-05-22 Общество С Ограниченной Ответственностью "Яндекс" Method (options) and electronic device (options) for processing the user verbal request
WO2018140420A1 (en) 2017-01-24 2018-08-02 Honeywell International, Inc. Voice control of an integrated room automation system
US10984329B2 (en) 2017-06-14 2021-04-20 Ademco Inc. Voice activated virtual assistant with a fused response
EP3447659A1 (en) * 2017-08-21 2019-02-27 Vestel Elektronik Sanayi ve Ticaret A.S. Digital assistant and method of operation
EP3496090A1 (en) * 2017-12-07 2019-06-12 Thomson Licensing Device and method for privacy-preserving vocal interaction
US20190332848A1 (en) 2018-04-27 2019-10-31 Honeywell International Inc. Facial enrollment and recognition system
US20190390866A1 (en) 2018-06-22 2019-12-26 Honeywell International Inc. Building management system with natural language interface
CN112639963A (en) * 2020-03-19 2021-04-09 深圳市大疆创新科技有限公司 Audio acquisition device, audio receiving device and audio processing method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020032751A1 (en) * 2000-05-23 2002-03-14 Srinivas Bharadwaj Remote displays in mobile communication networks
US20020174177A1 (en) * 2001-04-25 2002-11-21 Sharon Miesen Voice activated navigation of a computer network
US20030018531A1 (en) * 2000-09-08 2003-01-23 Mahaffy Kevin E. Point-of-sale commercial transaction processing system using artificial intelligence assisted by human intervention
US20040107108A1 (en) * 2001-02-26 2004-06-03 Rohwer Elizabeth A Apparatus and methods for implementing voice enabling applications in a coverged voice and data network environment
US20060106617A1 (en) * 2002-02-04 2006-05-18 Microsoft Corporation Speech Controls For Use With a Speech System
US20060235696A1 (en) * 1999-11-12 2006-10-19 Bennett Ian M Network based interactive speech recognition system
US20070106497A1 (en) * 2005-11-09 2007-05-10 Microsoft Corporation Natural language interface for driving adaptive scenarios
US20070124142A1 (en) * 2005-11-25 2007-05-31 Mukherjee Santosh K Voice enabled knowledge system
US20120210233A1 (en) * 2010-11-04 2012-08-16 Davis Bruce L Smartphone-Based Methods and Systems
US9076448B2 (en) * 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8498871B2 (en) * 2001-11-27 2013-07-30 Advanced Voice Recognition Systems, Inc. Dynamic speech recognition and transcription among users having heterogeneous protocols
KR100764174B1 (en) * 2006-03-03 2007-10-08 삼성전자주식회사 Apparatus for providing voice dialogue service and method for operating the apparatus
US20080294462A1 (en) * 2007-05-23 2008-11-27 Laura Nuhaan System, Method, And Apparatus Of Facilitating Web-Based Interactions Between An Elderly And Caregivers
KR101577607B1 (en) * 2009-05-22 2015-12-15 삼성전자주식회사 Apparatus and method for language expression using context and intent awareness
US9183560B2 (en) * 2010-05-28 2015-11-10 Daniel H. Abelow Reality alternate

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060235696A1 (en) * 1999-11-12 2006-10-19 Bennett Ian M Network based interactive speech recognition system
US9076448B2 (en) * 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US20020032751A1 (en) * 2000-05-23 2002-03-14 Srinivas Bharadwaj Remote displays in mobile communication networks
US20030018531A1 (en) * 2000-09-08 2003-01-23 Mahaffy Kevin E. Point-of-sale commercial transaction processing system using artificial intelligence assisted by human intervention
US20040107108A1 (en) * 2001-02-26 2004-06-03 Rohwer Elizabeth A Apparatus and methods for implementing voice enabling applications in a coverged voice and data network environment
US20020174177A1 (en) * 2001-04-25 2002-11-21 Sharon Miesen Voice activated navigation of a computer network
US20060106617A1 (en) * 2002-02-04 2006-05-18 Microsoft Corporation Speech Controls For Use With a Speech System
US20070106497A1 (en) * 2005-11-09 2007-05-10 Microsoft Corporation Natural language interface for driving adaptive scenarios
US20070124142A1 (en) * 2005-11-25 2007-05-31 Mukherjee Santosh K Voice enabled knowledge system
US20120210233A1 (en) * 2010-11-04 2012-08-16 Davis Bruce L Smartphone-Based Methods and Systems

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130899A1 (en) * 2012-08-03 2019-05-02 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US11024297B2 (en) * 2012-08-03 2021-06-01 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US20210165630A1 (en) * 2012-12-14 2021-06-03 Amazon Technologies, Inc. Response endpoint selection
US20230141659A1 (en) * 2012-12-14 2023-05-11 Amazon Technologies, Inc. Response endpoint selection
US11381903B2 (en) 2014-02-14 2022-07-05 Sonic Blocks Inc. Modular quick-connect A/V system and methods thereof
US20160179464A1 (en) * 2014-12-22 2016-06-23 Microsoft Technology Licensing, Llc Scaling digital personal assistant agents across devices
US9690542B2 (en) * 2014-12-22 2017-06-27 Microsoft Technology Licensing, Llc Scaling digital personal assistant agents across devices
US10769189B2 (en) * 2015-11-13 2020-09-08 Microsoft Technology Licensing, Llc Computer speech recognition and semantic understanding from activity patterns
US11429883B2 (en) 2015-11-13 2022-08-30 Microsoft Technology Licensing, Llc Enhanced computer experience from activity prediction
US11048869B2 (en) * 2016-08-19 2021-06-29 Panasonic Avionics Corporation Digital assistant and associated methods for a transportation vehicle
US11107457B2 (en) 2017-03-29 2021-08-31 Google Llc End-to-end text-to-speech conversion
US11862142B2 (en) 2017-03-29 2024-01-02 Google Llc End-to-end text-to-speech conversion
US20180315094A1 (en) * 2017-05-01 2018-11-01 International Business Machines Corporation Method and system for targeted advertising based on natural language analytics
US20180315093A1 (en) * 2017-05-01 2018-11-01 International Business Machines Corporation Method and system for targeted advertising based on natural language analytics
US10671602B2 (en) 2017-05-09 2020-06-02 Microsoft Technology Licensing, Llc Random factoid generation
WO2018208453A1 (en) * 2017-05-09 2018-11-15 Microsoft Technology Licensing, Llc Random factoid generation
US11200241B2 (en) * 2017-11-22 2021-12-14 International Business Machines Corporation Search query enhancement with context analysis
US20190155934A1 (en) * 2017-11-22 2019-05-23 International Business Machines Corporation Search query enhancement with context analysis
US10853396B2 (en) 2017-12-22 2020-12-01 Sap Se Intelligent natural language query processor
EP3502928A1 (en) * 2017-12-22 2019-06-26 Sap Se Intelligent natural language query processor
CN110400561A (en) * 2018-04-16 2019-11-01 松下航空电子公司 Method and system for the vehicles
US11347749B2 (en) 2018-05-24 2022-05-31 Sap Se Machine learning in digital paper-based interaction
US11977551B2 (en) 2018-05-24 2024-05-07 Sap Se Digital paper based interaction to system data
US11531673B2 (en) 2018-05-24 2022-12-20 Sap Se Ambiguity resolution in digital paper-based interaction
US11928112B2 (en) * 2018-10-03 2024-03-12 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes
US20210240778A1 (en) * 2018-10-03 2021-08-05 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes
US11017028B2 (en) * 2018-10-03 2021-05-25 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes
US20210398527A1 (en) * 2018-10-16 2021-12-23 Huawei Technologies Co., Ltd. Terminal screen projection control method and terminal
US20200150981A1 (en) * 2018-11-09 2020-05-14 International Business Machines Corporation Dynamic Generation of User Interfaces Based on Dialogue
US10959018B1 (en) * 2019-01-18 2021-03-23 Amazon Technologies, Inc. Method for autonomous loudspeaker room adaptation
US11710498B2 (en) 2019-02-11 2023-07-25 Samsung Electronics Co., Ltd. Electronic device and control method therefor
CN113261055A (en) * 2019-02-11 2021-08-13 三星电子株式会社 Electronic device and control method thereof
WO2020166796A1 (en) * 2019-02-11 2020-08-20 삼성전자주식회사 Electronic device and control method therefor
EP3863012A4 (en) * 2019-02-11 2021-12-22 Samsung Electronics Co., Ltd. Electronic device and control method therefor
US11790176B2 (en) * 2019-03-19 2023-10-17 Servicenow, Inc. Systems and methods for a virtual agent in a cloud computing environment
US11183190B2 (en) * 2019-05-21 2021-11-23 Lg Electronics Inc. Method and apparatus for recognizing a voice
US11328711B2 (en) * 2019-07-05 2022-05-10 Korea Electronics Technology Institute User adaptive conversation apparatus and method based on monitoring of emotional and ethical states
US11869497B2 (en) 2020-03-10 2024-01-09 MeetKai, Inc. Parallel hypothetical reasoning to power a multi-lingual, multi-turn, multi-domain virtual assistant
US20210304020A1 (en) * 2020-03-17 2021-09-30 MeetKai, Inc. Universal client api for ai services
US11991253B2 (en) 2020-03-17 2024-05-21 MeetKai, Inc. Intelligent layer to power cross platform, edge-cloud hybrid artificial intelligence services
US11995561B2 (en) * 2020-03-17 2024-05-28 MeetKai, Inc. Universal client API for AI services
US11921712B2 (en) 2020-10-05 2024-03-05 MeetKai, Inc. System and method for automatically generating question and query pairs
WO2022271385A1 (en) * 2021-06-21 2022-12-29 Roots For Education Llc Automatic generation of lectures derived from generic, educational or scientific contents, fitting specified parameters

Also Published As

Publication number Publication date
US9542956B1 (en) 2017-01-10

Similar Documents

Publication Publication Date Title
US20170046124A1 (en) Responding to Human Spoken Audio Based on User Input
US10121465B1 (en) Providing content on multiple devices
US20210065716A1 (en) Voice processing method and electronic device supporting the same
CN107112014B (en) Application focus in speech-based systems
EP2973543B1 (en) Providing content on multiple devices
US11810557B2 (en) Dynamic and/or context-specific hot words to invoke automated assistant
JP2020016875A (en) Voice interaction method, device, equipment, computer storage medium, and computer program
US11721338B2 (en) Context-based dynamic tolerance of virtual assistant
CN110428825B (en) Method and system for ignoring trigger words in streaming media content
US20210149627A1 (en) System for processing user utterance and control method of same
JP2022547598A (en) Techniques for interactive processing using contextual data
US20230043528A1 (en) Using backpropagation to train a dialog system
WO2021074736A1 (en) Providing adversarial protection of speech in audio signals
KR20190068133A (en) Electronic device and method for speech recognition
US10997963B1 (en) Voice based interaction based on context-based directives
US11783836B2 (en) Personal electronic captioning based on a participant user's difficulty in understanding a speaker
US20230169272A1 (en) Communication framework for automated content generation and adaptive delivery
US11462208B2 (en) Implementing a correction model to reduce propagation of automatic speech recognition errors
US20210407512A1 (en) System for Voice-To-Text Tagging for Rich Transcription of Human Speech
CN111724773A (en) Application opening method and device, computer system and medium
EP3792912B1 (en) Improved wake-word recognition in low-power devices
US11722572B2 (en) Communication platform shifting for voice-enabled device
US12008990B1 (en) Providing content on multiple devices
US20210264910A1 (en) User-driven content generation for virtual assistant
KR20150106181A (en) Messenger service system, messenger service method and apparatus for messenger service using pattern of common word in the system

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION