EP3047481A1 - Local and remote speech processing - Google Patents
Local and remote speech processingInfo
- Publication number
- EP3047481A1 EP3047481A1 EP14846698.0A EP14846698A EP3047481A1 EP 3047481 A1 EP3047481 A1 EP 3047481A1 EP 14846698 A EP14846698 A EP 14846698A EP 3047481 A1 EP3047481 A1 EP 3047481A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- command
- expression
- function
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000014509 gene expression Effects 0.000 claims abstract description 141
- 230000004044 response Effects 0.000 claims abstract description 84
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 111
- 238000004891 communication Methods 0.000 claims description 28
- 238000000034 method Methods 0.000 claims description 28
- 230000000977 initiatory effect Effects 0.000 claims description 15
- 230000000694 effects Effects 0.000 abstract description 5
- 230000009471 action Effects 0.000 description 81
- 230000005236 sound signal Effects 0.000 description 19
- 238000005516 engineering process Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000003490 calendering Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- Homes, offices, automobiles, and public spaces are becoming more wired and connected with the proliferation of computing devices such as notebook computers, tablets, entertainment systems, and portable communication devices.
- computing devices such as notebook computers, tablets, entertainment systems, and portable communication devices.
- computing devices evolve, the ways in which users interact with these devices continue to evolve. For example, people can interact with computing devices through mechanical devices (e.g., keyboards, mice, etc.), electrical devices (e.g., touch screens, touch pads, etc.), and optical devices (e.g., motion detectors, camera, etc.).
- Another way to interact with computing devices is through audio devices that capture and respond to human speech.
- FIG. 1 is a block diagram of an illustrative voice interaction computing architecture that includes a local audio device and a remote speech processing service.
- FIG. 2-4 are flow diagrams illustrating example processes for detecting command expressions that may be performed by a local audio device in conjunction with a remote speech processing service.
- This disclosure pertains generally to a speech interface system that provides or facilitates speech-based interactions with a user.
- the system includes a local device having a microphone that captures audio containing user speech.
- Spoken user commands may be prefaced by a keyword, referred to as a trigger expression or wake expression. Audio following a trigger expression may be streamed to a remote service for speech recognition and the service may respond by performing a function or providing a command to be performed by the audio device.
- certain command expressions are detected by or at the local device rather than by the remote service.
- the local device is configured to detect a trigger or alert expression, which indicates that subsequent speech is intended by the user to form a command.
- the local device initiates a communication session with the remote service and begins streaming received audio to the service.
- the remote service performs speech recognition on the received audio and attempts to identify user intent based on the recognized speech.
- the remote service may perform a corresponding function.
- the function may performed in conjunction with the local device. For example, the remote service may send a command to the local device indicating that the local device should execute the command to perform a corresponding function.
- the local device monitors or analyzes the audio to detect an occurrence of a local command expression following the trigger expression. Upon detecting a local command expression in the audio, the local device immediately implements a corresponding function. In addition, further actions by the remote service are stopped or cancelled to avoid duplicate actions with respect to a single user utterance. Actions by the remote service may be stopped by explicitly notifying the remote service that the utterance has been acted upon locally, by terminating or cancelling a communications session, and/or by foregoing execution of any commands that are specified by the remote service in response to remote recognition of user speech.
- FIG. 1 shows an example of a voice interaction system 100.
- the system 100 may include or may utilize a local voice-based audio device 102, which may be located within an environment 104 such as a home, and which may be used for interacting with a user 106.
- the voice interaction system 100 may also include or utilize a remote, network-based speech command service 108 that is configured to receive audio, to recognize speech in the audio, and to perform a function, referred to herein as a service-identified function, in response to the recognized speech.
- the service-identified function may be implemented by the speech command service 108 independently of the audio device, and/or may be implemented by providing a command to the audio device 102 for local execution.
- the primary mode of user interaction with the audio device 102 may be through speech.
- the audio device 102 may receive spoken command expressions from the user 106 and may provide services in response to the commands.
- the user may speak a predefined wake or trigger expression (e.g., "Awake"), which may be followed by commands or instructions (e.g., "I'd like to go to a movie. Please tell me what's playing at the local cinema.”).
- Provided services may include performing actions or activities, rendering media, obtaining and/or providing information, providing information via generated or synthesized speech via the audio device 102, initiating Internet-based services on behalf of the user 106, and so forth.
- the local audio device 102 and the speech command service 108 are configured to act in conjunction with each other to receive and respond to command expressions from the user 106.
- the command expressions may include local command expressions that are detected and acted upon by the local device 102 independently of the speech command service 108.
- the command expressions may also include commands that are interpreted and acted upon by or in conjunction with the remote speech command service 108.
- the audio device 102 may have one or more microphones 1 10 and one or more audio speakers or transducers 1 12 to facilitate audio interactions with the user 106.
- the microphone 1 10 produces a microphone signal, also referred to as an input audio signal, representing audio from the environment 104, including sounds or expressions uttered by the user 106.
- the microphone 1 10 may comprise a microphone array that is used in conjunction with audio beamforming techniques to produce an input audio signal that is focused in a selectable direction. Similarly, a plurality of directional microphones 1 10 may be used to produce an audio signal corresponding to one of multiple available directions.
- the audio device 102 includes operational logic, which in many cases may comprise a processor 1 14 and memory 1 16.
- the processor 1 14 may include multiple processors and/or a processor having multiple cores.
- the processor 1 14 may also comprise or include a digital signal processor for processing audio signals.
- the memory 1 16 may contain applications and programs in the form of computer-executable instructions that are executed by the processor 1 14 to perform acts or actions that implement desired functionality of the audio device 102, including the functionality specifically described below.
- the memory 1 16 may be a type of computer-readable storage media and may include volatile and nonvolatile memory. Thus, the memory 1 16 may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or other memory technology.
- the audio device 102 may include a plurality of applications, services, and/or functions 1 18, referred to collectively below as functional components 1 18, which are executable by the processor 1 14 to provide services and functionality.
- the applications and other functional components 1 18 may include media playback services such as music players.
- Other services or operations performed or provided by the applications and other functional components 1 18 may include, as examples, requesting and consuming entertainment (e.g., gaming, finding and playing music, movies or other content, etc.), personal management (e.g., calendaring, note taking, etc.), online shopping, financial transactions, database inquiries, person-to-person voice communications, and so forth.
- the functional components 1 18 may be pre- installed on the audio device 102, and may implement core functionality of the audio device 102. In other embodiments, one or more of the applications or other functional components 1 18 may be installed by the user 106 or otherwise installed after the audio device 102 has been initialized by the user 106, and may implement additional or customized functionality as desired by the user 106.
- the processor 1 14 may be configured by audio processing functionality or components 120 to process input audio signals generated by the microphone 1 10 and/or output audio signals provided to the speaker 1 12.
- the audio processing components 120 may implement acoustic echo cancellation to reduce audio echo generated by acoustic coupling between the microphone 1 10 and the speaker 1 12.
- the audio processing components 120 may also implement noise reduction to reduce noise in received audio signals, such as elements of input audio signals other than user speech.
- the audio processing components 120 may include one or more audio beamformers that are responsive to multiple microphones 1 10 to generate an audio signal that is focused in a direction from which user speech has been detected.
- the audio device 102 may also be configured to implement one or more expression detectors or speech recognition components 122, which may be used to detect a trigger expression in speech captured by the microphone 1 10.
- the term "trigger expression” is used herein to indicate a word, phrase, or other utterance that is used to signal the audio device 102 that subsequent user speech is intended by the user to be interpreted as a command.
- the one or more speech recognition components 122 may also be used to detect commands or command expressions in the speech captured by the microphone 1 10.
- command expression is used herein to indicate a word, phrase, or other utterance that corresponds to or is associated with a function that is to be performed by the audio device 102 or by a service or other device that is accessible to the audio device 102, such as the speech command service 108.
- the words “stop”, “pause”, “hang-up” may be used as command expressions.
- the "stop” and “pause” command expressions may indicate that media playback activities should be interrupted.
- the "hang-up” command expression may indicate that a current person-to- person communication should be terminated.
- Other command expressions, corresponding to different functions may also be used.
- Command expressions may comprise conversation-style directives, such as "Find a nearby Italian restaurant.”
- Command expressions may include local command expressions that are to be interpreted by the audio device 102 without relying on the speech command service 108.
- local command expressions are relatively short expressions such as single words or short phrases, which can be easily detected by the audio device 102.
- Local command expressions may correspond to device functions for which relatively low response latencies are desired, such as media control or media playback control functions.
- the services of the speech command service 108 may be utilized for other command expressions for which greater response latencies are acceptable.
- Command expressions that are to be acted upon by the speech command service will be referred to herein as remote command expressions.
- the speech recognition components 122 may be implemented using automated speech recognition (ASR) techniques.
- ASR automated speech recognition
- large vocabulary speech recognition techniques may be used for keyword detection, and the output of the speech recognition may be monitored for occurrences of the keyword.
- the speech recognition may use hidden Markov models and Gaussian mixture models to recognize voice input and to provide a continuous word stream corresponding to the voice input. The word stream may then be monitored to detect one or more specified words or expressions.
- the speech recognition components 122 may be implemented by one or more keyword spotters.
- a keyword spotter is a functional component or algorithm that evaluates an audio signal to detect the presence of one or more predefined words or expressions in the audio signal.
- a keyword spotter uses simplified ASR techniques to detect a specific word or a limited number of words rather than attempting to recognize a large vocabulary.
- a keyword spotter may provide a notification when a specified word is detected in a voice signal, rather than providing a textual or word-based output.
- a keyword spotter using these techniques may compare different words based on hidden Markov models (HMMs), which represent words as series of states.
- HMMs hidden Markov models
- an utterance is analyzed by comparing its model to a keyword model and to a background model. Comparing the model of the utterance with the keyword model yields a score that represents the likelihood that the utterance corresponds to the keyword. Comparing the model of the utterance with the background model yields a score that represents the likelihood that the utterance corresponds to a generic word other than the keyword. The two scores can be compared to determine whether the keyword was uttered.
- the audio device 102 may further comprise control functionally 124, referred to herein as a controller or control logic, that is configured to interact with the other components of the audio device 102 in order to implement the logical functionality of the audio device 102.
- control functionally 124 referred to herein as a controller or control logic
- control logic 124 may comprise executable instructions, programs, and/or or program modules that are stored in the memory 1 16 and executed by the processor 1 14.
- the speech command service 108 may in some instances be part of a network-accessible computing platform that is maintained and accessible via a network 126 such as the Internet.
- Network-accessible computing platforms such as this may be referred to using terms such as “on-demand computing”, “software as a service (SaaS)", “platform computing”, “network-accessible platform”, “cloud services”, “data centers”, and so forth.
- the audio device 102 and/or the speech command service 108 may communicatively couple to the network 126 via wired technologies (e.g., wires, universal serial bus (USB), fiber optic cable, etc.), wireless technologies (e.g., radio frequencies (RF), cellular, mobile telephone networks, satellite, Bluetooth, etc.), or other connection technologies.
- the network 126 is representative of any type of communication network, including data and/or voice network, and may be implemented using wired infrastructure (e.g., coaxial cable, fiber optic cable, etc.), a wireless infrastructure (e.g., RF, cellular, microwave, satellite, Bluetooth®, etc.), and/or other connection technologies.
- the audio device 102 is described herein as a voice- controlled or speech-based interface device, the techniques described herein may be implemented in conjunction with various different types of devices, such as telecommunications devices and components, hands-free devices, entertainment devices, media playback devices, and so forth.
- the speech command service 108 generally provides functionality for receiving an audio stream from the audio device 102, recognizing speech in the audio stream, determining user intent from the recognized speech, and performing an action or service in response to the user intent.
- the provided action may in some cases be performed in conjunction with the audio device 102 and in these cases the speech command service 108 may return a response to the audio device 102 indicating a command that is to be executed by the audio device 102.
- the speech command service 108 includes operational logic, which in many cases may comprise one or more servers, computers, and or processors 128.
- the speech command service 108 may also have memory 130 containing applications and programs in the form of instructions that are executed by the processor 128 to perform acts or actions that implement desired functionality of the speech command service, including the functionality specifically described herein.
- the memory 130 may be a type of computer storage media and may include volatile and nonvolatile memory. Thus, the memory 130 may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or other memory technology.
- the speech command service 108 may comprise speech recognition components 132.
- the speech recognition components 132 may include automatic speech recognition (ASR) functionality that recognizes human speech in an audio signal.
- ASR automatic speech recognition
- the speech command service 108 may also comprise a natural language understanding component (NLU) 134 that determines user intent based on recognized speech.
- NLU natural language understanding component
- the speech command service 108 may also comprise a command interpreter and action dispatcher 136 (referred to below simply as a command interpreter 136) that determines functions or commands corresponding to user intents.
- commands may correspond to functions that are to be performed at least in part by the audio device 102, and the command interpreter 136 may in those cases provide responses to the audio device 102 indicating commands for implementing such functions.
- Examples of commands or functions that may be performed by the audio device in response to directives from the command interpreter 136 may include playing music or other media, increasing/decreasing the volume of the speaker 1 12, generating audible speech through the speaker 1 12, initiating certain types of communications with users of similar devices, and so forth.
- the speech command service 108 may also perform functions, in response to speech recognized from received audio, that involve entities or devices that are not shown in FIG. 1.
- the speech command service 108 may interact with other network-based services to obtain information or services on behalf of the user 106.
- the speech command service 108 may itself have various elements and functionality that may be responsive to speech uttered by the user 106.
- the microphone 1 10 of the audio device 102 captures or receives audio containing speech of the user 106.
- the audio is processed by the audio processing components 120 and the processed audio is received by the speech recognition components 122.
- the speech recognition components 122 analyze the audio to detect occurrences of a trigger expression in the speech contained in the audio.
- the controller 124 Upon detection of the trigger expression, the controller 124 begins sending or streaming received audio to the speech command service 108 along with a request for the speech command service 108 to recognize and interpret the user speech, and to initiate a function corresponding to any interpreted intent.
- the speech recognition components 122 Concurrently with sending the audio to the speech command service 108, the speech recognition components 122 continue to analyze the received audio to detect an occurrence of a local command expression in the user speech.
- the controller 124 Upon detection of a local command expression, the controller 124 initiates or performs a device function that corresponds to the local command expression. For example, in response to the local command expression "stop", the controller 124 may initiate a function that stops media playback.
- the controller 124 may interact with one or more of the functional components 1 18 when initiating or performing the function.
- the speech command service 108 in response to receiving the audio, concurrently analyzes the audio to recognize speech, to determine a user intent, and to determine a service-identified function that is to be implemented in response to the user intent.
- the audio device 102 may take actions to cancel, nullify, or invalidate any service-identified functions that may eventually be initiated by the speech command service 108.
- the audio device 102 may cancel its previous request by sending a cancellation message to the speech command service 108 and/or by stopping the streaming of the audio to the speech command service 108.
- the audio device may ignore or discard any responses or service-specified commands that are received from the speech command service 108 in response to the earlier request.
- the audio device may inform the speech command service 108 of actions that have been performed locally in response to the local command expression, and the speech command service 108 may modify its subsequent behavior based on this information. For example, the speech command service 108 may forego actions that it might otherwise have performed in response to recognized speech in the received audio.
- FIG. 2 illustrates an example method 200 that may be performed by the audio device 102 in conjunction with the speech command service 108 in order to recognize and respond to user speech.
- the method 200 will be described in the context of the system 100 of FIG. 1, although the method 200 may also be performed in other environments and may be implemented in different ways.
- Actions on the left side of FIG. 2 are performed at or by the local audio device 102. Actions on the right side of FIG. 2 are performed at or by the remote speech command service 108.
- An action 202 comprises receiving an audio signal that has been captured by or in conjunction with the microphone 1 10.
- the audio signal contains or represents audio from the environment 104, and may contain user speech.
- the audio signal may be an analog electrical signal or may comprise a digital signal such as a digital audio stream.
- An action 204 comprises detecting an occurrence of a trigger expression in the received audio and/or in the user speech. This may be performed by the speech recognition components 122 as described above, which may in some embodiments comprise keyword spotters. If the trigger expression is not detected, the action 204 is repeated in order to continuously monitor for occurrences of the trigger expression. The remaining actions shown in FIG. 2 are performed in response to detecting the trigger expression. [0043] If the trigger expression is detected in the action 204, an action 206 is performed, comprising sending subsequently received audio to the speech command service 108 along with a service request 208 for the speech command service 108 to recognize speech in the audio and to implement a function corresponding to the recognized speech. Functions initiated by the speech command service 108 in this manner are referred to herein as service- identified functions, and may in certain cases be performed in conjunction with the audio device 102. For example, a function may be initiated by sending a command to the audio device 102.
- the sending 206 may comprise streaming or otherwise transmitting a digital audio stream 210 to the speech command service 108, representing or containing audio that is received from the microphone 1 10 subsequent to detection of the trigger expression.
- the action 206 may comprise opening or initiating a communication session between the audio device 102 and the speech command service 108.
- the request 208 may be used to establish a communication session with the speech command service 108 for the purpose of recognizing speech, understanding intent, and determining actions or functions to be performed in response to user speech.
- the request 208 may be followed or accompanied by the streamed audio 210.
- the audio stream 210 provided to the speech command service 108 may include portions of received audio beginning at a time just prior to utterance of the trigger expression.
- the communication session may be associated with a communication or session identifier (ID) that identifies the communication session established between the audio device 102 and the speech command service 108.
- ID may be used or included in future communications relating to a particular user utterance or audio stream.
- the session ID may be generated by the audio device 102 and provided in the request 208 to the speech command service 108.
- the session ID may be generated by the speech command service 108 and provided by the speech command service 108 in acknowledgment of the request 208.
- the term "request(ID)" is used herein to indicate a request having a particular session ID.
- a response from the speech command service 108 relating to the same session, request, or audio stream may be indicated by the term "response(ID)".
- each communication session and corresponding session ID may correspond to a single user utterance.
- the audio device 102 may establish a session upon detecting the trigger expression. The audio device 102 may then continue to stream audio to the speech command service 108 as part of the same session until the end of the user utterance.
- the speech command service 108 may provide responses to the audio device 102 through the session, using the same session ID. Responses may in some cases indicate commands to be executed by the audio device 102 in response to speech recognized by the speech command service 108 in the received audio 210.
- the communication session may remain open until the audio device 102 receives a response from the speech command service 108 or until the audio device 102 cancels the request.
- the speech command service 108 receives the request 208 and audio stream 210 in an action 212.
- the speech command service 108 performs an action 214 of recognizing speech in the received audio and determining a user intent as expressed by the recognized speech, using the speech recognition and natural language understanding components 132 and 134 of the speech command service 108.
- An action 214, performed by the command interpreter 136 comprises identifying and initiating a service- identified function in fulfillment of the determined user intent.
- the service- identified function may in some cases be performed by the speech command service 108, independently of the audio device 102. In other cases, the speech command service 108 may identify a function that is to be performed by the audio device 102, and may send a corresponding command to the audio device 102 for execution by the audio device 102.
- the local audio device 102 Concurrently with the actions being performed by the speech command service 108, the local audio device 102 performs further actions to determine whether the user has uttered a local command expression and to perform a corresponding local function in response to any such uttered local command expression.
- an action 218, performed in response to detecting the trigger expression in the action 204 comprises analyzing audio received in the action 202 to detect an occurrence of a local command expression that follows or immediately follows the trigger expression in the received user speech. This may be performed by the speech recognition components 122 of the audio device 102 as described above, which may in some embodiments comprise keyword spotters.
- an action 220 is performed of immediately initiating a device function that has been associated with the local command expression.
- the local command expression "stop" might be associated with a function that stops media playback.
- the audio device 102 performs an action 222 of stopping or cancelling the request 208 to the speech command service 108. This may include cancelling or nullifying implementation of the service-identified function that may have otherwise been implemented by the speech command service 108 in response to the received request 208 and accompanying audio 210.
- the action 222 may comprise sending an explicit notification or command to the speech command service 108, requesting that the speech command service 108 cancel any further recognition activities with respect to the service request 208, and/or to cancel implementation of any service-identified functions that may otherwise have been initiated in response to recognized speech.
- the audio device 102 may simply notify the speech command service 108 regarding any functions that have been performed locally in response to local recognition of the local command expression, and the speech command service 108 may respond by cancelling the service request 208 or by performing other actions as may be appropriate.
- the speech command service 108 may implement the service-identified function by identifying a command to be executed by the audio device 102. In response to receiving a notification that the service request 208 is to be cancelled, the speech command service 108 may forego sending the command to the audio device 102. Alternatively, the speech command service may be allowed to complete its processing and to send a command to the audio device 102, whereupon the audio device 102 may ignore the command or forego execution of the command.
- the speech command service may be configured to notify the audio device 102 before initiating a service-identified function, and may delay implementation of the service-identified function until receiving permission from the audio device 102.
- the audio device 102 may be configured to deny such permission when the local command expression has been recognized locally.
- the actions of the speech command service 108 shown in FIG. 2 are performed in parallel and asynchronously with the actions 218, 220, and 222 of the audio device 102. It is assumed in some implementations that the audio device 102 is able to detect and act upon the local command expression relatively quickly, so that it may perform the action 222 of cancelling the request 208 and subsequent processing by the speech command service 108 before the service-identified function of the action 216 has been implemented or executed.
- FIG. 3 shows illustrates an example method 300 in which the speech command service 108 returns commands to the audio device 102, and in which the audio device 102 is configured to ignore the commands or forego execution of the commands in situations in which a local command expression has already been detected and acted upon by the audio device 102.
- Initial actions are similar or identical to those described above. Actions performed by the audio device 102 are shown on the left and actions performed by the speech command service 108 are shown on the right.
- An action 302 comprises receiving an audio signal containing user speech.
- An action 304 comprises analyzing the audio signal to detect a trigger expression in the user speech. Subsequent actions shown in FIG. 3 are performed in response to detecting the trigger expression.
- An action 306 comprises sending a request 308 and audio 310 to the speech command service 108.
- An action 312 comprises receiving the request 308 and the audio 310 at the speech command service 108.
- An action 314 comprises recognizing user speech and determining user intent based on the recognized user speech.
- the speech command service 108 performs an action 316 of sending a command 318 to the audio device 102 for execution by the audio device 102 in order to implement a service-identified function corresponding to the recognized user intent.
- the command may comprise a "stop" command, indicating that the audio device 102 is to stop playback of music.
- An action 320 performed by the audio device 102, comprises receiving and executing the command.
- the action 320 is shown in a dashed box to indicate that it is performed conditionally, based on whether a local command expression has been detected and acted upon by the audio device 102. Specifically, the action 320 is not performed if a local command expression has been detected by the audio device 102.
- the audio device 102 Concurrently with the actions performed by the speech command service 108, the audio device 102 performs an action 322 of analyzing received audio to detect an occurrence of a local command expression that follows or immediately follows the trigger expression in the received user speech. In response to detecting the local command expression, an action 324 is performed of immediately initiating a local device function that has been associated with the local command expression.
- the audio device 102 performs an action 326 of foregoing execution of the received command 318. More specifically, any commands received from the speech command service 108 in response to the request 308 are discarded or ignored. Responses and commands corresponding to the request 308 may be identified by session IDs associated with the responses.
- the audio device performs the action 320 of executing the command 318 received from the speech command service 108.
- FIG. 4 shows an example method 400 in which the audio device 102 is configured to actively cancel requests to the speech command service 108 after locally detecting a local command expression.
- Initial actions are similar or identical to those described above. Actions performed by the audio device 102 are shown on the left and actions performed by the speech command service 108 are shown on the right.
- An action 402 comprises receiving an audio signal containing user speech.
- An action 404 comprises analyzing the audio signal to detect a trigger expression in the user speech. Subsequent actions shown in FIG. 4 are performed in response to detecting the trigger expression.
- An action 406 comprises sending a request 408 and audio 410 to the speech command service 108.
- An action 412 comprises receiving the request 408 and the audio 410 at the speech command service 108.
- An action 414 comprises recognizing user speech and determining user intent based on the recognized user speech.
- An action 416 comprises determining whether the request 408 has been cancelled by the audio device 102.
- the audio device 102 may send a cancellation message or may terminate the current communication session in order to cancel the request. If the request has been canceled by the audio device 102, no further action is taken by the speech command service. If the request has not been canceled, an action 418 is performed, which comprises sending a command 420 to the audio device 102 for execution by the audio device 102 in order to implement a service-identified function corresponding to the recognized user intent.
- An action 422, performed by the audio device 102 comprises receiving and executing the command.
- the action 422 is shown in a dashed box to indicate that it is performed conditionally, depending on whether a command has been sent and received from the speech command service 108, which in turn depends on whether the audio device 102 has cancelled the request 408.
- the audio device 102 Concurrently with the actions performed by the speech command service 108, the audio device 102 performs an action 424 of analyzing received audio to detect an occurrence of a local command expression that follows or immediately follows the trigger expression in the received user speech. In response to detecting the local command expression, an action 426 is performed of immediately initiating a local device function that has been associated with the local command expression.
- the audio device 102 performs an action 428 of requesting the speech command service 108 to cancel the request 408 and/or to cancel implementation of any service-identified functions that may have otherwise been performed in response to recognized speech in the audio received by the speech command service 108 from the audio device 102.
- This may comprise communicating with the speech command service 108, such as by sending a cancellation notification or request.
- the cancellation may comprise replying to a communication or notification from the speech command service 108 of a pending implementation of a service-identified function by the speech command service.
- the audio device 102 may reply and may request cancellation of the pending implementation.
- the audio device 102 may cancel the implementation of any function that might have otherwise been performed in response to detecting the local command expression, and may instruct the speech command service 108 to proceed with implementation of the pending function.
- the local command expression is not detected in the action 424, the audio device 102 performs the action 422 of executing the command 420 received from the speech command service 108.
- the action 422 may occur asynchronously, upon receiving the command 420 from the speech command service.
- inventions described above may be implemented programmatically, such as with computers, processors, digital signal processors, analog processors, and so forth. In other embodiments, however, one or more of the components, functions, or elements may be implemented using specialized or dedicated circuits, including analog circuits and/or digital logic circuits.
- component as used herein, is intended to include any hardware, software, logic, or combinations of the foregoing that are used to implement the functionality attributed to the component.
- One or more non- transitory computer-readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- a method comprising:
- cancelling implementation of the first function comprises requesting the speech command service to cancel implementation of the first function.
- cancelling implementation of the first function comprises requesting the speech command service to cancel the pending implementation of the first function.
- cancelling implementation of the first function comprises terminating the communication session.
- cancelling implementation of the first function comprises forgoing execution of the command.
- a system comprising:
- one or more speech recognition components configured to recognize user speech in received audio, to detect a trigger expression in the user speech, and to detect a local command expression in the user speech;
- control logic configured to perform acts in response to detection by the one or more speech recognition components of the trigger expression in the user speech, the acts comprising: sending the audio to a speech command service to recognize speech in the audio and to implement a first function corresponding to the recognized speech;
- cancelling implementation of the at least one of the first and second functions comprises requesting the speech command service to cancel implementation of the first function.
- cancelling implementation of the at least one of the first and second functions comprises ignoring a command received from the speech command service.
- cancelling implementation of the at least one of the first and second functions comprises informing the speech command service that the second function has been initiated.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201314033302A | 2013-09-20 | 2013-09-20 | |
PCT/US2014/054700 WO2015041892A1 (en) | 2013-09-20 | 2014-09-09 | Local and remote speech processing |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3047481A1 true EP3047481A1 (en) | 2016-07-27 |
EP3047481A4 EP3047481A4 (en) | 2017-03-01 |
Family
ID=52689281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP14846698.0A Withdrawn EP3047481A4 (en) | 2013-09-20 | 2014-09-09 | Local and remote speech processing |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP3047481A4 (en) |
JP (1) | JP2016531375A (en) |
CN (1) | CN105793923A (en) |
WO (1) | WO2015041892A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190179610A1 (en) * | 2017-12-12 | 2019-06-13 | Amazon Technologies, Inc. | Architecture for a hub configured to control a second device while a connection to a remote system is unavailable |
Families Citing this family (152)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
EP4138075A1 (en) | 2013-02-07 | 2023-02-22 | Apple Inc. | Voice trigger for a digital assistant |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
EP3008641A1 (en) | 2013-06-09 | 2016-04-20 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN105453026A (en) | 2013-08-06 | 2016-03-30 | 苹果公司 | Auto-activating smart responses based on activities from remote devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
CN106471570B (en) | 2014-05-30 | 2019-10-01 | 苹果公司 | Order single language input method more |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US9870196B2 (en) * | 2015-05-27 | 2018-01-16 | Google Llc | Selective aborting of online processing of voice inputs in a voice-enabled electronic device |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US9966073B2 (en) | 2015-05-27 | 2018-05-08 | Google Llc | Context-sensitive dynamic update of voice to text model in a voice-enabled electronic device |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
US10083697B2 (en) | 2015-05-27 | 2018-09-25 | Google Llc | Local persisting of data for selectively offline capable voice action in a voice-enabled electronic device |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | Intelligent automated assistant in a home environment |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | User interface for correcting recognition errors |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770429A1 (en) | 2017-05-12 | 2018-12-14 | Apple Inc. | Low-latency intelligent automated assistant |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Multi-modal interfaces |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
DK179549B1 (en) | 2017-05-16 | 2019-02-12 | Apple Inc. | Far-field extension for digital assistant services |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
CN107146618A (en) * | 2017-06-16 | 2017-09-08 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN107342083B (en) * | 2017-07-05 | 2021-07-20 | 百度在线网络技术(北京)有限公司 | Method and apparatus for providing voice service |
US10599377B2 (en) | 2017-07-11 | 2020-03-24 | Roku, Inc. | Controlling visual indicators in an audio responsive electronic device, and capturing and providing audio using an API, by native and non-native computing devices and services |
BR112019002636A2 (en) * | 2017-08-02 | 2019-05-28 | Panasonic Ip Man Co Ltd | information processing apparatus, speech recognition system and information processing method |
US10455322B2 (en) | 2017-08-18 | 2019-10-22 | Roku, Inc. | Remote control with presence sensor |
US10777197B2 (en) | 2017-08-28 | 2020-09-15 | Roku, Inc. | Audio responsive device with play/stop and tell me something buttons |
US11062702B2 (en) | 2017-08-28 | 2021-07-13 | Roku, Inc. | Media system with multiple digital assistants |
US11062710B2 (en) | 2017-08-28 | 2021-07-13 | Roku, Inc. | Local and cloud speech recognition |
US10515637B1 (en) | 2017-09-19 | 2019-12-24 | Amazon Technologies, Inc. | Dynamic speech processing |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
CN111629658B (en) * | 2017-12-22 | 2023-09-15 | 瑞思迈传感器技术有限公司 | Apparatus, system, and method for motion sensing |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US11145298B2 (en) | 2018-02-13 | 2021-10-12 | Roku, Inc. | Trigger word detection with multiple digital assistants |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
CN108320749A (en) * | 2018-03-14 | 2018-07-24 | 百度在线网络技术(北京)有限公司 | Far field voice control device and far field speech control system |
US10984799B2 (en) * | 2018-03-23 | 2021-04-20 | Amazon Technologies, Inc. | Hybrid speech interface device |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | Virtual assistant operation in multi-device environments |
DK179822B1 (en) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
US11373645B1 (en) * | 2018-06-18 | 2022-06-28 | Amazon Technologies, Inc. | Updating personalized data on a speech interface device |
WO2020005241A1 (en) * | 2018-06-27 | 2020-01-02 | Google Llc | Rendering responses to a spoken utterance of a user utilizing a local text-response map |
JP7000268B2 (en) | 2018-07-18 | 2022-01-19 | 株式会社東芝 | Information processing equipment, information processing methods, and programs |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
WO2020096218A1 (en) * | 2018-11-05 | 2020-05-14 | Samsung Electronics Co., Ltd. | Electronic device and operation method thereof |
US10885912B2 (en) | 2018-11-13 | 2021-01-05 | Motorola Solutions, Inc. | Methods and systems for providing a corrected voice command |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | User activity shortcut suggestions |
DK201970510A1 (en) | 2019-05-31 | 2021-02-11 | Apple Inc | Voice identification in digital assistant systems |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
JP7451033B2 (en) * | 2020-03-06 | 2024-03-18 | アルパイン株式会社 | data processing system |
US11038934B1 (en) | 2020-05-11 | 2021-06-15 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
WO2023287471A1 (en) * | 2021-07-15 | 2023-01-19 | Arris Enterprises Llc | Command services manager for secure sharing of commands to registered agents |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58208799A (en) * | 1982-05-28 | 1983-12-05 | トヨタ自動車株式会社 | Voice recognition system for vehicle |
WO2000058942A2 (en) * | 1999-03-26 | 2000-10-05 | Koninklijke Philips Electronics N.V. | Client-server speech recognition |
JP2001005492A (en) * | 1999-06-21 | 2001-01-12 | Matsushita Electric Ind Co Ltd | Voice recognizing method and voice recognition device |
WO2004017161A2 (en) * | 2002-08-16 | 2004-02-26 | Nuasis Corporation | High availability voip subsystem |
KR100521154B1 (en) * | 2004-02-03 | 2005-10-12 | 삼성전자주식회사 | Apparatus and method processing call in voice/data integration switching system |
US9848086B2 (en) * | 2004-02-23 | 2017-12-19 | Nokia Technologies Oy | Methods, apparatus and computer program products for dispatching and prioritizing communication of generic-recipient messages to recipients |
JP4483428B2 (en) * | 2004-06-25 | 2010-06-16 | 日本電気株式会社 | Speech recognition / synthesis system, synchronization control method, synchronization control program, and synchronization control apparatus |
CN1728750B (en) * | 2004-07-27 | 2012-07-18 | 邓里文 | Method of packet voice communication |
US20070258418A1 (en) * | 2006-05-03 | 2007-11-08 | Sprint Spectrum L.P. | Method and system for controlling streaming of media to wireless communication devices |
JP5380777B2 (en) * | 2007-02-21 | 2014-01-08 | ヤマハ株式会社 | Audio conferencing equipment |
US8090077B2 (en) * | 2007-04-02 | 2012-01-03 | Microsoft Corporation | Testing acoustic echo cancellation and interference in VoIP telephones |
JP4925906B2 (en) * | 2007-04-26 | 2012-05-09 | 株式会社日立製作所 | Control device, information providing method, and information providing program |
CN101246687A (en) * | 2008-03-20 | 2008-08-20 | 北京航空航天大学 | Intelligent voice interaction system and method thereof |
US8364481B2 (en) * | 2008-07-02 | 2013-01-29 | Google Inc. | Speech recognition with parallel recognition tasks |
US8019608B2 (en) * | 2008-08-29 | 2011-09-13 | Multimodal Technologies, Inc. | Distributed speech recognition using one way communication |
US8676904B2 (en) * | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
JP5244663B2 (en) * | 2009-03-18 | 2013-07-24 | Kddi株式会社 | Speech recognition processing method and system for inputting text by speech |
US9171541B2 (en) * | 2009-11-10 | 2015-10-27 | Voicebox Technologies Corporation | System and method for hybrid processing in a natural language voice services environment |
US9953653B2 (en) * | 2011-01-07 | 2018-04-24 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
JP5658641B2 (en) * | 2011-09-15 | 2015-01-28 | 株式会社Nttドコモ | Terminal device, voice recognition program, voice recognition method, and voice recognition system |
US20130085753A1 (en) * | 2011-09-30 | 2013-04-04 | Google Inc. | Hybrid Client/Server Speech Recognition In A Mobile Device |
US9620122B2 (en) * | 2011-12-08 | 2017-04-11 | Lenovo (Singapore) Pte. Ltd | Hybrid speech recognition |
-
2014
- 2014-09-09 JP JP2016543926A patent/JP2016531375A/en active Pending
- 2014-09-09 CN CN201480050711.8A patent/CN105793923A/en active Pending
- 2014-09-09 EP EP14846698.0A patent/EP3047481A4/en not_active Withdrawn
- 2014-09-09 WO PCT/US2014/054700 patent/WO2015041892A1/en active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2015041892A1 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190179610A1 (en) * | 2017-12-12 | 2019-06-13 | Amazon Technologies, Inc. | Architecture for a hub configured to control a second device while a connection to a remote system is unavailable |
US10713007B2 (en) * | 2017-12-12 | 2020-07-14 | Amazon Technologies, Inc. | Architecture for a hub configured to control a second device while a connection to a remote system is unavailable |
Also Published As
Publication number | Publication date |
---|---|
JP2016531375A (en) | 2016-10-06 |
WO2015041892A1 (en) | 2015-03-26 |
EP3047481A4 (en) | 2017-03-01 |
CN105793923A (en) | 2016-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015041892A1 (en) | Local and remote speech processing | |
US11600271B2 (en) | Detecting self-generated wake expressions | |
US9672812B1 (en) | Qualifying trigger expressions in speech-based systems | |
US10354649B2 (en) | Altering audio to improve automatic speech recognition | |
CN108351872B (en) | Method and system for responding to user speech | |
CN107004411B (en) | Voice application architecture | |
EP3084633B1 (en) | Attribute-based audio channel arbitration | |
US10079017B1 (en) | Speech-responsive portable speaker | |
US9734845B1 (en) | Mitigating effects of electronic audio sources in expression detection | |
US9098467B1 (en) | Accepting voice commands based on user identity | |
US9324322B1 (en) | Automatic volume attenuation for speech enabled devices | |
US9293134B1 (en) | Source-specific speech interactions | |
US10297250B1 (en) | Asynchronous transfer of audio data | |
US11004453B2 (en) | Avoiding wake word self-triggering | |
KR20190075800A (en) | Intelligent personal assistant interface system | |
US9224404B2 (en) | Dynamic audio processing parameters with automatic speech recognition | |
CN102591455A (en) | Selective Transmission of Voice Data | |
US20240005918A1 (en) | System For Recognizing and Responding to Environmental Noises | |
US10923122B1 (en) | Pausing automatic speech recognition | |
EP2760019B1 (en) | Dynamic audio processing parameters with automatic speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20160302 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20170127 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 15/22 20060101ALI20170123BHEP Ipc: G10L 15/08 20060101ALI20170123BHEP Ipc: G10L 15/00 20130101ALI20170123BHEP Ipc: G10L 15/32 20130101ALI20170123BHEP Ipc: G10L 15/30 20130101AFI20170123BHEP |
|
17Q | First examination report despatched |
Effective date: 20180719 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20181130 |