CN105793923A - Local and remote speech processing - Google Patents

Local and remote speech processing Download PDF

Info

Publication number
CN105793923A
CN105793923A CN201480050711.8A CN201480050711A CN105793923A CN 105793923 A CN105793923 A CN 105793923A CN 201480050711 A CN201480050711 A CN 201480050711A CN 105793923 A CN105793923 A CN 105793923A
Authority
CN
China
Prior art keywords
function
audio frequency
service
statement
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201480050711.8A
Other languages
Chinese (zh)
Inventor
尼克尔·斯特罗姆
彼得·斯伯丁·万兰德
比约恩·霍夫梅斯特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Amazon Technologies Inc filed Critical Amazon Technologies Inc
Publication of CN105793923A publication Critical patent/CN105793923A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

A user device may be configured to detect a user-uttered trigger expression and to respond by interpreting subsequent words or phrases as commands. The commands may be recognized by sending audio containing the words or phrases to a remote service that is configured to perform speech recognition. Certain commands may be designated as local commands and may be detected locally rather than relying on the remote service. Upon detection of the trigger expression, audio is streamed to the remote service and also analyzed locally to detect utterances of local commands. Upon detecting a local command, a corresponding function is immediately initiated, and subsequent activities or responses by the remote service are canceled or ignored.

Description

Local and remote speech processes
Related application
This application claims the U.S. Patent Application No. 14/033 that title is " LocalandRemoteSpeechProcessing (local and remote speech processes) " of JIUYUE in 2013 submission on the 20th, the priority of 302, described patent application is incorporated herein in its entirety by reference.
Background
Family, office, automobile and public space become and computing equipment, such as notebook computer, panel computer, entertainment systems and portable communication device develop rapidly contact more and more tightr.Along with the development of computing equipment, the mode that user and these equipment interact also continues to development.Such as, people can pass through plant equipment (such as, keyboard, mouse etc.), electrical equipment (such as, touch screen, Trackpad etc.) and optical device (such as, motion detector, video camera etc.) and computing equipment interact.The another way interacted with computing equipment is by catching human speech the audio frequency apparatus that described human speech is responded.
Accompanying drawing is sketched
Detailed description is described with reference to.In the drawings, in reference number, the Digital ID of the leftmost side occurs the figure of described reference number first.The same reference numbers used in various figures indicates similar or identical parts or feature.
Fig. 1 is the block diagram of illustrative speech (voice) the interactive computing architecture including local audio devices and remote speech process service.
Fig. 2-4 illustrates to be processed, by local audio devices and remote speech, the flow chart that service combines the example process of the order statement performed for detecting.
Describe in detail
The disclosure relates in general to and a kind of provide or promote and the voice-based mutual speech interface system of user.Described system includes the local device with mike, and described microphones capture comprises the audio frequency of user speech.With keyword, triggering statement can be referred to as or wakes statement up before spoken user order.Audio frequency after triggering statement can be streamed to remote service to carry out speech recognition, and the order performed by audio frequency apparatus can be responded by described service by execution function or offer.
Communication with remote service can introduce the response waiting time, and in most of the cases the described response waiting time can minimize in acceptable limits.But, some verbal orders may call for the less waiting time.For example, about the verbal order of certain form of media presentation, such as " stopping ", " time-out ", " termination " etc., it may be necessary to perform with amount of less appreciable waiting time.
According to various embodiments, some orders statement, is referred to herein as local command or local command statement, local device but not remote service detects or detect at local device but not remote service.More specifically, it is that user view is to form order that local device is configured to detect triggering or warning statement, described triggering or warning statement instruction voice subsequently.When triggering statement being detected, local device starts the communication session with remote service and starts the audio streaming received is transferred to described service.As response, the remote service audio frequency to receiving performs speech recognition and attempts to identify user view based on the voice of institute's identification.In response to the user view of institute's identification, remote service can perform the function of correspondence.In some cases, can be combined with local device and perform function.Such as, remote service can send order to local device, and instruction local device should perform described order to perform the function of correspondence.
With the active synchronization of remote service, local device monitoring or analysis audio frequency to detect the generation of local command statement after triggering is stated.When local command in audio frequency being detected is stated, local device realizes the function of correspondence immediately.Additionally, stop or cancelling other actions performed by remote service to avoid the palikinesia about unique user language.The action performed by remote service can be stopped in the following manner: clearly notice remote service described in language this locality implement, by terminating or cancel communication session and/or by abandoning any order specified by remote service in response to long-distance user's speech recognition.
Fig. 1 illustrates the example of voice interaction system 100.System 100 can include or the available local audio frequency apparatus 102 based on speech, and described audio frequency apparatus 102 can be located in environment 104 (such as family) and can be used for interacting with user 106.Voice interaction system 100 may also include or utilize long-range network voice command service 108, institute's speech commands service 108 is configured to receive the voice in audio frequency, identification audio frequency and the voice in response to institute's identification performs function, is referred to herein as the function that service identifies.The function that service identifies can be serviced 108 by voice command and realize independent of audio frequency apparatus, and/or can by providing order to realize locally executing to audio frequency apparatus 102.
In certain embodiments, user can be through voice with the mutual Main Patterns of audio frequency apparatus 102.Such as, the verbal order that audio frequency apparatus 102 can receive from user 106 is stated and may be in response to described order to provide service.User can say and predefined wake or trigger statement (such as, " waking up ") up, described in wake or trigger statement up after can be that (such as, " I wants to go to the cinema for order or instruction.Could you tell me what film local cinema is playing.”).The service provided can include execution action or activity, presents media, acquisition and/or offer information, provide information by the voice generated by audio frequency apparatus 102 or synthesize, represent user 106 and start service based on the Internet etc..
Local audio devices 102 and voice command service 108 are configured to be bonded to each other work and state with the order received from user 106 and it is responded.The local command that order statement can include being carried out detecting and implementing independent of voice command service 108 by local device 102 is stated.Order statement may also include by remote speech command service 108 or is combined, with remote speech command service 108, the order making an explanation and implementing.
Audio frequency apparatus 102 can have one or more mike 110 and one or more audio tweeter or changer 112, in order to promotes mutual with the audio frequency of user 106.Mike 110 produces microphone signal, is also referred to as input audio signal, and it represents the audio frequency from environment 104, including the sound sent by user 106 or statement.
In some cases, mike 110 can include microphone array, and described microphone array and audio signal beam form the input audio signal that technology is utilized in conjunction with concentrating on optional direction with generation.Similarly, multiple directions mike 110 can be used to produce the audio signal corresponding in multiple usable directions.
Audio frequency apparatus 102 includes operation logic, and described operation logic can include processor 114 and memorizer 116 in many cases.Processor 114 can include multiple processor and/or have the processor of multiple core.Processor 114 also can contain or comprise the digital signal processor for processing audio signal.
Memorizer 116 can comprise the application in form of computer-executable instructions and program, and processor 114 performs computer executable instructions to perform to realize operation or the action of the desired function (including the following function being expressly recited) of audio frequency apparatus 102.Memorizer 116 can be a class computer-readable recording medium and can include volatibility and nonvolatile memory.Therefore, memorizer 116 may include but be not limited to RAM, ROM, EEPROM, flash memory or other memory technologies.
Audio frequency apparatus 102 can include can performing to provide service and multiple application of function, service and/or function 118 by processor 114, is hereafter referred to collectively as functional unit 118.Application and other functional units 118 can include media playback services, such as music player.By application and other functional units 118 perform or provide other service or operation can include (as an example) request and consumer entertainment (such as, game, search and play music, film or other guide etc.), personal management (such as, schedule formulation, notepaper making etc.), online shopping, financial transaction, data base querying, interpersonal Speech Communication etc..
In some embodiments, functional unit 118 can be pre-installed on audio frequency apparatus 102, and can realize the Core Feature of audio frequency apparatus 102.In other embodiments, application or other functional units 118 one or more can undertake installing or otherwise installing by user 106 user 106 has initialized audio frequency apparatus 102 after, and function that is other or that customize can be realized according to the expectation of user 106.
Processor 114 can be configured by Audio Processing function or assembly 120 input audio signal that reason mike 110 generates and/or the output audio signal providing speaker 112.For example, audio processing components 120 can realize acoustic echo and eliminate the audio echo being coupled generation with minimizing by the acoustics between mike 110 with speaker 112.Audio processing components 120 also can realize noise decrease with reduce noise in reception audio signal, such as input audio signal but not the element of user speech.In certain embodiments, audio processing components 120 can include one or more audio signal beam shaper, and described audio signal beam shaper concentrates on audio signal on the direction having detected that user speech in response to multiple mikes 110 to generate.
Audio frequency apparatus 102 can also be configured to implement one or more statement detector or speech recognition assembly 122, and the one or more statement detector or speech recognition assembly 122 can be used for the triggering statement in the voice that detection is caught by mike 110.It is that user view is interpreted the word of order, phrase or other language that term " trigger statement " to be used for indicating for signaling audio frequency apparatus 102 user speech subsequently in this article.
One or more speech recognition assembly 122 can also be used to detect the order in the voice caught by mike 110 or order statement.Term " order statement " is in this article for indicating corresponding to by by audio frequency apparatus 102 or the function performed by the addressable service of audio frequency apparatus 102 or other equipment (such as voice command service 108) or the word being associated with described function, phrase or other language.Such as, word " stopping ", " time-out ", " termination " can be used as order statement." stopping " and " time-out " order statement may indicate that should interrupt media playback activity." termination " order statement may indicate that current interpersonal communication should terminate.It is used as other order statements corresponding to difference in functionality.Order statement can include conversational instruction, such as " finds neighbouring Italian restaurant ".
Order statement can include the local command statement that will be made an explanation when being independent of voice command service 108 by audio frequency apparatus 102.In general, local command statement is relatively short statement, and such as word or short phrase, it can easily be detected by audio frequency apparatus 102.Local command statement may correspond to the functions of the equipments of expectation relatively low response waiting time, and such as media control or media playback controls function.The service of voice command service 108 can be used for other order statements of acceptable bigger response waiting time.The order implemented by voice command service is expressed in and will be referred to as remote command statement herein.
In some cases, speech recognition assembly 122 can use automatic speech recognizing (ASR) technology to realize.Such as, large vocabulary speech recognition technology can be used to carry out keyword search, and the output appearance to find keyword of speech recognition can be monitored.For example, speech recognition can use hidden Markov model and gauss hybrid models to carry out identification speech and input and provide the continuous word stream inputted corresponding to described speech.Subsequently, word stream can be monitored to detect one or more word specified or statement.
As an alternative, speech recognition assembly 122 can be realized by one or more keyword direction finders.Keyword direction finder is functional unit or algorithm, and its assessment audio signal is to detect the existence of one or more predefined words or statement in audio signal.In general, keyword direction finder uses the ASR technology simplified detect the word of certain words or limited quantity rather than attempt identification large vocabulary.Such as, when specified word being detected in voice signal, keyword direction finder can provide notice rather than provide text or the output based on word.Various words can be compared by the keyword direction finder using these technology based on hidden Markov model (HMM), and word list is shown as state sequence by described hidden Markov model.In general, by by discourse model and keyword model and be compared to language is analyzed with background model.The model of language and keyword model are compared and draws the score representing language corresponding to the probability of keyword.The model of language and background model are compared and draws the score representing language corresponding to the probability of the generic word except keyword.Can compare to determine whether to have said keyword by two scores.
Audio frequency apparatus 102 may also include and is referred to herein as controller or controls the control function 124 of logic, and described control function 124 is configured to other assemblies with audio frequency apparatus 102 and interacts to realize the logic function of audio frequency apparatus 102.
Control executable instruction, program and/or or program module that logic 124, audio processing components 120, speech recognition assembly 122 and functional unit 118 can include being stored in memorizer 116 and being performed by processor 114.
Voice command service 108 can be the part that network-accessible calculates platform in some cases, and described network-accessible calculates platform to be undertaken safeguarding and may have access to by network 126 (such as the Internet).Such network-accessible calculates platform and term such as " on-demand computing ", " namely software service (SaaS) ", " platform calculating ", " network-accessible platform ", " cloud service ", " data center " etc. can be used to refer to.
Audio frequency apparatus 102 and/or voice command service 108 can pass through cable technology (such as, electric wire, USB (universal serial bus) (USB), fiber optic cables etc.), wireless technology (such as, radio frequency (RF), honeycomb, mobile telephone network, satellite, bluetooth etc.) or other interconnection techniques be communicatively coupled to network 126.Network 126 represents any kind of communication network, including data and/or voice network, and can use non-wireless infrastructures (such as, coaxial cable, fiber optic cables etc.), radio infrastructure (such as, RF, honeycomb, microwave, satellite,Deng) and/or other interconnection techniques realize.
Although audio frequency apparatus 102 is described herein as speech control or voice-based interface equipment, but the techniques described herein realize in combinations with various types of equipment, such as telecommunication apparatus and assembly, hand free device, amusement equipment, media-playback device etc..
Voice command service 108 is commonly provided for following function: receives the voice from the voice in the audio stream of audio frequency apparatus 102, identification audio stream, according to institute's identification and determines user view and in response to user view execution action or service.The action provided performs in combinations with audio frequency apparatus 102 in some cases, and voice command service 108 can indicate the response of the order performed by audio frequency apparatus 102 to audio frequency apparatus 102 return in these cases.
Voice command service 108 includes operation logic, and described operation logic can include one or more server, computer and or processor 128 in many cases.Voice command service 108 also can have the memorizer 130 comprising application and program in instruction type, and processor 128 performs instruction to perform to realize operation or the action of the desired function (including function explicitly described herein) of voice command service.Memorizer 130 can be a class computer-readable storage medium and can include volatibility and nonvolatile memory.Therefore, memorizer 130 may include but be not limited to RAM, ROM, EEPROM, flash memory or other memory technologies.
Be not explicitly depicted other logically and physically among assembly, voice command service 108 can include speech recognition assembly 132.Speech recognition assembly 132 can include automatic speech recognizing (ASR) function of the human speech in identification audio signal.
Voice command service 108 may also include the voice based on institute's identification and determines the natural language understanding assembly (NLU) 134 of user view.
Voice command service 108 may also include determining that command interpreter and the action allotter 136 (hereinafter referred to as command interpreter 136) of the function corresponding to user view or order.In some cases, order may correspond to the function that will be performed at least in part by audio frequency apparatus 102, and command interpreter 136 can provide instruction for realizing the response of the order of this type of function to audio frequency apparatus 102 in these cases.The order that can be performed in response to the instruction from command interpreter 136 by audio frequency apparatus or the example of function can include playing music or other media, increase/reductions speaker 112 volume, generated the certain form of communication etc. that the user of audible voice, startup and similar devices carries out by speaker 112.
It should be noted that voice command service 108 may also be responsive to perform to relate to the function of unshowned entity or equipment in Fig. 1 in the voice gone out from the audible recognition received.Such as, voice command service 108 can interact with other network services to represent user 106 obtaining information or service.Additionally, voice command service 108 itself can have the various elements and function that can the voice that user 106 sends be responded.
In operation, the audio frequency of the voice comprising user 106 is caught or received to the mike 110 of audio frequency apparatus 102.The audio frequency that audio frequency is undertaken processing and processing by audio processing components 120 is received by speech recognition assembly 122.Speech recognition assembly 122 analyzes described audio frequency to detect the appearance triggering statement in the voice that audio frequency comprises.When triggering statement being detected, controller 124 starts to send the audio frequency that receives together with the request that voice command services 108 or stream transmission is to voice command service 108, with identification with explain user speech and start the function corresponding to any explained intention.
Tong Bu with sending the audio to voice command service 108, speech recognition assembly 122 continues to analyze the audio frequency received to detect the appearance of local command statement in user speech.When detecting that local command is stated, controller 124 starts or performs the functions of the equipments stated corresponding to described local command.Such as, stating " stopping " in response to local command, controller 124 can start the function stopping media playback.When starting or perform function, controller 124 can interact with one or more in functional unit 118.
Meanwhile, in response to receiving audio frequency, voice command service 108 synchronizes to be analyzed with identification voice to described audio frequency, it is determined that user view, and determines the function that the service that will realize identifies in response to user view.But, after this locality detection and implementing local command statement, audio frequency apparatus 102 can take action to cancel, abolish and may finally be serviced the function of any service identification of 108 startups by voice command or make it invalid.Such as, audio frequency apparatus 102 can be cancelled message and/or service 108 Streaming audio to voice command and cancel its previous Request by stopping by sending to voice command service 108.As another example, audio frequency apparatus can ignore or abandon the order of any response or the service identification received in response to early stage request from voice command service 108.In some cases, audio frequency apparatus can notify that voice command service 108 is stated in the action locally executed in response to local command, and voice command service 108 can revise its behavior subsequently based on this information.Such as, voice command service 108 can abandon the action that is otherwise likely to perform in response to the voice of identification in received audio frequency.
Fig. 2 illustrates illustrative methods 200, and described method 200 can be combined with voice command service 108 by audio frequency apparatus 102 and perform so that discriminating user voice it is responded.By method 200 described in the context in the system 100 of Fig. 1, although method 200 can also perform in other environments and can realize in a different manner.
Action in the left side of Fig. 2 is to perform at local audio devices 102 place or be executed by.Action on the right side of Fig. 2 is to perform at remote speech command service 108 place or be executed by.
Action 202 includes receiving by mike 110 or the audio signal caught in conjunction with described mike 110.Audio signal comprises or represents the audio frequency from environment 104, and can comprise user speech.Audio signal can be analog electrical signal or can include digital signal, such as digital audio stream.
Action 204 includes detecting the appearance triggering statement in the audio frequency received and/or user speech.This action can be performed by speech recognition assembly 122 as above, and described speech recognition assembly 122 can include keyword direction finder in some embodiments.If being not detected by triggering statement, then palikinesia 204 is so that monitoring triggers the appearance of statement continuously.All the other actions shown in Fig. 2 are in response to and detect what triggering statement performed.
If triggering statement being detected in action 204, so execution action 206, send, to voice command service 108, the audio frequency that receives including with rear and voice command is serviced the service request 208 of 108, in order to voice in identification audio frequency and realize the function of the voice corresponding to institute's identification.The function started by this way by voice command service 108 is referred to herein as the function that service identifies, and can be combined with audio frequency apparatus 102 in some cases and perform.Such as, can pass through to send commands to start function to audio frequency apparatus 102.
Described transmission 206 may be included in after triggering statement being detected, would indicate that or voice command service 108 is transmitted or be otherwise transferred to the digital audio stream 210 that comprises the audio frequency received from mike 110 as a stream.In certain embodiments, action 206 can include opening or start the communication session between audio frequency apparatus 102 and voice command service 108.Specifically, it is possible to use request 208 set up and voice command service 108 communication session, in order to identification voice, understand be intended to and determine the action or function that will perform in response to user speech.Request 208 can be followed by or be attended by streaming audio 210.In some cases, it is provided that multiple parts of the audio frequency received just started can be included in the time said before triggering statement to the audio stream 210 of voice command service 108.
Communication session can be associated with communication or Session ID (ID), and described communication or session id mark service, at audio frequency apparatus 102 and voice command, the communication session set up between 108.Session id can use in the future communications relevant to specific user's language or audio stream or include wherein.In some cases, session id can be generated by audio frequency apparatus 102 and be provided to voice command service 108 in request 208.As an alternative, session id can be serviced 108 generations by voice command and be serviced 108 offers in the confirmation to request 208 by voice command.Term " request (ID) " is in this article for indicating the request with special session ID.The response relevant to same session, request or audio stream servicing 108 from voice command can by term " response (ID) " instruction.
In certain embodiments, each communication session and corresponding session id may correspond to unique user language.Such as, audio frequency apparatus 102 can set up session when triggering statement being detected.Audio frequency apparatus 102 may thereafter continue to part audio streaming being transferred to voice command service 108 as same session, until user spoken utterances terminates.Voice command service 108 can use identical session id to provide response to audio frequency apparatus 102 by session.Response may indicate that the order that will be performed in response to the voice by voice command service 108 identification in the audio frequency 210 received by audio frequency apparatus 102 in some cases.Communication session can stay open until audio frequency apparatus 102 receives the response or until the audio frequency apparatus 102 cancellation request that service 108 from voice command.
In action 212,208 and audio stream 210 are asked in voice command service 108 reception.As response, voice command services 108 execution actions 214: use speech recognition and the natural language understanding assembly 132 and 134 of voice command service 108, voice in the audio frequency that identification receives and determining such as the user view of the phonetic representation by institute's identification.The action 214 performed by command interpreter 136 includes the function identifying and starting service identification to fulfil determined user view.The function that service identifies can be serviced 108 by voice command in some cases and perform independent of audio frequency apparatus 102.In other cases, the recognizable function that will be performed by audio frequency apparatus 102 of voice command service 108, and can send, to audio frequency apparatus 102, the corresponding order performed for audio frequency apparatus 102.
Tong Bu with the action performed by voice command service 108, local audio devices 102 performs other actions to determine whether user says local command statement and the local function performing correspondence in response to any this local command statement said.Specifically, in response to detecting that in action 204 action 218 triggering statement and perform includes analyzing the audio frequency received in action 202 to detect the appearance of the local command statement after triggering statement or immediately after in received speech.This action can be performed by the speech recognition assembly 122 of audio frequency apparatus 102 as above, and described speech recognition assembly 122 can include keyword direction finder in some embodiments.
Detect that local command is stated in response in action 218, perform to start immediately the action 220 of the functions of the equipments being associated with local command statement.Such as, local command statement " stopping " can being associated with the function stopping media playback.
It addition, detect that local command is stated in response in action 218, audio frequency apparatus 102 performs to stop or cancelling the action 222 of the request 208 that voice command services 108.This action can include cancelling or abolish the realization of function that service identifies, the function of described service identification is otherwise likely to be realized in response to received request 208 and the audio frequency 210 enclosed by voice command service 108.
In some implementation, action 222 can include clearly notifying or order to voice command service 108 transmission, and the realization of the function being otherwise likely to any service identification started in response to the voice of institute's identification about any other identification activity and/or cancellation of servicing request 208 is cancelled in request voice command service 108.As an alternative, audio frequency apparatus 102 can notify simply voice command service 108 about in response to local command statement local identification and in any function locally executed, and voice command service 108 can pass through cancel service request 208 or by perform other actions (depending on the circumstances) respond.
In some implementation, voice command service 108 can by identifying the function that the order performed by audio frequency apparatus 102 realizes service identification.In response to receiving the notice that service request 208 will be cancelled, voice command service 108 can be abandoned sending order to audio frequency apparatus 102.As an alternative, voice command service can being allowed to complete it and process and send order to audio frequency apparatus 102, audio frequency apparatus 102 can be ignored described order or abandon performing described order when the time comes.
In some implementations, voice command service can be configured to starting notification audio equipment 102 before the function that service identifies, and can postpone the realization of the function that described service identifies until receive license from audio frequency apparatus 102.In this case, audio frequency apparatus 102 can be configured to when picking out local command statement in this locality and refuse this license.
Kind described above method can use when needing different order waiting time amounts.Such as, waiting that the communication from voice command service can introduce the of a relatively high waiting time, this is probably unacceptable in some cases.This type of communication before realizing function can prevent from repeating or unexpected action.Realize the order statement of local identification immediately and ignore subsequently from the order of voice command service or cancel the request to voice command service subsequently and be more likely to be appropriate for expecting the situation of relatively low latency.
It should be noted that the action 218,220 and 222 of the action of the voice command service 108 shown in Fig. 2 and audio frequency apparatus 102 performs parallel and asynchronously.In some implementations, assume that audio frequency apparatus 102 can relatively quickly detect and implement local command statement, so that it can perform action 222: before the function of the service identification of action 216 has completed or performed, cancel request 208 and the process subsequently of 108 execution will be serviced by voice command.
Fig. 3 illustrates illustrative methods 300, wherein voice command service 108 is to audio frequency apparatus 102 return command, and wherein audio frequency apparatus 102 is configured to ignore described order or abandon performing described order when local command statement is detected by audio frequency apparatus 102 and implemented.Initial actuating is similar with those described above or identical.The action performed by audio frequency apparatus 102 illustrates in left side and the action that performed by voice command service 108 illustrates on right side.
Action 302 includes receiving the audio signal comprising user speech.Action 304 includes the triggering statement analyzing audio signal to detect in user speech.Action subsequently shown in Fig. 3 is in response to and detects what triggering statement performed.
Action 306 includes servicing 108 transmission request 308 and audio frequency 310 to voice command.Action 312 includes servicing 108 places at voice command and receives request 308 and audio frequency 310.Action 314 includes discriminating user voice and the user speech based on institute's identification determines user view.
In response to determined user view, voice command services 108 execution actions 316: send order 318 to audio frequency apparatus 102, and described order 318 is for being performed to realize the function that the service of the user view corresponding to institute's identification identifies by audio frequency apparatus 102.Such as, order can include " stopping " order, and its instruction audio frequency apparatus 102 should stop the playback of music.
The action 320 performed by audio frequency apparatus 102 includes receiving and performing order.Action 320 illustrates in a dotted box, for indicating it to be based on whether audio frequency apparatus 102 detects and implement local command statement and be conditionally executed.Specifically, if audio frequency apparatus 102 has detected that local command is stated, then do not perform action 320.
Tong Bu with the action performed by voice command service 108, audio frequency apparatus 102 performs action 322: analyze the audio frequency that receives with in detection institute receptions user speech trigger state after or the appearance stated of local command immediately after.In response to detecting that local command is stated, perform to start immediately the action 324 of the local device function being associated with local command statement.
It addition, detect that local command is stated in response in action 322, audio frequency apparatus 102 performs to abandon the action 326 of the order 318 that execution receives.More specifically, abandon or ignore any order received from voice command service 108 in response to request 308.Response and order corresponding to request 308 can be identified by the session id being associated with response.
If being not detected by local command statement in action 322, then audio frequency apparatus execution action 320: perform the order 318 received from voice command service 108.
Fig. 4 illustrates illustrative methods 400, and wherein the request that voice command services 108 actively cancelled by audio frequency apparatus 102 after being configured to be in that the statement of locally detected local command.Initial actuating is similar with those described above or identical.The action performed by audio frequency apparatus 102 illustrates in left side and the action that performed by voice command service 108 illustrates on right side.
Action 402 includes receiving the audio signal comprising user speech.Action 404 includes the triggering statement analyzing audio signal to detect in user speech.Action subsequently shown in Fig. 4 is in response to and detects what triggering statement performed.
Action 406 includes servicing 108 transmission request 408 and audio frequency 410 to voice command.Action 412 includes servicing 108 places at voice command and receives request 408 and audio frequency 410.Action 414 includes discriminating user voice and the user speech based on institute's identification determines user view.
Action 416 includes determining whether request 408 is cancelled by audio frequency apparatus 102.For example, audio frequency apparatus 102 can send cancellation message or can terminate present communications session to cancel request.If request is cancelled by audio frequency apparatus 102, then further action is no longer taked in voice command service.If request is not yet cancelled, then execution action 418, described action 418 includes: send order 420 to audio frequency apparatus 102, and described order 420 is for being performed to realize the function that the service of the user view corresponding to institute's identification identifies by audio frequency apparatus 102.
The action 422 performed by audio frequency apparatus 102 includes receiving and performing order.Action 422 illustrates in a dotted box, is used for indicating it to be depending on whether voice command service 108 has sent and received order, then depended on whether audio frequency apparatus 102 has been cancelled request 408 and be conditionally executed.
Tong Bu with the action performed by voice command service 108, audio frequency apparatus 102 performs action 424: analyze the audio frequency that receives with in detection institute receptions user speech trigger state after or the appearance stated of local command immediately after.In response to detecting that local command is stated, perform to start immediately the action 426 of the local device function being associated with local command statement.
Additionally, detect that local command is stated in response in action 424, audio frequency apparatus 102 performs action 428: request voice command service 108 is cancelled request 408 and/or cancels the realization of function of any service identification, and the function of described service identification is otherwise likely to perform in response to the voice by the voice command service 108 institute's identification from the audio frequency that audio frequency apparatus 102 receives.This action can include communicating with voice command service 108, such as by sending cancellation notice or request.
In some cases, the response that can include the realization undetermined making the function that the service undertaken by institute's speech commands service identifies from the communication of voice command service 108 or notice is cancelled.In response to receiving this notice, audio frequency apparatus 102 can respond and can ask to cancel described realization undetermined.As an alternative, audio frequency apparatus 102 can cancel the realization of any function being otherwise likely to perform in response to local command statement being detected, and may indicate that voice command service 108 proceeds the realization of function undetermined.
If being not detected by local command statement in action 424, then audio frequency apparatus 102 performs action 422: perform the order 420 received from voice command service 108.When receiving the order 420 from voice command service, action 422 can occur asynchronously.
The mode that embodiments described above can program realizes, and such as utilizes computer, processor, digital signal processor, analog processor etc..But, in other embodiments, one or more in assembly, function or element use special or special circuit to realize, including analog circuit and/or Digital Logical Circuits.As used herein, term " assembly " is intended to include any hardware of the function for realizing belonging to assembly, software, logic or aforementioned every combination.
Although describing theme with the language specific to architectural feature, but it is to be understood that in claims, the theme of definition is not necessarily limited to described specific features.It practice, specific features discloses as the illustrative form implementing claim.
Clause:
1. the non-transitory computer-readable medium of one or more storage computer executable instructions, described instruction is when executed so that one or more processors perform to include following action:
Receive the audio frequency comprising user speech;
Detect the triggering statement in described user speech;
State in response to described triggering theed detect in described user speech:
The audio streaming received is transferred to remote speech command service;And
Analyzing the audio frequency the received described local command statement triggered after statement to detect in described user speech, the statement of wherein said local command is associated with functions of the equipments;
Described functions of the equipments are started in response to the described described local command statement triggered after stating detected in described user speech;
Receiving the response from described remote speech command service, wherein said response indicates the order that will perform in response to the voice by described remote speech command service identification in described streaming audio;
If the described described local command statement triggered after statement being not detected by described user speech, then perform by the described order of described response instruction;And
If be detected that the described described local command statement triggered after statement in described user speech, then abandon performing by the described order of described response instruction.
2. one or more computer-readable mediums as described in clause 1, wherein said stream transmission is associated with communication identifier and wherein said response indicates described communication identifier.
3. one or more computer-readable mediums as described in clause 1, wherein said functions of the equipments include media control function.
4. one or more computer-readable mediums as described in clause 1, described action also includes in response to detecting that described order statement stops the described stream transmission of received audio frequency.
5. a method, comprising:
Receive the audio frequency comprising user speech;
Detect the triggering statement in described user speech;
State in response to described triggering theed detect in described user speech:
Send the voice in the audio frequency that the audio frequency that receives receives with identification to voice command service and realize the first function of the voice corresponding to institute's identification;And
Analyzing the audio frequency that receives to state with the described local command triggered after statement in detection institute receptions audio frequency, wherein said local command is stated and is associated with the second function;
The described described local command statement triggered after statement in response to detecting in received audio frequency:
Start described second function;And
Cancel the realization of described first function.
6. the method as described in clause 5, the realization wherein cancelling described first function includes the speech commands service of request institute to cancel the realization of described first function.
7. the method as described in clause 5, its communication also including receiving the realization undetermined indicating described first function from institute's speech commands service;
The realization wherein cancelling described first function includes the speech commands service of request institute to cancel the realization undetermined of described first function.
8. the method as described in clause 5, it also includes receiving the order corresponding to described first function from institute's speech commands service, and the realization wherein cancelling described first function includes abandoning the described order that execution receives from institute's speech commands service.
9. the method as described in clause 5, it also includes notice institute described second function of speech commands service and has been turned on.
10. the method as described in clause 5, the realization wherein cancelling described first function includes notifying that institute's speech commands services described second function and has been turned on.
11. the method as described in clause 5, wherein said second function includes media control function.
12. the method as described in clause 5, it also includes:
In response to the described communication session triggering statement and foundation and institute's speech commands service detected in described audio frequency;And
The realization wherein cancelling described first function includes terminating described communication session.
13. the method as described in clause 5, it also includes:
The audio frequency making identifier and receive is associated;
Receiving the response from institute's speech commands service, wherein said response indicates described identifier and the order corresponding to described first function;And
The realization wherein cancelling described first function includes abandoning performing described order.
14. a system, comprising:
One or more speech recognition assemblies, the one or more speech recognition assembly is configured to the user speech in the received audio frequency of identification, detect and trigger statement and the local command statement detecting in described user speech in described user speech;
Controlling logic, described control logic is configured to perform action in response to the described triggering statement in the one or more speech recognition component detection to described user speech, and described action includes:
Described audio frequency is sent with the voice in audio frequency described in identification and the first function realizing the voice corresponding to institute's identification to voice command service;And
State in response to the described local command in the one or more speech recognition component detection to described user speech: (a) identifies that the second function stated corresponding to described local command and (b) cancel the realization of at least one in described first function and described second function.
15. the system as described in clause 14, wherein said one or more speech recognition assemblies include one or more keyword direction finder.
16. the system as described in clause 14, wherein cancel the realization of at least one in described first function and described second function and include the speech commands service of request institute to cancel the realization of described first function.
17. the system as described in clause 14, wherein cancel the realization of at least one in described first function and described second function and include ignoring the order received from institute's speech commands service.
18. the system as described in clause 14, wherein said second function includes media control function.
19. the system as described in clause 14, described action also includes the described transmission stopping described audio frequency in response to the described local command statement detecting in described user speech.
20. the system as described in clause 14, wherein cancel the realization of at least one in described first function and described second function and include notifying that institute's speech commands services described second function and has been turned on.

Claims (15)

1. storing an equipment for computer executable instructions, described instruction is when executed so that one or more processors of described equipment perform to include following action:
Receive the audio frequency comprising user speech;
Detect the triggering statement in described user speech;
State in response to described triggering theed detect in described user speech:
The audio streaming received is transferred to remote speech command service;And
Analyzing the audio frequency the received described local command statement triggered after statement to detect in described user speech, the statement of wherein said local command is associated with functions of the equipments;
Described functions of the equipments are started in response to the described described local command statement triggered after stating detected in described user speech;
Receiving the response from described remote speech command service, wherein said response indicates the order that will perform in response to the voice by described remote speech command service identification in described streaming audio;
If the described described local command statement triggered after statement being not detected by described user speech, then perform by the described order of described response instruction;And
If be detected that the described described local command statement triggered after statement in described user speech, then abandon performing by the described order of described response instruction.
2. equipment as claimed in claim 1, wherein said stream transmission is associated with communication identifier and wherein said response indicates described communication identifier.
3. equipment as claimed in claim 1, wherein said functions of the equipments include media control function.
4. equipment as claimed in claim 1, described action also includes in response to detecting that described order statement stops the described stream transmission of received audio frequency.
5. a method, comprising:
Receive the audio frequency comprising user speech;
Detect the triggering statement in described user speech;
State in response to described triggering theed detect in described user speech:
Send the voice in the audio frequency that the audio frequency that receives receives with identification to voice command service and realize the first function of the voice corresponding to institute's identification;And
Analyzing the audio frequency that receives to state with the described local command triggered after statement in detection institute receptions audio frequency, wherein said local command is stated and is associated with the second function;
The described described local command statement triggered after statement in response to detecting in received audio frequency:
Start described second function;And
Cancel the realization of described first function.
6. method as claimed in claim 5, the realization wherein cancelling described first function includes the speech commands service of request institute to cancel the realization of described first function.
7. method as claimed in claim 5, its communication also including receiving the realization undetermined indicating described first function from institute's speech commands service;
The realization wherein cancelling described first function includes the speech commands service of request institute to cancel the realization described undetermined of described first function.
8. method as claimed in claim 5, it also includes receiving the order corresponding to described first function from institute's speech commands service, and the realization wherein cancelling described first function includes abandoning the described order that execution receives from institute's speech commands service.
9. method as claimed in claim 5, it also includes notice institute described second function of speech commands service and has been turned on.
10. method as claimed in claim 5, it also includes:
The audio frequency making identifier and receive is associated;
Receiving the response from institute's speech commands service, wherein said response indicates described identifier and the order corresponding to described first function;And
The realization wherein cancelling described first function includes abandoning performing described order.
11. a system, comprising:
One or more speech recognition assemblies, the one or more speech recognition assembly is configured to the user speech in the received audio frequency of identification, detect and trigger statement and the local command statement detecting in described user speech in described user speech;
Controlling logic, described control logic is configured to perform action in response to the described triggering statement in the one or more speech recognition component detection to described user speech, and described action includes:
Described audio frequency is sent with the voice in audio frequency described in identification and the first function realizing the voice corresponding to institute's identification to voice command service;And
State in response to the described local command in the one or more speech recognition component detection to described user speech: (a) identifies that the second function stated corresponding to described local command and (b) cancel the realization of at least one in described first function and described second function.
12. system as claimed in claim 11, wherein cancel the realization of at least one described in described first function and described second function and include the speech commands service of request institute to cancel the realization of described first function.
13. system as claimed in claim 11, wherein cancel the realization of at least one described in described first function and described second function and include ignoring the order received from institute's speech commands service.
14. system as claimed in claim 11, described action also includes the described transmission stopping described audio frequency in response to the described local command statement detecting in described user speech.
15. system as claimed in claim 11, wherein cancel the realization of at least one described in described first function and described second function and include notifying that institute's speech commands services described second function and has been turned on.
CN201480050711.8A 2013-09-20 2014-09-09 Local and remote speech processing Pending CN105793923A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201314033302A 2013-09-20 2013-09-20
US14/033,302 2013-09-20
PCT/US2014/054700 WO2015041892A1 (en) 2013-09-20 2014-09-09 Local and remote speech processing

Publications (1)

Publication Number Publication Date
CN105793923A true CN105793923A (en) 2016-07-20

Family

ID=52689281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480050711.8A Pending CN105793923A (en) 2013-09-20 2014-09-09 Local and remote speech processing

Country Status (4)

Country Link
EP (1) EP3047481A4 (en)
JP (1) JP2016531375A (en)
CN (1) CN105793923A (en)
WO (1) WO2015041892A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146618A (en) * 2017-06-16 2017-09-08 北京云知声信息技术有限公司 Method of speech processing and device
CN108320749A (en) * 2018-03-14 2018-07-24 百度在线网络技术(北京)有限公司 Far field voice control device and far field speech control system
JP2019050554A (en) * 2017-07-05 2019-03-28 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Method and apparatus for providing voice service
CN112334976A (en) * 2018-06-27 2021-02-05 谷歌有限责任公司 Presenting responses to a spoken utterance of a user using a local text response mapping

Families Citing this family (146)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
CN104969289B (en) 2013-02-07 2021-05-28 苹果公司 Voice trigger of digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
KR101959188B1 (en) 2013-06-09 2019-07-02 애플 인크. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US9966073B2 (en) 2015-05-27 2018-05-08 Google Llc Context-sensitive dynamic update of voice to text model in a voice-enabled electronic device
US9870196B2 (en) * 2015-05-27 2018-01-16 Google Llc Selective aborting of online processing of voice inputs in a voice-enabled electronic device
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10083697B2 (en) 2015-05-27 2018-09-25 Google Llc Local persisting of data for selectively offline capable voice action in a voice-enabled electronic device
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
DK201770427A1 (en) 2017-05-12 2018-12-20 Apple Inc. Low-latency intelligent automated assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10599377B2 (en) 2017-07-11 2020-03-24 Roku, Inc. Controlling visual indicators in an audio responsive electronic device, and capturing and providing audio using an API, by native and non-native computing devices and services
SG11201901441QA (en) * 2017-08-02 2019-03-28 Panasonic Ip Man Co Ltd Information processing apparatus, speech recognition system, and information processing method
US10455322B2 (en) 2017-08-18 2019-10-22 Roku, Inc. Remote control with presence sensor
US11062710B2 (en) 2017-08-28 2021-07-13 Roku, Inc. Local and cloud speech recognition
US10777197B2 (en) 2017-08-28 2020-09-15 Roku, Inc. Audio responsive device with play/stop and tell me something buttons
US11062702B2 (en) 2017-08-28 2021-07-13 Roku, Inc. Media system with multiple digital assistants
US10515637B1 (en) 2017-09-19 2019-12-24 Amazon Technologies, Inc. Dynamic speech processing
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10713007B2 (en) * 2017-12-12 2020-07-14 Amazon Technologies, Inc. Architecture for a hub configured to control a second device while a connection to a remote system is unavailable
CN111629658B (en) * 2017-12-22 2023-09-15 瑞思迈传感器技术有限公司 Apparatus, system, and method for motion sensing
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US11145298B2 (en) 2018-02-13 2021-10-12 Roku, Inc. Trigger word detection with multiple digital assistants
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10984799B2 (en) * 2018-03-23 2021-04-20 Amazon Technologies, Inc. Hybrid speech interface device
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US11373645B1 (en) * 2018-06-18 2022-06-28 Amazon Technologies, Inc. Updating personalized data on a speech interface device
JP7000268B2 (en) 2018-07-18 2022-01-19 株式会社東芝 Information processing equipment, information processing methods, and programs
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
WO2020096218A1 (en) * 2018-11-05 2020-05-14 Samsung Electronics Co., Ltd. Electronic device and operation method thereof
US10885912B2 (en) * 2018-11-13 2021-01-05 Motorola Solutions, Inc. Methods and systems for providing a corrected voice command
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
DK201970510A1 (en) 2019-05-31 2021-02-11 Apple Inc Voice identification in digital assistant systems
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11227599B2 (en) 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
JP7451033B2 (en) 2020-03-06 2024-03-18 アルパイン株式会社 data processing system
US11043220B1 (en) 2020-05-11 2021-06-22 Apple Inc. Digital assistant hardware abstraction
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
US20230013916A1 (en) * 2021-07-15 2023-01-19 Arris Enterprises Llc Command services manager for secure sharing of commands to registered agents

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652561A (en) * 2004-02-03 2005-08-10 三星电子株式会社 Call processing system and method in a voice and data integrated switching system
CN1728750A (en) * 2004-07-27 2006-02-01 邓里文 Method of packet voice communication
US20060109783A1 (en) * 2002-08-16 2006-05-25 Carl Schoeneberger High availability VoIP subsystem
CN1947392A (en) * 2004-02-23 2007-04-11 诺基亚公司 Methods, apparatus and computer program products for dispatching and prioritizing communication of generic-recipient messages to recipients
US20070258418A1 (en) * 2006-05-03 2007-11-08 Sprint Spectrum L.P. Method and system for controlling streaming of media to wireless communication devices
CN101246687A (en) * 2008-03-20 2008-08-20 北京航空航天大学 Intelligent voice interaction system and method thereof
US20080240370A1 (en) * 2007-04-02 2008-10-02 Microsoft Corporation Testing acoustic echo cancellation and interference in VoIP telephones
US20120179469A1 (en) * 2011-01-07 2012-07-12 Nuance Communication, Inc. Configurable speech recognition system using multiple recognizers
CN102792294A (en) * 2009-11-10 2012-11-21 声钰科技 System and method for hybrid processing in a natural language voice service environment
JP2013064777A (en) * 2011-09-15 2013-04-11 Ntt Docomo Inc Terminal device, voice recognition program, voice recognition method and voice recognition system

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58208799A (en) * 1982-05-28 1983-12-05 トヨタ自動車株式会社 Voice recognition system for vehicle
EP1088299A2 (en) * 1999-03-26 2001-04-04 Scansoft, Inc. Client-server speech recognition
JP2001005492A (en) * 1999-06-21 2001-01-12 Matsushita Electric Ind Co Ltd Voice recognizing method and voice recognition device
JP4483428B2 (en) * 2004-06-25 2010-06-16 日本電気株式会社 Speech recognition / synthesis system, synchronization control method, synchronization control program, and synchronization control apparatus
JP5380777B2 (en) * 2007-02-21 2014-01-08 ヤマハ株式会社 Audio conferencing equipment
JP4925906B2 (en) * 2007-04-26 2012-05-09 株式会社日立製作所 Control device, information providing method, and information providing program
US8364481B2 (en) * 2008-07-02 2013-01-29 Google Inc. Speech recognition with parallel recognition tasks
US8019608B2 (en) * 2008-08-29 2011-09-13 Multimodal Technologies, Inc. Distributed speech recognition using one way communication
US8676904B2 (en) * 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
JP5244663B2 (en) * 2009-03-18 2013-07-24 Kddi株式会社 Speech recognition processing method and system for inputting text by speech
US20130085753A1 (en) * 2011-09-30 2013-04-04 Google Inc. Hybrid Client/Server Speech Recognition In A Mobile Device
US9620122B2 (en) * 2011-12-08 2017-04-11 Lenovo (Singapore) Pte. Ltd Hybrid speech recognition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060109783A1 (en) * 2002-08-16 2006-05-25 Carl Schoeneberger High availability VoIP subsystem
CN1652561A (en) * 2004-02-03 2005-08-10 三星电子株式会社 Call processing system and method in a voice and data integrated switching system
CN1947392A (en) * 2004-02-23 2007-04-11 诺基亚公司 Methods, apparatus and computer program products for dispatching and prioritizing communication of generic-recipient messages to recipients
CN1728750A (en) * 2004-07-27 2006-02-01 邓里文 Method of packet voice communication
US20070258418A1 (en) * 2006-05-03 2007-11-08 Sprint Spectrum L.P. Method and system for controlling streaming of media to wireless communication devices
US20080240370A1 (en) * 2007-04-02 2008-10-02 Microsoft Corporation Testing acoustic echo cancellation and interference in VoIP telephones
CN101246687A (en) * 2008-03-20 2008-08-20 北京航空航天大学 Intelligent voice interaction system and method thereof
CN102792294A (en) * 2009-11-10 2012-11-21 声钰科技 System and method for hybrid processing in a natural language voice service environment
US20120179469A1 (en) * 2011-01-07 2012-07-12 Nuance Communication, Inc. Configurable speech recognition system using multiple recognizers
JP2013064777A (en) * 2011-09-15 2013-04-11 Ntt Docomo Inc Terminal device, voice recognition program, voice recognition method and voice recognition system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146618A (en) * 2017-06-16 2017-09-08 北京云知声信息技术有限公司 Method of speech processing and device
JP2019050554A (en) * 2017-07-05 2019-03-28 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Method and apparatus for providing voice service
CN108320749A (en) * 2018-03-14 2018-07-24 百度在线网络技术(北京)有限公司 Far field voice control device and far field speech control system
CN112334976A (en) * 2018-06-27 2021-02-05 谷歌有限责任公司 Presenting responses to a spoken utterance of a user using a local text response mapping

Also Published As

Publication number Publication date
EP3047481A4 (en) 2017-03-01
WO2015041892A1 (en) 2015-03-26
EP3047481A1 (en) 2016-07-27
JP2016531375A (en) 2016-10-06

Similar Documents

Publication Publication Date Title
CN105793923A (en) Local and remote speech processing
US11922095B2 (en) Device selection for providing a response
US11875820B1 (en) Context driven device arbitration
US9672812B1 (en) Qualifying trigger expressions in speech-based systems
JP6314219B2 (en) Detection of self-generated wake expressions
US11138977B1 (en) Determining device groups
US9293134B1 (en) Source-specific speech interactions
WO2019046026A1 (en) Context-based device arbitration
CN112201246B (en) Intelligent control method and device based on voice, electronic equipment and storage medium
US9799329B1 (en) Removing recurring environmental sounds
US9792901B1 (en) Multiple-source speech dialog input
US11862153B1 (en) System for recognizing and responding to environmental noises
KR20190096308A (en) electronic device
JP2023553867A (en) User utterance profile management
KR20210042523A (en) An electronic apparatus and Method for controlling the electronic apparatus thereof
US10923122B1 (en) Pausing automatic speech recognition
CN114999496A (en) Audio transmission method, control equipment and terminal equipment
US20240079007A1 (en) System and method for detecting a wakeup command for a voice assistant
US20220261218A1 (en) Electronic device including speaker and microphone and method for operating the same
KR20220118109A (en) Electronic device including speker and michrophone and method for thereof
CN117795597A (en) Joint acoustic echo cancellation, speech enhancement and voice separation for automatic speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160720

WD01 Invention patent application deemed withdrawn after publication