WO2019234392A1 - Appareil et procédé - Google Patents

Appareil et procédé Download PDF

Info

Publication number
WO2019234392A1
WO2019234392A1 PCT/GB2019/051477 GB2019051477W WO2019234392A1 WO 2019234392 A1 WO2019234392 A1 WO 2019234392A1 GB 2019051477 W GB2019051477 W GB 2019051477W WO 2019234392 A1 WO2019234392 A1 WO 2019234392A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
voice command
automated assistant
perform
communicating
Prior art date
Application number
PCT/GB2019/051477
Other languages
English (en)
Inventor
Martin Harrison
Original Assignee
Pure International Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pure International Limited filed Critical Pure International Limited
Publication of WO2019234392A1 publication Critical patent/WO2019234392A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to an apparatus for communicating a voice command to perform an operation.
  • the present invention relates to an apparatus for communicating a voice command to an automated assistant for processing the voice command to perform the operation.
  • Apparatus are known for communicating voice commands to automated assistants stored on computing equipment.
  • An automated assistant may process the voice command and send a response to the apparatus to perform a function or operation associated with the voice command.
  • the function may include communicating information to a user through a sound output, presenting information on a screen of the apparatus, or it may include configuration of the apparatus to perform an operation at a later date. Examples of voice commands include “What is the time?”, “What is the nearest restaurant?”,“What is the weather?”, “Set a 9am alarm”, and“Play some music”.
  • the automated system retrieves the required information or performs the necessary information, often by utilising computing equipment in communication with the automated assistant to do so.
  • a known apparatus may have a microphone for receiving a voice command, and a processor for processing and communicating the voice command to an automated assistant that may be at a remote server.
  • such an apparatus is configured to continuously process audio received at the microphone and process the audio to identify a‘wake command’, such as “Wake up” which signals to the processor that a voice command is about to be spoken.
  • the processor receives the voice command and communicates the voice command to the automated assistant in an audio format, e.g. as a digital audio file.
  • the automated assistant may process the voice command through the use of a speech recognition engine and perform an operation required by the voice command.
  • Known automated assistants may output an audible phrase at the device to confirm the voice command has been processed and often repeat a portion of the voice command. For example, if the voice command was“Set my alarm for tomorrow at 7am”, the automated assistant will set the alarm and cause the apparatus to output, e.g.“OK, I have set your alarm for 7am tomorrow”.
  • Automated assistants may also perform more complicated tasks through control of other devices which may be in communication with the automated assistant.
  • an automated assistant may be able to communicate with an online service, such as an online taxi service.
  • Automated assistants can be used to access such online services but require the voice command to be spoken in a specific way to summon the online service.
  • a user may be required to utter the voice command“Ask Online Taxi Service to order a car to take me to work and ask that the taxi come to my house to pick me up”. If the voice command is not spoken in the correct format, then the automated assistant will return an error message to the user.
  • Such operations can be cumbersome for a user to learn and often a user is not aware that such operations are even possible.
  • Apparatus suitable for use with automated assistants include electronic devices and portable electronic devices.
  • handheld computers, cellular telephones and hybrid devices such as smart speakers.
  • a smart speaker may have a processor, memory, input devices, e.g. buttons, touchscreens, and microphones, output devices, e.g. displays and a speaker, and a network interface to permit communication with an automated assistant over a communications network, e.g. the Internet.
  • a smart speaker may include an automated assistant module stored in its memory that is executed the processor and permits the smart speaker to interact with the automated assistant to perform operations based on voice commands. The smart speaker can thus perform relatively smart operations through interaction with the automated assistant.
  • an apparatus for communicating a preset voice command including:
  • a network interface for communication with computing equipment over a communications network
  • a non-transitory computer readable storage medium storing: a preset voice command
  • the apparatus may include the preset device for operation by a user to perform the preset operation at the apparatus, and/or the apparatus may communicate with an electronic device which includes the preset device for the user to operate, wherein the electronic device is configured to communicate to the apparatus that the preset device has been operated to perform the preset operation at the apparatus.
  • the computer executable instructions may include, in response to the preset device being operated, creating or determining identifier information associated with the preset voice command, and communicating said identifier information to the automated assistant.
  • the computer executable instructions may include, in response to the preset device being operated, receiving preset variable information from one or more input devices and communicating the preset variable information to the automated assistant for use in performing the preset voice command.
  • the preset variable information may be added to the preset voice command before the preset voice command is communicated to the automated assistant or processed by the automated assistant.
  • the computer executable instructions may include:
  • the function includes modifying the response and the processor processes the modified response in order to perform the preset operation.
  • Optionally modifying the response includes inhibiting a sound output from being outputted and / or inhibiting a visual output from being outputted.
  • the apparatus includes:
  • non-transitory computer readable medium includes a plurality of preset voice commands
  • the computer executable instructions include, in response to one of the plurality of preset devices being operated, communicating a corresponding one of the plurality of preset voice commands over the network interface to the automated assistant to perform the corresponding preset operation.
  • the preset voice command and/or the preset variable information includes a text string.
  • the computer executable instructions include instructions for speech synthesising the text string and communicating the speech synthesised text string to the automated assistant.
  • the instructions for speech synthesising the text string include communicating the text string to the computing equipment for processing the text string, and receiving the speech synthesised text string therefrom before communicating the speech synthesised text string to the automated assistant.
  • the apparatus may receive a new preset voice command for performing a new preset operation and store the new preset voice command in the non-transitory computer readable medium so that the preset device or the one of the plurality of preset devices is configured to perform the new preset function.
  • a new preset voice command for performing a new preset operation and store the new preset voice command in the non-transitory computer readable medium so that the preset device or the one of the plurality of preset devices is configured to perform the new preset function.
  • the new preset voice command may be received from the or an electronic device, or computing equipment in communication with the apparatus;
  • the apparatus may include a user input device operable to configure the preset device or one of the plurality of preset devices to perform the new preset operation.
  • receiving and storing the new preset voice command on operation of the user input device includes:
  • processing the speech includes recognising words in the speech to provide a text string
  • Optionally receiving and storing the new preset voice command on operation of the user input device includes:
  • one or more or all of the preset device(s) and/or the user input device(s) are:
  • buttons provided on the apparatus operable by a user pressing the required button
  • an apparatus for communicating a voice command including:
  • a network interface for communication with computing equipment over a communications path
  • a non-transitory computer readable storage medium storing: a voice command in the form of a text string for performing the operation, and
  • a method for using an apparatus to communicate a preset voice command wherein the apparatus can communicate with an automated assistant over a network interface, the method including:
  • the preset voice command is for performing a preset operation
  • operating the preset device includes one or more of a button press, interacting with a graphical user interface and / or operating a touchscreen.
  • the method includes:
  • Optionally storing a preset voice command includes storing the preset voice command as a text string.
  • the computer executable instructions include the preset voice command being speech synthesised and the speech synthesised preset voice command being communicated to the automated assistant in order to perform the preset operation.
  • a non-transitory computer readable medium including instructions executable on a processor of an apparatus for communicating a preset voice command to an automated assistant, wherein the instructions include:
  • a preset device in response to a preset device being operated, receiving a preset voice command for performing a preset operation from the apparatus; and communicating a preset voice command over a network interface of the apparatus to the automated assistant for processing the preset voice command to perform the preset operation.
  • synthesising the voice command in response to a voice command device being operated, synthesising the voice command and communicating the synthesised voice command over the network interface to an automated assistant for processing the synthesised voice command to perform the operation.
  • the method includes the apparatus communicating the voice command to the automated assistant to perform an operation at the apparatus
  • the voice command is a preset voice command.
  • the electronic device(s) may be a cellular telephone, tablet and / or a portable electronic device.
  • the apparatus includes a speaker for outputting sound.
  • the apparatus is configured as a portable device or a smart speaker.
  • Figure 1 is a schematic diagram of an illustrative system environment in which an apparatus in accordance with embodiments of the present invention may be used
  • Figure 2 is a perspective view of an apparatus in accordance with embodiments of the present invention
  • FIG. 3 is a schematic diagram showing various constituent elements of an apparatus in accordance with embodiments of the present invention.
  • Figure 4 is a diagram showing a process carried out in accordance with embodiments of the present invention.
  • Figure 5 is a diagram showing a process carried out in accordance with embodiments of the present invention.
  • Figures 6a-6e show screens provided on a display of an apparatus in accordance with embodiments of the present invention
  • Figure 7 is a diagram showing a process carried out in accordance with embodiments of the present invention.
  • FIG 8 is a diagram showing a process carried out in accordance with embodiments of the present invention.
  • an apparatus 10 in accordance with an embodiment of the present invention is shown schematically in an illustrative system environment in which the apparatus 10 may be operated.
  • a user may operate the apparatus 10 to communicate with an automated assistant 12 and / or various services 14 through a communications network 16.
  • the apparatus 10 is shown in schematic form in figure 2.
  • the apparatus 10 is in the form of a smart speaker.
  • the apparatus 10 may have other forms.
  • it may be a portable device, portable computer or cellular telephone.
  • Apparatus 10 may have a housing 19 including a number of components.
  • Apparatus 10 may have a display 20 such as, for example, a liquid crystal display.
  • the display 20 may include a touchscreen for a user to input information or commands to a processor 22 of the apparatus 10.
  • Apparatus 10 includes memory 24.
  • Memory 24 may include a non-transitory computer readable storage medium and may be any physical media that may be accessible to the processor for executing instructions and / or extracting data therefrom.
  • the memory may include random access memory (RAM), and/or any form of non-volatile memory, e.g. read-only memory (“ROM”), electrically erasable ROM (e.g. EEPROM, Flash).
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable ROM
  • the memory 24 includes instructions that are stored in non- volatile memory thereof for execution by the processor 22 to control the various components of the apparatus 10 at a low-level, e.g. hardware control. Such instructions are sometimes referred to as firmware in the art.
  • the memory 24 includes one or more modules of instructions that are executable by the processor to perform a set of tasks.
  • Apparatus 10 may include a speaker 26 for outputting sounds.
  • Apparatus 10 may include a microphone 28 for receiving sound inputs.
  • Sound inputs could include control inputs, e.g. voice commands, or any form of sound that is receivable by the microphone 28 for other uses, e.g. carrying out particular operations at the apparatus based on the received sound input.
  • the memory 24 may include a codec module 30 operatively coupled to the microphone 28 and speaker 26 for coding and / or decoding audio between, for example, analogue and digital formats, compressing and / or decompressing audio.
  • the microphone 28 may receive a voice command as a sound input, i.e. a user’s speech
  • the codec module 30 may encode the voice command into a digital format for transfer to other components of the apparatus 10, e.g. for storage in the memory 24, processing by the processor and / or communicating the same to the automated assistant 12.
  • the apparatus 10 may communicate information to a user through the speaker 26 by the codec module 30 decoding information from a digital format into an analogue format for output by the speaker 26. In this way, the user may interact with and/or operate the apparatus 10 through speech.
  • Apparatus 10 may include a plurality of preset devices 18a, 18b, 18c and 18d which are operable to perform respective preset operations.
  • the preset devices 18a-d are buttons that are operated by a user pressing them.
  • the preset devices 18a-d may include a touchscreen or graphical user interface on / through which an icon or graphic is displayed and a user touches / selects the relevant icon or graphic to operate the respective preset device and perform that preset operation.
  • the preset devices 18a-d may take a number of other forms that permit the user to perform a preset operation.
  • the apparatus 10 may be arranged so that the preset devices 18a-d may be operated by a user as preset configuration devices in a configuration mode to configure the preset devices 18a-d.
  • the processer 22 determines that the associated preset operation should be performed.
  • the processer 22 determines that the preset device 18a is to be operated as a preset configuration device and the processor executes instructions to configure a new preset operation for the preset device 18a as will be explained.
  • the apparatus 10 may be able to communicate with one or more electronic devices 15 which each include preset devices and / or user input devices for the user to operate to perform and / or configure preset operations at the apparatus 10.
  • the electronic devices may include cellular telephones, tablets, or similar computing equipment.
  • the electronic devices may include remote controls, or any devices that have the necessary means to communicate or provide information for communication with the apparatus 10 and/ or configure voice commands for the apparatus 10.
  • the apparatus 10 may include preset devices 18a-d and/or user input devices of its own, and may also communicate with electronic devices 15 to perform / configure preset operations at the apparatus 10.
  • the apparatus 10 may not have its own preset devices 18a-d and/or separate user input devices, and, instead, only communicate with electronic devices 15 that include preset devices 18a-d and/or user input devices.
  • the preset devices 18a-d and/or user input devices when present on electronic devices 15 may have the same or similar form as those provided on the apparatus 10, e.g. physical buttons, a touchscreen or graphical user interface on / through which an icon or graphic is displayed and a user touches / selects the relevant icon or graphic to operate the respective preset device and perform that preset operation.
  • the electronic devices 15 may also include a microphone for receiving voice commands that may be transmitted to the apparatus 10 in order to store a preset voice command.
  • the electronic devices 15 may communicate via a cloud based platform with the apparatus 10.
  • the clould based platform may receive information from one or more of the electronic devices 15 and then communicate to the apparatus 10 that a particular preset device has been operated.
  • the cloud based platform may process the information received from the electronic device(s) before communicating to the apparatus 10 that a preset device 18a-d has been operated.
  • a cloud based platform may have a preset algorithm which is triggered on operation of the preset device 18a-d and communicates information to the apparatus 10 that an associated preset operation should be performed.
  • apparatus 10 may include a number of input and/or output devices. These may be provided as part of the apparatus 10, e.g. in the form of a display or touchscreen, buttons, or lights. These may be provided as devices which may be connected to the rest of the apparatus 10 through connectors provided on the apparatus 10 to permit connection of audio devices, and/or may include connectors for connecting electronic devices, e.g. a cellular telephone, portable electronic device, or other user input interface device which may include buttons, touchpads or any other suitable interfaces for controlling and/or providing inputs to the apparatus 10.
  • electronic devices e.g. a cellular telephone, portable electronic device, or other user input interface device which may include buttons, touchpads or any other suitable interfaces for controlling and/or providing inputs to the apparatus 10.
  • Apparatus 10 includes a network interface 32 for communication with computing equipment 36 and / or electronic devices 15 over the communications network.
  • the network interface 32 may permit communication over a link 34 provided through wired technologies, for example, wires and/or USB etc., or through wireless technologies, for example, radio frequency, cellular, Bluetooth etc.
  • the network interface 32 may include wireless communication devices having one or more antennas to provide local or remote network links. Examples of local network links include Wi-Fi and Bluetooth. Examples of remote network links include cellular telephone bands and data service bands such as 3G, 4G and 5G.
  • the network interface permits data to be transmitted over link 34 between the apparatus 10 and computing equipment 36.
  • the network interface 32 may permit communication with electronic devices 15, for example.
  • the computing equipment 36 may include a remote computer server.
  • the remote computer server may have its own processor, storage memory, input- output devices and network interface over which it may receive and transmit data from or to devices connected to the computing equipment 36.
  • the computing equipment 36 may include the automated assistant 12 in the form of software.
  • the automated assistant 12 may process voice commands received from one or more devices, and output responses to the voice commands for receipt by the one or more devices.
  • the computing equipment 36 may include a plurality of computing resources, e.g. a plurality of servers, connected by the communications network to form a network-accessible computing platform that forms a system of processors, storage, software, data retrieval / transmission, which can be accessed through a network such as the Internet.
  • the automated assistant 12 may interact with and / or be operated through such a platform.
  • platforms are known as“cloud services”,“data centres” and the like.
  • the automated assistant 12 includes a set of instructions, i.e. software, which is executed by the computing equipment 36.
  • the automated assistant 12 when processing voice commands received from the apparatus 10, may communicate with services 14 provided by other computing equipment over the communications network 16 in order to produce a suitable response to each voice command.
  • the automated assistant 12 may include a speech recognition engine module for extracting information from a voice command provided to the automated assistant 12.
  • the apparatus 10 may also include an automated assistant module 38 stored in its memory and executable by the processor 22.
  • the automated assistant module 38 may be for communicating or otherwise interacting with the automated assistant 12 and/or performing operations at the apparatus 10 as instructed by the automated assistant 12.
  • Memory 24 may include a speech recognition engine module 40 for extracting information, i.e. words, from a voice command provided, for example, in a digital audio format thereto.
  • Memory 24 may include a preset module 42 for interaction with the automated assistant module 38 through an Application Programming Interface (API) of the automated assistant module 38.
  • Memory 24 may include a speech synthesiser module 46 for synthesising a text string into a digital sound format.
  • the preset module 42 includes instructions for interacting with the automated assistant module 38 to perform certain steps in accordance with embodiments of the present invention as will be explained.
  • the memory 24 includes a plurality of preset voice commands 18a’-d’ which are associated with respective preset devices 18a, 18b, 18c, 18d.
  • the preset voice commands 18a’-18d’ may be stored in a variety of formats.
  • a preset voice command may be stored in a coded digital audio format such as MP3, MP4, WAV or the like.
  • the preset voice command may be stored as a text string, e.g. alphanumeric characters. Examples of preset voice commands in text string format may include“What is the time?” or“What is the weather forecast for today?”.
  • the term“preset operation” is used to denote an operation that has been set so that the operation associated with the preset is performed each time an associated preset device is operated, i.e. when a preset device is operated the same preset operation is performed each time.
  • the preset voice commands that are associated with the preset devices are voice commands for performing an operation which is performed each time the preset device is operated.
  • General operation of the apparatus 10 includes the automated assistant module 38 being executed by the processor 22.
  • the automated assistant module 38 receives sound inputs from the microphone 28 and employs its speech recognition engine module to identify from the sound inputs whether a voice command has been spoken by a user. This may include the automated assistant module 38 identifying whether a wake command has been spoken, e.g. “Wake”.
  • the automated assistant module 38 captures the subsequent words spoken after the wake command and communicates them as a voice command to the automated assistant 12.
  • the automated assistant module 38 may create identifier information for the voice command and communicate the same to the automated assistant 12.
  • the automated assistant 12 then processes the voice command before returning a response in the form of instructions to perform an operation at the apparatus 10 together with any associated identifier information.
  • the automated assistant module 38 may send the sound inputs from the microphone 28 to the automated assistant 12 to identify whether a wake command has been spoken. This is because the automated assistant 12 may utilise greater processing power and more complex algorithms for this purpose. The automated assistant 12 then returns the information that a wake command has been spoken to the automated assistant module 38 which captures the associated voice command.
  • the preset devices 18a-d each include a button.
  • a user may press the button for a short duration for the respective preset device to perform a preset operation.
  • a user may press the button for a longer duration to configure the associated preset device 18a so that the preset device 18a may be operated to perform a new preset operation.
  • alternative forms of input devices and configurations thereof may be employed to perform preset operations and / or to configure preset operations.
  • those electronic devices 15 may include a button or other input devices and configurations.
  • this shows a process for which the preset device 18a is operated to perform a preset operation.
  • the preset voice command 18a’ “set an alarm for 7am tomorrow” is utilised.
  • the preset voice command 18a’ may be for a different operation.
  • a user operates preset device 18a in its preset operation mode by pressing its button for a short duration.
  • the preset module 42 receives an input signal generated by the preset device 18a and identifies it as denoting that a preset operation is to be carried out. The preset module 42 then identifies the relevant preset voice command 18a’ for preset device 18a stored in the memory 24 and processes the preset voice command 18a’.
  • the preset voice command 18a’ may include a text string, and processing of the preset voice command 18a’ may include employing the speech synthesiser module 46 to convert the text string into a digital audio format.
  • the preset voice command 18a’ includes a combination of a digital audio portion and a text string portion
  • processing of the preset voice command 18a’ may include employing the speech synthesiser module 46 and determining how the portions should be combined to provide the preset voice command 18a’ in digital audio format.
  • the preset module 42 identifies the relevant preset voice command 18a’ as a text string“set an alarm for 7am tomorrow” stored in the memory 24.
  • the preset module 42 then employs the speech synthesiser module 46 to speech synthesise the text string into a speech synthesised text string, i.e. to produce the text string as synthesised speech in a digital audio format.
  • the preset module 42 transfers the preset voice command 18a’ in its speech synthesised form to the automated assistant module 38 on the apparatus 10.
  • the automated assistant module 38 creates identifier information for the preset voice command 18a’ for communication over the network interface to the automated assistant 12 at the computing equipment 36 together with the preset voice command 18a’.
  • the identifier information created may be the text string“ALARM” and the preset module 42 determines that an associated function for this identifier information is to inhibit any audio sound from being outputted by the speaker 26 when processing any response from the automated assistant 12 to the preset voice command 18a’.
  • the preset module 42 may create an associated function, or select an associated function (e.g. from a set of associated functions in the memory 24) based on words that it recognises (through employment of the speech recognition module 40) from the preset voice command 18a’. For example, if it recognises the word“alarm”, the associated function may be to silence any response to the preset voice command 18a’ that is received from the automated assistant 12. Other examples of preset voice commands for which the associated function is to silence the responses may include“good night”, which is for turning off all devices in the house, or“turn off lights” (because the user can see that the lights have been turned off).
  • the preset module 42 may similarly recognise words from the preset voice command 18a’ at step 200 or step 300 and if they match a pre- determ ined word stored in the memory or otherwise accessible to the preset module 42 (e.g. by communication with remote computing equipment over the Internet), the preset module 42 operates so as to capture the identifier information created by the automated assistant module 38 and to carry out an associated function when a response is received with the same identifier information.
  • pre-determ ined words may include “not” and disturb”, or“goodnight”, for which the associated function is to silence any response received to a preset voice command 18a’ including these words.
  • the automated assistant module 38 communicates the preset voice command 18a’ and identifier information to the automated assistant 12 over the network interface 32.
  • the automated assistant 12 processes the preset voice command 18a’ to produce a response 18a” thereto.
  • the automated assistant 12 then communicates the response 18a” to the apparatus 10 together with the identifier information.
  • the response 18a” includes instructions for the automated assistant module 38 to produce an alarm sound at 7am the next day and includes instructions for the module to output a spoken phrase“Your alarm has been set for 7am tomorrow” from the speaker 26 to indicate the alarm has been set.
  • the instructions may include the automated assistant module 38 requesting an alarm tone or music track stored at computing equipment 36 and subsequently outputting from the speaker 26 the alarm tone or music track received from the computing equipment 36.
  • the preset module 42 receives the response 18a” and identifier information.
  • the preset module 42 determines whether the identifier information matches the identifier information stored at step 300.
  • the preset module 42 stores the identifier information in the memory 24 and determines whether or not an associated function should be performed at the apparatus 10.
  • the associated function may include modifying the response so that at least a portion of the instructions thereof may be carried out differently, if carried out at all. If it does, the preset module 42 then performs the associated function at the apparatus 10. If the identifier information does not match, it carries out the response instructions as they are.
  • the preset module 42 determines a match between the identifier information received (“ALARM”) and the identifier information stored in the memory. As such, the preset module 42 carries out the associated function, i.e. it inhibits the phrase“Your alarm has been set for 7am tomorrow” from being outputted by the speaker 26.
  • the operation described above is carried out in the same way with the exception that the step 100 is performed at the relevant electronic device 15 and includes the electronic device 15 communicating instructions to apparatus 10 that preset device 18a has been operated so its associated preset operation should be performed. The remaining steps are then performed in the same way.
  • Advantages associated with one or more of the described embodiments may be readily appreciated by the skilled person.
  • the user may operate the apparatus 10 to perform an operation using the automated assistant 12 through an input device without having to speak an audible voice command.
  • the apparatus 10 may perform an additional function when a response to a preset voice command is received.
  • This may, for example, permit the response to be performed at the apparatus silently without a sound e.g. a spoken phrase from the automated assistant to confirm the response has been performed, being produced, for example. This may be advantageous for users operating the apparatus at night time when people may be asleep. Similarly, the response may be performed without activating a display of the apparatus or producing any visual output (e.g. illuminating a light or display).
  • this shows a process for which the preset device 18a is operated to configure the preset device 18a to carry out a new, i.e. different, preset operation.
  • the preset device 18a will be configured to request a weather forecast.
  • the user presses a button of the preset device 18a for a long period.
  • the preset module 42 determines that the preset device is being operated in a configuration mode, i.e. as a preset configuration device, to configure a preset operation for the preset device 18a.
  • the preset module 42 causes the speaker 26 to output a sound to prompt the user to speak a preset voice command for performing the required preset operation and operates the microphone 28 to receive it.
  • the preset module 42 may prompt the user in other ways, e.g. by flashing a light or providing a visual prompt on a display of the apparatus.
  • the user speaks the phrase,“What is the weather forecast?”. In embodiments, the user may then release the button once the phrase has been spoken to indicate the end of the voice command.
  • the preset module 42 may automatically determine the end of the voice command by sensing when the sound input to the microphone 28 has ceased.
  • the codec module 30 receives the phrase and converts it into a digital format and the preset module 42 stores it in the memory 24 as the preset voice command 18a’ for the preset device 18a. Any previously stored preset voice command in the memory 24 for the preset device 18a is thereby replaced.
  • the apparatus 10 may receive preset voice commands from electronic devices such as cellular telephones or tablets as part of the process of configuring preset devices 18a-d and store them in memory 24 in embodiments.
  • the electronic devices may be the same as the electronic devices 15 or different therefrom in other embodiments.
  • the user may be able to transfer voice commands from a cellular telephone to the apparatus 10 through input devices of the cellular telephone such as its microphone to receive spoken voice commands, and/or its graphical user interface to receive voice commands in the form of text strings.
  • portions, or, all of, one or more of the steps 1000 to 1300 may be carried out at the electronic device before the preset voice command is stored in memory 24.
  • a user may receive, e.g. download preset voice commands to the apparatus, from other devices or the Internet.
  • the user may be able to download preset voice commands for controlling other devices, or preset voice commands for requesting services connected to the automated assistant 12 without having to learn the required format of the preset voice command for controlling them.
  • Certain preset voice commands may be for operations that may include a large number of preset variables that need to be defined to perform those operations. Configuring a preset device to carry out such operations requires all of those preset variables to be captured.
  • a preset configuration mode includes operating one or more input devices, e.g. selecting a list of options presented on a touchscreen or display, so that a user may enter the necessary preset variables without requiring the user to speak and / or making it relatively straightforward compared to speaking the required voice command to perform such an operation.
  • the preset variables are then stored in the memory 24 in an appropriate format so that they can be combined to provide a preset voice command which is communicated to the automated assistant when the preset device is operated to perform the preset operation.
  • a preset device 18a may be configured to perform a preset operation for which a preset voice command 18a’ is stored in the memory 24 and prior to the preset device 18a being operated to perform the preset operation, the user may be required to operate one or more input devices of the apparatus 10 to provide preset variable information which is communicated to the automated assistant together with the preset voice command 18a’. This may include the preset variable information being added to or combined with the preset voice command 18a’ and the resultant preset voice command 18a’ being communicated to the automated assistant 12 in order to perform the present operation.
  • a preset device 18a may have an associated preset voice command 18a’ that is for carrying out an operation to set an alarm.
  • the preset voice command 18a’ stored in the memory 24 may include some words as text strings, e.g.“Set an alarm for” and the preset module 42, when operating the preset voice command 18a’, operates one or more user input devices so that a user inputs the necessary preset variable information.
  • the preset module 42 may capture the entered preset variable information as text strings.
  • the preset module 42 then completes the preset voice command 18a’ by adding the preset variable information to the preset voice command 18a’ to form a single text string that can be communicated to the automated assistant to complete the preset operation.
  • such a preset voice command may be generated as part of a user input, e.g. a display 20, being operated by the preset module to provide a series of options on the display which can be selected by a user through a touchscreen or graphical user interface, or other input devices such as buttons.
  • a series of screens that may be displayed as part of this operation is shown in figures 6a-d.
  • Figure 6a shows a screen for turning an alarm on.
  • Figure 6b shows a screen for setting the time.
  • Figure 6c shows a screen for setting the alarm sound.
  • Figure 6d shows a screen for setting the repeat setting.
  • Figure 6e shows a screen for setting the volume.
  • the selections or inputs made by the user may generate corresponding text strings that can be combined or otherwise stored in the memory 24 as a preset voice command that is synthesised into speech by the speech synthesiser module 46 when the corresponding preset device is operated to perform the preset operation. It will be appreciated that storing complicated commands as text strings requires less memory space compared to them being stored as digital audio.
  • figure 7 illustrates, for certain embodiments, how a process is carried out at the apparatus 10 and the automated assistant 12 on the computing equipment 36 when a preset device 18a is operated.
  • the rectangles in the process correspond to actions carried out at the apparatus and the circles in the process correspond to actions carried out at the computing equipment.
  • a preset device 18a is operated.
  • the preset module 42 sends the corresponding preset voice command 18a’ from the memory 24 to the automated assistant module 38. This may involve sending a digital recording of a spoken voice command if the preset voice command 18a’ was stored in a digital audio format. Alternatively, the preset module 42 may transfer a text string to the speech synthesiser module 46 to provide the preset voice command 18a’ as a spoken command to the automated assistant module 38 if the preset voice command 18a’ was originally stored as a text string in the memory 24. The automated assistant module 38 then communicates the preset voice command 18a’, in the format received from the preset module 42, to the automated assistant 12 together with identifier information. The preset module 42 determines the identifier information and stores this information in the memory 24.
  • the automated assistant 12 receives and processes the preset voice command 18a’ and identifier information.
  • the automated assistant 12 generates a response thereto and communicates the response and said identifier information to the automated assistant module 38.
  • the preset module 42 receives the response and identifier information from the automated assistant module 38. The preset module 42 determines whether the identifier information matches the identifier information generated at step 1020. If it does, the preset module 42 determines whether this identifier information requires a modified response to be performed. For example, it may determine that the response should be carried out without any sound being outputted from the apparatus 10. If not, then the response is performed without any modification, i.e. in the original form provided by the automated assistant 12. In embodiments, the preset module 42 may obtain or create identifier information in other ways and / or determine, in other ways, whether a modified response should be carried out.
  • figure 8 illustrates, for certain embodiments, operation of the apparatus 10 and its interaction with the automated assistant 12 on the computing equipment when a preset device 18a is operated to set an alarm with reference to other voice commands that may be received by the apparatus 10.
  • the rectangles correspond to actions carried out at the apparatus 10 and the circles correspond to actions carried out at the computing equipment 36.
  • a user operates a preset device 18a to perform a preset operation which sets an alarm.
  • the preset module 42 is configured to operate one or more input devices of the apparatus 10, e.g. by presenting them on a touchscreen of the display 12, and prompts the user to operate the input devices, to enter the necessary preset variable information which includes the variables time; day; radio station; and volume; required to set the alarm.
  • the user enters the following values for the variables - 9am, weekday, Radio 2, Volume 10 - through the input devices.
  • the preset variable information received by the preset module 42 is in the form of a text string and the preset module 42 generates the text string“set an alarm for 9am on weekdays and play radio 2 at volume 10” as the preset voice command 18a’.
  • the speech synthesiser module then synthesises the preset voice command 18a’ to provide it in a digital audio format.
  • the preset module 42 transfers the preset voice command 18a’ in its speech synthesised form to the automated assistant module 38.
  • the automated assistant module 38 communicates the preset voice command 18a’ as a voice command to the automated assistant 12 and generates identifier information which is received or determined by the preset module 42.
  • the automated assistant receives the preset voice command 18a’ and processes it.
  • the automated assistant 12 generates a response and communicates the response to the automated assistant module 38 together with the identifier information.
  • the preset module 42 checks for any responses it has received from the automated assistant module 38 and determines whether or not they match the identifier information at step 4000. For example, an intermediate response may have been received at the apparatus 10 that is unrelated to the preset operation being performed. If there is no match, the response is carried out as it is without any modification thereto or any additional function being performed over and above the response.
  • step 8000a is performed.
  • the preset module 42 recognises that the preset voice command 18a is for an alarm and that an associated function is to be performed which is that the response should therefore be modified so that no audio is outputted. Instead, the preset module 42 may generate confirmation that the alarm has been set in a non- audio format, e.g. by displaying a message or symbol on the display, or by turning on a light emitting device to provide a visual indication to the user. If there is no match at step 7000, step 8000b is performed. The response is performed in unmodified form and the preset module 42 returns to step 7000.
  • the preset module 42 may include a list of identifier information that is used by the automated assistant module 38 / automated assistant 12 when carrying out certain voice commands.
  • the preset module 42 may include an associated set of functions that should be performed for each piece of identifier information. During operation, the preset module 42 may simply determine the identifier information that is associated with a preset voice command, checks it against the list and performs the associated function when a response to the preset voice command 18a’ is received.
  • certain preset devices may have a fixed preset operation and preset voice command that cannot be changed or reconfigured by a user. For example, preset operations such as providing the time or weather forecast may be fixed. Associated preset voice commands may be stored in the apparatus firmware and may only be changed through an update to the firmware. In such embodiments, one or more of the other preset devices may be configured to perform desired preset voice commands set by the user, i.e. the preset devices can be customised. In certain embodiments, the preset module 42 may include a function whereby a user may lock such customised preset voice commands to prevent them being inadvertently replaced by the user.
  • the preset module 42 may be able to recognise a particular user from the user’s voice when the user configures the preset voice command for a preset device and store user identification information as part of the preset voice command. In embodiments, this functionality may be provided by the automated assistant module 38 or automated assistant 12. In this instance, when the preset device performs the preset operation, the preset module 42 may send the preset voice command including the user identification information so that the automated assistant can personalise the response to the user. For example, if the preset operation is to play user’s favourite songs, the automated assistant can retrieve the user’s favourite songs using the user’s automated assistant account information or usage.
  • Another example may be a preset operation for an audiobook so that the particular user’s audio book is resumed when the preset device is operated.
  • Another example may be a preset operation for ascertaining traffic news for a user’s commute and the automated assistant when processing the associated preset voice command,“How’s my commute?” will know the user’s commute information based on the user’s account information.
  • Preset operations carried out by a preset device may be configured so that the preset operations are performed automatically played at a predefined time or based on a particular event.
  • the preset operation“How’s my commute” may be performed automatically when the apparatus 10 is operated at a set time, e.g. to dismiss an alarm or operated in some other way within the time period 6-7am.
  • a preset device may be configured or created by the preset module identifying frequent voice commands, i.e. favourite voice commands, that are spoken by the user to operate the apparatus 10.
  • the preset module may, based on the frequency of use of a particular voice command, set a preset device, or create a preset device, for that voice command to be performed as a preset operation. For example, by creating an icon or graphic on a touchscreen of a display that may be selected to perform the preset operation.
  • the user may be presented with an option to configure or create a preset device with the suggested voice command or it may be automatically configured or created.
  • apparatus 10 may be configured to store a voice command in the form of a text string which may not be a preset voice command and in response to a voice command device being operated to perform an operation, the apparatus 10 may speech synthesise the voice command and communicate the speech synthesised voice command over the network interface to the automated assistant for processing the speech synthesised voice command to perform the operation.
  • voice command device(s) may take a similar or same form / configuration as the preset devices 18a-d as described in relation to other embodiments but it is configured to cause an operation to occur that is not limited to a preset operation.
  • modules or steps have been described as being stored / operated at the apparatus 10 or at the computing equipment 36, it will be appreciated that the modules or steps may be stored / operated differently.
  • any one of the modules may include steps that are processed by the module at the apparatus 10 in combination with computing equipment remote from the apparatus 10 via the Internet to perform those steps.
  • module simply refers to a set of instructions used to carry out one ore more operations. Any of the described operations in relation to a module is simply a set of instructions that, in embodiments, may be utilised in isolation, or in combination with the other operations of that module, or indeed other modules in a manner that will be appreciated by the skilled person.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention concerne un appareil permettant de communiquer une commande vocale prédéfinie, comprenant : un processeur ; une interface réseau pour une communication avec un équipement informatique sur un réseau de communication ; un support de stockage non transitoire lisible par ordinateur stockant : une commande vocale prédéfinie ; et des instructions pouvant être exécutées par ordinateur qui, lorsqu'elles sont exécutées par le processeur, effectuent les étapes suivantes consistant : en réponse à un dispositif prédéfini qui est actionné pour effectuer une opération prédéfinie au niveau de l'appareil, à communiquer la commande vocale prédéfinie sur l'interface réseau à un assistant automatisé sur l'équipement informatique pour traiter la commande vocale prédéfinie pour effectuer l'opération prédéfinie.
PCT/GB2019/051477 2018-06-08 2019-05-30 Appareil et procédé WO2019234392A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1809440.9A GB2574471A (en) 2018-06-08 2018-06-08 An apparatus and method
GB1809440.9 2018-06-08

Publications (1)

Publication Number Publication Date
WO2019234392A1 true WO2019234392A1 (fr) 2019-12-12

Family

ID=62975728

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2019/051477 WO2019234392A1 (fr) 2018-06-08 2019-05-30 Appareil et procédé

Country Status (2)

Country Link
GB (1) GB2574471A (fr)
WO (1) WO2019234392A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11275553B2 (en) * 2018-12-07 2022-03-15 Google Llc Conditionally assigning various automated assistant function(s) to interaction with a peripheral assistant control device
WO2022217590A1 (fr) * 2021-04-16 2022-10-20 深圳传音控股股份有限公司 Procédé d'invite vocale, terminal, et support de stockage

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150134761A1 (en) * 2013-11-14 2015-05-14 Qualcomm Incorporated Mechanisms to route iot notifications according to user activity and/or proximity detection
EP3010015A1 (fr) * 2014-10-14 2016-04-20 Samsung Electronics Co., Ltd. Dispositif électronique et procédé pour interactions parlées
US20180039478A1 (en) * 2016-08-02 2018-02-08 Google Inc. Voice interaction services
US20180091913A1 (en) * 2016-09-27 2018-03-29 Sonos, Inc. Audio Playback Settings for Voice Interaction
US20180096684A1 (en) * 2016-10-05 2018-04-05 Gentex Corporation Vehicle-based remote control system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9576575B2 (en) * 2014-10-27 2017-02-21 Toyota Motor Engineering & Manufacturing North America, Inc. Providing voice recognition shortcuts based on user verbal input

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150134761A1 (en) * 2013-11-14 2015-05-14 Qualcomm Incorporated Mechanisms to route iot notifications according to user activity and/or proximity detection
EP3010015A1 (fr) * 2014-10-14 2016-04-20 Samsung Electronics Co., Ltd. Dispositif électronique et procédé pour interactions parlées
US20180039478A1 (en) * 2016-08-02 2018-02-08 Google Inc. Voice interaction services
US20180091913A1 (en) * 2016-09-27 2018-03-29 Sonos, Inc. Audio Playback Settings for Voice Interaction
US20180096684A1 (en) * 2016-10-05 2018-04-05 Gentex Corporation Vehicle-based remote control system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11275553B2 (en) * 2018-12-07 2022-03-15 Google Llc Conditionally assigning various automated assistant function(s) to interaction with a peripheral assistant control device
US11893309B2 (en) 2018-12-07 2024-02-06 Google Llc Conditionally assigning various automated assistant function(s) to interaction with a peripheral assistant control device
WO2022217590A1 (fr) * 2021-04-16 2022-10-20 深圳传音控股股份有限公司 Procédé d'invite vocale, terminal, et support de stockage

Also Published As

Publication number Publication date
GB2574471A (en) 2019-12-11
GB201809440D0 (en) 2018-07-25

Similar Documents

Publication Publication Date Title
US11984119B2 (en) Electronic device and voice recognition method thereof
US10930277B2 (en) Configuration of voice controlled assistant
US10121465B1 (en) Providing content on multiple devices
CN112543910A (zh) 用于确认用户的意图的电子装置的反馈方法和设备
US12008983B1 (en) User feedback for speech interactions
JP6783339B2 (ja) 音声を処理する方法及び装置
KR20200015267A (ko) 음성 인식을 수행할 전자 장치를 결정하는 전자 장치 및 전자 장치의 동작 방법
US9799329B1 (en) Removing recurring environmental sounds
CN108632653A (zh) 语音管控方法、智能电视及计算机可读存储介质
WO2019234392A1 (fr) Appareil et procédé
US10062386B1 (en) Signaling voice-controlled devices
EP3018848B1 (fr) Appareil d'étiquetage de sorties d'un système de console de mixage audio
US10002611B1 (en) Asynchronous audio messaging
JP2006501788A (ja) コンピュータとのワイヤレス音声通信用システム及び方法
EP3018847B1 (fr) Appareil d'étiquetage d'entrées d'un système de console de mixage audio
CN110634478A (zh) 用于处理语音信号的方法及装置
CN212624795U (zh) 交互系统、语音交互设备及操控设备
CN113314115A (zh) 终端设备的语音处理方法、终端设备及可读存储介质
CN112542171A (zh) 使用语音识别功能执行动作的电子装置及其方法
US11722572B2 (en) Communication platform shifting for voice-enabled device
CN112260938B (zh) 会话消息的处理方法、装置、电子设备及存储介质
KR20160103698A (ko) 외부음향에 따른 맞춤 정보를 표시하는 전자책
CN111028832B (zh) 麦克风静音模式控制方法、装置及存储介质和电子设备
WO2024024219A1 (fr) Dispositif de traitement d'informations, programme, procédé de traitement d'informations et système de traitement d'informations
KR102359163B1 (ko) 전자 장치 및 이의 음성 인식 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19730449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19730449

Country of ref document: EP

Kind code of ref document: A1