US9711135B2

Patents

Full documents

Title

Abstract

Claims

All

Any

Exact

Not

Add AND condition

These CPCs and their children

These exact CPCs

Add AND condition

Exact

Exact Batch

Similar

Substructure

Substructure (SMARTS)

Full documents

Claims only

Add AND condition

Application Numbers

Publication Numbers

Either

Add AND condition

Electronic devices and methods for compensating for environmental noise in text-to-speech applications

Abstract

A method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output includes: measuring environmental noise using a microphone signal; determining sound characteristics of the measured environmental noise; dynamically predicting expected future sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise; receiving a text input at a TTS engine at the device, with the TTS engine configured to convert the text input into a speech output signal; determining text characteristics of the text input at the TTS engine; and at the TTS engine, dynamically adapting the speech output signal based on the determined text characteristics of the text input and the predicted expected future sound characteristics of the environmental noise.

Images (0)

Classifications

G10L13/08

Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

View 6 more classifications

Landscapes

Engineering & Computer Science

Computational Linguistics

US9711135B2

United States

Download PDF

Find Prior Art

Similar

Inventor: Ola Thorn
Current Assignee The listed assignees may be inaccurate. : Sony Corp

2013

2013-12-17

Application filed by Sony Corp, Sony Mobile Communications Inc

2014-08-13

Assigned to SONY CORPORATION

2016-04-27

Assigned to Sony Mobile Communications Inc.

2016-09-22

Publication of US20160275936A1

2017-07-18

Application granted

2017-07-18

Publication of US9711135B2

2019-03-25

Assigned to SONY CORPORATION

Status

Active

2034-11-11

Adjusted expiration

Info: Patent citations (4); Non-patent citations (2); Cited by (115); Legal events; Similar documents; Priority and Related Applications
External links: USPTO; USPTO PatentCenter; USPTO Assignment; Espacenet; Global Dossier; Discuss

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application is a 35 U.S.C. §371 national stage application of PCT International Application No. PCT/JP2013/084395, filed on 17 Dec. 2013, the disclosure and content of which is incorporated by reference herein as if set forth in its entirety.

BACKGROUND

Text-to-speech (TTS) capabilities are being increasingly incorporated into a wide variety of electronic devices. For example, TTS is used to read articles, emails, text messages and the like and to communicate with intelligent personal assistant applications on electronic devices such as computers (e.g., portable computers) and telephones. TTS is also used for navigation with such devices as well as dedicated global positioning system (GPS) devices.

The environment in which TTS is used may be noisy due to any number of factors such background conversation, music, passing trains and so forth. One solution is to detect the ambient or environmental noise and increase the volume of the TTS output in response to increased environmental noise. However, there is an upper limit to volume control which may not be sufficient for certain environmental noise. Prolonged periods with high volume can also be annoying for the user.

SUMMARY

According to a first aspect, embodiments of the invention are directed to a method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output. The method includes: measuring environmental noise using a microphone signal; determining sound characteristics of the measured environmental noise; dynamically predicting expected future sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise; receiving a text input at a TTS engine at the device, with the TTS engine configured to convert the text input into a speech output signal; determining text characteristics of the text input at the TTS engine; and, at the TTS engine, dynamically adapting the speech output signal based on the determined text characteristics of the text input and the predicted expected future sound characteristics of the environmental noise. The speech output signal is dynamically adapted by varying the pace of the speech output and/or varying the pitch of the speech output.

Dynamically predicting expected future sound characteristics of the environmental noise may include characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise based on a time-varying pattern observed in determined sound characteristics of previously occurring environmental noise. In some embodiments, characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise includes characterizing expected differences over time in the volume of the environmental noise based on differences over time in volume of previously occurring environmental noise.

In some embodiments, characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise includes predicting timing of gaps in the environmental noise where the environmental noise volume has less than a threshold amplitude based on differences over time in volume of previously occurring environmental noise. The speech output signal may be dynamically adapted by increasing the pace of the speech output such that the speech output is fully carried out within one of the predicted gaps. The speech output signal may be dynamically adapted by interrupting the speech output at an interrupted location of the speech output signal at the end of one of the predicted gaps and resuming the speech output from the interrupted location of the speech output signal at a subsequent predicted gap.

In some embodiments, characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise includes characterizing expected differences over time in the frequency spectrum of the environmental noise based on differences over time in the frequency spectrum of the previously occurring environmental noise. The speech output signal may be dynamically adapted by varying the pitch of the speech output to compensate for the expected differences over time in the frequency spectrum of the environmental noise.

In some embodiments, dynamically predicting expected future sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise includes generating values of a time-varying control curve representing an expected pattern of the environmental noise. The speech output signal may be dynamically adapted by dynamically adapting the speech output signal using values of the time-varying control curve.

In some embodiments, the method includes operating the device in a learning mode to determine sound characteristics of the measured environmental noise.

The method may include determining a location of the device, and dynamically predicting expected future sound characteristics of the environmental noise may be based on previously known sound characteristics of environmental noise at the location.

In some embodiments, characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise comprises characterizing expected differences over time in the direction of the environmental noise based on differences over time in direction of previously occurring environmental noise. Dynamically adapting the speech output signal may include panning the speech output to compensate for the expected differences over time in the direction of the environmental noise.

According to a second aspect, embodiments of the invention are directed to a method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output. The method includes: measuring environmental noise using a microphone signal; determining identifying characteristics of the measured environmental noise; and determining whether the identifying characteristics of the measured environmental noise match those corresponding to an identified environmental noise source that is stored in an environmental noise library at the device. If it is determined that the identifying characteristics of the measured environmental noise match those corresponding to an identified environmental noise source that is stored in the environmental noise library, the method includes: receiving a text input at a TTS engine at the device, with the TTS engine configured to convert the text input into a speech output signal; and dynamically adapting the speech output signal based on a time-varying pattern of sound characteristics of the identified environmental noise source, with the time-varying pattern being stored in the environmental noise library. The speech output signal is dynamically adapted by varying the pace of the speech output and/or varying the pitch of the speech output. In some embodiments, if it is determined that the identifying characteristics of the measured environmental noise do not match those corresponding to an identified environmental noise source that is stored in the environmental noise library, the method includes operating the device in a learning mode including: learning identifying characteristics of the measured environmental noise; learning a time-varying pattern of sound characteristics of the environmental noise; and storing the identifying characteristics and the time-varying pattern of sound characteristics as an identified environmental noise source in the environmental noise library.

The identified environmental noise source may be audio from a media file. The identified environmental noise source may be one or more persons in a conversation. The identified environmental noise source may be a conversation type.

According to a third aspect, embodiments of the invention are directed to an electronic device configured to compensate for environmental noise in TTS speech output. The device includes: microphone configured to measure environmental noise; an environmental noise analysis circuit; and a TTS processor. The environmental noise analysis circuit is configured to: determine sound characteristics of environmental noise measured by the microphone; and predict future expected sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise. The TTS processor is configured to: receive a text input and covert the text input into a speech output signal; determine text characteristics of the text input; and adapt the speech output signal based on the determined text characteristics of the text input and the predicted expected sound characteristics of the environmental noise. The TTS processor is configured to adapt the speech output signal by varying the pace of speech output and/or varying the pitch of the speech output.

In some embodiments, the environmental noise analysis circuit is configured to characterize a time-varying pattern of the expected future sound characteristics of the environmental noise based on a time-varying pattern observed in determined sound characteristics of previously occurring environmental noise to predict future expected sound characteristics of the environmental noise. In some embodiments, the environmental noise analysis circuit is configured to predict timing of gaps in the environmental noise where the environmental noise volume has less than a threshold amplitude based on differences over time in volume of previously occurring environmental noise. The TTS processor may be configured to adapt the speech output signal by increasing the pace of the speech output such that the speech output is fully carried out within one of the predicted gaps. The TTS processor may be configured to adapt the speech output signal by interrupting the speech output at an interrupted location of the speech output signal at the end of one of the predicted gaps and resuming the speech output from the interrupted location of the speech output signal at a subsequent predicted gap.

The environmental noise analysis circuit may be configured to characterize expected differences over time in the frequency spectrum of the environmental noise based on differences over time in the frequency spectrum of the previously occurring environmental noise and the TTS processor may be configured to vary the pitch of the speech output to compensate for the expected differences over time in the frequency spectrum of the environmental noise.

According to a fourth aspect, embodiments of the invention are directed to a method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output. The method includes: determining one or more identifying location characteristics associated with a location of the device; and determining whether the identifying location characteristics match those corresponding to an identified location that is stored in a location library at the device. If the identifying location characteristics match those corresponding to an identified location that is stored in the location library at the device, the method includes: receiving a text input at a TTS engine at the device, the TTS engine configured to convert the text input into a speech output signal; and dynamically adapting the speech output signal based on a time-varying pattern of sound characteristics of environmental noise at the identified location, the time-varying pattern being stored in the location library. If the identifying location characteristics do not match those corresponding to an identified location that is stored in the location library at the device, the method includes operating the device in a learning mode including: learning one or more identifying location characteristics associated with the location of the device; measuring environmental noise using a microphone signal; learning a time-varying pattern of sound characteristics of the environmental noise; and storing the one or more identifying location characteristics and time-varying pattern of sound characteristics as an identified location in the location library. In some embodiments, the identifying location characteristics are associated with a GPS location or signal. In some embodiments, the identifying location characteristics are associated with a WiFi network or an identification thereof.

It is noted that any one or more aspects or features described with respect to one embodiment may be incorporated in a different embodiment although not specifically described relative thereto. That is, all embodiments and/or features of any embodiment can be combined in any way and/or combination. Applicant reserves the right to change any originally filed claim or file any new claim accordingly, including the right to be able to amend any originally filed claim to depend from and/or incorporate any feature of any other claim although not originally claimed in that manner. These and other objects and/or aspects of the present invention are explained in detail in the specification set forth below.

Further features, advantages and details of the present invention will be appreciated by those of ordinary skill in the art from a reading of the figures and the detailed description of the preferred embodiments that follow, such description being merely illustrative of the present invention.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram of an electronic device according to some embodiments of the present invention.

FIG. 2 is a chart illustrating an exemplary time-varying pattern of predicted sound characteristics of environmental noise using the device of FIG. 1.

FIG. 3 is a block diagram of a communication terminal and exemplary components thereof according to some embodiments of the present invention.

FIG. 4 is a flowchart of methods and operations that may be carried out using by the device of FIG. 1 for compensating for environmental noise in text-to-speech (TTS) speech output.

FIG. 5 is a flowchart of further methods and operations that may be carried out using by the device of FIG. 1 for compensating for environmental noise in text-to-speech (TTS) speech output.

FIG. 6 is a flowchart of further methods and operations that may be carried out using by the device of FIG. 1 for compensating for environmental noise in text-to-speech (TTS) speech output.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Various embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings. However, this invention should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will convey the scope of the invention to those skilled in the art.

It will be understood that, as used herein, the term “comprising” or “comprises” is open-ended, and includes one or more stated elements, steps and/or functions without precluding one or more unstated elements, steps and/or functions. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term “and/or” and “/” includes any and all combinations of one or more of the associated listed items. In the drawings, the size and relative sizes of regions may be exaggerated for clarity. Like numbers refer to like elements throughout.

Some embodiments may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Consequently, as used herein, the term “signal” may take the form of a continuous waveform and/or discrete value(s), such as digital value(s) in a memory or register. Furthermore, various embodiments may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. Accordingly, as used herein, the terms “circuit” and “controller” and “processor” may take the form of digital circuitry, such as computer-readable program code executed by an instruction processing device(s) (e.g., general purpose microprocessor and/or digital signal microprocessor), and/or analog circuitry. The operations that are described below with regard to the figures can therefore be at least partially implemented as computer-readable program code executed by a computer (e.g., microprocessor).

Embodiments are described below with reference to block diagrams and operational flow charts. It is to be understood that the functions/acts noted in the blocks may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

Generally speaking, embodiments of the invention are directed to devices and methods for compensating for ambient, background or environmental noise in text-to-speech (TTS) applications. As is known to those of skill in the art, TTS is a type of speech synthesis wherein textual input is converted into speech output. Different environmental noise sources can create different challenges with regard to the intelligibility of the TTS speech output.

An electronic device 100 according to some embodiments of the invention is illustrated in FIG. 1. The electronic device 100 includes a text-to-speech (TTS) engine or processor 102 that is configured to receive an input 104 and convert the input 104 to a speech output signal 106. The speech output signal 106 is received at one or more speakers or transducers 108 which are configured to output a speech output 110 (e.g., synthesized speech). The TTS engine 102 includes a text analysis module 112 and a TTS adaptation circuit or processor 114; these components will be described in greater detail below.

In some embodiments, the input 104 is a text input 104 (e.g., a text input signal) such that the speech output signal 106 and the speech output 110 is based on text in the text input 104. Although the input 104 is generally referenced as a text input 104 below, it will be appreciated that the input 104 can take other forms (e.g., the input may be a TTS input 104). For example, the speech output signal 106 and the speech output 110 may be used as a substitute for a user's voice. The speech output signal 106 and the speech output 110 may be translated speech and the input 104 may be pre-translated speech (e.g., speech input by the user in a different language). The input 104 may also be input initiated by, for example, a person using assistive input technology. The input 104 may be signals from electrodes or other sensors in a subvocal recognition system. The input 104 may be signals that are generated from eye movement, mouth/tongue movement and the like. The input 104 may be signals from a brain computer or a brain-computer interface (e.g., neural signals).

A user of the device 100 listens to the speech output 110 in TTS applications. However, environmental noise (e.g., ambient or background noise) may adversely affect the intelligibility of the speech output 110.

The device 100 may include at least one environmental noise sensor 116 such as a microphone 116 that measures or senses the environmental noise. The microphone 116 provides a microphone signal 118 to an environmental noise analysis circuit or processor 120. The environmental noise analysis processor 120 may include an audio analysis module 122 that is configured to perform a real time audio analysis of environmental noise measured by the microphone 116. The audio analysis module 122 determines sound characteristics including the amplitude (volume) and frequency levels associated with the captured environmental noise.

In some embodiments, the microphone 116 may be located in another device 101. For example, the microphone 116 could be in a car, a headset or another electronic device. The device 101 may communicate (e.g., wirelessly) with the device 100 and provide the microphone signal 118 to the device 100.

An environmental noise prediction module 124 is configured to predict expected future sound characteristics of the environmental noise based on the sound characteristics determined by the audio analysis module 122. In some embodiments, the environmental noise prediction module 124 characterizes a time-varying pattern of the expected future sound characteristics of the environmental noise based on a time-varying pattern observed in determined sound characteristics of previously occurring environmental noise. In some embodiments, the environmental noise prediction module 124 generates values of a time-varying control curve representing an expected pattern of the environmental noise. The control curve may be constantly or dynamically reconfigured based on the measured environmental noise.

FIG. 2 illustrates an exemplary control curve representing an expected pattern of environmental noise. The primary source of the environmental noise in FIG. 2 may be a passing train, for example. The curve shown in FIG. 2 may represent predicted sound characteristics by the environmental noise prediction module 124 after analyzing a relatively short sample of measured environmental noise (e.g., up to point “A” in time). The environmental noise prediction module 124 may recognize the gradually increasing amplitude or volume of the noise as well as the gradually increasing frequency as the train approaches. In some embodiments, and as described in more detail below, the environmental noise analysis processor 120 may also recognize identifying characteristics of the environmental noise source (e.g., a train) to assist in generating the predicted noise pattern.

As seen in FIG. 2, the environmental noise prediction module 124 may characterize expected differences over time in the amplitude (volume) of the environmental noise based on differences over time in volume of previously occurring environmental noise. The environmental noise prediction module 124 may predict timing of gaps in the environmental noise where the environmental noise has less than a threshold amplitude or volume based on differences over time in volume of previously occurring environmental noise. The environmental noise prediction module 124 may also characterize expected differences over time in the frequency spectrum of the environmental noise based on differences over time in the frequency spectrum of the previously occurring environmental noise.

Referring again to FIG. 1, the text analysis module 112 is configured to analyze the text input 104 and determine characteristics thereof. For example, the text analysis module 112 may determine how many words are in the text input, the minimum amount of time needed to output the text input as an intelligible speech output, whether any words need emphasis and/or extra output time (e.g., an obscure word) and so forth.

The TTS adaptation processor 114 is configured to dynamically adapt the speech output signal 106 based on the determined characteristics of the text input and the predicted expected future sound characteristics of the environmental noise. The TTS adaptation processor 114 has a pace adjustment module 126 configured to vary the pace of the speech output 110. The TTS adaptation processor 114 has a pitch adjustment module 128 configured to vary the pitch of the speech output 110. The pitch adjustment module 128 may include an electronic filter. The TTS adaptation processor 114 may also have a volume adjustment module 130 configured to vary the amplitude or volume of the speech output 110 (or portions thereof for emphasis). The volume adjustment module 130 may include an amplifier.

The pace adjustment module 126 may increase the pace of the speech output signal 106 such that the speech output 110 is carried out within one or more of the predicted gaps where the environmental noise has less than a threshold amplitude. The pace adjustment module 126 may be configured to adapt the speech output signal 106 by increasing the pace of the speech output 110 such that the speech output 110 is fully carried out within one of the predicted gaps. The pace adjustment module 126 may be configured to adapt the speech output signal 106 by interrupting the speech output 110 at an interrupted location of the speech output signal 106 (or the text corresponding thereto) at the end of one of the predicted gaps and resuming the speech output 110 from the interrupted location of the speech output signal 106 (or the text corresponding thereto) at a subsequent predicted gap.

Using the passing train example of FIG. 2, the pace adjustment module 126 may increase the pace of the speech output signal 106 such that the speech output 110 corresponding to the text input 104 is fully carried out in the first gap, e.g., as the train is approaching but before it creates sufficient background noise to make the speech output 110 unintelligible. The pace adjustment module 126 may also interrupt the speech output 110 as the predicted sound amplitude reaches the threshold amplitude. The pace adjustment module 126 may then resume the speech output 110 at the beginning of the second gap, e.g., after the train has passed and is a sufficient distance from the device to allow intelligible speech output.

The pitch adjustment module 128 may vary the pitch of the speech output 110 to compensate for the predicted differences over time in the frequency spectrum of the environmental noise. As illustrated in FIG. 1, the environmental noise analysis processor 120 can provide the determined sound characteristics of the measured sound to the TTS adaptation processor 114. In some embodiments, the environmental noise analysis processor 120 provides the determined frequencies of the measured environmental noise to the TTS adaptation processor 114 or the pitch adjustment module 128 such that the pitch of the speech output 110 can be varied in real time to compensate for changes in frequencies of the environmental noise and to “cut through” the environmental noise.

In some embodiments, the at least one environmental noise sensor 116 may detect the direction or location of the environmental noise or sound. For example, the at least one environmental noise sensor 116 may include a plurality of microphones and/or a gyroscope and/or other suitable device(s) to detect the location of the noise (e.g., the direction from which the noise is emanating). The at least one speaker 106 may comprise a multi-directional speaker or a plurality of speakers that are configured for panning the speech output 110. Therefore, in some embodiments, the TTS adaptation processor 114 may be configured to pan the speech output 110 and/or control 3D audio aspects of the speech output 110 in response to directional sound characteristics of the measured environmental noise. In some embodiments, the TTS adaptation processor 114 may be configured to characterize (predict) a time-varying pattern of the expected future sound characteristics of the environmental noise including characterizing expected differences over time in the direction or location of the environmental noise based on differences over time in direction or location of previously occurring environmental noise. In some embodiments, the TTS adaptation processor 114 may dynamically adapt the speech output signal 106 by controlling 3D audio output and/or panning the audio output that makes up the speech output 110 (e.g., dynamically adapt the direction of the speech output 110). In some embodiments, the TTS adaptation processor 114 may adapt the direction of the speech output 110 in addition to adapting the pace, pitch and/or volume of the speech output 110.

It will be appreciated that devices and methods according to the invention may further change the intonation, dialect, voice, voice gender, choice of words, lengths of sentences, and so forth to further improve the intelligibility of the speech output 110.

In some embodiments, the environmental noise analysis processor 120 may recognize identifying sound characteristics of the environmental noise. The environmental noise analysis processor 120 may include an environmental noise recognition module 132 that interacts with an environmental noise library 134. The environmental noise recognition module 132 may determine identifying characteristics of the measured environmental noise and query the library 134 to determine whether the identifying characteristics match those corresponding to a pre-identified environmental noise source or type that is stored in the library 134. The environmental noise library 134 may include a time-varying pattern of sound characteristics for each environmental noise source stored therein.

Exemplary environmental noise sources include an airplane cabin, a train station (or a passing train), a song, audio from a movie or television show, participant(s) in a conversation and types of conversations.

For example, a song may have gaps in the audio amplitude (e.g., where the amplitude is less than a threshold) and the TTS speech output can be timed accordingly. Also, the environmental noise in an airplane cabin may have a relatively constant amplitude (e.g., from the engine) but with an oscillating frequency and the pitch of the TTS speech output can be varied accordingly. A participant in a conversation may speak with a known voice cadence and the pace and/or pitch of the TTS speech output can be varied based on the known voice cadence. In addition, the type of conversation may be determined based on identifying sound characteristics (e.g., the name of a politician or a sports team). It may be known that conversations involving politics or sports may be heated and/or fast-paced, and the TTS speech output may be paused until the conversation has ceased, for example.

The device 100, and more particularly the environmental noise analysis circuit 120, may be operated in a learning mode. In some embodiments, the device 100 operates in a learning mode for a certain amount of time to “learn” characteristics of the environmental noise such that a time-varying pattern of sound characteristics can be established for a particular environment. For example, the device 100 may operate in a learning mode for a number of seconds or minutes to learn time-varying patterns in the environmental noise, such as regular or irregular intervals of relatively high amplitude sound and/or timing of gaps as well as differences over time in the frequency spectrum. The device 100 may then be operated in a TTS mode and the time-varying pattern of sound characteristics can be used to adapt the speech output by adapting the pace, pitch and/or volume of the speech output to provide intelligible speech output in the learned environment. The device 100 may enter the learning mode automatically (e.g., upon receiving a TTS text input but prior to converting the input to a speech output). Alternatively, a user may manually place the device 100 in the learning mode, e.g., using a user input interface.

In other embodiments, the device 100 may automatically enter the learning mode when identifying characteristics of the environmental noise are not recognized or are not present in the environmental noise library 134. As described above, the environmental noise recognition module 132 may make a determination as to whether identifying characteristics of the measured environmental noise match those of an identified environmental noise source that is stored in the environmental noise library 134. If the identifying characteristics of the measured environmental noise do not match those of an identified environmental noise source that is stored in the environmental noise library 134, the device 100 may switch to operate in a learning mode to learn identifying characteristics of the environmental noise that is being measured and learn a time-varying pattern of sound characteristics of the environmental noise. The identifying characteristics and the time-varying pattern of sound characteristics may then be stored in the environmental noise library 134 for later use.

In some embodiments, the environmental noise analysis processor 120 includes a location recognition module 136 and a location library 138. The location recognition module 136 may determine one or more identifying location characteristics associated with the location of the device 100 (e.g., GPS coordinates) and query the location library 138 to determine whether the identifying location characteristics match those corresponding to an identified location that is stored in the location library 138. The location library 138 may include a time-varying pattern of sound characteristics associated with the environmental noise at the identified location. For example, the location may be in or near a factory or office building and the time-varying pattern may include sound characteristics associated with environmental noise from sources such as machines, ventilation systems and the like at that location.

In some embodiments, the one or more identifying location characteristics associated with the location of the device 100 are associated with GPS coordinates or location (or an alternative or competing system such as GLONASS, Galileo, WiFi Positioning and the like). In some embodiments, the one or more identifying location characteristics associated with the location of the device 100 are associated with a WiFi network, such as an WiFi access point (e.g., router), a WiFi hot spot a WLAN network and the like. The one or more identifying location characteristics associated with the location of the device 100 may be associated with a Bluetooth beacon. The one or more identifying location characteristics associated with the location of the device 100 may be associated with a NFC signal (e.g., a known NFC signal that is associated with a location). It will be appreciated that other network or location technologies can be used to provide the identifying location characteristics.

If the identifying characteristics of the location of the device 100 match those corresponding to an identified location that is stored in the location library 138 at the device 100, a text input may be received at the TTS engine 102 at the device 100. The speech output signal may be adapted based on the time-varying pattern of sound characteristics of environmental noise at the identified location that is stored in the location library 138. The device 100 may be operated in a learning mode if the one or more identifying location characteristics associated with the location of the device 100 do not match those corresponding to an identified location that is stored in the location library 138 at the device 100. The device 100 may switch (e.g., automatically switch) to operate in a learning mode to learn one or more identifying location characteristics associated with the location of the device 100, and may learn a time-varying pattern of sound characteristics at the location. The identifying location characteristics and the time-varying pattern of sound characteristics may then be stored in the location library 138 as an identified location for later use.

In some embodiments, the electronic device 100 takes the form of a wireless communications terminal. FIG. 3 is a diagram of a terminal 200 that includes a terminal housing 200 h. The illustrated terminal 200 includes a display 232 and a user input interface 234 (e.g., a keypad or touchscreen). The illustrated terminal further includes a general application controller 202, a wireless communication protocol controller 204, a cellular transceiver 206, a WLAN transceiver 208 (e.g., compliant with one or more of the IEEE 801.11a-g standards), and/or a Bluetooth transceiver 210.

The cellular transceiver 206 can be configured to communicate bi-directionally according to one or more cellular standards, such as Long Term Evolution (LTE), enhanced data rates for General Packet Radio Service (GPRS) evolution (EDGE), code division multiple access (CDMA), wideband-CDMA, CDMA2000, and/or Universal Mobile Telecommunications System (UMTS) frequency bands. The terminal 200 may thereby be configured to communicate across a wireless air interface with a cellular transceiver base station and with another terminal via the WLAN transceiver 208 and/or the Bluetooth transceiver 210.

As illustrated in FIG. 2, the terminal 200 may include the components described above in connection with the device 100, including the microphone 116, the environmental noise analysis circuit 120, the TTS engine 102 and/or the loudspeaker 108. One or more of the illustrated components may be omitted from the terminal 200 and/or rearranged within the terminal 200, and it is contemplated that additional components (e.g., one or more additional controllers or processors) may be included.

Exemplary operations according to embodiments of the invention are illustrated in FIGS. 4-6. Referring to FIG. 4, a method 400 by an electronic device for compensating for environmental noise in TTS speech output includes measuring environmental noise using a microphone signal (e.g., at the device) (Block 402). Sound characteristics of the measured environmental noise are determined (Block 404). Expected future sound characteristics of the environmental noise are predicted based on the determined sound characteristics of the measured environmental noise (Block 406). A text input is received at a TTS engine at the device, and the TTS engine determines text characteristics of the text input (Block 408). The TTS engine is configured to convert the text input into a speech output signal. The speech output signal is adapted at the TTS engine based on the determined text characteristics of the text input and the predicted expected future sound characteristics of the environmental noise (Block 410). The speech output signal may be adapted by varying the pace of the speech output and/or varying the pitch of the speech output.

Referring to FIG. 5, a method 500 by an electronic device for compensating for environmental noise in TTS speech output includes measuring environmental noise using a microphone signal (e.g., at the device) (Block 502). Identifying characteristics of the measured environmental noise are determined (Block 504). A determination of whether the identifying characteristics of the measured environmental noise match those corresponding to an identified environmental noise source that is stored in an environmental noise library at the device (Block 506).

When the identifying characteristics of the measured environmental noise match those corresponding to an identified environmental noise source that is stored in the environmental noise library, a text input is received at a TTS engine at the device (Block 508). The TTS engine is configured to convert the text input into a speech output signal. The speech output signal is adapted based on a time-varying pattern of sound characteristics of the identified environmental noise (Block 510). The time-varying pattern is stored in the environmental noise library. The speech output signal may be adapted by varying the pace of the speech output and/or varying the pitch of the speech output.

When the identifying characteristics of the measured environmental noise do not match those corresponding to an identified environmental noise source that is stored in the environmental noise library, the device is operated in a learning mode including learning identifying characteristics of the measured environmental noise (Block 512) and learning a time-varying pattern of sound characteristics of the environmental noise (Block 514). The identifying characteristics and the time-varying pattern of sound characteristics are stored in the environmental noise library (Block 516).

Referring to FIG. 6, a method 600 by an electronic device for compensating for environmental noise in TTS speech output includes determining one or more identifying location characteristics associated with a location of the device (Block 602). A determination is made as to whether the identifying location characteristics match those corresponding to an identified location that is stored in a location library at the device (Block 604).

If the identifying location characteristics match those corresponding to an identified location that is stored in the location library at the device, a text input is received at a TTS engine at the device (Block 606). The TTS engine is configured to convert the text input into a speech output signal. The speech output signal is adapted based on a time-varying pattern of sound characteristics of environmental noise at the identified location (Block 608). The time-varying pattern is stored in the location library.

If it is determined that the identifying location characteristics do not match those corresponding to an identified location that is stored in the location library, the device is operated in a learning mode including learning one or more identifying location characteristics corresponding to the location of the device (Block 610) measuring the environmental noise (Block 612), and learning a time-varying pattern of sound characteristics of the environmental noise at the location of the device (Block 614). The one or more identifying location characteristics and the time-varying pattern of sound characteristics is stored as an identified location in the location library (Block 616).

The operations described above with reference to FIGS. 4-6 may be carried out by the device 100 described above. It will be appreciated that additional operations are contemplated, including those described above in connection with the described devices and components.

Many alterations and modifications may be made by those having ordinary skill in the art, given the benefit of present disclosure, without departing from the spirit and scope of the invention. Therefore, it must be understood that the illustrated embodiments have been set forth only for the purposes of example, and that it should not be taken as limiting the invention as defined by the following claims. The following claims, therefore, are to be read to include not only the combination of elements which are literally set forth but all equivalent elements for performing substantially the same function in substantially the same way to obtain substantially the same result. The claims are thus to be understood to include what is specifically illustrated and described above, what is conceptually equivalent, and also what incorporates the essential idea of the invention.

Claims (17)

Hide Dependent

That which is claimed is:

1. A method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output, the method comprising:

measuring environmental noise using a microphone signal;

determining sound characteristics of the measured environmental noise;

dynamically predicting expected future sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise, wherein dynamically predicting expected future sound characteristics of the environmental noise comprises characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise based on a time-varying pattern observed in determined sound characteristics of previously occurring environmental noise;

receiving a text input at a TTS engine at the device, the TTS engine configured to convert the text input into a speech output signal;

determining text characteristics of the text input at the TTS engine; and

at the TTS engine, dynamically adapting the speech output signal based on the determined text characteristics of the text input and the predicted expected future sound characteristics of the environmental noise, wherein dynamically adapting the speech output signal comprises varying the pace of the speech output and/or varying the pitch of the speech output;

wherein:

dynamically predicting expected future sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise comprises generating values of a time-varying control curve representing an expected pattern of the environmental noise; and

dynamically adapting the speech output signal comprises dynamically adapting the speech output signal using values of the time-varying control curve.

2. The method of claim 1 wherein characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise comprises characterizing expected differences over time in the volume of the environmental noise based on differences over time in volume of previously occurring environmental noise.

3. The method of claim 1 wherein characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise comprises predicting timing of gaps in the environmental noise where the environmental noise volume has less than a threshold amplitude based on differences over time in volume of previously occurring environmental noise.

4. The method of claim 3 wherein dynamically adapting the speech output signal comprises increasing the pace of the speech output such that the speech output is fully carried out within one of the predicted gaps.

5. The method of claim 3 wherein dynamically adapting the speech output signal comprises interrupting the speech output at an interrupted location of the speech output signal at the end of one of the predicted gaps and resuming the speech output from the interrupted location of the speech output signal at a subsequent predicted gap.

6. A method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output, the method comprising:

measuring environmental noise using a microphone signal;

determining sound characteristics of the measured environmental noise;

determining text characteristics of the text input at the TTS engine; and

wherein characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise comprises characterizing expected differences over time in the frequency spectrum of the environmental noise based on differences over time in the frequency spectrum of the previously occurring environmental noise.

7. The method of claim 1 wherein dynamically adapting the speech output signal comprises varying the pitch of the speech output to compensate for the expected differences over time in the frequency spectrum of the environmental noise.

8. The method of claim 1 wherein the text characteristics of the text input comprises a length of the text and a minimum amount of time needed for intelligible speech output.

9. The method of claim 1 further comprising operating the device in a learning mode to determine sound characteristics of the measured environmental noise.

10. The method of claim 1 further comprising determining a location of the device, wherein dynamically predicting expected future sound characteristics of the environmental noise is based on previously known sound characteristics of environmental noise at the location.

11. The method of claim 1 wherein;

characterizing a time-varying pattern of the expected future sound characteristics of the environmental noise comprises characterizing expected differences over time in the direction of the environmental noise based on differences over time in direction of previously occurring environmental noise; and

dynamically adapting the speech output signal comprises panning the speech output to compensate for the expected differences over time in the direction of the environmental noise.

12. An electronic device configured to compensate for environmental noise in TTS speech output, the device comprising:

a microphone configured to measure environmental noise;

an environmental noise analysis circuit configured to:

determine sound characteristics of environmental noise measured by the microphone;

predict future expected sound characteristics of the environmental noise based on the determined sound characteristics of the measured environmental noise;

characterize a time-varying pattern of the expected future sound characteristics of the environmental noise based on a time-varying pattern observed in determined sound characteristics of previously occurring environmental noise to predict future expected sound characteristics of the environmental noise; and

predict timing of gaps in the environmental noise where the environmental noise volume has less than a threshold amplitude based on differences over time in volume of previously occurring environmental noise; and

a TTS processor configured to:

receive a text input and covert the text input into a speech output signal;

determine text characteristics of the text input; and

adapt the speech output signal based on the determined text characteristics of the text input and the predicted expected sound characteristics of the environmental noise, wherein the TTS processor is configured to adapt the speech output signal by varying the pace of the speech output and/or varying the pitch of the speech output within one of the predicted gaps.

13. The device of claim 12 wherein:

the environmental noise analysis circuit is configured to characterize expected differences over time in the frequency spectrum of the environmental noise based on differences over time in the frequency spectrum of the previously occurring environmental noise; and

the TTS processor is configured to vary the pitch of the speech output to compensate for the expected differences over time in the frequency spectrum of the environmental noise.

14. The device of claim 12 wherein the environmental noise analysis circuit is configured to operate in a learning mode to determine sound characteristics of the measured environmental noise.

15. A method by an electronic device for compensating for environmental noise in text-to-speech (TTS) speech output, the method comprising:

determining one or more identifying location characteristics associated with a location of the device;

determining whether the identifying location characteristics match those corresponding to an identified location that is stored in a location library at the device;

if the identifying location characteristics match those corresponding to an identified location that is stored in the location library at the device:

receiving a text input at a TTS engine at the device, the TTS engine configured to convert the text input into a speech output signal; and

dynamically adapting the speech output signal based on a time-varying pattern of sound characteristics of environmental noise at the identified location, the time-varying pattern being stored in the location library; and

if the identifying location characteristics do not match those corresponding to an identified location that is stored in the location library at the device, operating the device in a learning mode including:

learning one or more identifying location characteristics associated with the location of the device;

measuring environmental noise using a microphone signal;

learning a time-varying pattern of sound characteristics of the environmental noise; and

storing the one or more identifying location characteristics of the location and the time-varying pattern of sound characteristics as an identified location in the location library;

wherein (i) the identifying location characteristics are associated with a GPS location or GPS signal and/or (ii) the identifying location characteristics are associated with a WiFi network or an identification thereof.

16. The device of claim 12 wherein the TTS processor is configured to dynamically adapt the speech output signal by increasing the pace of the speech output such that the speech output is fully carried out within one of the predicted gaps.

17. The device of claim 12 wherein the TTS processor is configured to dynamically adapt the speech output signal by interrupting the speech output at an interrupted location of the speech output signal at the end of one of the predicted gaps and resuming the speech output from the interrupted location of the speech output signal at a subsequent predicted gap.

Patent Citations (4)

Publication number Priority date Publication date Assignee Title

US20020128838A1

2001-03-08 2002-09-12 Peter Veprek Run time synthesizer adaptation to improve intelligibility of synthesized speech

US20030061049A1

* 2001-08-30 2003-03-27 Clarity, Llc Synthesized speech intelligibility enhancement through environment awareness

US20120296654A1

* 2011-05-20 2012-11-22 James Hendrickson Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment

US8571871B1

* 2012-10-02 2013-10-29 Google Inc. Methods and systems for adaptation of synthetic speech in an environment

Family To Family Citations

* Cited by examiner, † Cited by third party

Non-Patent Citations (2)

Title

International Preliminary Report on Patentability for corresponding PCT application No. PCT/JP2013/084395 mailed Jun. 30, 2016, 7 pages.

International Search Report and Written Opinion for PCT/JP2013/084395 mailed May 28, 2014.

* Cited by examiner, † Cited by third party

Cited By (115)

Publication number Priority date Publication date Assignee Title

Family To Family Citations

US8677377B2

2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant

US9318108B2

2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant

US8977255B2

2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation

US8676904B2

2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities

US10255566B2

2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform

US10276170B2

2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant

US8682667B2

2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information

US9262612B2

2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication

US10417037B2

2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant

DE212014000045U1

2013-02-07 2015-09-24 Apple Inc. Voice trigger for a digital assistant

US10652394B2

2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail

US10748529B1

2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant

JP6259911B2

2013-06-09 2018-01-10 アップルインコーポレイテッド Apparatus, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant

US10176167B2

2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs

AU2014306221B2

2013-08-06 2017-04-06 Apple Inc. Auto-activating smart responses based on activities from remote devices

US10296160B2

2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data

US9715875B2

2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases

US9633004B2

2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts

WO2015184186A1

2014-05-30 2015-12-03 Apple Inc. Multi-command single utterance input method

US9430463B2

2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing

US10170123B2

2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation

US9338493B2

2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions

US9668121B2

2014-09-30 2017-05-30 Apple Inc. Social reminders

US10074360B2

2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition

US10127911B2

2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques

US10152299B2

2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants

US9886953B2

2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation

US9721566B2

2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers

US10460227B2

2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session

US10200824B2

2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device

US10083688B2

2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance

US9578173B2

2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session

US9640169B2

* 2015-06-25 2017-05-02 Bose Corporation Arraying speakers for a uniform driver field

US20160378747A1

2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback

US10671428B2

2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant

US10331312B2

2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment

US10740384B2

2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback

US10747498B2

2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant

US9847764B2

2015-09-11 2017-12-19 Blackberry Limited Generating adaptive notification

RU2632424C2

2015-09-29 2017-10-04 Общество С Ограниченной Ответственностью "Яндекс" Method and server for speech synthesis in text

US11587559B2

2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

US10691473B2

2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment

US10956666B2

2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions

US10223066B2

2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices

US11227589B2

2016-06-06 2022-01-18 Apple Inc. Intelligent list reading

US12223282B2

2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment

US10586535B2

2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment

DK179415B1

2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control

US12197817B2

2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control

DK201670540A1

2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant

US10474753B2

2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks

US10074359B2

* 2016-11-01 2018-09-11 Google Llc Dynamic text-to-speech provisioning

US11204787B2

2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant

JP6646001B2

* 2017-03-22 2020-02-14 株式会社東芝 Audio processing device, audio processing method and program

JP2018159759A

* 2017-03-22 2018-10-11 株式会社東芝 Voice processor, voice processing method and program

DK201770383A1

2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors

US10417266B2

2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions

DK180048B1

2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION

US10395654B2

2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network

US10726832B2

2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information

DK179745B1

2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT

DK201770429A1

2017-05-12 2018-12-14 Apple Inc. Low-latency intelligent automated assistant

DK179496B1

2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models

US11301477B2

2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant

DK201770411A1

2017-05-15 2018-12-20 Apple Inc. Multi-modal interfaces

US10395659B2

* 2017-05-16 2019-08-27 Apple Inc. Providing an auditory-based interface of a digital assistant

US20180336275A1

2017-05-16 2018-11-22 Apple Inc. Intelligent automated assistant for media exploration

US10403278B2

2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services

US20180336892A1

2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant

DK179549B1

2017-05-16 2019-02-12 Apple Inc. Far-field extension for digital assistant services

US10311144B2

2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation

US20180350344A1

2017-05-30 2018-12-06 Motorola Solutions, Inc System, device, and method for an electronic digital assistant having a context driven natural language vocabulary

CN107909993A

* 2017-11-27 2018-04-13 安徽经邦软件技术有限公司 A kind of intelligent sound report preparing system

US11164591B2

* 2017-12-18 2021-11-02 Huawei Technologies Co., Ltd. Speech enhancement method and apparatus

CN107895579B

* 2018-01-02 2021-08-17 联想(北京)有限公司 Voice recognition method and system

US10733375B2

2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding

US10592604B2

2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition

US10818288B2

2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction

US10928918B2

2018-05-07 2021-02-23 Apple Inc. Raise to speak

US11145294B2

2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences

US10984780B2

2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks

DK179822B1

2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device

US11386266B2

2018-06-01 2022-07-12 Apple Inc. Text correction

US10892996B2

2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination

DK180639B1

2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT

DK201870355A1

2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments

US10496705B1

2018-06-03 2019-12-03 Apple Inc. Accelerated task performance

KR102544250B1

* 2018-07-03 2023-06-16 삼성전자주식회사 Method and device for outputting sound

US11010561B2

2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data

US11170166B2

2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks

US10839159B2

2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system

US11462215B2

2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands

US11475898B2

2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition

US11638059B2

2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices

US11348573B2

2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems

US11475884B2

2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined

DK201970509A1

2019-05-06 2021-01-15 Apple Inc Spoken notifications

US11423908B2

2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests

US11307752B2

2019-05-06 2022-04-19 Apple Inc. User configurable task triggers

US11140099B2

2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions

DK180129B1

2019-05-31 2020-06-02 Apple Inc. USER ACTIVITY SHORTCUT SUGGESTIONS

DK201970511A1

2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems

US11496600B2

2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models

US11289073B2

2019-05-31 2022-03-29 Apple Inc. Device text to speech

US11468890B2

2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices

US11360641B2

2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information

US11484271B2

* 2019-08-20 2022-11-01 West Affum Holdings Dac Alert presentation based on ancillary device conditions

US11488406B2

2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators

US11425523B2

* 2020-04-10 2022-08-23 Facebook Technologies, Llc Systems and methods for audio adjustment

US11810578B2

2020-05-11 2023-11-07 Apple Inc. Device arbitration for digital assistant-based intercom systems

US11183193B1

2020-05-11 2021-11-23 Apple Inc. Digital assistant hardware abstraction

US11061543B1

2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context

US11755276B2

2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence

US11490204B2

2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination

US11438683B2

2020-07-21 2022-09-06 Apple Inc. User identification using headphones

* Cited by examiner, † Cited by third party, ‡ Family to family citation

Priority And Related Applications

Applications Claiming Priority (1)

Application Filing date Title

PCT/JP2013/084395

2013-12-17 Electronic devices and methods for compensating for environmental noise in text-to-speech applications

Legal Events

Date Code Title Description

2014-08-13 AS Assignment

Owner name: SONY CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THORN, OLA;REEL/FRAME:033525/0742

Effective date: 20140807

2016-04-27 AS Assignment

Owner name: SONY MOBILE COMMUNICATIONS INC., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SONY CORPORATION;REEL/FRAME:038542/0224

Effective date: 20160414

2017-06-28 STCF Information on status: patent grant

Free format text: PATENTED CASE

2018-03-20 CC Certificate of correction

2019-03-25 AS Assignment

Owner name: SONY CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SONY MOBILE COMMUNICATIONS, INC.;REEL/FRAME:048691/0134

Effective date: 20190325

2020-09-24 MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

2024-12-20 MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

Concepts

Download

Name Image Sections Count Query match

environmental effect

title,claims,abstract,description 259 0.000

method

title,claims,abstract,description 37 0.000

spectrum

claims,description 16 0.000

panning

claims,description 4 0.000

Show all concepts from the description section

Data provided by IFI CLAIMS Patent Services