JP2019532349A - ニューラルネットワークを使用したオーディオの生成 - Google Patents
ニューラルネットワークを使用したオーディオの生成 Download PDFInfo
- Publication number
- JP2019532349A JP2019532349A JP2019522236A JP2019522236A JP2019532349A JP 2019532349 A JP2019532349 A JP 2019532349A JP 2019522236 A JP2019522236 A JP 2019522236A JP 2019522236 A JP2019522236 A JP 2019522236A JP 2019532349 A JP2019532349 A JP 2019532349A
- Authority
- JP
- Japan
- Prior art keywords
- neural network
- audio
- layer
- time step
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims description 96
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000009826 distribution Methods 0.000 claims abstract description 27
- 230000008569 process Effects 0.000 claims abstract description 23
- 238000003860 storage Methods 0.000 claims abstract description 9
- 230000001364 causal effect Effects 0.000 claims description 59
- 238000012545 processing Methods 0.000 claims description 40
- 238000013527 convolutional neural network Methods 0.000 claims description 32
- 230000004913 activation Effects 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 abstract description 14
- 230000006870 function Effects 0.000 description 35
- 230000003750 conditioning effect Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000012800 visualization Methods 0.000 description 8
- 230000009471 action Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000001143 conditioned effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 241000009334 Singa Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Electrically Operated Instructional Devices (AREA)
- Complex Calculations (AREA)
- Stereophonic System (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
102 ニューラルネットワーク入力
110 畳み込みサブネットワーク
120 出力層
140 オーディオサンプル
142 現在のオーディオシーケンス、現在の出力シーケンス
144 代替表現
146 スコア分布、出力層
152 オーディオデータの出力シーケンス、オーディオシーケンス
200 視覚化
202 現在の入力シーケンス
204 拡張因果的畳み込み層、層
206 拡張因果的畳み込み層、層
208 拡張因果的畳み込み層、層
210 拡張因果的畳み込み層、層
212 出力
300 アーキテクチャ
302 因果的畳み込み層、層入力
304 拡張因果的畳み込み層、層
306 入力、層入力
308 拡張因果的畳み込み層
310 非線形関数
312 ゲーティング関数
314 乗算
316 1×1畳み込み
318 スキップ出力
320 最終出力
322 合計
324 非線形性
326 1×1畳み込み
328 非線形性
330 1×1畳み込み
Claims (37)
前記ニューラルネットワークシステムが、複数の時間ステップの各々においてそれぞれのオーディオサンプルを含むオーディオデータの出力シーケンスを生成するように構成され、
前記ニューラルネットワークシステムが、
1つまたは複数のオーディオ処理畳み込みニューラルネットワーク層を備える畳み込みサブネットワークであって、前記畳み込みサブネットワークが、前記複数の時間ステップの各々について、
前記出力シーケンスにおいて前記時間ステップに先行する各時間ステップにおける前記それぞれのオーディオサンプルを含むオーディオデータの現在シーケンスを受信し、
前記時間ステップについての代替表現を生成するために前記オーディオデータの現在シーケンスを処理する
ように構成される、畳み込みサブネットワークと、
出力層であって、前記出力層が、前記複数の時間ステップの各々について、
前記時間ステップの前記代替表現を受信し、
前記時間ステップに関する複数の可能なオーディオサンプルにわたるスコア分布を規定する出力を生成するために前記時間ステップの前記代替表現を処理する
ように構成される、出力層と
を備える、ニューラルネットワークシステム。
前記複数の時間ステップの各々について、
前記時間ステップの前記スコア分布に従って前記出力シーケンス内の前記時間ステップにおけるオーディオサンプルを選択する
ように構成されるサブシステムをさらに備える、請求項1に記載のニューラルネットワークシステム。
前記スコア分布からサンプリングすること
を含む、請求項2に記載のニューラルネットワークシステム。
前記スコア分布に従って最高スコアを有するオーディオサンプルを選択すること
を含む、請求項2に記載のニューラルネットワークシステム。
前記方法が、前記複数の時間ステップの各々について、
1つまたは複数のオーディオ処理畳み込みニューラルネットワーク層を含む畳み込みサブネットワークへの入力としてオーディオデータの現在シーケンスを提供するステップであって、
前記現在シーケンスが、前記出力シーケンス内の前記時間ステップに先行する各時間ステップにおける前記それぞれのオーディオサンプルを含み、
前記畳み込みサブネットワークが、前記複数の時間ステップの各々について、
前記オーディオデータの現在シーケンスを受信し、
前記時間ステップについての代替表現を生成するために前記オーディオデータの現在シーケンスを処理する
ように構成される、ステップと、
出力層への入力として前記時間ステップの前記代替表現を提供するステップであって、前記出力層が、前記複数の時間ステップの各々について、
前記時間ステップの前記代替表現を受信し、
前記時間ステップに関する複数の可能なオーディオサンプルにわたるスコア分布を規定する出力を生成するために前記時間ステップの代替表現を処理する
ように構成される、ステップと
を含む、方法。
前記時間ステップの前記スコア分布に従って前記出力シーケンス内の前記時間ステップにおけるオーディオサンプルを選択するステップをさらに含む、請求項20に記載の方法。
前記スコア分布からサンプリングするステップを含む、請求項21に記載の方法。
前記スコア分布に従って最高スコアを有するオーディオサンプルを選択するステップを含む、請求項21に記載の方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019150456A JP6891236B2 (ja) | 2016-09-06 | 2019-08-20 | ニューラルネットワークを使用したオーディオの生成 |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662384115P | 2016-09-06 | 2016-09-06 | |
US62/384,115 | 2016-09-06 | ||
PCT/US2017/050320 WO2018048934A1 (en) | 2016-09-06 | 2017-09-06 | Generating audio using neural networks |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019150456A Division JP6891236B2 (ja) | 2016-09-06 | 2019-08-20 | ニューラルネットワークを使用したオーディオの生成 |
Publications (2)
Publication Number | Publication Date |
---|---|
JP6577159B1 JP6577159B1 (ja) | 2019-09-18 |
JP2019532349A true JP2019532349A (ja) | 2019-11-07 |
Family
ID=60022154
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019522236A Active JP6577159B1 (ja) | 2016-09-06 | 2017-09-06 | ニューラルネットワークを使用したオーディオの生成 |
JP2019150456A Active JP6891236B2 (ja) | 2016-09-06 | 2019-08-20 | ニューラルネットワークを使用したオーディオの生成 |
JP2021087708A Active JP7213913B2 (ja) | 2016-09-06 | 2021-05-25 | ニューラルネットワークを使用したオーディオの生成 |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019150456A Active JP6891236B2 (ja) | 2016-09-06 | 2019-08-20 | ニューラルネットワークを使用したオーディオの生成 |
JP2021087708A Active JP7213913B2 (ja) | 2016-09-06 | 2021-05-25 | ニューラルネットワークを使用したオーディオの生成 |
Country Status (9)
Country | Link |
---|---|
US (5) | US10304477B2 (ja) |
EP (2) | EP3497629B1 (ja) |
JP (3) | JP6577159B1 (ja) |
KR (1) | KR102353284B1 (ja) |
CN (2) | CN112289342B (ja) |
AU (1) | AU2017324937B2 (ja) |
BR (1) | BR112019004524B1 (ja) |
CA (2) | CA3155320A1 (ja) |
WO (1) | WO2018048934A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019045856A (ja) * | 2017-08-31 | 2019-03-22 | 国立研究開発法人情報通信研究機構 | オーディオデータ学習装置、オーディオデータ推論装置、およびプログラム |
JP2022528016A (ja) * | 2019-05-23 | 2022-06-07 | グーグル エルエルシー | 表現用エンドツーエンド音声合成における変分埋め込み容量 |
Families Citing this family (83)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9965247B2 (en) | 2016-02-22 | 2018-05-08 | Sonos, Inc. | Voice controlled media playback system based on user profile |
US9811314B2 (en) | 2016-02-22 | 2017-11-07 | Sonos, Inc. | Metadata exchange involving a networked playback system and a networked microphone system |
US10095470B2 (en) | 2016-02-22 | 2018-10-09 | Sonos, Inc. | Audio response playback |
US9772817B2 (en) | 2016-02-22 | 2017-09-26 | Sonos, Inc. | Room-corrected voice detection |
US10264030B2 (en) | 2016-02-22 | 2019-04-16 | Sonos, Inc. | Networked microphone device control |
US9947316B2 (en) | 2016-02-22 | 2018-04-17 | Sonos, Inc. | Voice control of a media playback system |
US9978390B2 (en) | 2016-06-09 | 2018-05-22 | Sonos, Inc. | Dynamic player selection for audio signal processing |
US10134399B2 (en) | 2016-07-15 | 2018-11-20 | Sonos, Inc. | Contextualization of voice inputs |
US10115400B2 (en) | 2016-08-05 | 2018-10-30 | Sonos, Inc. | Multiple voice services |
EP3497629B1 (en) | 2016-09-06 | 2020-11-04 | Deepmind Technologies Limited | Generating audio using neural networks |
US11080591B2 (en) * | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
WO2018048945A1 (en) * | 2016-09-06 | 2018-03-15 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
US9942678B1 (en) | 2016-09-27 | 2018-04-10 | Sonos, Inc. | Audio playback settings for voice interaction |
US10181323B2 (en) | 2016-10-19 | 2019-01-15 | Sonos, Inc. | Arbitration-based voice recognition |
CN110023963B (zh) | 2016-10-26 | 2023-05-30 | 渊慧科技有限公司 | 使用神经网络处理文本序列 |
EP3745394B1 (en) * | 2017-03-29 | 2023-05-10 | Google LLC | End-to-end text-to-speech conversion |
US10475449B2 (en) | 2017-08-07 | 2019-11-12 | Sonos, Inc. | Wake-word detection suppression |
KR102410820B1 (ko) * | 2017-08-14 | 2022-06-20 | 삼성전자주식회사 | 뉴럴 네트워크를 이용한 인식 방법 및 장치 및 상기 뉴럴 네트워크를 트레이닝하는 방법 및 장치 |
US10048930B1 (en) | 2017-09-08 | 2018-08-14 | Sonos, Inc. | Dynamic computation of system response volume |
US10446165B2 (en) | 2017-09-27 | 2019-10-15 | Sonos, Inc. | Robust short-time fourier transform acoustic echo cancellation during audio playback |
US10482868B2 (en) | 2017-09-28 | 2019-11-19 | Sonos, Inc. | Multi-channel acoustic echo cancellation |
US10621981B2 (en) | 2017-09-28 | 2020-04-14 | Sonos, Inc. | Tone interference cancellation |
US10466962B2 (en) | 2017-09-29 | 2019-11-05 | Sonos, Inc. | Media playback system with voice assistance |
US10880650B2 (en) | 2017-12-10 | 2020-12-29 | Sonos, Inc. | Network microphone devices with automatic do not disturb actuation capabilities |
JP7082357B2 (ja) * | 2018-01-11 | 2022-06-08 | ネオサピエンス株式会社 | 機械学習を利用したテキスト音声合成方法、装置およびコンピュータ読み取り可能な記憶媒体 |
WO2019152722A1 (en) | 2018-01-31 | 2019-08-08 | Sonos, Inc. | Device designation of playback and network microphone device arrangements |
US11175880B2 (en) | 2018-05-10 | 2021-11-16 | Sonos, Inc. | Systems and methods for voice-assisted media content selection |
US10959029B2 (en) | 2018-05-25 | 2021-03-23 | Sonos, Inc. | Determining and adapting to changes in microphone performance of playback devices |
CA3103470A1 (en) | 2018-06-12 | 2019-12-19 | Intergraph Corporation | Artificial intelligence applications for computer-aided dispatch systems |
US10681460B2 (en) | 2018-06-28 | 2020-06-09 | Sonos, Inc. | Systems and methods for associating playback devices with voice assistant services |
US10971170B2 (en) * | 2018-08-08 | 2021-04-06 | Google Llc | Synthesizing speech from text using neural networks |
US10461710B1 (en) | 2018-08-28 | 2019-10-29 | Sonos, Inc. | Media playback system with maximum volume setting |
US11076035B2 (en) | 2018-08-28 | 2021-07-27 | Sonos, Inc. | Do not disturb feature for audio notifications |
US10587430B1 (en) | 2018-09-14 | 2020-03-10 | Sonos, Inc. | Networked devices, systems, and methods for associating playback devices based on sound codes |
US11024331B2 (en) | 2018-09-21 | 2021-06-01 | Sonos, Inc. | Voice detection optimization using sound metadata |
US10811015B2 (en) | 2018-09-25 | 2020-10-20 | Sonos, Inc. | Voice detection optimization based on selected voice assistant service |
CN112789625A (zh) | 2018-09-27 | 2021-05-11 | 渊慧科技有限公司 | 承诺信息速率变分自编码器 |
US11100923B2 (en) | 2018-09-28 | 2021-08-24 | Sonos, Inc. | Systems and methods for selective wake word detection using neural network models |
US11899519B2 (en) | 2018-10-23 | 2024-02-13 | Sonos, Inc. | Multiple stage network microphone device with reduced power consumption and processing load |
US10977872B2 (en) | 2018-10-31 | 2021-04-13 | Sony Interactive Entertainment Inc. | Graphical style modification for video games using machine learning |
US11636673B2 (en) | 2018-10-31 | 2023-04-25 | Sony Interactive Entertainment Inc. | Scene annotation using machine learning |
US11375293B2 (en) | 2018-10-31 | 2022-06-28 | Sony Interactive Entertainment Inc. | Textual annotation of acoustic effects |
US10854109B2 (en) | 2018-10-31 | 2020-12-01 | Sony Interactive Entertainment Inc. | Color accommodation for on-demand accessibility |
EP3654249A1 (en) * | 2018-11-15 | 2020-05-20 | Snips | Dilated convolutions and gating for efficient keyword spotting |
US11024321B2 (en) | 2018-11-30 | 2021-06-01 | Google Llc | Speech coding using auto-regressive generative neural networks |
US11183183B2 (en) | 2018-12-07 | 2021-11-23 | Sonos, Inc. | Systems and methods of operating media playback systems having multiple voice assistant services |
US11132989B2 (en) | 2018-12-13 | 2021-09-28 | Sonos, Inc. | Networked microphone devices, systems, and methods of localized arbitration |
CN109771944B (zh) * | 2018-12-19 | 2022-07-12 | 武汉西山艺创文化有限公司 | 一种游戏音效生成方法、装置、设备和存储介质 |
US10602268B1 (en) | 2018-12-20 | 2020-03-24 | Sonos, Inc. | Optimization of network microphone devices using noise classification |
JP7192882B2 (ja) * | 2018-12-26 | 2022-12-20 | 日本電信電話株式会社 | 発話リズム変換装置、モデル学習装置、それらの方法、およびプログラム |
US11315556B2 (en) | 2019-02-08 | 2022-04-26 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification |
US10867604B2 (en) | 2019-02-08 | 2020-12-15 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing |
US11587552B2 (en) | 2019-04-30 | 2023-02-21 | Sutherland Global Services Inc. | Real time key conversational metrics prediction and notability |
US11120794B2 (en) | 2019-05-03 | 2021-09-14 | Sonos, Inc. | Voice assistant persistence across multiple network microphone devices |
CN110136731B (zh) * | 2019-05-13 | 2021-12-24 | 天津大学 | 空洞因果卷积生成对抗网络端到端骨导语音盲增强方法 |
US11200894B2 (en) | 2019-06-12 | 2021-12-14 | Sonos, Inc. | Network microphone device with command keyword eventing |
US10586540B1 (en) | 2019-06-12 | 2020-03-10 | Sonos, Inc. | Network microphone device with command keyword conditioning |
US11361756B2 (en) | 2019-06-12 | 2022-06-14 | Sonos, Inc. | Conditional wake word eventing based on environment |
US10871943B1 (en) | 2019-07-31 | 2020-12-22 | Sonos, Inc. | Noise classification for event detection |
US11138969B2 (en) | 2019-07-31 | 2021-10-05 | Sonos, Inc. | Locally distributed keyword detection |
US11138975B2 (en) | 2019-07-31 | 2021-10-05 | Sonos, Inc. | Locally distributed keyword detection |
CN110728991B (zh) * | 2019-09-06 | 2022-03-01 | 南京工程学院 | 一种改进的录音设备识别算法 |
WO2021075994A1 (en) | 2019-10-16 | 2021-04-22 | Saudi Arabian Oil Company | Determination of elastic properties of a geological formation using machine learning applied to data acquired while drilling |
US11189286B2 (en) | 2019-10-22 | 2021-11-30 | Sonos, Inc. | VAS toggle based on device orientation |
KR20210048310A (ko) | 2019-10-23 | 2021-05-03 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
KR102556096B1 (ko) * | 2019-11-29 | 2023-07-18 | 한국전자통신연구원 | 이전 프레임의 정보를 사용한 오디오 신호 부호화/복호화 장치 및 방법 |
US11200900B2 (en) | 2019-12-20 | 2021-12-14 | Sonos, Inc. | Offline voice control |
US11373095B2 (en) * | 2019-12-23 | 2022-06-28 | Jens C. Jenkins | Machine learning multiple features of depicted item |
US11562740B2 (en) | 2020-01-07 | 2023-01-24 | Sonos, Inc. | Voice verification for media playback |
US11308958B2 (en) | 2020-02-07 | 2022-04-19 | Sonos, Inc. | Localized wakeword verification |
US20210312258A1 (en) * | 2020-04-01 | 2021-10-07 | Sony Corporation | Computing temporal convolution networks in real time |
US20210350788A1 (en) * | 2020-05-06 | 2021-11-11 | Samsung Electronics Co., Ltd. | Electronic device for generating speech signal corresponding to at least one text and operating method of the electronic device |
US11308962B2 (en) | 2020-05-20 | 2022-04-19 | Sonos, Inc. | Input detection windowing |
US11482224B2 (en) | 2020-05-20 | 2022-10-25 | Sonos, Inc. | Command keywords with input detection windowing |
US11727919B2 (en) | 2020-05-20 | 2023-08-15 | Sonos, Inc. | Memory allocation for keyword spotting engines |
EP3719711A3 (en) | 2020-07-30 | 2021-03-03 | Institutul Roman De Stiinta Si Tehnologie | Method of detecting anomalous data, machine computing unit, computer program |
US11698771B2 (en) | 2020-08-25 | 2023-07-11 | Sonos, Inc. | Vocal guidance engines for playback devices |
US11984123B2 (en) | 2020-11-12 | 2024-05-14 | Sonos, Inc. | Network device interaction by range |
US11796714B2 (en) | 2020-12-10 | 2023-10-24 | Saudi Arabian Oil Company | Determination of mechanical properties of a geological formation using deep learning applied to data acquired while drilling |
CN113724683B (zh) * | 2021-07-23 | 2024-03-22 | 阿里巴巴达摩院(杭州)科技有限公司 | 音频生成方法、计算机设备及计算机可读存储介质 |
WO2023177145A1 (ko) * | 2022-03-16 | 2023-09-21 | 삼성전자주식회사 | 전자 장치 및 전자 장치의 제어 방법 |
WO2023219292A1 (ko) * | 2022-05-09 | 2023-11-16 | 삼성전자 주식회사 | 장면 분류를 위한 오디오 처리 방법 및 장치 |
EP4293662A1 (en) * | 2022-06-17 | 2023-12-20 | Samsung Electronics Co., Ltd. | Method and system for personalising machine learning models |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09146576A (ja) * | 1995-10-31 | 1997-06-06 | Natl Sci Council | 原文対音声の人工的神経回路網にもとづく韻律の合成装置 |
CA2810457A1 (en) * | 2013-03-25 | 2014-09-25 | Gerald Bradley PENN | System and method for applying a convolutional neural network to speech recognition |
Family Cites Families (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2810457A (en) * | 1953-04-10 | 1957-10-22 | Gen Motors Corp | Lubricator |
JPH0450121Y2 (ja) | 1986-04-30 | 1992-11-26 | ||
JP2522400B2 (ja) * | 1989-08-10 | 1996-08-07 | ヤマハ株式会社 | 楽音波形生成方法 |
US5377302A (en) | 1992-09-01 | 1994-12-27 | Monowave Corporation L.P. | System for recognizing speech |
AU675389B2 (en) * | 1994-04-28 | 1997-01-30 | Motorola, Inc. | A method and apparatus for converting text into audible signals using a neural network |
US6357176B2 (en) * | 1997-03-19 | 2002-03-19 | Mississippi State University | Soilless sod |
JPH10333699A (ja) * | 1997-06-05 | 1998-12-18 | Fujitsu Ltd | 音声認識および音声合成装置 |
US5913194A (en) * | 1997-07-14 | 1999-06-15 | Motorola, Inc. | Method, device and system for using statistical information to reduce computation and memory requirements of a neural network based speech synthesis system |
JPH11282484A (ja) * | 1998-03-27 | 1999-10-15 | Victor Co Of Japan Ltd | 音声合成装置 |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
DE10018134A1 (de) * | 2000-04-12 | 2001-10-18 | Siemens Ag | Verfahren und Vorrichtung zum Bestimmen prosodischer Markierungen |
JP2002123280A (ja) * | 2000-10-16 | 2002-04-26 | Seiko Epson Corp | 音声合成方法および音声合成装置ならびに音声合成処理プログラムを記録した記録媒体 |
US7062437B2 (en) * | 2001-02-13 | 2006-06-13 | International Business Machines Corporation | Audio renderings for expressing non-audio nuances |
US20060064177A1 (en) | 2004-09-17 | 2006-03-23 | Nokia Corporation | System and method for measuring confusion among words in an adaptive speech recognition system |
US7747070B2 (en) * | 2005-08-31 | 2010-06-29 | Microsoft Corporation | Training convolutional neural networks on graphics processing units |
KR100832556B1 (ko) * | 2006-09-22 | 2008-05-26 | (주)한국파워보이스 | 강인한 원거리 음성 인식 시스템을 위한 음성 인식 방법 |
US8504361B2 (en) * | 2008-02-07 | 2013-08-06 | Nec Laboratories America, Inc. | Deep neural networks and methods for using same |
CA2724753A1 (en) * | 2008-05-30 | 2009-12-03 | Nokia Corporation | Method, apparatus and computer program product for providing improved speech synthesis |
FR2950713A1 (fr) | 2009-09-29 | 2011-04-01 | Movea Sa | Systeme et procede de reconnaissance de gestes |
TWI413104B (zh) * | 2010-12-22 | 2013-10-21 | Ind Tech Res Inst | 可調控式韻律重估測系統與方法及電腦程式產品 |
CN102651217A (zh) * | 2011-02-25 | 2012-08-29 | 株式会社东芝 | 用于合成语音的方法、设备以及用于语音合成的声学模型训练方法 |
EP2565667A1 (en) | 2011-08-31 | 2013-03-06 | Friedrich-Alexander-Universität Erlangen-Nürnberg | Direction of arrival estimation using watermarked audio signals and microphone arrays |
US8527276B1 (en) * | 2012-10-25 | 2013-09-03 | Google Inc. | Speech synthesis using deep neural networks |
US9230550B2 (en) * | 2013-01-10 | 2016-01-05 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
US9147154B2 (en) | 2013-03-13 | 2015-09-29 | Google Inc. | Classifying resources using a deep network |
US9141906B2 (en) * | 2013-03-13 | 2015-09-22 | Google Inc. | Scoring concept terms using a deep network |
US9190053B2 (en) | 2013-03-25 | 2015-11-17 | The Governing Council Of The Univeristy Of Toronto | System and method for applying a convolutional neural network to speech recognition |
US20150032449A1 (en) * | 2013-07-26 | 2015-01-29 | Nuance Communications, Inc. | Method and Apparatus for Using Convolutional Neural Networks in Speech Recognition |
CN104681034A (zh) * | 2013-11-27 | 2015-06-03 | 杜比实验室特许公司 | 音频信号处理 |
US9953634B1 (en) | 2013-12-17 | 2018-04-24 | Knowles Electronics, Llc | Passive training for automatic speech recognition |
US10275704B2 (en) | 2014-06-06 | 2019-04-30 | Google Llc | Generating representations of input sequences using neural networks |
US10181098B2 (en) | 2014-06-06 | 2019-01-15 | Google Llc | Generating representations of input sequences using neural networks |
KR102332729B1 (ko) | 2014-07-28 | 2021-11-30 | 삼성전자주식회사 | 발음 유사도를 기반으로 한 음성 인식 방법 및 장치, 음성 인식 엔진 생성 방법 및 장치 |
US9821340B2 (en) * | 2014-07-28 | 2017-11-21 | Kolo Medical Ltd. | High displacement ultrasonic transducer |
US20160035344A1 (en) * | 2014-08-04 | 2016-02-04 | Google Inc. | Identifying the language of a spoken utterance |
CN110110843B (zh) | 2014-08-29 | 2020-09-25 | 谷歌有限责任公司 | 用于处理图像的方法和系统 |
JP6814146B2 (ja) * | 2014-09-25 | 2021-01-13 | サンハウス・テクノロジーズ・インコーポレーテッド | オーディオをキャプチャーし、解釈するシステムと方法 |
US10783900B2 (en) * | 2014-10-03 | 2020-09-22 | Google Llc | Convolutional, long short-term memory, fully connected deep neural networks |
US9824684B2 (en) | 2014-11-13 | 2017-11-21 | Microsoft Technology Licensing, Llc | Prediction-based sequence recognition |
US9542927B2 (en) * | 2014-11-13 | 2017-01-10 | Google Inc. | Method and system for building text-to-speech voice from diverse recordings |
US9607217B2 (en) * | 2014-12-22 | 2017-03-28 | Yahoo! Inc. | Generating preference indices for image content |
US11080587B2 (en) * | 2015-02-06 | 2021-08-03 | Deepmind Technologies Limited | Recurrent neural networks for data item generation |
US10403269B2 (en) * | 2015-03-27 | 2019-09-03 | Google Llc | Processing audio waveforms |
US20160343366A1 (en) * | 2015-05-19 | 2016-11-24 | Google Inc. | Speech synthesis model selection |
US9595002B2 (en) | 2015-05-29 | 2017-03-14 | Sas Institute Inc. | Normalizing electronic communications using a vector having a repeating substring as input for a neural network |
CN105096939B (zh) * | 2015-07-08 | 2017-07-25 | 百度在线网络技术(北京)有限公司 | 语音唤醒方法和装置 |
US9786270B2 (en) | 2015-07-09 | 2017-10-10 | Google Inc. | Generating acoustic models |
CN106375231B (zh) * | 2015-07-22 | 2019-11-05 | 华为技术有限公司 | 一种流量切换方法、设备及系统 |
KR102413692B1 (ko) | 2015-07-24 | 2022-06-27 | 삼성전자주식회사 | 음성 인식을 위한 음향 점수 계산 장치 및 방법, 음성 인식 장치 및 방법, 전자 장치 |
CN105068998B (zh) | 2015-07-29 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | 基于神经网络模型的翻译方法及装置 |
CN105321525B (zh) * | 2015-09-30 | 2019-02-22 | 北京邮电大学 | 一种降低voip通信资源开销的系统和方法 |
US10733979B2 (en) | 2015-10-09 | 2020-08-04 | Google Llc | Latency constraints for acoustic modeling |
US10395118B2 (en) | 2015-10-29 | 2019-08-27 | Baidu Usa Llc | Systems and methods for video paragraph captioning using hierarchical recurrent neural networks |
WO2017083695A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Generating target sequences from input sequences using partial conditioning |
US10319374B2 (en) | 2015-11-25 | 2019-06-11 | Baidu USA, LLC | Deployed end-to-end speech recognition |
CN105513591B (zh) * | 2015-12-21 | 2019-09-03 | 百度在线网络技术(北京)有限公司 | 用lstm循环神经网络模型进行语音识别的方法和装置 |
US10402700B2 (en) | 2016-01-25 | 2019-09-03 | Deepmind Technologies Limited | Generating images using neural networks |
CN108780519B (zh) * | 2016-03-11 | 2022-09-02 | 奇跃公司 | 卷积神经网络的结构学习 |
US10460747B2 (en) | 2016-05-10 | 2019-10-29 | Google Llc | Frequency based audio analysis using neural networks |
US9972314B2 (en) | 2016-06-01 | 2018-05-15 | Microsoft Technology Licensing, Llc | No loss-optimization for weighted transducer |
US11373672B2 (en) | 2016-06-14 | 2022-06-28 | The Trustees Of Columbia University In The City Of New York | Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments |
US9984683B2 (en) | 2016-07-22 | 2018-05-29 | Google Llc | Automatic speech recognition using multi-dimensional models |
WO2018048945A1 (en) | 2016-09-06 | 2018-03-15 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
EP3497629B1 (en) * | 2016-09-06 | 2020-11-04 | Deepmind Technologies Limited | Generating audio using neural networks |
US11080591B2 (en) | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
CN110023963B (zh) | 2016-10-26 | 2023-05-30 | 渊慧科技有限公司 | 使用神经网络处理文本序列 |
US10049106B2 (en) | 2017-01-18 | 2018-08-14 | Xerox Corporation | Natural language generation through character-based recurrent neural networks with finite-state prior knowledge |
US11934935B2 (en) | 2017-05-20 | 2024-03-19 | Deepmind Technologies Limited | Feedforward generative neural networks |
US10726858B2 (en) | 2018-06-22 | 2020-07-28 | Intel Corporation | Neural network for speech denoising trained with deep feature losses |
US10971170B2 (en) | 2018-08-08 | 2021-04-06 | Google Llc | Synthesizing speech from text using neural networks |
-
2017
- 2017-09-06 EP EP17780543.9A patent/EP3497629B1/en active Active
- 2017-09-06 CA CA3155320A patent/CA3155320A1/en active Pending
- 2017-09-06 CA CA3036067A patent/CA3036067C/en active Active
- 2017-09-06 WO PCT/US2017/050320 patent/WO2018048934A1/en unknown
- 2017-09-06 BR BR112019004524-4A patent/BR112019004524B1/pt active IP Right Grant
- 2017-09-06 AU AU2017324937A patent/AU2017324937B2/en active Active
- 2017-09-06 JP JP2019522236A patent/JP6577159B1/ja active Active
- 2017-09-06 KR KR1020197009838A patent/KR102353284B1/ko active IP Right Grant
- 2017-09-06 EP EP20195353.6A patent/EP3822863B1/en active Active
- 2017-09-06 CN CN202011082855.5A patent/CN112289342B/zh active Active
- 2017-09-06 CN CN201780065523.6A patent/CN109891434B/zh active Active
-
2018
- 2018-07-09 US US16/030,742 patent/US10304477B2/en active Active
-
2019
- 2019-04-22 US US16/390,549 patent/US10803884B2/en active Active
- 2019-08-20 JP JP2019150456A patent/JP6891236B2/ja active Active
-
2020
- 2020-09-14 US US17/020,348 patent/US11386914B2/en active Active
-
2021
- 2021-05-25 JP JP2021087708A patent/JP7213913B2/ja active Active
-
2022
- 2022-06-13 US US17/838,985 patent/US11869530B2/en active Active
-
2023
- 2023-11-27 US US18/519,986 patent/US20240135955A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09146576A (ja) * | 1995-10-31 | 1997-06-06 | Natl Sci Council | 原文対音声の人工的神経回路網にもとづく韻律の合成装置 |
CA2810457A1 (en) * | 2013-03-25 | 2014-09-25 | Gerald Bradley PENN | System and method for applying a convolutional neural network to speech recognition |
Non-Patent Citations (1)
Title |
---|
AARON VAN DEN OORD ET AL.: "WaveNet: A Generative Model for Raw Audio", ARXIV PREPRINT, JPN6019027821, 19 September 2016 (2016-09-19), US, pages 1 - 15, ISSN: 0004080090 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019045856A (ja) * | 2017-08-31 | 2019-03-22 | 国立研究開発法人情報通信研究機構 | オーディオデータ学習装置、オーディオデータ推論装置、およびプログラム |
JP7209275B2 (ja) | 2017-08-31 | 2023-01-20 | 国立研究開発法人情報通信研究機構 | オーディオデータ学習装置、オーディオデータ推論装置、およびプログラム |
JP2022528016A (ja) * | 2019-05-23 | 2022-06-07 | グーグル エルエルシー | 表現用エンドツーエンド音声合成における変分埋め込み容量 |
JP7108147B2 (ja) | 2019-05-23 | 2022-07-27 | グーグル エルエルシー | 表現用エンドツーエンド音声合成における変分埋め込み容量 |
Also Published As
Publication number | Publication date |
---|---|
CN109891434B (zh) | 2020-10-30 |
EP3497629A1 (en) | 2019-06-19 |
CN109891434A (zh) | 2019-06-14 |
US11386914B2 (en) | 2022-07-12 |
JP2021152664A (ja) | 2021-09-30 |
KR20190042730A (ko) | 2019-04-24 |
US20220319533A1 (en) | 2022-10-06 |
US20240135955A1 (en) | 2024-04-25 |
BR112019004524A2 (pt) | 2019-05-28 |
JP6891236B2 (ja) | 2021-06-18 |
JP7213913B2 (ja) | 2023-01-27 |
EP3822863A1 (en) | 2021-05-19 |
AU2017324937A1 (en) | 2019-03-28 |
AU2017324937B2 (en) | 2019-12-19 |
BR112019004524B1 (pt) | 2023-11-07 |
EP3822863B1 (en) | 2022-11-02 |
CA3036067C (en) | 2023-08-01 |
KR102353284B1 (ko) | 2022-01-19 |
CN112289342B (zh) | 2024-03-19 |
US20180322891A1 (en) | 2018-11-08 |
JP2020003809A (ja) | 2020-01-09 |
WO2018048934A1 (en) | 2018-03-15 |
CA3036067A1 (en) | 2018-03-15 |
US20190251987A1 (en) | 2019-08-15 |
US10304477B2 (en) | 2019-05-28 |
JP6577159B1 (ja) | 2019-09-18 |
US11869530B2 (en) | 2024-01-09 |
US20200411032A1 (en) | 2020-12-31 |
US10803884B2 (en) | 2020-10-13 |
CA3155320A1 (en) | 2018-03-15 |
EP3497629B1 (en) | 2020-11-04 |
CN112289342A (zh) | 2021-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6577159B1 (ja) | ニューラルネットワークを使用したオーディオの生成 | |
JP6750121B2 (ja) | 畳み込みニューラルネットワークを使用したシーケンスの処理 | |
US11948066B2 (en) | Processing sequences using convolutional neural networks | |
CN111699497A (zh) | 使用离散潜变量的序列模型的快速解码 | |
EP4150616A1 (en) | End-to-end speech waveform generation through data density gradient estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20190423 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20190423 |
|
A871 | Explanation of circumstances concerning accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A871 Effective date: 20190423 |
|
A975 | Report on accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A971005 Effective date: 20190712 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20190722 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20190821 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6577159 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |