JP7171911B2 - ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成 - Google Patents

ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成 Download PDF

Info

Publication number
JP7171911B2
JP7171911B2 JP2021520598A JP2021520598A JP7171911B2 JP 7171911 B2 JP7171911 B2 JP 7171911B2 JP 2021520598 A JP2021520598 A JP 2021520598A JP 2021520598 A JP2021520598 A JP 2021520598A JP 7171911 B2 JP7171911 B2 JP 7171911B2
Authority
JP
Japan
Prior art keywords
audio
digital
data processing
processing system
digital component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2021520598A
Other languages
English (en)
Japanese (ja)
Other versions
JP2022540263A (ja
Inventor
マシュー・シャリフィ
ヴィクター・カルブネ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of JP2022540263A publication Critical patent/JP2022540263A/ja
Priority to JP2022176503A priority Critical patent/JP7525575B2/ja
Application granted granted Critical
Publication of JP7171911B2 publication Critical patent/JP7171911B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional [3D] objects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4882Data services, e.g. news ticker for displaying messages, e.g. warnings, reminders
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)
  • Television Signal Processing For Recording (AREA)
  • Document Processing Apparatus (AREA)
JP2021520598A 2020-06-09 2020-06-09 ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成 Active JP7171911B2 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022176503A JP7525575B2 (ja) 2020-06-09 2022-11-02 ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2020/036749 WO2021251953A1 (en) 2020-06-09 2020-06-09 Generation of interactive audio tracks from visual content

Related Child Applications (1)

Application Number Title Priority Date Filing Date
JP2022176503A Division JP7525575B2 (ja) 2020-06-09 2022-11-02 ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成

Publications (2)

Publication Number Publication Date
JP2022540263A JP2022540263A (ja) 2022-09-15
JP7171911B2 true JP7171911B2 (ja) 2022-11-15

Family

ID=71465407

Family Applications (2)

Application Number Title Priority Date Filing Date
JP2021520598A Active JP7171911B2 (ja) 2020-06-09 2020-06-09 ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成
JP2022176503A Active JP7525575B2 (ja) 2020-06-09 2022-11-02 ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成

Family Applications After (1)

Application Number Title Priority Date Filing Date
JP2022176503A Active JP7525575B2 (ja) 2020-06-09 2022-11-02 ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成

Country Status (6)

Country Link
US (2) US12230252B2 (https=)
EP (2) EP4478338B1 (https=)
JP (2) JP7171911B2 (https=)
KR (1) KR102765838B1 (https=)
CN (2) CN114080817B (https=)
WO (1) WO2021251953A1 (https=)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11010428B2 (en) * 2018-01-16 2021-05-18 Google Llc Systems, methods, and apparatuses for providing assistant deep links to effectuate third-party dialog session transfers
CN111753553B (zh) * 2020-07-06 2022-07-05 北京世纪好未来教育科技有限公司 语句类型识别方法、装置、电子设备和存储介质
US12045717B2 (en) * 2020-12-09 2024-07-23 International Business Machines Corporation Automatic creation of difficult annotated data leveraging cues
US20240153488A1 (en) * 2021-03-17 2024-05-09 Pioneer Corporation Sound output control device, sound output control method, and sound output control program
US20230005486A1 (en) * 2021-07-02 2023-01-05 Pindrop Security, Inc. Speaker embedding conversion for backward and cross-channel compatability
US20230098356A1 (en) * 2021-09-30 2023-03-30 Meta Platforms, Inc. Systems and methods for identifying candidate videos for audio experiences
KR20230056923A (ko) * 2021-10-21 2023-04-28 주식회사 캐스트유 음원을 위한 키워드 생성방법
US12230278B1 (en) * 2022-02-22 2025-02-18 Amazon Technologies, Inc. Output of visual supplemental content
EP4420751A1 (en) * 2023-02-23 2024-08-28 Sony Interactive Entertainment Inc. Generating a musical score for a video game
JP7497502B1 (ja) 2023-08-14 2024-06-10 株式会社コロプラ プログラム及びシステム
US20250118287A1 (en) * 2023-10-06 2025-04-10 Google Llc Sonifying Visual Content For Vision-Impaired Users
US12604062B2 (en) * 2024-04-03 2026-04-14 Nec Corporation Of America Enhancing media consumption experience through generative AI-powered interactive companions
US12561348B2 (en) * 2024-05-29 2026-02-24 Microsoft Technology Licensing, Llc Semantic-tree-based AI content management platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002328949A (ja) 2001-04-27 2002-11-15 Hitachi Ltd デジタルコンテンツ視聴方法およびシステム
JP2013517739A (ja) 2010-01-20 2013-05-16 マイクロソフト コーポレーション ミックスされた能力を備えたデバイス及びインターフェース間における通信セッション
WO2019216969A1 (en) 2018-05-07 2019-11-14 Google Llc Synchronizing access controls between computing devices

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6876968B2 (en) * 2001-03-08 2005-04-05 Matsushita Electric Industrial Co., Ltd. Run time synthesizer adaptation to improve intelligibility of synthesized speech
US20060229877A1 (en) * 2005-04-06 2006-10-12 Jilei Tian Memory usage in a text-to-speech system
US8996376B2 (en) * 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US20110047163A1 (en) * 2009-08-24 2011-02-24 Google Inc. Relevance-Based Image Selection
US8862985B2 (en) * 2012-06-08 2014-10-14 Freedom Scientific, Inc. Screen reader with customizable web page output
US9536528B2 (en) * 2012-07-03 2017-01-03 Google Inc. Determining hotword suitability
CN104795066A (zh) * 2014-01-17 2015-07-22 株式会社Ntt都科摩 语音识别方法和装置
US20160048561A1 (en) * 2014-08-15 2016-02-18 Chacha Search, Inc. Method, system, and computer readable storage for podcasting and video training in an information search system
US20160379638A1 (en) * 2015-06-26 2016-12-29 Amazon Technologies, Inc. Input speech quality matching
US9792835B2 (en) * 2016-02-05 2017-10-17 Microsoft Technology Licensing, Llc Proxemic interfaces for exploring imagery
US10049670B2 (en) * 2016-06-06 2018-08-14 Google Llc Providing voice action discoverability example for trigger term
CN107516511B (zh) * 2016-06-13 2021-05-25 微软技术许可有限责任公司 意图识别和情绪的文本到语音学习系统
US10141006B1 (en) * 2016-06-27 2018-11-27 Amazon Technologies, Inc. Artificial intelligence system for improving accessibility of digitized speech
US20180082679A1 (en) * 2016-09-18 2018-03-22 Newvoicemedia, Ltd. Optimal human-machine conversations using emotion-enhanced natural speech using hierarchical neural networks and reinforcement learning
US10311856B2 (en) 2016-10-03 2019-06-04 Google Llc Synthesized voice selection for computational agents
US10387488B2 (en) 2016-12-07 2019-08-20 At7T Intellectual Property I, L.P. User configurable radio
US11722571B1 (en) * 2016-12-20 2023-08-08 Amazon Technologies, Inc. Recipient device presence activity monitoring for a communications session
US10559309B2 (en) * 2016-12-22 2020-02-11 Google Llc Collaborative voice controlled devices
US10708313B2 (en) * 2016-12-30 2020-07-07 Google Llc Multimodal transmission of packetized data
KR102622356B1 (ko) * 2017-04-20 2024-01-08 구글 엘엘씨 장치에 대한 다중 사용자 인증
EP4060476B1 (en) * 2017-06-13 2025-08-06 Google LLC Establishment of audio-based network sessions with non-registered resources
CN108615526B (zh) * 2018-05-08 2020-07-07 腾讯科技(深圳)有限公司 语音信号中关键词的检测方法、装置、终端及存储介质
US11094324B2 (en) * 2019-05-14 2021-08-17 Motorola Mobility Llc Accumulative multi-cue activation of domain-specific automatic speech recognition engine
JP7191792B2 (ja) * 2019-08-23 2022-12-19 株式会社東芝 情報処理装置、情報処理方法およびプログラム
US11361759B2 (en) * 2019-11-18 2022-06-14 Streamingo Solutions Private Limited Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002328949A (ja) 2001-04-27 2002-11-15 Hitachi Ltd デジタルコンテンツ視聴方法およびシステム
JP2013517739A (ja) 2010-01-20 2013-05-16 マイクロソフト コーポレーション ミックスされた能力を備えたデバイス及びインターフェース間における通信セッション
WO2019216969A1 (en) 2018-05-07 2019-11-14 Google Llc Synchronizing access controls between computing devices

Also Published As

Publication number Publication date
US20220157300A1 (en) 2022-05-19
JP2022540263A (ja) 2022-09-15
EP4478338A2 (en) 2024-12-18
CN118714395A (zh) 2024-09-27
CN114080817A (zh) 2022-02-22
KR20230021556A (ko) 2023-02-14
JP7525575B2 (ja) 2024-07-30
WO2021251953A1 (en) 2021-12-16
JP2023024987A (ja) 2023-02-21
CN114080817B (zh) 2024-07-02
KR102765838B1 (ko) 2025-02-11
EP3948516B1 (en) 2024-11-13
EP4478338A3 (en) 2025-02-19
US20250061892A1 (en) 2025-02-20
US12230252B2 (en) 2025-02-18
EP4478338B1 (en) 2026-04-01
EP3948516A1 (en) 2022-02-09

Similar Documents

Publication Publication Date Title
JP7171911B2 (ja) ビジュアルコンテンツからのインタラクティブなオーディオトラックの生成
US11929069B2 (en) Proactive incorporation of unsolicited content into human-to-computer dialogs
EP3669283B1 (en) Network source identification via audio signals
US12106084B2 (en) Debugging applications for delivery via an application delivery server
US12380160B2 (en) Responding to queries with voice recordings
US20240427581A1 (en) Debugging applications for delivery via an application delivery server
US11385990B2 (en) Debugging applications for delivery via an application delivery server
EP4143674B1 (en) Bit vector-based content matching for third-party digital assistant actions
US12561366B2 (en) Real-time micro-profile generation using a dynamic tree structure

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20210614

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20221003

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20221102

R150 Certificate of patent or registration of utility model

Ref document number: 7171911

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250