EP4136857A4 - Ai-assisted sound effect generation for silent video - Google Patents
Ai-assisted sound effect generation for silent videoInfo
- Publication number
- EP4136857A4 EP4136857A4 EP21787592.1A EP21787592A EP4136857A4 EP 4136857 A4 EP4136857 A4 EP 4136857A4 EP 21787592 A EP21787592 A EP 21787592A EP 4136857 A4 EP4136857 A4 EP 4136857A4
- Authority
- EP
- European Patent Office
- Prior art keywords
- sound effect
- effect generation
- silent video
- assisted
- assisted sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/638—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/848,512 US11381888B2 (en) | 2020-04-14 | 2020-04-14 | AI-assisted sound effect generation for silent video |
PCT/US2021/026554 WO2021211368A1 (en) | 2020-04-14 | 2021-04-09 | Ai-assisted sound effect generation for silent video |
Publications (2)
Publication Number | Publication Date |
---|---|
EP4136857A1 EP4136857A1 (en) | 2023-02-22 |
EP4136857A4 true EP4136857A4 (en) | 2024-04-24 |
Family
ID=78007346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21787592.1A Pending EP4136857A4 (en) | 2020-04-14 | 2021-04-09 | Ai-assisted sound effect generation for silent video |
Country Status (5)
Country | Link |
---|---|
US (1) | US11381888B2 (en) |
EP (1) | EP4136857A4 (en) |
JP (1) | JP2023521866A (en) |
CN (1) | CN115428469A (en) |
WO (1) | WO2021211368A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461235B (en) * | 2020-03-31 | 2021-07-16 | 合肥工业大学 | Audio and video data processing method and system, electronic equipment and storage medium |
US11386302B2 (en) | 2020-04-13 | 2022-07-12 | Google Llc | Systems and methods for contrastive learning of visual representations |
US11694084B2 (en) * | 2020-04-14 | 2023-07-04 | Sony Interactive Entertainment Inc. | Self-supervised AI-assisted sound effect recommendation for silent video |
US11615312B2 (en) | 2020-04-14 | 2023-03-28 | Sony Interactive Entertainment Inc. | Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering |
CN114648982B (en) * | 2022-05-24 | 2022-07-26 | 四川大学 | Controller voice recognition method and device based on comparison learning |
CN114822512B (en) * | 2022-06-29 | 2022-09-02 | 腾讯科技(深圳)有限公司 | Audio data processing method and device, electronic equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9094636B1 (en) * | 2005-07-14 | 2015-07-28 | Zaxcom, Inc. | Systems and methods for remotely controlling local audio devices in a virtual wireless multitrack recording system |
US8654250B2 (en) * | 2010-03-30 | 2014-02-18 | Sony Corporation | Deriving visual rhythm from video signals |
US9373320B1 (en) * | 2013-08-21 | 2016-06-21 | Google Inc. | Systems and methods facilitating selective removal of content from a mixed audio recording |
US10459995B2 (en) * | 2016-12-22 | 2019-10-29 | Shutterstock, Inc. | Search engine for processing image search queries in multiple languages |
CN108922551B (en) * | 2017-05-16 | 2021-02-05 | 博通集成电路(上海)股份有限公司 | Circuit and method for compensating lost frame |
US11276419B2 (en) * | 2019-07-30 | 2022-03-15 | International Business Machines Corporation | Synchronized sound generation from videos |
-
2020
- 2020-04-14 US US16/848,512 patent/US11381888B2/en active Active
-
2021
- 2021-04-09 WO PCT/US2021/026554 patent/WO2021211368A1/en unknown
- 2021-04-09 CN CN202180028673.6A patent/CN115428469A/en active Pending
- 2021-04-09 JP JP2022562558A patent/JP2023521866A/en active Pending
- 2021-04-09 EP EP21787592.1A patent/EP4136857A4/en active Pending
Non-Patent Citations (4)
Title |
---|
DONGHUO ZENG ET AL: "Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 10 August 2019 (2019-08-10), XP081459707 * |
HAO ZHU ET AL: "Deep Audio-Visual Learning: A Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 January 2020 (2020-01-14), XP081578387 * |
HONG SUNGEUN ET AL: "CBVMR: Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint", PROCEEDINGS OF THE 2018 ACM ON INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 5 June 2018 (2018-06-05), New York, NY, USA, pages 353 - 361, XP055908308, ISBN: 978-1-4503-5046-4, Retrieved from the Internet <URL:https://arxiv.org/pdf/1704.06761.pdf> DOI: 10.1145/3206025.3206046 * |
See also references of WO2021211368A1 * |
Also Published As
Publication number | Publication date |
---|---|
US11381888B2 (en) | 2022-07-05 |
WO2021211368A1 (en) | 2021-10-21 |
US20210321172A1 (en) | 2021-10-14 |
EP4136857A1 (en) | 2023-02-22 |
CN115428469A (en) | 2022-12-02 |
JP2023521866A (en) | 2023-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4136857A4 (en) | Ai-assisted sound effect generation for silent video | |
EP4139626A4 (en) | Sound suppressor | |
GB2600600B (en) | Synchronized sound generation from videos | |
CA200774S (en) | Microphone | |
CA215592S (en) | Microphone | |
CA200776S (en) | Microphone | |
EP4228284A4 (en) | Sound outputting apparatus | |
CA206866S (en) | Speaker | |
GB202211297D0 (en) | Generating synchronized sound from videos | |
GB202003141D0 (en) | Sound field microphones | |
CA200556S (en) | Speaker microphone | |
CA201343S (en) | Speaker | |
GB2591222B (en) | Sound reproduction | |
GB202020825D0 (en) | Audio synchronisation | |
GB202309656D0 (en) | Device for generating sound | |
GB202308194D0 (en) | Device for generating sound | |
AU2023901210A0 (en) | Generating Sound | |
GB202317432D0 (en) | Audio signal generation | |
GB202315797D0 (en) | Sound apparatus | |
EP4136178A4 (en) | Sound deadener composition | |
CA207678S (en) | Speaker | |
CA222743S (en) | Speaker | |
GB2597844B (en) | Speaker | |
CA204917S (en) | Speaker | |
CA198264S (en) | Speaker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20221011 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Free format text: PREVIOUS MAIN CLASS: H04N0021854000 Ipc: G06F0016630000 |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20240327 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G11B 27/28 20060101ALI20240321BHEP Ipc: H04N 21/845 20110101ALI20240321BHEP Ipc: H04N 21/44 20110101ALI20240321BHEP Ipc: H04N 21/439 20110101ALI20240321BHEP Ipc: G11B 27/031 20060101ALI20240321BHEP Ipc: G10L 13/027 20130101ALI20240321BHEP Ipc: G06V 10/82 20220101ALI20240321BHEP Ipc: G06N 3/084 20230101ALI20240321BHEP Ipc: G06N 3/045 20230101ALI20240321BHEP Ipc: G06N 3/044 20230101ALI20240321BHEP Ipc: H04N 21/854 20110101ALI20240321BHEP Ipc: G06F 16/63 20190101AFI20240321BHEP |