EP4136857A4 - Ai-assisted sound effect generation for silent video - Google Patents

Ai-assisted sound effect generation for silent video

Info

Publication number
EP4136857A4
EP4136857A4 EP21787592.1A EP21787592A EP4136857A4 EP 4136857 A4 EP4136857 A4 EP 4136857A4 EP 21787592 A EP21787592 A EP 21787592A EP 4136857 A4 EP4136857 A4 EP 4136857A4
Authority
EP
European Patent Office
Prior art keywords
sound effect
effect generation
silent video
assisted
assisted sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21787592.1A
Other languages
German (de)
French (fr)
Other versions
EP4136857A1 (en
Inventor
Sudha Krishnamurthy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Interactive Entertainment Inc
Original Assignee
Sony Interactive Entertainment Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Interactive Entertainment Inc filed Critical Sony Interactive Entertainment Inc
Publication of EP4136857A1 publication Critical patent/EP4136857A1/en
Publication of EP4136857A4 publication Critical patent/EP4136857A4/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/638Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
EP21787592.1A 2020-04-14 2021-04-09 Ai-assisted sound effect generation for silent video Pending EP4136857A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/848,512 US11381888B2 (en) 2020-04-14 2020-04-14 AI-assisted sound effect generation for silent video
PCT/US2021/026554 WO2021211368A1 (en) 2020-04-14 2021-04-09 Ai-assisted sound effect generation for silent video

Publications (2)

Publication Number Publication Date
EP4136857A1 EP4136857A1 (en) 2023-02-22
EP4136857A4 true EP4136857A4 (en) 2024-04-24

Family

ID=78007346

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21787592.1A Pending EP4136857A4 (en) 2020-04-14 2021-04-09 Ai-assisted sound effect generation for silent video

Country Status (5)

Country Link
US (1) US11381888B2 (en)
EP (1) EP4136857A4 (en)
JP (1) JP2023521866A (en)
CN (1) CN115428469A (en)
WO (1) WO2021211368A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461235B (en) * 2020-03-31 2021-07-16 合肥工业大学 Audio and video data processing method and system, electronic equipment and storage medium
US11386302B2 (en) 2020-04-13 2022-07-12 Google Llc Systems and methods for contrastive learning of visual representations
US11694084B2 (en) * 2020-04-14 2023-07-04 Sony Interactive Entertainment Inc. Self-supervised AI-assisted sound effect recommendation for silent video
US11615312B2 (en) 2020-04-14 2023-03-28 Sony Interactive Entertainment Inc. Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering
CN114648982B (en) * 2022-05-24 2022-07-26 四川大学 Controller voice recognition method and device based on comparison learning
CN114822512B (en) * 2022-06-29 2022-09-02 腾讯科技(深圳)有限公司 Audio data processing method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9094636B1 (en) * 2005-07-14 2015-07-28 Zaxcom, Inc. Systems and methods for remotely controlling local audio devices in a virtual wireless multitrack recording system
US8654250B2 (en) * 2010-03-30 2014-02-18 Sony Corporation Deriving visual rhythm from video signals
US9373320B1 (en) * 2013-08-21 2016-06-21 Google Inc. Systems and methods facilitating selective removal of content from a mixed audio recording
US10459995B2 (en) * 2016-12-22 2019-10-29 Shutterstock, Inc. Search engine for processing image search queries in multiple languages
CN108922551B (en) * 2017-05-16 2021-02-05 博通集成电路(上海)股份有限公司 Circuit and method for compensating lost frame
US11276419B2 (en) * 2019-07-30 2022-03-15 International Business Machines Corporation Synchronized sound generation from videos

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DONGHUO ZENG ET AL: "Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 10 August 2019 (2019-08-10), XP081459707 *
HAO ZHU ET AL: "Deep Audio-Visual Learning: A Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 January 2020 (2020-01-14), XP081578387 *
HONG SUNGEUN ET AL: "CBVMR: Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint", PROCEEDINGS OF THE 2018 ACM ON INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 5 June 2018 (2018-06-05), New York, NY, USA, pages 353 - 361, XP055908308, ISBN: 978-1-4503-5046-4, Retrieved from the Internet <URL:https://arxiv.org/pdf/1704.06761.pdf> DOI: 10.1145/3206025.3206046 *
See also references of WO2021211368A1 *

Also Published As

Publication number Publication date
US11381888B2 (en) 2022-07-05
WO2021211368A1 (en) 2021-10-21
US20210321172A1 (en) 2021-10-14
EP4136857A1 (en) 2023-02-22
CN115428469A (en) 2022-12-02
JP2023521866A (en) 2023-05-25

Similar Documents

Publication Publication Date Title
EP4136857A4 (en) Ai-assisted sound effect generation for silent video
EP4139626A4 (en) Sound suppressor
GB2600600B (en) Synchronized sound generation from videos
CA200774S (en) Microphone
CA215592S (en) Microphone
CA200776S (en) Microphone
EP4228284A4 (en) Sound outputting apparatus
CA206866S (en) Speaker
GB202211297D0 (en) Generating synchronized sound from videos
GB202003141D0 (en) Sound field microphones
CA200556S (en) Speaker microphone
CA201343S (en) Speaker
GB2591222B (en) Sound reproduction
GB202020825D0 (en) Audio synchronisation
GB202309656D0 (en) Device for generating sound
GB202308194D0 (en) Device for generating sound
AU2023901210A0 (en) Generating Sound
GB202317432D0 (en) Audio signal generation
GB202315797D0 (en) Sound apparatus
EP4136178A4 (en) Sound deadener composition
CA207678S (en) Speaker
CA222743S (en) Speaker
GB2597844B (en) Speaker
CA204917S (en) Speaker
CA198264S (en) Speaker

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20221011

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Free format text: PREVIOUS MAIN CLASS: H04N0021854000

Ipc: G06F0016630000

A4 Supplementary search report drawn up and despatched

Effective date: 20240327

RIC1 Information provided on ipc code assigned before grant

Ipc: G11B 27/28 20060101ALI20240321BHEP

Ipc: H04N 21/845 20110101ALI20240321BHEP

Ipc: H04N 21/44 20110101ALI20240321BHEP

Ipc: H04N 21/439 20110101ALI20240321BHEP

Ipc: G11B 27/031 20060101ALI20240321BHEP

Ipc: G10L 13/027 20130101ALI20240321BHEP

Ipc: G06V 10/82 20220101ALI20240321BHEP

Ipc: G06N 3/084 20230101ALI20240321BHEP

Ipc: G06N 3/045 20230101ALI20240321BHEP

Ipc: G06N 3/044 20230101ALI20240321BHEP

Ipc: H04N 21/854 20110101ALI20240321BHEP

Ipc: G06F 16/63 20190101AFI20240321BHEP