MX2022002921A - Sistemas y metodos para correlacionar el habla y el movimiento de los labios. - Google Patents

Sistemas y metodos para correlacionar el habla y el movimiento de los labios.

Info

Publication number
MX2022002921A
MX2022002921A MX2022002921A MX2022002921A MX2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A
Authority
MX
Mexico
Prior art keywords
speaker
lip movement
speech
audio content
media file
Prior art date
Application number
MX2022002921A
Other languages
English (en)
Inventor
Yadong Wang
Shilpa Jois Rao
Original Assignee
Netflix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netflix Inc filed Critical Netflix Inc
Publication of MX2022002921A publication Critical patent/MX2022002921A/es

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

La presente invención se refiere al método implementado por computadora descrito que incluye analizar, por un sistema de detección del habla, un archivo multimedia para detectar el movimiento de los labios de un hablante que se presenta visualmente en el contenido multimedia del archivo multimedia. El método incluye además identificar, por el sistema de detección del habla, el contenido de audio dentro del archivo multimedia y mejorar la exactitud de una correlación temporal del sistema de detección del habla. El método puede implicar correlacionar el movimiento de los labios del hablante con el contenido de audio y determinar, con base en la correlación entre el movimiento de los labios del hablante y el contenido de audio, que el contenido de audio comprende el habla del hablante. El método puede implicar además la grabación, con base en la determinación de que el contenido de audio comprende el habla del hablante, la correlación temporal entre el habla y el movimiento de los labios del hablante como metadatos del archivo multimedia. Se describen varios otros métodos, sistemas y medios legibles por computadora.
MX2022002921A 2020-06-24 2021-06-22 Sistemas y metodos para correlacionar el habla y el movimiento de los labios. MX2022002921A (es)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/911,247 US20210407510A1 (en) 2020-06-24 2020-06-24 Systems and methods for correlating speech and lip movement
PCT/US2021/038515 WO2021262737A1 (en) 2020-06-24 2021-06-22 Systems and methods for correlating speech and lip movement

Publications (1)

Publication Number Publication Date
MX2022002921A true MX2022002921A (es) 2022-04-06

Family

ID=77022202

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2022002921A MX2022002921A (es) 2020-06-24 2021-06-22 Sistemas y metodos para correlacionar el habla y el movimiento de los labios.

Country Status (7)

Country Link
US (1) US20210407510A1 (es)
EP (1) EP4022608A1 (es)
AU (1) AU2021297802B2 (es)
BR (1) BR112022026466A2 (es)
CA (1) CA3146707A1 (es)
MX (1) MX2022002921A (es)
WO (1) WO2021262737A1 (es)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11538461B1 (en) * 2021-03-18 2022-12-27 Amazon Technologies, Inc. Language agnostic missing subtitle detection
CN113345472B (zh) * 2021-05-08 2022-03-25 北京百度网讯科技有限公司 语音端点检测方法、装置、电子设备及存储介质
CN113448533B (zh) * 2021-06-11 2023-10-31 阿波罗智联(北京)科技有限公司 提醒音频的生成方法、装置、电子设备和存储介质
US20230125543A1 (en) * 2021-10-26 2023-04-27 International Business Machines Corporation Generating audio files based on user generated scripts and voice components
GB2615095A (en) * 2022-01-27 2023-08-02 Sony Interactive Entertainment Europe Ltd System and method for controlling audio
CN114420124B (zh) * 2022-03-31 2022-06-24 北京妙医佳健康科技集团有限公司 一种语音识别方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3743391A (en) * 1971-05-14 1973-07-03 D White System for dubbing fresh sound tracks on motion picture films
US7343289B2 (en) * 2003-06-25 2008-03-11 Microsoft Corp. System and method for audio/video speaker detection
JP2008310382A (ja) * 2007-06-12 2008-12-25 Omron Corp 読唇装置および方法、情報処理装置および方法、検出装置および方法、プログラム、データ構造、並びに、記録媒体
US8626498B2 (en) * 2010-02-24 2014-01-07 Qualcomm Incorporated Voice activity detection based on plural voice activity detectors
WO2020081872A1 (en) * 2018-10-18 2020-04-23 Warner Bros. Entertainment Inc. Characterizing content for audio-video dubbing and other transformations

Also Published As

Publication number Publication date
CA3146707A1 (en) 2021-12-30
AU2021297802B2 (en) 2023-03-16
AU2021297802A1 (en) 2022-03-03
BR112022026466A2 (pt) 2023-01-31
EP4022608A1 (en) 2022-07-06
US20210407510A1 (en) 2021-12-30
WO2021262737A1 (en) 2021-12-30

Similar Documents

Publication Publication Date Title
MX2022002921A (es) Sistemas y metodos para correlacionar el habla y el movimiento de los labios.
EP4328905A3 (en) Recorded media hotword trigger suppression
CN103632682B (zh) 一种音频特征检测的方法
Yella et al. Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations
GB201017876D0 (en) Database systems and methods
GB2526929A (en) Captioning using socially derived acoustic profiles
Venter et al. Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings
ATE526663T1 (de) Verfahren und vorrichtung zum verarbeiten eines audiosignals
CN102087704A (zh) 信息处理装置、信息处理方法和程序
BR112022004158A2 (pt) Sistemas e métodos para geração de sinal de áudio
CN105913849A (zh) 一种基于事件检测的说话人分割方法
CN108538312B (zh) 基于贝叶斯信息准则的数字音频篡改点自动定位的方法
Wray et al. Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic.
ATE421748T1 (de) Verfahren und anordnung zur spracherkennung
CN105706167B (zh) 有语音的话音检测方法和装置
JP2011504034A5 (es)
KR20160013592A (ko) 음성 특징 벡터를 이용한 화자 분리 시스템 및 방법
TW200620923A (en) Method and system for guard interval size detection
KR101808810B1 (ko) 음성/무음성 구간 검출 방법 및 장치
Staudacher et al. Fast fundamental frequency determination via adaptive autocorrelation
Pfeiffer Pause concepts for audio segmentation at different semantic levels
Campr et al. Audio-video speaker diarization for unsupervised speaker and face model creation
WO2021087243A1 (en) Automatic geological formations tops picking using dynamic time warping (dtw)
FI20175862A1 (fi) Järjestelmä äänilähteen määrittämiseksi
Eaton et al. Detection of clipping in coded speech signals