MX2022002921A - Sistemas y metodos para correlacionar el habla y el movimiento de los labios. - Google Patents
Sistemas y metodos para correlacionar el habla y el movimiento de los labios.Info
- Publication number
- MX2022002921A MX2022002921A MX2022002921A MX2022002921A MX2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A MX 2022002921 A MX2022002921 A MX 2022002921A
- Authority
- MX
- Mexico
- Prior art keywords
- speaker
- lip movement
- speech
- audio content
- media file
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 6
- 238000001514 detection method Methods 0.000 abstract 3
- 230000002123 temporal effect Effects 0.000 abstract 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
Abstract
La presente invención se refiere al método implementado por computadora descrito que incluye analizar, por un sistema de detección del habla, un archivo multimedia para detectar el movimiento de los labios de un hablante que se presenta visualmente en el contenido multimedia del archivo multimedia. El método incluye además identificar, por el sistema de detección del habla, el contenido de audio dentro del archivo multimedia y mejorar la exactitud de una correlación temporal del sistema de detección del habla. El método puede implicar correlacionar el movimiento de los labios del hablante con el contenido de audio y determinar, con base en la correlación entre el movimiento de los labios del hablante y el contenido de audio, que el contenido de audio comprende el habla del hablante. El método puede implicar además la grabación, con base en la determinación de que el contenido de audio comprende el habla del hablante, la correlación temporal entre el habla y el movimiento de los labios del hablante como metadatos del archivo multimedia. Se describen varios otros métodos, sistemas y medios legibles por computadora.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/911,247 US20210407510A1 (en) | 2020-06-24 | 2020-06-24 | Systems and methods for correlating speech and lip movement |
PCT/US2021/038515 WO2021262737A1 (en) | 2020-06-24 | 2021-06-22 | Systems and methods for correlating speech and lip movement |
Publications (1)
Publication Number | Publication Date |
---|---|
MX2022002921A true MX2022002921A (es) | 2022-04-06 |
Family
ID=77022202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
MX2022002921A MX2022002921A (es) | 2020-06-24 | 2021-06-22 | Sistemas y metodos para correlacionar el habla y el movimiento de los labios. |
Country Status (7)
Country | Link |
---|---|
US (1) | US20210407510A1 (es) |
EP (1) | EP4022608A1 (es) |
AU (1) | AU2021297802B2 (es) |
BR (1) | BR112022026466A2 (es) |
CA (1) | CA3146707A1 (es) |
MX (1) | MX2022002921A (es) |
WO (1) | WO2021262737A1 (es) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11538461B1 (en) * | 2021-03-18 | 2022-12-27 | Amazon Technologies, Inc. | Language agnostic missing subtitle detection |
CN113345472B (zh) * | 2021-05-08 | 2022-03-25 | 北京百度网讯科技有限公司 | 语音端点检测方法、装置、电子设备及存储介质 |
CN113448533B (zh) * | 2021-06-11 | 2023-10-31 | 阿波罗智联(北京)科技有限公司 | 提醒音频的生成方法、装置、电子设备和存储介质 |
US20230125543A1 (en) * | 2021-10-26 | 2023-04-27 | International Business Machines Corporation | Generating audio files based on user generated scripts and voice components |
GB2615095A (en) * | 2022-01-27 | 2023-08-02 | Sony Interactive Entertainment Europe Ltd | System and method for controlling audio |
CN114420124B (zh) * | 2022-03-31 | 2022-06-24 | 北京妙医佳健康科技集团有限公司 | 一种语音识别方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3743391A (en) * | 1971-05-14 | 1973-07-03 | D White | System for dubbing fresh sound tracks on motion picture films |
US7343289B2 (en) * | 2003-06-25 | 2008-03-11 | Microsoft Corp. | System and method for audio/video speaker detection |
JP2008310382A (ja) * | 2007-06-12 | 2008-12-25 | Omron Corp | 読唇装置および方法、情報処理装置および方法、検出装置および方法、プログラム、データ構造、並びに、記録媒体 |
US8626498B2 (en) * | 2010-02-24 | 2014-01-07 | Qualcomm Incorporated | Voice activity detection based on plural voice activity detectors |
WO2020081872A1 (en) * | 2018-10-18 | 2020-04-23 | Warner Bros. Entertainment Inc. | Characterizing content for audio-video dubbing and other transformations |
-
2020
- 2020-06-24 US US16/911,247 patent/US20210407510A1/en active Pending
-
2021
- 2021-06-22 EP EP21745500.5A patent/EP4022608A1/en active Pending
- 2021-06-22 MX MX2022002921A patent/MX2022002921A/es unknown
- 2021-06-22 WO PCT/US2021/038515 patent/WO2021262737A1/en unknown
- 2021-06-22 CA CA3146707A patent/CA3146707A1/en active Pending
- 2021-06-22 AU AU2021297802A patent/AU2021297802B2/en active Active
- 2021-06-22 BR BR112022026466A patent/BR112022026466A2/pt unknown
Also Published As
Publication number | Publication date |
---|---|
CA3146707A1 (en) | 2021-12-30 |
AU2021297802B2 (en) | 2023-03-16 |
AU2021297802A1 (en) | 2022-03-03 |
BR112022026466A2 (pt) | 2023-01-31 |
EP4022608A1 (en) | 2022-07-06 |
US20210407510A1 (en) | 2021-12-30 |
WO2021262737A1 (en) | 2021-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
MX2022002921A (es) | Sistemas y metodos para correlacionar el habla y el movimiento de los labios. | |
EP4328905A3 (en) | Recorded media hotword trigger suppression | |
CN103632682B (zh) | 一种音频特征检测的方法 | |
Yella et al. | Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations | |
GB201017876D0 (en) | Database systems and methods | |
GB2526929A (en) | Captioning using socially derived acoustic profiles | |
Venter et al. | Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings | |
ATE526663T1 (de) | Verfahren und vorrichtung zum verarbeiten eines audiosignals | |
CN102087704A (zh) | 信息处理装置、信息处理方法和程序 | |
BR112022004158A2 (pt) | Sistemas e métodos para geração de sinal de áudio | |
CN105913849A (zh) | 一种基于事件检测的说话人分割方法 | |
CN108538312B (zh) | 基于贝叶斯信息准则的数字音频篡改点自动定位的方法 | |
Wray et al. | Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic. | |
ATE421748T1 (de) | Verfahren und anordnung zur spracherkennung | |
CN105706167B (zh) | 有语音的话音检测方法和装置 | |
JP2011504034A5 (es) | ||
KR20160013592A (ko) | 음성 특징 벡터를 이용한 화자 분리 시스템 및 방법 | |
TW200620923A (en) | Method and system for guard interval size detection | |
KR101808810B1 (ko) | 음성/무음성 구간 검출 방법 및 장치 | |
Staudacher et al. | Fast fundamental frequency determination via adaptive autocorrelation | |
Pfeiffer | Pause concepts for audio segmentation at different semantic levels | |
Campr et al. | Audio-video speaker diarization for unsupervised speaker and face model creation | |
WO2021087243A1 (en) | Automatic geological formations tops picking using dynamic time warping (dtw) | |
FI20175862A1 (fi) | Järjestelmä äänilähteen määrittämiseksi | |
Eaton et al. | Detection of clipping in coded speech signals |