BR112022001300A2 - Dispositivo de processamento de fala, método de processamento de fala, e mídia de gravação - Google Patents

Dispositivo de processamento de fala, método de processamento de fala, e mídia de gravação

Info

Publication number
BR112022001300A2
BR112022001300A2 BR112022001300A BR112022001300A BR112022001300A2 BR 112022001300 A2 BR112022001300 A2 BR 112022001300A2 BR 112022001300 A BR112022001300 A BR 112022001300A BR 112022001300 A BR112022001300 A BR 112022001300A BR 112022001300 A2 BR112022001300 A2 BR 112022001300A2
Authority
BR
Brazil
Prior art keywords
utterance
speech processing
speaker
processing device
utterance data
Prior art date
Application number
BR112022001300A
Other languages
English (en)
Inventor
Kazuyuki Sasaki
Original Assignee
Nec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corp filed Critical Nec Corp
Publication of BR112022001300A2 publication Critical patent/BR112022001300A2/pt

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

dispositivo de processamento de fala, método de processamento de fala, e mídia de gravação. a presente invenção refere-se a um dispositivo de processamento de fala, por exemplo, capaz de realizar processamento levando em consideração a influência do ruído com relação a fala produzida por uma elocução pela pessoa. uma unidade de extração de orador extrai uma área do orador a partir de uma imagem. uma unidade de geração de primeiros dados de elocução, com base no formato dos lábios do orador, gera os primeiros dados de elocução que indicam o conteúdo da elocução pelo orador. uma unidade de geração de segundos dados de elocução, com base em um sinal de fala que corresponde a uma elocução pelo orador, gera os segundos dados de elocução que indicam o conteúdo da elocução pelo orador. uma unidade de comparação compara os primeiros dados de elocução e os segundos dados de elocução um com o outro.
BR112022001300A 2019-08-02 2020-07-29 Dispositivo de processamento de fala, método de processamento de fala, e mídia de gravação BR112022001300A2 (pt)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019142951 2019-08-02
PCT/JP2020/028955 WO2021024869A1 (ja) 2019-08-02 2020-07-29 音声処理装置、音声処理方法、および記録媒体

Publications (1)

Publication Number Publication Date
BR112022001300A2 true BR112022001300A2 (pt) 2022-03-22

Family

ID=74503621

Family Applications (1)

Application Number Title Priority Date Filing Date
BR112022001300A BR112022001300A2 (pt) 2019-08-02 2020-07-29 Dispositivo de processamento de fala, método de processamento de fala, e mídia de gravação

Country Status (6)

Country Link
US (1) US20220262363A1 (pt)
EP (1) EP4009629A4 (pt)
JP (1) JP7347511B2 (pt)
CN (1) CN114175147A (pt)
BR (1) BR112022001300A2 (pt)
WO (1) WO2021024869A1 (pt)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116110373B (zh) * 2023-04-12 2023-06-09 深圳市声菲特科技技术有限公司 智能会议系统的语音数据采集方法及相关装置

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS59182687A (ja) * 1983-04-01 1984-10-17 Nippon Telegr & Teleph Corp <Ntt> 静止画像通信会議方式
US5528728A (en) * 1993-07-12 1996-06-18 Kabushiki Kaisha Meidensha Speaker independent speech recognition system and method using neural network and DTW matching technique
JPH08187368A (ja) * 1994-05-13 1996-07-23 Matsushita Electric Ind Co Ltd ゲーム装置、入力装置、音声選択装置、音声認識装置及び音声反応装置
JP2004024863A (ja) * 1994-05-13 2004-01-29 Matsushita Electric Ind Co Ltd 口唇認識装置および発生区間認識装置
AU2001296459A1 (en) * 2000-10-02 2002-04-15 Clarity, L.L.C. Audio visual speech processing
US7257538B2 (en) * 2002-10-07 2007-08-14 Intel Corporation Generating animation from visual and audio input
US20050047664A1 (en) * 2003-08-27 2005-03-03 Nefian Ara Victor Identifying a speaker using markov models
JP5257680B2 (ja) * 2006-03-30 2013-08-07 本田技研工業株式会社 音声認識装置
JP4462339B2 (ja) * 2007-12-07 2010-05-12 ソニー株式会社 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム
US8798311B2 (en) * 2009-01-23 2014-08-05 Eldon Technology Limited Scrolling display of electronic program guide utilizing images of user lip movements
JP2010262424A (ja) * 2009-05-01 2010-11-18 Nikon Corp 車載カメラシステム
JP2011013731A (ja) 2009-06-30 2011-01-20 Sony Corp 情報処理装置、情報処理方法、およびプログラム
JP2011186351A (ja) * 2010-03-11 2011-09-22 Sony Corp 情報処理装置、および情報処理方法、並びにプログラム
JP5849761B2 (ja) * 2012-02-22 2016-02-03 日本電気株式会社 音声認識システム、音声認識方法および音声認識プログラム
MX2018002176A (es) 2015-08-20 2018-06-15 Sol Gel Tech Ltd Composiciones para aplicaciones topicas que comprenden peroxido de benzoilo y adapaleno.
US9940932B2 (en) * 2016-03-02 2018-04-10 Wipro Limited System and method for speech-to-text conversion
JP2018091954A (ja) * 2016-12-01 2018-06-14 オリンパス株式会社 音声認識装置、及び音声認識方法
US11456005B2 (en) * 2017-11-22 2022-09-27 Google Llc Audio-visual speech separation
JP7081164B2 (ja) * 2018-01-17 2022-06-07 株式会社Jvcケンウッド 表示制御装置、通信装置、表示制御方法および通信方法
US20190371318A1 (en) * 2018-02-15 2019-12-05 DMAI, Inc. System and method for adaptive detection of spoken language via multiple speech models

Also Published As

Publication number Publication date
JP7347511B2 (ja) 2023-09-20
EP4009629A1 (en) 2022-06-08
WO2021024869A1 (ja) 2021-02-11
US20220262363A1 (en) 2022-08-18
EP4009629A4 (en) 2022-09-21
CN114175147A (zh) 2022-03-11
JPWO2021024869A1 (pt) 2021-02-11

Similar Documents

Publication Publication Date Title
Wang et al. Secure your voice: An oral airflow-based continuous liveness detection for voice assistants
KR102493289B1 (ko) 핫워드 억제
Vannoni et al. Individual acoustic variation in fallow deer (Dama dama) common and harsh groans: a source‐filter theory perspective
Yella et al. Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations
Stoeger et al. Visualizing sound emission of elephant vocalizations: evidence for two rumble production types
BR112021024196A2 (pt) Sistemas e métodos para aprendizado de máquina de atributos de voz
KR20200118894A (ko) 미리 레코딩된 비디오들에 대한 자동화된 보이스 번역 더빙
Pan et al. Selective listening by synchronizing speech with lips
SG179092A1 (en) Multifunction multimedia device
DE60326743D1 (de) Fingerabdruckextraktion
US20130262117A1 (en) Spoken dialog system using prominence
US8571873B2 (en) Systems and methods for reconstruction of a smooth speech signal from a stuttered speech signal
JP2016062357A (ja) 音声翻訳装置、方法およびプログラム
US20120078625A1 (en) Waveform analysis of speech
BR112022001300A2 (pt) Dispositivo de processamento de fala, método de processamento de fala, e mídia de gravação
Li et al. Learning normality is enough: a software-based mitigation against inaudible voice attacks
Fecher The'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear.
US11523186B2 (en) Automated audio mapping using an artificial neural network
Jaroslavceva et al. Robot Ego‐Noise Suppression with Labanotation‐Template Subtraction
Li et al. Cross-Domain Audio Deepfake Detection: Dataset and Analysis
BRPI0406956A (pt) Quantificação do tom para reconhecimento de fala distribuìda
Heckmann Steps towards more natural human-machine interaction via audio-visual word prominence detection
Abdo et al. Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech.
Kunka et al. Multimodal English corpus for automatic speech recognition
Kirby et al. Elicitation context does not drive F0 lowering following voiced stops: Evidence from French and Italian