BR112023019971A2 - Reconhecimento de fala visual adaptativo - Google Patents
Reconhecimento de fala visual adaptativoInfo
- Publication number
- BR112023019971A2 BR112023019971A2 BR112023019971A BR112023019971A BR112023019971A2 BR 112023019971 A2 BR112023019971 A2 BR 112023019971A2 BR 112023019971 A BR112023019971 A BR 112023019971A BR 112023019971 A BR112023019971 A BR 112023019971A BR 112023019971 A2 BR112023019971 A2 BR 112023019971A2
- Authority
- BR
- Brazil
- Prior art keywords
- speech recognition
- video
- visual speech
- speaker
- adaptive visual
- Prior art date
Links
- 230000000007 visual effect Effects 0.000 title abstract 5
- 230000003044 adaptive effect Effects 0.000 title abstract 3
- 238000000034 method Methods 0.000 abstract 3
- 238000013528 artificial neural network Methods 0.000 abstract 2
- 238000004590 computer program Methods 0.000 abstract 1
- 238000010348 incorporation Methods 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Image Analysis (AREA)
- Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
Abstract
reconhecimento de fala visual adaptativo. métodos, sistemas e aparelhos, incluindo programas de computador codificados em meios de armazenamento de computador, para processamento de dados de vídeo utilizando um modelo de reconhecimento de fala visual adaptativo. um dos métodos inclui receber um vídeo que inclui uma pluralidade de quadros de vídeo que representam um primeiro locutor; obter uma primeira incorporação caracterizando o primeiro locutor; e processar uma primeira entrada compreendendo (i) o vídeo e (ii) a primeira incorporação usando uma rede neural de reconhecimento de fala visual tendo uma pluralidade de parâmetros, em que a rede neural de reconhecimento de fala visual é configurada para processar o vídeo e a primeira incorporação de acordo com valores treinados dos parâmetros para gerar uma saída de reconhecimento de fala que define uma sequência de uma ou mais palavras sendo faladas pelo primeiro locutor no vídeo.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GR20210100402 | 2021-06-18 | ||
PCT/EP2022/066419 WO2022263570A1 (en) | 2021-06-18 | 2022-06-15 | Adaptive visual speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
BR112023019971A2 true BR112023019971A2 (pt) | 2023-11-21 |
Family
ID=82385592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
BR112023019971A BR112023019971A2 (pt) | 2021-06-18 | 2022-06-15 | Reconhecimento de fala visual adaptativo |
Country Status (9)
Country | Link |
---|---|
US (1) | US20240265911A1 (pt) |
EP (1) | EP4288960A1 (pt) |
JP (1) | JP2024520985A (pt) |
KR (1) | KR102663654B1 (pt) |
CN (1) | CN117121099A (pt) |
AU (1) | AU2022292104B2 (pt) |
BR (1) | BR112023019971A2 (pt) |
CA (1) | CA3214170A1 (pt) |
WO (1) | WO2022263570A1 (pt) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240095449A1 (en) * | 2022-09-16 | 2024-03-21 | Verizon Patent And Licensing Inc. | Systems and methods for adjusting a transcript based on output from a machine learning model |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101382504B1 (ko) | 2007-05-21 | 2014-04-07 | 삼성전자주식회사 | 매크로 생성 장치 및 방법 |
US8407057B2 (en) * | 2009-01-21 | 2013-03-26 | Nuance Communications, Inc. | Machine, system and method for user-guided teaching and modifying of voice commands and actions executed by a conversational learning system |
KR102301880B1 (ko) * | 2014-10-14 | 2021-09-14 | 삼성전자 주식회사 | 전자 장치 및 이의 음성 대화 방법 |
US10474883B2 (en) * | 2016-11-08 | 2019-11-12 | Nec Corporation | Siamese reconstruction convolutional neural network for pose-invariant face recognition |
US11919531B2 (en) * | 2018-01-31 | 2024-03-05 | Direct Current Capital LLC | Method for customizing motion characteristics of an autonomous vehicle for a user |
EP3766065A1 (en) * | 2018-05-18 | 2021-01-20 | Deepmind Technologies Limited | Visual speech recognition by phoneme prediction |
WO2020048358A1 (en) * | 2018-09-04 | 2020-03-12 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for recognizing speech using depth information |
US10846522B2 (en) * | 2018-10-16 | 2020-11-24 | Google Llc | Speaking classification using audio-visual data |
US20210065712A1 (en) * | 2019-08-31 | 2021-03-04 | Soundhound, Inc. | Automotive visual speech recognition |
JP7442631B2 (ja) * | 2019-10-18 | 2024-03-04 | グーグル エルエルシー | エンドツーエンドのマルチスピーカ視聴覚自動音声認識 |
CN111723758B (zh) * | 2020-06-28 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 视频信息的处理方法、装置、电子设备及存储介质 |
-
2022
- 2022-06-15 BR BR112023019971A patent/BR112023019971A2/pt unknown
- 2022-06-15 US US18/571,553 patent/US20240265911A1/en active Pending
- 2022-06-15 KR KR1020237032681A patent/KR102663654B1/ko active IP Right Grant
- 2022-06-15 EP EP22737416.2A patent/EP4288960A1/en active Pending
- 2022-06-15 AU AU2022292104A patent/AU2022292104B2/en active Active
- 2022-06-15 WO PCT/EP2022/066419 patent/WO2022263570A1/en active Application Filing
- 2022-06-15 JP JP2023560142A patent/JP2024520985A/ja active Pending
- 2022-06-15 CN CN202280025056.5A patent/CN117121099A/zh active Pending
- 2022-06-15 CA CA3214170A patent/CA3214170A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
AU2022292104A1 (en) | 2023-09-21 |
KR102663654B1 (ko) | 2024-05-10 |
WO2022263570A1 (en) | 2022-12-22 |
US20240265911A1 (en) | 2024-08-08 |
CN117121099A (zh) | 2023-11-24 |
AU2022292104B2 (en) | 2024-08-01 |
CA3214170A1 (en) | 2022-12-22 |
EP4288960A1 (en) | 2023-12-13 |
KR20230141932A (ko) | 2023-10-10 |
JP2024520985A (ja) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019202382A1 (en) | Robotic agent conversation escalation | |
SG10201707702YA (en) | Collaborative Voice Controlled Devices | |
CN106528530A (zh) | 一种确定句子类型的方法及装置 | |
DE112016005912T5 (de) | Technologien zur satzende-detektion unter verwendung von syntaktischer kohärenz | |
BR112016023920A2 (pt) | métodos e sistemas para lidar com um diálogo com um robô | |
Chen et al. | Fine-grained style control in transformer-based text-to-speech synthesis | |
BR112023020614A2 (pt) | Processamento de entradas multimodais usando modelos de linguagem | |
BR112023019971A2 (pt) | Reconhecimento de fala visual adaptativo | |
CN112434514B (zh) | 基于多粒度多通道的神经网络的语义匹配方法、装置及计算机设备 | |
WO2022115676A3 (en) | Out-of-domain data augmentation for natural language processing | |
Zhu et al. | Robust spoken language understanding with unsupervised asr-error adaptation | |
Zhou et al. | Transferable positive/negative speech emotion recognition via class-wise adversarial domain adaptation | |
CN118043885A (zh) | 用于半监督语音识别的对比孪生网络 | |
EP3869505A3 (en) | Method, apparatus, system, electronic device for processing information and storage medium | |
Montenegro et al. | Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process | |
Jalal et al. | Removing bias with residual mixture of multi-view attention for speech emotion recognition | |
JP2018205945A (ja) | 対話応答文書自動作成人工知能装置 | |
Juszkiewicz | Improving noise robustness of speech emotion recognition system | |
Naini et al. | Generalization of Self-Supervised Learning-Based Representations for Cross-Domain Speech Emotion Recognition | |
Yao et al. | Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit | |
Verkholyak et al. | Hierarchical two-level modelling of emotional states in spoken dialog systems | |
EP3825897A3 (en) | Method, apparatus, device, storage medium and program for outputting information | |
US20230081543A1 (en) | Method for synthetizing speech and electronic device | |
Sathya et al. | Artificial intelligence for speech recognition | |
Wagner et al. | Acted and spontaneous conversational prosody—Same or Different |