BR112021020151A2 - Detector de diálogo - Google Patents
Detector de diálogoInfo
- Publication number
- BR112021020151A2 BR112021020151A2 BR112021020151A BR112021020151A BR112021020151A2 BR 112021020151 A2 BR112021020151 A2 BR 112021020151A2 BR 112021020151 A BR112021020151 A BR 112021020151A BR 112021020151 A BR112021020151 A BR 112021020151A BR 112021020151 A2 BR112021020151 A2 BR 112021020151A2
- Authority
- BR
- Brazil
- Prior art keywords
- context
- audio
- frame
- frames
- current frame
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 abstract 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/81—Detection of presence or absence of voice signals for discriminating voice from music
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/007—Two-channel systems in which the audio signals are in digital form
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Auxiliary Devices For Music (AREA)
- Image Analysis (AREA)
Abstract
detector de diálogo. a presente invenção refere-se a um método de extração de recursos de áudio em um detector de diálogo em resposta a um sinal de áudio de entrada, o método compreendendo a divisão do sinal de áudio de entrada em uma pluralidade de quadros, extração de recursos de áudio de quadro de cada quadro, determinando um conjunto de contexto janelas, cada janela de contexto incluindo um número de quadros em torno de um quadro atual, derivando, para cada janela de contexto, um recurso de áudio de contexto relevante para o quadro atual com base nos recursos de áudio de quadro dos quadros em cada respectivo contexto e concatenando cada áudio de contexto recurso para formar um vetor de recurso combinado para representar o quadro atual. as janelas de contexto com comprimentos diferentes podem melhorar a velocidade de resposta e melhorar a robustez.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2019083173 | 2019-04-18 | ||
US201962840839P | 2019-04-30 | 2019-04-30 | |
EP19192553 | 2019-08-20 | ||
PCT/US2020/028001 WO2020214541A1 (en) | 2019-04-18 | 2020-04-13 | A dialog detector |
Publications (1)
Publication Number | Publication Date |
---|---|
BR112021020151A2 true BR112021020151A2 (pt) | 2021-12-14 |
Family
ID=70480833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
BR112021020151A BR112021020151A2 (pt) | 2019-04-18 | 2020-04-13 | Detector de diálogo |
Country Status (7)
Country | Link |
---|---|
US (1) | US20220199074A1 (pt) |
EP (1) | EP3956890B1 (pt) |
JP (1) | JP2022529437A (pt) |
KR (1) | KR20210154807A (pt) |
CN (1) | CN113748461A (pt) |
BR (1) | BR112021020151A2 (pt) |
WO (1) | WO2020214541A1 (pt) |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6785645B2 (en) * | 2001-11-29 | 2004-08-31 | Microsoft Corporation | Real-time speech and music classifier |
AUPS270902A0 (en) * | 2002-05-31 | 2002-06-20 | Canon Kabushiki Kaisha | Robust detection and classification of objects in audio using limited training data |
KR100883656B1 (ko) * | 2006-12-28 | 2009-02-18 | 삼성전자주식회사 | 오디오 신호의 분류 방법 및 장치와 이를 이용한 오디오신호의 부호화/복호화 방법 및 장치 |
US9196249B1 (en) * | 2009-07-02 | 2015-11-24 | Alon Konchitsky | Method for identifying speech and music components of an analyzed audio signal |
US9401153B2 (en) * | 2012-10-15 | 2016-07-26 | Digimarc Corporation | Multi-mode audio recognition and auxiliary data encoding and decoding |
CN104885151B (zh) * | 2012-12-21 | 2017-12-22 | 杜比实验室特许公司 | 用于基于感知准则呈现基于对象的音频内容的对象群集 |
US9767791B2 (en) * | 2013-05-21 | 2017-09-19 | Speech Morphing Systems, Inc. | Method and apparatus for exemplary segment classification |
EP4379714A2 (en) * | 2013-09-12 | 2024-06-05 | Dolby Laboratories Licensing Corporation | Loudness adjustment for downmixed audio content |
US10181322B2 (en) * | 2013-12-20 | 2019-01-15 | Microsoft Technology Licensing, Llc | Multi-user, multi-domain dialog system |
US9620105B2 (en) * | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
KR102413692B1 (ko) * | 2015-07-24 | 2022-06-27 | 삼성전자주식회사 | 음성 인식을 위한 음향 점수 계산 장치 및 방법, 음성 인식 장치 및 방법, 전자 장치 |
CN117351966A (zh) * | 2016-09-28 | 2024-01-05 | 华为技术有限公司 | 一种处理多声道音频信号的方法、装置和系统 |
CN109215667B (zh) * | 2017-06-29 | 2020-12-22 | 华为技术有限公司 | 时延估计方法及装置 |
-
2020
- 2020-04-13 KR KR1020217032867A patent/KR20210154807A/ko unknown
- 2020-04-13 JP JP2021561019A patent/JP2022529437A/ja active Pending
- 2020-04-13 US US17/604,379 patent/US20220199074A1/en active Pending
- 2020-04-13 CN CN202080029059.7A patent/CN113748461A/zh active Pending
- 2020-04-13 WO PCT/US2020/028001 patent/WO2020214541A1/en active Application Filing
- 2020-04-13 BR BR112021020151A patent/BR112021020151A2/pt unknown
- 2020-04-13 EP EP20723256.2A patent/EP3956890B1/en active Active
Also Published As
Publication number | Publication date |
---|---|
EP3956890A1 (en) | 2022-02-23 |
JP2022529437A (ja) | 2022-06-22 |
CN113748461A (zh) | 2021-12-03 |
EP3956890B1 (en) | 2024-02-21 |
WO2020214541A1 (en) | 2020-10-22 |
US20220199074A1 (en) | 2022-06-23 |
KR20210154807A (ko) | 2021-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EA201791569A1 (ru) | Способ и система определения статуса злокачественной опухоли | |
SG10201909133YA (en) | Systems and methods for matching and scoring sameness | |
MX2016004674A (es) | Sistema y metodo para determinar la secuencia de realizacion de una pluralidad de tareas. | |
BR112018073635A2 (pt) | método de autenticação de identidade e aparelho de autenticação de identidade | |
BR112019002756A2 (pt) | método, aparelho e dispositivo de seleção de recurso | |
MX361846B (es) | Método y aparato para identificación de área. | |
BR112015022002A2 (pt) | aparelho para determinar informações de sinais vitais de um indivíduo, e método para determinar informações de sinais vitais de um indivíduo | |
GB2511696A (en) | Methods, systems, and computer readable media for reducing the impact of false downlink control information (DCI) detection in long term evolution (LTE) | |
MY159100A (en) | Apparatus, system and method for detecting and preventing malicious scripts using code pattern-based static analysis and api flow-based dynamic analysis | |
BR112015006774A2 (pt) | método de detecção de canal de controle e de equipamento de usuário | |
BR112017020635A2 (pt) | determinação de modo de derivação de informações de movimento em conversão de vídeo em código | |
BR112018010073A2 (pt) | monitoramento de cabeça para método e sistema de saída binaural paramétrica | |
BR112015018597A2 (pt) | método e dispositivo para reconhecimento de áudio | |
MX359781B (es) | Método y dispositivo para ocultar información de privacidad. | |
BR112014030215A2 (pt) | métodos e aparelhos para determinar informação de notações para apresentações de mídia online | |
EP2713314A3 (en) | Image processing device and image processing method | |
EP3182409A3 (en) | Determining the inter-channel time difference of a multi-channel audio signal | |
BR112018008857A2 (pt) | dispositivo eletrônico, método de varredura de um canal em um dispositivo eletrônico, e meio de armazenamento legível por computador não transitório | |
GB2507215A (en) | System and method for determining interpersonal relationship influence information using textual content from interpersonal interactions | |
MX365897B (es) | Método y aparato para determinar similitud y terminal. | |
BR112018010161A8 (pt) | sistema e método para avaliar um detector em um dispositivo de imagem | |
BR112021026664A2 (pt) | Corte de vídeo automatizado usando importância relativa de objetos identificados | |
BR112014028679A2 (pt) | tempo de fase de página | |
BR112015000879A2 (pt) | sistema e método para modelagem de velocidade de migração | |
BR112019005938A8 (pt) | Método e sistema para entrega de conteúdo em tempo real |