JP7757398B2 - 動的分類器を使用したユーザ音声アクティビティ検出 - Google Patents

動的分類器を使用したユーザ音声アクティビティ検出

Info

Publication number
JP7757398B2
JP7757398B2 JP2023520368A JP2023520368A JP7757398B2 JP 7757398 B2 JP7757398 B2 JP 7757398B2 JP 2023520368 A JP2023520368 A JP 2023520368A JP 2023520368 A JP2023520368 A JP 2023520368A JP 7757398 B2 JP7757398 B2 JP 7757398B2
Authority
JP
Japan
Prior art keywords
audio data
microphone
processors
dynamic classifier
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2023520368A
Other languages
English (en)
Japanese (ja)
Other versions
JP2023545981A5 (https=
JP2023545981A (ja
Inventor
シャハージ・ミルザハサンルー、タハー
アルベス、ロジェリオ・ゲデス
ビッサー、エリック
キム、レフン
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of JP2023545981A publication Critical patent/JP2023545981A/ja
Publication of JP2023545981A5 publication Critical patent/JP2023545981A5/ja
Application granted granted Critical
Publication of JP7757398B2 publication Critical patent/JP7757398B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/12Speech classification or search using dynamic programming techniques, e.g. dynamic time warping [DTW]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • H04R1/028Casings; Cabinets ; Supports therefor; Mountings therein associated with devices performing functions other than acoustics, e.g. electric candles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers
    • H04R3/005Circuits for transducers for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Power Sources (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)
  • Traffic Control Systems (AREA)
  • Emergency Alarm Devices (AREA)
JP2023520368A 2020-10-08 2021-09-17 動的分類器を使用したユーザ音声アクティビティ検出 Active JP7757398B2 (ja)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202063089507P 2020-10-08 2020-10-08
US63/089,507 2020-10-08
US17/308,593 US11783809B2 (en) 2020-10-08 2021-05-05 User voice activity detection using dynamic classifier
US17/308,593 2021-05-05
PCT/US2021/071503 WO2022076963A1 (en) 2020-10-08 2021-09-17 User voice activity detection using dynamic classifier

Publications (3)

Publication Number Publication Date
JP2023545981A JP2023545981A (ja) 2023-11-01
JP2023545981A5 JP2023545981A5 (https=) 2024-08-29
JP7757398B2 true JP7757398B2 (ja) 2025-10-21

Family

ID=81079407

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023520368A Active JP7757398B2 (ja) 2020-10-08 2021-09-17 動的分類器を使用したユーザ音声アクティビティ検出

Country Status (7)

Country Link
US (1) US11783809B2 (https=)
EP (1) EP4226371B1 (https=)
JP (1) JP7757398B2 (https=)
KR (1) KR20230084154A (https=)
CN (1) CN116249952A (https=)
BR (1) BR112023005828A2 (https=)
WO (1) WO2022076963A1 (https=)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11776550B2 (en) * 2021-03-09 2023-10-03 Qualcomm Incorporated Device operation based on dynamic classifier
CN116312568B (zh) * 2022-09-07 2026-02-24 南京龙垣信息科技有限公司 语音活动检测方法、装置、计算机设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160267908A1 (en) 2015-03-12 2016-09-15 Sony Corporation Low-power voice command detector
US20160360326A1 (en) 2015-06-02 2016-12-08 Oticon A/S Peer to peer hearing system
US20180146307A1 (en) 2016-11-24 2018-05-24 Oticon A/S Hearing device comprising an own voice detector
US20180350394A1 (en) 2017-05-31 2018-12-06 Bose Corporation Voice activity detection for communication headset
JP2020102836A (ja) 2018-12-20 2020-07-02 ジーエヌ ヒアリング エー/エスGN Hearing A/S 自声検出を用いた聴覚デバイス及び関連する方法
US20200294508A1 (en) 2019-03-13 2020-09-17 Oticon A/S Hearing device or system comprising a user identification unit

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080153537A1 (en) * 2006-12-21 2008-06-26 Charbel Khawand Dynamically learning a user's response via user-preferred audio settings in response to different noise environments
US8194882B2 (en) * 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
WO2009151578A2 (en) * 2008-06-09 2009-12-17 The Board Of Trustees Of The University Of Illinois Method and apparatus for blind signal recovery in noisy, reverberant environments
US9210503B2 (en) * 2009-12-02 2015-12-08 Audience, Inc. Audio zoom
WO2011133924A1 (en) * 2010-04-22 2011-10-27 Qualcomm Incorporated Voice activity detection
US20110288860A1 (en) 2010-05-20 2011-11-24 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for processing of speech signals using head-mounted microphone pair
US8898058B2 (en) * 2010-10-25 2014-11-25 Qualcomm Incorporated Systems, methods, and apparatus for voice activity detection
US9147397B2 (en) * 2013-10-29 2015-09-29 Knowles Electronics, Llc VAD detection apparatus and method of operating the same
KR20150105847A (ko) * 2014-03-10 2015-09-18 삼성전기주식회사 음성구간 검출 방법 및 장치
CN107293287B (zh) * 2014-03-12 2021-10-26 华为技术有限公司 检测音频信号的方法和装置
WO2016116160A1 (en) * 2015-01-22 2016-07-28 Sonova Ag Hearing assistance system
US10186277B2 (en) * 2015-03-19 2019-01-22 Intel Corporation Microphone array speech enhancement
US10475471B2 (en) * 2016-10-11 2019-11-12 Cirrus Logic, Inc. Detection of acoustic impulse events in voice applications using a neural network
US10242696B2 (en) * 2016-10-11 2019-03-26 Cirrus Logic, Inc. Detection of acoustic impulse events in voice applications
US9843861B1 (en) * 2016-11-09 2017-12-12 Bose Corporation Controlling wind noise in a bilateral microphone array
US9930447B1 (en) * 2016-11-09 2018-03-27 Bose Corporation Dual-use bilateral microphone array
US10499139B2 (en) * 2017-03-20 2019-12-03 Bose Corporation Audio signal processing for noise reduction
US10264354B1 (en) * 2017-09-25 2019-04-16 Cirrus Logic, Inc. Spatial cues from broadside detection
US10096328B1 (en) * 2017-10-06 2018-10-09 Intel Corporation Beamformer system for tracking of speech and noise in a dynamic environment
EP3721429A2 (en) * 2017-12-07 2020-10-14 HED Technologies Sarl Voice aware audio system and method
US10885907B2 (en) * 2018-02-14 2021-01-05 Cirrus Logic, Inc. Noise reduction system and method for audio device with multiple microphones
US11094316B2 (en) * 2018-05-04 2021-08-17 Qualcomm Incorporated Audio analytics for natural language processing
US11062727B2 (en) * 2018-06-13 2021-07-13 Ceva D.S.P Ltd. System and method for voice activity detection
EP3675517B1 (en) * 2018-12-31 2021-10-20 GN Audio A/S Microphone apparatus and headset
US10964314B2 (en) * 2019-03-22 2021-03-30 Cirrus Logic, Inc. System and method for optimized noise reduction in the presence of speech distortion using adaptive microphone array
US11328740B2 (en) * 2019-08-07 2022-05-10 Magic Leap, Inc. Voice onset detection
US11917384B2 (en) * 2020-03-27 2024-02-27 Magic Leap, Inc. Method of waking a device using spoken voice commands

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160267908A1 (en) 2015-03-12 2016-09-15 Sony Corporation Low-power voice command detector
US20160360326A1 (en) 2015-06-02 2016-12-08 Oticon A/S Peer to peer hearing system
US20180146307A1 (en) 2016-11-24 2018-05-24 Oticon A/S Hearing device comprising an own voice detector
US20180350394A1 (en) 2017-05-31 2018-12-06 Bose Corporation Voice activity detection for communication headset
JP2020102836A (ja) 2018-12-20 2020-07-02 ジーエヌ ヒアリング エー/エスGN Hearing A/S 自声検出を用いた聴覚デバイス及び関連する方法
US20200294508A1 (en) 2019-03-13 2020-09-17 Oticon A/S Hearing device or system comprising a user identification unit

Also Published As

Publication number Publication date
US11783809B2 (en) 2023-10-10
US20220115007A1 (en) 2022-04-14
BR112023005828A2 (pt) 2023-05-02
EP4226371B1 (en) 2025-08-20
KR20230084154A (ko) 2023-06-12
EP4226371A1 (en) 2023-08-16
WO2022076963A1 (en) 2022-04-14
CN116249952A (zh) 2023-06-09
JP2023545981A (ja) 2023-11-01
EP4226371C0 (en) 2025-08-20

Similar Documents

Publication Publication Date Title
US11694710B2 (en) Multi-stream target-speech detection and channel fusion
US11823679B2 (en) Method and system of audio false keyphrase rejection using speaker recognition
CN111699528B (zh) 电子装置及执行电子装置的功能的方法
JP7498560B2 (ja) システム及び方法
CN112074900A (zh) 用于自然语言处理的音频分析
US11776550B2 (en) Device operation based on dynamic classifier
US11437021B2 (en) Processing audio signals
US11805360B2 (en) Noise suppression using tandem networks
CN110556103A (zh) 音频信号处理方法、装置、系统、设备和存储介质
EP3274988A1 (en) Controlling electronic device based on direction of speech
JP2018533051A (ja) 協調的なオーディオ処理
JP7757398B2 (ja) 動的分類器を使用したユーザ音声アクティビティ検出
US12142288B2 (en) Acoustic aware voice user interface
CN115331672B (zh) 设备控制方法、装置、电子设备及存储介质
CN119317959A (zh) 对于语音和其他音频应用使用通知掩蔽的表示学习
US20240419731A1 (en) Knowledge-based audio scene graph
EP4728513A1 (en) Knowledge-based audio scene graph
CN121281521A (zh) 语音交互过程中识别语音指令的方法、装置、系统及服务器

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240821

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20240821

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20250527

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250815

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20250909

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20251008

R150 Certificate of patent or registration of utility model

Ref document number: 7757398

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150