KR20210003307A - 심층신경망을 이용하는 종단 간 화자 인식 - Google Patents
심층신경망을 이용하는 종단 간 화자 인식 Download PDFInfo
- Publication number
- KR20210003307A KR20210003307A KR1020207037861A KR20207037861A KR20210003307A KR 20210003307 A KR20210003307 A KR 20210003307A KR 1020207037861 A KR1020207037861 A KR 1020207037861A KR 20207037861 A KR20207037861 A KR 20207037861A KR 20210003307 A KR20210003307 A KR 20210003307A
- Authority
- KR
- South Korea
- Prior art keywords
- computer
- speaker
- neural network
- built
- speech sample
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 125
- 238000000034 method Methods 0.000 claims description 69
- 239000013598 vector Substances 0.000 claims description 41
- 230000008569 process Effects 0.000 claims description 27
- 238000012795 verification Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 11
- 210000002569 neuron Anatomy 0.000 claims description 9
- 230000001537 neural effect Effects 0.000 claims description 3
- 210000004704 glottis Anatomy 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims 6
- 238000012549 training Methods 0.000 abstract description 44
- 238000010923 batch production Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 41
- 238000012545 processing Methods 0.000 description 12
- 238000011176 pooling Methods 0.000 description 10
- 238000013459 approach Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 238000007689 inspection Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000008570 general process Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Image Analysis (AREA)
- Telephonic Communication Services (AREA)
- Image Processing (AREA)
Abstract
Description
도 2a는 본 발명의 예시적인 실시예에 따라, 트레이닝 시 사용을 위한 삼중 망 아키텍처를 가진 심층 신경망의 일반적인 구조를 예시한다.
도 2b는 본 발명의 예시적인 실시예에 따라, 특정한 사용자에 대한 등록 및 검사 시 사용을 위한 삼중 망 아키텍처를 가진 심층 신경망의 일반적인 구조를 예시한다.
도 3a는 트레이닝 시 사용을 위해, 사전-프로세싱된 스피치 샘플을 수신하도록 설계된, 삼중 망 아키텍처를 가진 심층 신경망의 구조의 특정 예를 예시한다.
도 3b는 특정한 사용자에 대한 등록 및 검사 시 사용을 위한 심층 신경망 아키텍처의 구조의 특정 예를 예시한다.
도 3c는 트레이닝 시 사용을 위해, 원 스피치 샘플을 프로세싱하도록 설계된, 삼중 망 아키텍처를 가진 심층 신경망의 구조의 또 다른 특정 예를 예시한다.
도 4는 본 발명의 예시적인 실시예에 따른 화자 인식을 위한 일반적인 프로세스의 흐름도를 예시한다.
도 5는 본 발명의 예시적인 실시예에 따라, 화자 인식을 수행하기 위해 삼중 망 아키텍처의 심층 신경망을 이용하는 프로세스의 흐름도를 예시한다.
도 6은 본 발명의 예시적인 실시예에 따라, 삼중 망 아키텍처를 가진 심층 신경망을 트레이닝하기 위한 프로세스의 흐름도이다.
도 7은 소프트맥스 함수(softmax function)를 통합하도록 설계되며 화자 식별을 구체적으로 수행하기 위해 사전-트레이닝되는, 삼중 망 아키텍처를 가진 심층 신경망의 구조의 예를 예시한다.
Claims (20)
- 컴퓨터로 구현되는 방법으로서,
컴퓨터가, 인식 스피치 샘플(recognition speech sample)을 수신하는 단계;
상기 컴퓨터가, 상기 인식 스피치 샘플에 대해 신경망을 실행하여 상기 인식 스피치 샘플의 성문(voiceprint)을 생성하는 단계 - 상기 신경망은 양의 스피치 샘플의 이중 세트와 음의 스피치 샘플의 코호트 세트를 사용하여 트레이닝되는 삼중 신경 아키텍쳐의 일부임 -;
상기 컴퓨터가, 생성된 성문을 적어도 하나의 저장된 성문과 비교하는 단계; 및
상기 컴퓨터가, 상기 비교에 기초하여 상기 인식 스피치 샘플에 대해 화자 인식을 수행하는 단계를 포함하는,
컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 화자 인식을 수행하는 단계는,
상기 컴퓨터가, 생성된 성문을 적어도 하나의 저장된 성문과 비교하는 단계 - 상기 적어도 하나의 저장된 성문은 검증될 화자와 연관됨 -를 포함하는,
컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 화자 인식을 수행하는 단계는,
상기 컴퓨터가, 생성된 성문을 적어도 하나의 저장된 성문과 비교하는 단계 - 상기 적어도 하나의 저장된 성문은 폐쇄된 세트의 알려진 화자와 연관됨 -를 포함하는,
컴퓨터로 구현되는 방법. - 제3항에 있어서,
상기 폐쇄된 세트의 알려진 화자는 전화 사기와 연관된 블랙리스트인,
컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 컴퓨터가, 상기 신경망을 실행하기 전에 상기 인식 스피치 샘플을 사전 프로세싱하는 단계를 더 포함하는,
컴퓨터로 구현되는 방법. - 제5항에 있어서,
상기 인식 스피치 샘플을 사전 프로세싱하는 단계는,
상기 컴퓨터가, 상기 인식 스피치 샘플을 미리 결정된 윈도우 시프트와 함께 미리 결정된 지속 기간의 윈도우로 분할하는 단계; 및
상기 컴퓨터가, 각 윈도우로부터 상기 신경망에 입력할 특징을 추출하는 단계를 포함하는,
컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 화자 인식을 수행하는 단계는,
상기 컴퓨터가, 상기 인식 스피치 샘플과 연관된 화자를 식별하는 단계; 및
상기 컴퓨터가, 상기 인식 스피치 샘플과 연관된 화자를 검증하는 단계
중 적어도 하나를 포함하는,
컴퓨터로 구현되는 방법. - 시스템으로서,
복수의 컴퓨터 프로그램 명령이 저장된 비일시적 저장 매체; 및
상기 비일시적 저장 매체에 전기적으로 연결되고 상기 복수의 컴퓨터 프로그램 명령을 실행하여,
인식 스피치 샘플을 수신하고,
상기 인식 스피치 샘플에 대해 신경망을 배치하여 상기 인식 스피치 샘플의 성문을 생성하며, - 상기 신경망은 양의 스피치 샘플의 이중 세트와 음의 스피치 샘플의 코호트 세트를 사용하여 트레이닝되는 삼중 신경 아키텍쳐의 일부임-,
생성된 성문을 적어도 하나의 저장된 성문과 비교하고,
상기 비교에 기초하여 상기 인식 스피치 샘플에 대해 화자 인식을 수행하도록 구성된 프로세서를 포함하는,
시스템. - 제8항에 있어서,
상기 프로세서는 컴퓨터 프로그램 명령을 실행하여, 생성된 성문을 상기 적어도 하나의 저장된 성문과 비교하도록 더 구성되고,
상기 적어도 하나의 저장된 성문은 검증된 화자와 연관되는,
시스템. - 제8항에 있어서,
상기 프로세서는 컴퓨터 프로그램 명령을 실행하여, 생성된 성문을 상기 적어도 하나의 저장된 성문과 비교하도록 더 구성되고,
상기 적어도 하나의 저장된 성문은 폐쇄된 세트의 알려진 화자와 연관되는,
시스템. - 제10항에 있어서,
상기 폐쇄된 세트의 알려진 화자는 전화 사기와 연관된 블랙리스트인,
시스템. - 제8항에 있어서,
상기 프로세서는 컴퓨터 프로그램 명령을 실행하여, 상기 신경망을 실행하기 전에 상기 인식 스피치 샘플을 사전 프로세싱하도록 더 구성된,
시스템. - 제12항에 있어서,
상기 인식 스피치 샘플을 사전 프로세싱하기 위해,
상기 프로세서는 컴퓨터 프로그램 명령을 실행하여, 상기 인식 스피치 샘플을 미리 결정된 윈도우 시프트와 함께 미리 결정된 지속 기간의 윈도우로 분할하고, 그리고 각 윈도우로부터 상기 신경망에 입력할 특징을 추출하도록 더 구성되는,
시스템. - 제8항에 있어서,
화자 검증은,
상기 인식 스피치 샘플과 연관된 화자를 식별하는 프로세서, 및
상기 인식 스피치 샘플과 연관된 화자를 검증하는 프로세서
중 적어도 하나를 포함하는,
시스템. - 컴퓨터로 구현되는 방법으로서,
컴퓨터가, 화자에 기인한 하나 이상의 양의 스피치 샘플의 제1 세트를 제1 피드-포워드 신경망에 공급하여 제1 내장 벡터를 생성하는 단계;
상기 컴퓨터가, 상기 화자에 기인한 하나 이상의 양의 스피치 샘플의 제2 세트를 제2 피드-포워드 신경망에 공급하여 제2 내장 벡터를 생성하는 단계;
상기 컴퓨터가, 상기 화자에 기인하지 않는 음의 스피치 샘플의 코호트 세트를 제3 피드-포워드 신경망에 공급하여 내장 벡터들의 세트를 생성하는 단계;
상기 컴퓨터가, 제1 내장 벡터, 제2 내장 벡터, 및 내장 벡터들의 세트에 기초하여 손실 함수를 계산하는 단계; 및
상기 컴퓨터가, 상기 손실 함수를 역전파하여 상기 제1, 제2, 및 제3 피드-포워드 신경망 각각에서 하나 이상의 연결 가중치를 수정하는 단계를 포함하는,
컴퓨터로 구현되는 방법. - 제15항에 있어서,
상기 손실 함수는,
상기 제1 및 제2 내장 벡터 사이의 유사도에 대응하는 양의 거리, 및
상기 제1 내장 벡터와 상기 내장 벡터들의 세트의 제1 내장 벡터와 가장 유사한 내장 벡터 간의 유사도에 대응하는 음의 거리에 기초하는,
컴퓨터로 구현되는 방법. - 제16항에 있어서,
상기 제1 및 제2 내장 벡터 사이의 유사도, 및 상기 제1 내장 벡터와 상기 내장 벡터들의 세트의 제1 내장 벡터와 가장 유사한 내장 벡터 간의 유사도 각각은 코사인 척도에 기초하는,
컴퓨터로 구현되는 방법. - 제15항에 있어서,
상기 손실 함수는,
상기 제1 및 제2 내장 벡터 사이의 유사도, 및 상기 제1 내장 벡터와 상기 내장 벡터들의 세트의 제1 내장 벡터와 가장 유사한 내장 벡터 간의 유사도와 연관된 동일 에러 레이트(equal error rate, EER) 메트릭에 기초하는,
컴퓨터로 구현되는 방법. - 제15항에 있어서,
상기 컴퓨터가, 무작위 연결 가중치를 사용하여 상기 제1, 제2, 및 제2 피드-포워드 신경망 중 적어도 하나를 초기화하는 단계를 더 포함하는,
컴퓨터로 구현되는 방법. - 제15항에 있어서,
상기 컴퓨터가, 폐쇄된 세트의 화자에서 화자 식별을 수행하도록 트레이닝된 심층 신경망의 연결 가중치를 사용하여 상기 제1, 제2, 및 제3 피드-포워드 신경망 중 적어도 하나를 초기화하는 단계를 더 포함하는,
컴퓨터로 구현되는 방법.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/262,748 | 2016-09-12 | ||
US15/262,748 US9824692B1 (en) | 2016-09-12 | 2016-09-12 | End-to-end speaker recognition using deep neural network |
PCT/US2017/050927 WO2018049313A1 (en) | 2016-09-12 | 2017-09-11 | End-to-end speaker recognition using deep neural network |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020207002634A Division KR102198835B1 (ko) | 2016-09-12 | 2017-09-11 | 심층 신경망을 사용한 단-대-단 화자 인식 |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20210003307A true KR20210003307A (ko) | 2021-01-11 |
KR102239129B1 KR102239129B1 (ko) | 2021-04-09 |
Family
ID=59955660
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020197010208A KR102072782B1 (ko) | 2016-09-12 | 2017-09-11 | 심층 신경망을 사용한 단-대-단 화자 인식 |
KR1020207037861A KR102239129B1 (ko) | 2016-09-12 | 2017-09-11 | 심층신경망을 이용하는 종단 간 화자 인식 |
KR1020207002634A KR102198835B1 (ko) | 2016-09-12 | 2017-09-11 | 심층 신경망을 사용한 단-대-단 화자 인식 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020197010208A KR102072782B1 (ko) | 2016-09-12 | 2017-09-11 | 심층 신경망을 사용한 단-대-단 화자 인식 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020207002634A KR102198835B1 (ko) | 2016-09-12 | 2017-09-11 | 심층 신경망을 사용한 단-대-단 화자 인식 |
Country Status (8)
Country | Link |
---|---|
US (5) | US9824692B1 (ko) |
EP (1) | EP3501025B1 (ko) |
JP (2) | JP7173974B2 (ko) |
KR (3) | KR102072782B1 (ko) |
AU (3) | AU2017322591B2 (ko) |
CA (3) | CA3075049C (ko) |
ES (1) | ES2883326T3 (ko) |
WO (1) | WO2018049313A1 (ko) |
Families Citing this family (134)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
EP2954514B1 (en) | 2013-02-07 | 2021-03-31 | Apple Inc. | Voice trigger for a digital assistant |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10650046B2 (en) | 2016-02-05 | 2020-05-12 | Sas Institute Inc. | Many task computing with distributed file system |
US10642896B2 (en) | 2016-02-05 | 2020-05-05 | Sas Institute Inc. | Handling of data sets during execution of task routines of multiple languages |
US10795935B2 (en) | 2016-02-05 | 2020-10-06 | Sas Institute Inc. | Automated generation of job flow definitions |
US10650045B2 (en) * | 2016-02-05 | 2020-05-12 | Sas Institute Inc. | Staged training of neural networks for improved time series prediction performance |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
WO2018033137A1 (zh) * | 2016-08-19 | 2018-02-22 | 北京市商汤科技开发有限公司 | 在视频图像中展示业务对象的方法、装置和电子设备 |
US9824692B1 (en) | 2016-09-12 | 2017-11-21 | Pindrop Security, Inc. | End-to-end speaker recognition using deep neural network |
US10325601B2 (en) | 2016-09-19 | 2019-06-18 | Pindrop Security, Inc. | Speaker recognition in the call center |
CA3179080A1 (en) | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Channel-compensated low-level features for speaker recognition |
USD898059S1 (en) | 2017-02-06 | 2020-10-06 | Sas Institute Inc. | Display screen or portion thereof with graphical user interface |
US10672403B2 (en) | 2017-02-07 | 2020-06-02 | Pindrop Security, Inc. | Age compensation in biometric systems using time-interval, gender and age |
WO2018160943A1 (en) * | 2017-03-03 | 2018-09-07 | Pindrop Security, Inc. | Method and apparatus for detecting spoofing conditions |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
CN107221320A (zh) * | 2017-05-19 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | 训练声学特征提取模型的方法、装置、设备和计算机存储介质 |
CN107180628A (zh) * | 2017-05-19 | 2017-09-19 | 百度在线网络技术(北京)有限公司 | 建立声学特征提取模型的方法、提取声学特征的方法、装置 |
USD898060S1 (en) | 2017-06-05 | 2020-10-06 | Sas Institute Inc. | Display screen or portion thereof with graphical user interface |
US10354656B2 (en) * | 2017-06-23 | 2019-07-16 | Microsoft Technology Licensing, Llc | Speaker recognition |
US10091349B1 (en) | 2017-07-11 | 2018-10-02 | Vail Systems, Inc. | Fraud detection system and method |
US10623581B2 (en) | 2017-07-25 | 2020-04-14 | Vail Systems, Inc. | Adaptive, multi-modal fraud detection system |
CN117744654A (zh) * | 2017-07-26 | 2024-03-22 | 舒辅医疗 | 基于机器学习的自然语言情境中数值数据的语义分类方法以及系统 |
US10325602B2 (en) * | 2017-08-02 | 2019-06-18 | Google Llc | Neural networks for speaker verification |
US10755142B2 (en) * | 2017-09-05 | 2020-08-25 | Cognizant Technology Solutions U.S. Corporation | Automated and unsupervised generation of real-world training data |
CN107919130B (zh) * | 2017-11-06 | 2021-12-17 | 百度在线网络技术(北京)有限公司 | 基于云端的语音处理方法和装置 |
US10671888B1 (en) | 2017-12-14 | 2020-06-02 | Perceive Corporation | Using batches of training items for training a network |
CN108417217B (zh) * | 2018-01-11 | 2021-07-13 | 思必驰科技股份有限公司 | 说话人识别网络模型训练方法、说话人识别方法及系统 |
CN108447490B (zh) * | 2018-02-12 | 2020-08-18 | 阿里巴巴集团控股有限公司 | 基于记忆性瓶颈特征的声纹识别的方法及装置 |
CN108428455A (zh) * | 2018-02-13 | 2018-08-21 | 上海爱优威软件开发有限公司 | 声纹特征的采集方法及系统 |
CN108399395A (zh) * | 2018-03-13 | 2018-08-14 | 成都数智凌云科技有限公司 | 基于端到端深度神经网络的语音和人脸复合身份认证方法 |
US11586902B1 (en) | 2018-03-14 | 2023-02-21 | Perceive Corporation | Training network to minimize worst case surprise |
US11995537B1 (en) * | 2018-03-14 | 2024-05-28 | Perceive Corporation | Training network with batches of input instances |
US12165066B1 (en) | 2018-03-14 | 2024-12-10 | Amazon Technologies, Inc. | Training network to maximize true positive rate at low false positive rate |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
CN108875904A (zh) * | 2018-04-04 | 2018-11-23 | 北京迈格威科技有限公司 | 图像处理方法、图像处理装置和计算机可读存储介质 |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
GB2573809B (en) | 2018-05-18 | 2020-11-04 | Emotech Ltd | Speaker Recognition |
CN108766440B (zh) * | 2018-05-28 | 2020-01-14 | 平安科技(深圳)有限公司 | 说话人分离模型训练方法、两说话人分离方法及相关设备 |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
WO2019246219A1 (en) | 2018-06-19 | 2019-12-26 | Securelogix Corporation | Active audio calling device identification system |
JP6980603B2 (ja) * | 2018-06-21 | 2021-12-15 | 株式会社東芝 | 話者モデル作成システム、認識システム、プログラムおよび制御装置 |
US10210860B1 (en) * | 2018-07-27 | 2019-02-19 | Deepgram, Inc. | Augmented generalized deep learning with special vocabulary |
US10721190B2 (en) * | 2018-07-31 | 2020-07-21 | Microsoft Technology Licensing, Llc | Sequence to sequence to classification model for generating recommended messages |
US20200104678A1 (en) * | 2018-09-27 | 2020-04-02 | Google Llc | Training optimizer neural networks |
US10872601B1 (en) * | 2018-09-27 | 2020-12-22 | Amazon Technologies, Inc. | Natural language processing |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
CN111091020A (zh) * | 2018-10-22 | 2020-05-01 | 百度在线网络技术(北京)有限公司 | 自动驾驶状态判别方法和装置 |
US11475898B2 (en) * | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
EP3884582B1 (en) | 2018-11-23 | 2024-08-21 | Nokia Technologies Oy | End-to-end learning in communication systems |
KR102644945B1 (ko) | 2018-12-14 | 2024-03-08 | 삼성전자주식회사 | 클럭 주파수 공급 장치 및 방법 |
US20200201970A1 (en) * | 2018-12-20 | 2020-06-25 | Cirrus Logic International Semiconductor Ltd. | Biometric user recognition |
KR102570070B1 (ko) * | 2018-12-27 | 2023-08-23 | 삼성전자주식회사 | 일반화된 사용자 모델을 이용한 사용자 인증 방법 및 장치 |
CN109378006B (zh) | 2018-12-28 | 2022-09-16 | 三星电子(中国)研发中心 | 一种跨设备声纹识别方法及系统 |
US11114103B2 (en) | 2018-12-28 | 2021-09-07 | Alibaba Group Holding Limited | Systems, methods, and computer-readable storage media for audio signal processing |
CN109840588B (zh) * | 2019-01-04 | 2023-09-08 | 平安科技(深圳)有限公司 | 神经网络模型训练方法、装置、计算机设备及存储介质 |
CN109769099B (zh) * | 2019-01-15 | 2021-01-22 | 三星电子(中国)研发中心 | 通话人物异常的检测方法和装置 |
US11019201B2 (en) | 2019-02-06 | 2021-05-25 | Pindrop Security, Inc. | Systems and methods of gateway detection in a telephone network |
US11017783B2 (en) * | 2019-03-08 | 2021-05-25 | Qualcomm Incorporated | Speaker template update with embedding vectors based on distance metric |
US10956474B2 (en) | 2019-03-14 | 2021-03-23 | Microsoft Technology Licensing, Llc | Determination of best set of suggested responses |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US10659588B1 (en) * | 2019-03-21 | 2020-05-19 | Capital One Services, Llc | Methods and systems for automatic discovery of fraudulent calls using speaker recognition |
US12015637B2 (en) | 2019-04-08 | 2024-06-18 | Pindrop Security, Inc. | Systems and methods for end-to-end architectures for voice spoofing detection |
KR20200126675A (ko) * | 2019-04-30 | 2020-11-09 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
CN110347807B (zh) * | 2019-05-20 | 2023-08-08 | 平安科技(深圳)有限公司 | 问题信息处理方法及装置 |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11257493B2 (en) | 2019-07-11 | 2022-02-22 | Soundhound, Inc. | Vision-assisted speech processing |
JP2021026050A (ja) * | 2019-07-31 | 2021-02-22 | 株式会社リコー | 音声認識システム、情報処理装置、音声認識方法、プログラム |
KR102286775B1 (ko) * | 2019-08-02 | 2021-08-09 | 서울시립대학교 산학협력단 | 미등록 화자를 추가할 수 있는 심층 신경망 기반의 화자 식별 장치, 이를 위한 방법 및 이 방법을 수행하기 위한 프로그램이 기록된 컴퓨터 판독 가능한 기록매체 |
US11900246B2 (en) * | 2019-09-02 | 2024-02-13 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing user based on on-device training |
AU2020363882B9 (en) * | 2019-10-11 | 2024-03-28 | Pindrop Security, Inc. | Z-vectors: speaker embeddings from raw audio using sincnet, extended cnn architecture, and in-network augmentation techniques |
CN111712874B (zh) * | 2019-10-31 | 2023-07-14 | 支付宝(杭州)信息技术有限公司 | 用于确定声音特性的方法、系统、装置和存储介质 |
US11282495B2 (en) * | 2019-12-12 | 2022-03-22 | Amazon Technologies, Inc. | Speech processing using embedding data |
US11899765B2 (en) | 2019-12-23 | 2024-02-13 | Dts Inc. | Dual-factor identification system and method with adaptive enrollment |
CN111145761B (zh) * | 2019-12-27 | 2022-05-24 | 携程计算机技术(上海)有限公司 | 模型训练的方法、声纹确认的方法、系统、设备及介质 |
CN111310836B (zh) * | 2020-02-20 | 2023-08-18 | 浙江工业大学 | 一种基于声谱图的声纹识别集成模型的防御方法及防御装置 |
KR20220150344A (ko) * | 2020-03-05 | 2022-11-10 | 핀드롭 시큐리티 인코포레이티드 | 오디오로부터의 식별 및 검증을 위한 화자 독립 임베딩의 시스템들 및 방법들 |
CN111354345B (zh) * | 2020-03-11 | 2021-08-31 | 北京字节跳动网络技术有限公司 | 生成语音模型和语音识别的方法、装置、设备以及介质 |
CN111524521B (zh) * | 2020-04-22 | 2023-08-08 | 北京小米松果电子有限公司 | 声纹提取模型训练方法和声纹识别方法、及其装置和介质 |
CN111524525B (zh) * | 2020-04-28 | 2023-06-16 | 平安科技(深圳)有限公司 | 原始语音的声纹识别方法、装置、设备及存储介质 |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
CN111341324B (zh) * | 2020-05-18 | 2020-08-25 | 浙江百应科技有限公司 | 一种基于fasttext模型的识别纠错及训练方法 |
EP4161619B1 (en) * | 2020-06-08 | 2024-04-24 | ResMed Sensor Technologies Limited | Systems and methods for categorizing and/or characterizing a user interface |
US11574622B2 (en) * | 2020-07-02 | 2023-02-07 | Ford Global Technologies, Llc | Joint automatic speech recognition and text to speech conversion using adversarial neural networks |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
CN112017670B (zh) * | 2020-08-13 | 2021-11-02 | 北京达佳互联信息技术有限公司 | 一种目标账户音频的识别方法、装置、设备及介质 |
US12190905B2 (en) | 2020-08-21 | 2025-01-07 | Pindrop Security, Inc. | Speaker recognition with quality indicators |
US20220165275A1 (en) * | 2020-10-01 | 2022-05-26 | Pindrop Security, Inc. | Enrollment and authentication over a phone call in call centers |
EP4229626A4 (en) | 2020-10-16 | 2024-10-09 | Pindrop Security, Inc. | AUDIOVISUAL HYPERTRUCING DETECTION |
US11837238B2 (en) * | 2020-10-21 | 2023-12-05 | Google Llc | Assessing speaker recognition performance |
WO2022086045A1 (ko) * | 2020-10-22 | 2022-04-28 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
CN112071322B (zh) * | 2020-10-30 | 2022-01-25 | 北京快鱼电子股份公司 | 一种端到端的声纹识别方法、装置、存储介质及设备 |
CN112382298B (zh) * | 2020-11-17 | 2024-03-08 | 北京清微智能科技有限公司 | 唤醒词声纹识别方法、唤醒词声纹识别模型及其训练方法 |
CN112447188B (zh) * | 2020-11-18 | 2023-10-20 | 中国人民解放军陆军工程大学 | 一种基于改进softmax函数的声学场景分类方法 |
KR102487936B1 (ko) * | 2020-12-07 | 2023-01-11 | 서울시립대학교 산학협력단 | 세그먼트 집계를 통해 짧은 발성을 보상하는 심층 신경망 기반 화자 인증 시스템 및 방법 |
KR102661876B1 (ko) * | 2020-12-21 | 2024-04-29 | 한국전자통신연구원 | 합성곱 신경망 기반 오디오 핑거프린트 추출 방법 및 장치 |
CN113555032B (zh) * | 2020-12-22 | 2024-03-12 | 腾讯科技(深圳)有限公司 | 多说话人场景识别及网络训练方法、装置 |
CN112466311B (zh) * | 2020-12-22 | 2022-08-19 | 深圳壹账通智能科技有限公司 | 声纹识别方法、装置、存储介质及计算机设备 |
CN112820313B (zh) * | 2020-12-31 | 2022-11-01 | 北京声智科技有限公司 | 模型训练方法、语音分离方法、装置及电子设备 |
CN112784749B (zh) * | 2021-01-22 | 2023-11-10 | 北京百度网讯科技有限公司 | 目标模型的训练方法、目标对象的识别方法、装置及介质 |
US12183350B2 (en) * | 2021-04-12 | 2024-12-31 | Paypal, Inc. | Adversarially robust voice biometrics, secure recognition, and identification |
US20220366916A1 (en) * | 2021-05-13 | 2022-11-17 | Itaú Unibanco S/A | Access control system |
EP4390919A3 (en) * | 2021-06-18 | 2024-09-25 | My Voice AI Limited | Methods for improving the performance of neural networks used for biometric authentication |
CN113327598B (zh) * | 2021-06-30 | 2023-11-14 | 北京有竹居网络技术有限公司 | 模型的训练方法、语音识别方法、装置、介质及设备 |
CA3221044A1 (en) * | 2021-07-02 | 2023-01-05 | Tianxiang Chen | Speaker embedding conversion for backward and cross-channel compatibility |
US11558506B1 (en) * | 2021-09-27 | 2023-01-17 | Nice Ltd. | Analysis and matching of voice signals |
US20230100259A1 (en) * | 2021-09-30 | 2023-03-30 | Samsung Electronics Co., Ltd. | Device and method with target speaker identification |
US20230186896A1 (en) * | 2021-12-15 | 2023-06-15 | My Voice Ai Limited | Speaker verification method using neural network |
FR3131039A1 (fr) * | 2021-12-19 | 2023-06-23 | Oso-Ai | Procédé d’analyse d’une donnée numérique |
CN114299953B (zh) * | 2021-12-29 | 2022-08-23 | 湖北微模式科技发展有限公司 | 一种结合嘴部运动分析的话者角色区分方法与系统 |
CN114613369B (zh) * | 2022-03-07 | 2024-08-09 | 哈尔滨理工大学 | 一种基于特征差异最大化的说话人识别方法 |
WO2023177616A1 (en) * | 2022-03-18 | 2023-09-21 | Sri International | Rapid calibration of multiple loudspeaker arrays |
CN114859317B (zh) * | 2022-06-14 | 2024-12-06 | 中国人民解放军海军航空大学 | 雷达目标自适应反向截断智能识别方法 |
KR102612986B1 (ko) * | 2022-10-19 | 2023-12-12 | 한국과학기술원 | 온라인 추천 시스템, 메타 학습 기반 추천기 업데이트 방법 및 장치 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040028790A (ko) * | 2001-06-19 | 2004-04-03 | 세큐리복스 리미티드 | 화자 인식 시스템 |
US20140214417A1 (en) * | 2013-01-28 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and device for voiceprint recognition |
US20150127336A1 (en) * | 2013-11-04 | 2015-05-07 | Google Inc. | Speaker verification using neural networks |
JP2015102806A (ja) * | 2013-11-27 | 2015-06-04 | 国立研究開発法人情報通信研究機構 | 統計的音響モデルの適応方法、統計的音響モデルの適応に適した音響モデルの学習方法、ディープ・ニューラル・ネットワークを構築するためのパラメータを記憶した記憶媒体、及び統計的音響モデルの適応を行なうためのコンピュータプログラム |
US20160098987A1 (en) * | 2014-10-02 | 2016-04-07 | Microsoft Technology Licensing , LLC | Neural network-based speech processing |
Family Cites Families (102)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62231993A (ja) | 1986-03-25 | 1987-10-12 | インタ−ナシヨナル ビジネス マシ−ンズ コ−ポレ−シヨン | 音声認識方法 |
CA1311059C (en) | 1986-03-25 | 1992-12-01 | Bruce Allen Dautrich | Speaker-trained speech recognizer having the capability of detecting confusingly similar vocabulary words |
US4817156A (en) | 1987-08-10 | 1989-03-28 | International Business Machines Corporation | Rapidly training a speech recognizer to a subsequent speaker given training data of a reference speaker |
US5072452A (en) | 1987-10-30 | 1991-12-10 | International Business Machines Corporation | Automatic determination of labels and Markov word models in a speech recognition system |
US5461697A (en) * | 1988-11-17 | 1995-10-24 | Sekisui Kagaku Kogyo Kabushiki Kaisha | Speaker recognition system using neural network |
JP2524472B2 (ja) | 1992-09-21 | 1996-08-14 | インターナショナル・ビジネス・マシーンズ・コーポレイション | 電話回線利用の音声認識システムを訓練する方法 |
US5867562A (en) | 1996-04-17 | 1999-02-02 | Scherer; Gordon F. | Call processing system with call screening |
US6975708B1 (en) | 1996-04-17 | 2005-12-13 | Convergys Cmg Utah, Inc. | Call processing system with call screening |
US5835890A (en) | 1996-08-02 | 1998-11-10 | Nippon Telegraph And Telephone Corporation | Method for speaker adaptation of speech models recognition scheme using the method and recording medium having the speech recognition method recorded thereon |
WO1998014934A1 (en) | 1996-10-02 | 1998-04-09 | Sri International | Method and system for automatic text-independent grading of pronunciation for language instruction |
AU5359498A (en) | 1996-11-22 | 1998-06-10 | T-Netix, Inc. | Subword-based speaker verification using multiple classifier fusion, with channel, fusion, model, and threshold adaptation |
JP2991144B2 (ja) | 1997-01-29 | 1999-12-20 | 日本電気株式会社 | 話者認識装置 |
US5995927A (en) | 1997-03-14 | 1999-11-30 | Lucent Technologies Inc. | Method for performing stochastic matching for use in speaker verification |
US6519561B1 (en) | 1997-11-03 | 2003-02-11 | T-Netix, Inc. | Model adaptation of neural tree networks and other fused models for speaker verification |
US6009392A (en) | 1998-01-15 | 1999-12-28 | International Business Machines Corporation | Training speech recognition by matching audio segment frequency of occurrence with frequency of words and letter combinations in a corpus |
ATE235733T1 (de) | 1998-05-11 | 2003-04-15 | Siemens Ag | Anordnung und verfahren zur erkennung eines vorgegebenen wortschatzes in gesprochener sprache durch einen rechner |
US6141644A (en) | 1998-09-04 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Speaker verification and speaker identification based on eigenvoices |
US6411930B1 (en) | 1998-11-18 | 2002-06-25 | Lucent Technologies Inc. | Discriminative gaussian mixture models for speaker verification |
CN1148720C (zh) | 1999-03-11 | 2004-05-05 | 英国电讯有限公司 | 说话者识别 |
US6463413B1 (en) | 1999-04-20 | 2002-10-08 | Matsushita Electrical Industrial Co., Ltd. | Speech recognition training for small hardware devices |
KR100307623B1 (ko) | 1999-10-21 | 2001-11-02 | 윤종용 | 엠.에이.피 화자 적응 조건에서 파라미터의 분별적 추정 방법 및 장치 및 이를 각각 포함한 음성 인식 방법 및 장치 |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US7318032B1 (en) | 2000-06-13 | 2008-01-08 | International Business Machines Corporation | Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique |
DE10047724A1 (de) | 2000-09-27 | 2002-04-11 | Philips Corp Intellectual Pty | Verfahren zur Ermittlung eines Eigenraumes zur Darstellung einer Mehrzahl von Trainingssprechern |
DE10047723A1 (de) | 2000-09-27 | 2002-04-11 | Philips Corp Intellectual Pty | Verfahren zur Ermittlung eines Eigenraums zur Darstellung einer Mehrzahl von Trainingssprechern |
EP1197949B1 (en) | 2000-10-10 | 2004-01-07 | Sony International (Europe) GmbH | Avoiding online speaker over-adaptation in speech recognition |
US7209881B2 (en) | 2001-12-20 | 2007-04-24 | Matsushita Electric Industrial Co., Ltd. | Preparing acoustic models by sufficient statistics and noise-superimposed speech data |
US7457745B2 (en) | 2002-12-03 | 2008-11-25 | Hrl Laboratories, Llc | Method and apparatus for fast on-line automatic speaker/environment adaptation for speech/speaker recognition in the presence of changing environments |
US7184539B2 (en) | 2003-04-29 | 2007-02-27 | International Business Machines Corporation | Automated call center transcription services |
US20050039056A1 (en) | 2003-07-24 | 2005-02-17 | Amit Bagga | Method and apparatus for authenticating a user using three party question protocol |
US7328154B2 (en) | 2003-08-13 | 2008-02-05 | Matsushita Electrical Industrial Co., Ltd. | Bubble splitting for compact acoustic modeling |
US7447633B2 (en) | 2004-11-22 | 2008-11-04 | International Business Machines Corporation | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US20120253805A1 (en) | 2005-04-21 | 2012-10-04 | Anthony Rajakumar | Systems, methods, and media for determining fraud risk from audio signals |
EP1889255A1 (en) | 2005-05-24 | 2008-02-20 | Loquendo S.p.A. | Automatic text-independent, language-independent speaker voice-print creation and speaker recognition |
US7539616B2 (en) | 2006-02-20 | 2009-05-26 | Microsoft Corporation | Speaker authentication using adapted background models |
US8099288B2 (en) | 2007-02-12 | 2012-01-17 | Microsoft Corp. | Text-dependent speaker verification |
KR101756834B1 (ko) | 2008-07-14 | 2017-07-12 | 삼성전자주식회사 | 오디오/스피치 신호의 부호화 및 복호화 방법 및 장치 |
US8886663B2 (en) | 2008-09-20 | 2014-11-11 | Securus Technologies, Inc. | Multi-party conversation analyzer and logger |
EP2182512A1 (en) | 2008-10-29 | 2010-05-05 | BRITISH TELECOMMUNICATIONS public limited company | Speaker verification |
US8442824B2 (en) | 2008-11-26 | 2013-05-14 | Nuance Communications, Inc. | Device, system, and method of liveness detection utilizing voice biometrics |
EP2221805B1 (en) * | 2009-02-20 | 2014-06-25 | Nuance Communications, Inc. | Method for automated training of a plurality of artificial neural networks |
US8463606B2 (en) | 2009-07-13 | 2013-06-11 | Genesys Telecommunications Laboratories, Inc. | System for analyzing interactions and reporting analytic results to human-operated and system interfaces in real time |
US8160877B1 (en) | 2009-08-06 | 2012-04-17 | Narus, Inc. | Hierarchical real-time speaker recognition for biometric VoIP verification and targeting |
US8554562B2 (en) | 2009-11-15 | 2013-10-08 | Nuance Communications, Inc. | Method and system for speaker diarization |
US9558755B1 (en) | 2010-05-20 | 2017-01-31 | Knowles Electronics, Llc | Noise suppression assisted automatic speech recognition |
TWI403304B (zh) | 2010-08-27 | 2013-08-01 | Ind Tech Res Inst | 隨身語能偵知方法及其裝置 |
US8484023B2 (en) | 2010-09-24 | 2013-07-09 | Nuance Communications, Inc. | Sparse representation features for speech recognition |
US8484024B2 (en) | 2011-02-24 | 2013-07-09 | Nuance Communications, Inc. | Phonetic features for speech recognition |
US20130080165A1 (en) | 2011-09-24 | 2013-03-28 | Microsoft Corporation | Model Based Online Normalization of Feature Distribution for Noise Robust Speech Recognition |
US9042867B2 (en) | 2012-02-24 | 2015-05-26 | Agnitio S.L. | System and method for speaker recognition on mobile devices |
US8781093B1 (en) | 2012-04-18 | 2014-07-15 | Google Inc. | Reputation based message analysis |
US20130300939A1 (en) | 2012-05-11 | 2013-11-14 | Cisco Technology, Inc. | System and method for joint speaker and scene recognition in a video/audio processing environment |
US8972312B2 (en) * | 2012-05-29 | 2015-03-03 | Nuance Communications, Inc. | Methods and apparatus for performing transformation techniques for data clustering and/or classification |
US9262640B2 (en) | 2012-08-17 | 2016-02-16 | Charles Fadel | Controlling access to resources based on affinity planes and sectors |
US9368116B2 (en) | 2012-09-07 | 2016-06-14 | Verint Systems Ltd. | Speaker separation in diarization |
EP2713367B1 (en) | 2012-09-28 | 2016-11-09 | Agnitio, S.L. | Speaker recognition |
US9837078B2 (en) * | 2012-11-09 | 2017-12-05 | Mattersight Corporation | Methods and apparatus for identifying fraudulent callers |
US9633652B2 (en) | 2012-11-30 | 2017-04-25 | Stmicroelectronics Asia Pacific Pte Ltd. | Methods, systems, and circuits for speaker dependent voice recognition with a single lexicon |
US9230550B2 (en) * | 2013-01-10 | 2016-01-05 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
US9406298B2 (en) | 2013-02-07 | 2016-08-02 | Nuance Communications, Inc. | Method and apparatus for efficient i-vector extraction |
US9454958B2 (en) | 2013-03-07 | 2016-09-27 | Microsoft Technology Licensing, Llc | Exploiting heterogeneous data in deep neural network-based speech recognition systems |
US20140337017A1 (en) | 2013-05-09 | 2014-11-13 | Mitsubishi Electric Research Laboratories, Inc. | Method for Converting Speech Using Sparsity Constraints |
US9460722B2 (en) | 2013-07-17 | 2016-10-04 | Verint Systems Ltd. | Blind diarization of recorded calls with arbitrary number of speakers |
US9984706B2 (en) | 2013-08-01 | 2018-05-29 | Verint Systems Ltd. | Voice activity detection using a soft decision mechanism |
US20160293167A1 (en) * | 2013-10-10 | 2016-10-06 | Google Inc. | Speaker recognition using neural networks |
US9336781B2 (en) | 2013-10-17 | 2016-05-10 | Sri International | Content-aware speaker recognition |
US9232063B2 (en) | 2013-10-31 | 2016-01-05 | Verint Systems Inc. | Call flow and discourse analysis |
US9620145B2 (en) | 2013-11-01 | 2017-04-11 | Google Inc. | Context-dependent state tying using a neural network |
US9514753B2 (en) | 2013-11-04 | 2016-12-06 | Google Inc. | Speaker identification using hash-based indexing |
US9665823B2 (en) | 2013-12-06 | 2017-05-30 | International Business Machines Corporation | Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition |
EP3373176B1 (en) | 2014-01-17 | 2020-01-01 | Cirrus Logic International Semiconductor Limited | Tamper-resistant element for use in speaker recognition |
US9401143B2 (en) | 2014-03-24 | 2016-07-26 | Google Inc. | Cluster specific speech model |
US9685174B2 (en) | 2014-05-02 | 2017-06-20 | The Regents Of The University Of Michigan | Mood monitoring of bipolar disorder using speech analysis |
US9792899B2 (en) | 2014-07-15 | 2017-10-17 | International Business Machines Corporation | Dataset shift compensation in machine learning |
US9978013B2 (en) * | 2014-07-16 | 2018-05-22 | Deep Learning Analytics, LLC | Systems and methods for recognizing objects in radar imagery |
US9373330B2 (en) | 2014-08-07 | 2016-06-21 | Nuance Communications, Inc. | Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis |
KR101844932B1 (ko) | 2014-09-16 | 2018-04-03 | 한국전자통신연구원 | 신호처리 알고리즘이 통합된 심층 신경망 기반의 음성인식 장치 및 이의 학습방법 |
US9318107B1 (en) * | 2014-10-09 | 2016-04-19 | Google Inc. | Hotword detection on multiple devices |
US9418656B2 (en) * | 2014-10-29 | 2016-08-16 | Google Inc. | Multi-stage hotword detection |
US20160180214A1 (en) * | 2014-12-19 | 2016-06-23 | Google Inc. | Sharp discrepancy learning |
US9875742B2 (en) | 2015-01-26 | 2018-01-23 | Verint Systems Ltd. | Word-level blind diarization of recorded calls with arbitrary number of speakers |
US10580401B2 (en) * | 2015-01-27 | 2020-03-03 | Google Llc | Sub-matrix input for neural network layers |
KR101988222B1 (ko) | 2015-02-12 | 2019-06-13 | 한국전자통신연구원 | 대어휘 연속 음성 인식 장치 및 방법 |
US9666183B2 (en) | 2015-03-27 | 2017-05-30 | Qualcomm Incorporated | Deep neural net based filter prediction for audio event classification and extraction |
US9978374B2 (en) * | 2015-09-04 | 2018-05-22 | Google Llc | Neural networks for speaker verification |
US10056076B2 (en) | 2015-09-06 | 2018-08-21 | International Business Machines Corporation | Covariance matrix estimation with structural-based priors for speech processing |
KR102423302B1 (ko) | 2015-10-06 | 2022-07-19 | 삼성전자주식회사 | 음성 인식에서의 음향 점수 계산 장치 및 방법과, 음향 모델 학습 장치 및 방법 |
EP3506613A1 (en) | 2015-10-14 | 2019-07-03 | Pindrop Security, Inc. | Call detail record analysis to identify fraudulent activity and fraud detection in interactive voice response systems |
US9818431B2 (en) * | 2015-12-21 | 2017-11-14 | Microsoft Technoloogy Licensing, LLC | Multi-speaker speech separation |
US9584946B1 (en) | 2016-06-10 | 2017-02-28 | Philip Scott Lyren | Audio diarization system that segments audio input |
US9824692B1 (en) | 2016-09-12 | 2017-11-21 | Pindrop Security, Inc. | End-to-end speaker recognition using deep neural network |
WO2018053531A1 (en) | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Dimensionality reduction of baum-welch statistics for speaker recognition |
US10325601B2 (en) | 2016-09-19 | 2019-06-18 | Pindrop Security, Inc. | Speaker recognition in the call center |
CA3179080A1 (en) | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Channel-compensated low-level features for speaker recognition |
CA3195323A1 (en) | 2016-11-01 | 2018-05-01 | Transaction Network Services, Inc. | Systems and methods for automatically conducting risk assessments for telephony communications |
US10205825B2 (en) | 2017-02-28 | 2019-02-12 | At&T Intellectual Property I, L.P. | System and method for processing an automated call based on preferences and conditions |
US10623581B2 (en) | 2017-07-25 | 2020-04-14 | Vail Systems, Inc. | Adaptive, multi-modal fraud detection system |
US10506088B1 (en) | 2017-09-25 | 2019-12-10 | Amazon Technologies, Inc. | Phone number verification |
US10887452B2 (en) | 2018-10-25 | 2021-01-05 | Verint Americas Inc. | System architecture for fraud detection |
US10554821B1 (en) | 2018-11-09 | 2020-02-04 | Noble Systems Corporation | Identifying and processing neighbor spoofed telephone calls in a VoIP-based telecommunications network |
US10477013B1 (en) | 2018-11-19 | 2019-11-12 | Successful Cultures, Inc | Systems and methods for providing caller identification over a public switched telephone network |
US10375238B1 (en) | 2019-04-15 | 2019-08-06 | Republic Wireless, Inc. | Anti-spoofing techniques for outbound telephone calls |
-
2016
- 2016-09-12 US US15/262,748 patent/US9824692B1/en active Active
-
2017
- 2017-09-11 AU AU2017322591A patent/AU2017322591B2/en active Active
- 2017-09-11 KR KR1020197010208A patent/KR102072782B1/ko active IP Right Grant
- 2017-09-11 WO PCT/US2017/050927 patent/WO2018049313A1/en unknown
- 2017-09-11 ES ES17772184T patent/ES2883326T3/es active Active
- 2017-09-11 CA CA3075049A patent/CA3075049C/en active Active
- 2017-09-11 JP JP2019535198A patent/JP7173974B2/ja active Active
- 2017-09-11 EP EP17772184.2A patent/EP3501025B1/en active Active
- 2017-09-11 KR KR1020207037861A patent/KR102239129B1/ko active IP Right Grant
- 2017-09-11 CA CA3096378A patent/CA3096378A1/en active Pending
- 2017-09-11 CA CA3036533A patent/CA3036533C/en active Active
- 2017-09-11 KR KR1020207002634A patent/KR102198835B1/ko active IP Right Grant
- 2017-11-20 US US15/818,231 patent/US10381009B2/en active Active
-
2019
- 2019-08-08 US US16/536,293 patent/US11468901B2/en active Active
-
2021
- 2021-12-17 AU AU2021286422A patent/AU2021286422B2/en active Active
-
2022
- 2022-06-29 JP JP2022104204A patent/JP7619983B2/ja active Active
- 2022-10-10 US US17/963,091 patent/US20230037232A1/en active Pending
-
2023
- 2023-11-06 AU AU2023263421A patent/AU2023263421A1/en active Pending
-
2024
- 2024-01-25 US US18/422,523 patent/US20240249728A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040028790A (ko) * | 2001-06-19 | 2004-04-03 | 세큐리복스 리미티드 | 화자 인식 시스템 |
US20140214417A1 (en) * | 2013-01-28 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and device for voiceprint recognition |
US20150127336A1 (en) * | 2013-11-04 | 2015-05-07 | Google Inc. | Speaker verification using neural networks |
JP2015102806A (ja) * | 2013-11-27 | 2015-06-04 | 国立研究開発法人情報通信研究機構 | 統計的音響モデルの適応方法、統計的音響モデルの適応に適した音響モデルの学習方法、ディープ・ニューラル・ネットワークを構築するためのパラメータを記憶した記憶媒体、及び統計的音響モデルの適応を行なうためのコンピュータプログラム |
US20160098987A1 (en) * | 2014-10-02 | 2016-04-07 | Microsoft Technology Licensing , LLC | Neural network-based speech processing |
Also Published As
Publication number | Publication date |
---|---|
ES2883326T3 (es) | 2021-12-07 |
JP7619983B2 (ja) | 2025-01-22 |
KR102198835B1 (ko) | 2021-01-05 |
CA3075049C (en) | 2020-12-01 |
US20190392842A1 (en) | 2019-12-26 |
AU2023263421A1 (en) | 2023-11-23 |
CA3036533A1 (en) | 2018-03-15 |
AU2021286422B2 (en) | 2023-08-10 |
AU2021286422A1 (en) | 2022-01-20 |
KR20190075914A (ko) | 2019-07-01 |
US9824692B1 (en) | 2017-11-21 |
US20230037232A1 (en) | 2023-02-02 |
CA3096378A1 (en) | 2018-03-15 |
CA3036533C (en) | 2020-04-21 |
US10381009B2 (en) | 2019-08-13 |
US20180075849A1 (en) | 2018-03-15 |
AU2017322591A1 (en) | 2019-05-02 |
CA3075049A1 (en) | 2018-03-15 |
JP2022153376A (ja) | 2022-10-12 |
WO2018049313A1 (en) | 2018-03-15 |
KR102072782B1 (ko) | 2020-02-03 |
KR20200013089A (ko) | 2020-02-05 |
US11468901B2 (en) | 2022-10-11 |
JP7173974B2 (ja) | 2022-11-16 |
EP3501025A1 (en) | 2019-06-26 |
EP3501025B1 (en) | 2021-08-11 |
JP2019532354A (ja) | 2019-11-07 |
AU2017322591B2 (en) | 2021-10-21 |
KR102239129B1 (ko) | 2021-04-09 |
US20240249728A1 (en) | 2024-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102198835B1 (ko) | 심층 신경망을 사용한 단-대-단 화자 인식 | |
US10997980B2 (en) | System and method for determining voice characteristics | |
US10553218B2 (en) | Dimensionality reduction of baum-welch statistics for speaker recognition | |
US11727942B2 (en) | Age compensation in biometric systems using time-interval, gender and age | |
Hansen et al. | Speaker recognition by machines and humans: A tutorial review | |
Singh et al. | Applications of speaker recognition | |
KR100406307B1 (ko) | 음성등록방법 및 음성등록시스템과 이에 기초한음성인식방법 및 음성인식시스템 | |
US10909991B2 (en) | System for text-dependent speaker recognition and method thereof | |
TW202213326A (zh) | 用於說話者驗證的廣義化負對數似然損失 | |
Li et al. | Cost‐Sensitive Learning for Emotion Robust Speaker Recognition | |
Georgescu et al. | GMM-UBM modeling for speaker recognition on a Romanian large speech corpora | |
KR20200066149A (ko) | 사용자 인증 방법 및 장치 | |
Ren et al. | A hybrid GMM speaker verification system for mobile devices in variable environments | |
Wadehra et al. | Comparative Analysis Of Different Speaker Recognition Algorithms | |
Pekcan | Development of machine learning based speaker recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A107 | Divisional application of patent | ||
PA0104 | Divisional application for international application |
Comment text: Divisional Application for International Patent Patent event code: PA01041R01D Patent event date: 20201229 Application number text: 1020207002634 Filing date: 20200128 |
|
PA0201 | Request for examination | ||
PG1501 | Laying open of application | ||
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20210114 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20210406 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20210406 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
PR1001 | Payment of annual fee |
Payment date: 20240404 Start annual number: 4 End annual number: 4 |