KR102008077B1 - 배치된 단대단 음성 인식 - Google Patents
배치된 단대단 음성 인식 Download PDFInfo
- Publication number
- KR102008077B1 KR102008077B1 KR1020177023173A KR20177023173A KR102008077B1 KR 102008077 B1 KR102008077 B1 KR 102008077B1 KR 1020177023173 A KR1020177023173 A KR 1020177023173A KR 20177023173 A KR20177023173 A KR 20177023173A KR 102008077 B1 KR102008077 B1 KR 102008077B1
- Authority
- KR
- South Korea
- Prior art keywords
- training
- model
- layer
- speech
- regression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Machine Translation (AREA)
Abstract
Description
도면 1("도1")은 본 개시의 실시예에 따른 단대단 딥 러닝 모델의 아키텍처를 나타낸다.
도2는 본 개시의 실시예에 따른 딥 러닝 모델의 트레이닝 방법을 나타낸다.
도3은 본 개시의 실시예에 따른 시퀸스별 배치 정규화(sequence-wise batch normalization) 방법을 나타낸다.
도4는 그래프로 본 개시의 실시예에 따른 배치 정규화를 이용하여 트레이닝한 것과 배치 정규화를 이용하지 않고 트레이닝한 두개의 모델의 트레이닝 곡선을 나타낸다.
도5는 본 개시의 실시예에 따른 커리큘럼 학습 전략(curriculum learning strategy)을 이용하여 RNN 모델을 트레이닝하는 방법을 나타낸다.
도6은 본 개시의 실시예에 따른 출력 전사를 위한 이중 자소 분할(bi-graphemes segmentation)을 이용하여 RNN 모델을 트레이닝하는 방법을 나타낸다.
도7은 본 개시의 실시예에 따른 미래 콘텍스트 크기가 2인 행 콘볼루션 아키텍처를 나타낸다.
도8은 본 개시의 실시예에 따른 단방향 RNN 모델을 구비한 오디오 전사 방법을 나타낸다.
도9는 본 개시의 실시예에 따른 다중 언어에 적용되는 음성 전사 모델에 대해 트레이닝을 진행하는 방법을 나타낸다.
도10은 본 개시의 실시예에 따른 2개의 망에 대한 스케일링 비교를 나타낸다.
도11은 본 개시의 실시예에 따른 CTC(Connectionist Temporal Classification) 기법의 GPU 구현을 위한 순방향 전송 및 역전파를 나타낸다.
도12는 본 개시의 실시예에 따른 CTC 손실 함수의 GPU 구현을 위한 방법을 나타낸다.
도13은 본 개시의 실시예에 따른 음성 전사 트레이닝을 위한 데이터 수집 방법을 나타낸다.
도14는 본 개시의 실시예에 따른 지정한 크기의 배치로 요청을 처리하는 확률을 나타낸다.
도15는 본 개시의 실시예에 따른 서버 부하 함수의 중간값 및 98 백분위수 지연을 나타낸다.
도16은 본 개시의 실시예에 따른 커널의 비교를 나타낸다.
도17은 본 개시의 실시예에 따른 트레이닝 노드의 예시도를 나타내며, 여기서, PLX는 PCI 스위치를 가리키고, 점선 박스는 동일한 PCI 루트 콤플렉스에 의해 연결된 모든 장치를 포함한다.
도18은 본 개시의 실시예에 따른 컴퓨팅 시스템의 간략화된 블록도를 나타낸다.
Claims (20)
- 사용자로부터 입력 오디오를 수신하되, 상기 입력 오디오는 다수의 발언을 포함하는 단계;
상기 다수의 발언 중의 각 발언에 대해 일 세트의 스펙트로그램 프레임을 획득하는 단계;
상기 일 세트의 스펙트로그램 프레임을 회귀 신경망(RNN) 모델에 입력하는 단계 - 상기 RNN 모델은 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층을 포함하고, 상기 RNN 모델은 상기 하나 또는 다수의 회기층 상에 위치한 행 콘볼루션 층을 더 포함하고, 상기 행 콘볼루션 층은 단방향 및 단순 순방향 층(unidirectional and forward-only layer)이고, 상기 RNN 모델은 트레이닝 데이터 세트로부터 샘플링된 다수의 미니 배치의 트레이닝 발언 시퀸스를 이용하여 사전 트레이닝되고, 다수의 미니 배치는 트레이닝 기간에 배치 정규화되어 상기 하나 또는 다수의 회귀층 중 적어도 하나 중의 사전 활성화를 정규화시킴 -;
현재 시간 스텝 크기와 적어도 하나의 미래 시간 스텝 크기에서 상기 회귀층로부터의 정보를 이용하여 상기 행 콘볼루션 층의 활성화를 획득하는 단계;
상기 현재 시간 스텝에 대응하는 출력 예측을 위하여 상기 행 콘볼루션 층의 상기 획득된 활성화를 분류기로 피드하는 단계;
상기 RNN 모델로부터 하나 또는 다수의 예측된 문자의 확률 출력을 획득하는 단계; 및
각 발언의 가장 가능한 전사를 찾아내도록, 언어 모델에 제약된 상기 확률 출력을 이용하여 검색을 진행하되, 상기 언어 모델은, 상기 예측된 문자의 확률 출력으로부터 확정한 문자열을 하나의 단어 또는 다수의 단어로 해석하는 단계
를 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 배치 정규화는,
상기 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층 중의 각 은닉 유닛에 대해, 각 미니 배치 중의 각 트레이닝 발언 시퀸스의 길이 상에서 상기 사전 활성화의 평균값과 분산을 계산하는 단계를 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 삭제
- 삭제
- 삭제
- 제1항에 있어서,
상기 예측된 문자는 영어 알파벳 또는 중국어 문자인 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 입력 오디오는, 상기 입력 오디오의 총 파워가 상기 RNN 모델을 사전 트레이닝하기 위한 일 세트의 트레이닝 샘플이 일치하도록 정규화되는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 제1항에 있어서,
누적 확률이 적어도 역치인 문자들만 고려하기 위해, 상기 언어 모델 중에서 빔 탐색을 진행하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 제1항에 있어서,
상기 일 세트의 스펙트로그램 프레임을 획득하는 과정에서 미리 결정된 개수의 시편의 스텝 크기를 스트라이드(stride)로 취함으로써, 상기 발언에 대한 서브 샘플링이 상기 일 세트의 스펙트로그램 프레임을 획득할 때 구현되는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 제9항에 있어서,
상기 RNN 모델로부터의 상기 예측된 문자는, 전체 단어, 음절 및 단어 레벨에서 중복되지 않은 n-그램으로부터 선택된 대체 라벨을 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터로 구현되는 방법. - 하나 또는 다수의 명령어 시퀸스를 포함하는 컴퓨터 판독가능 기록매체에 있어서,
상기 명령어 시퀸스가 하나 또는 다수의 마이크로프로세서에 의해 실행될 경우,
입력 오디오를 수신하되, 상기 입력 오디오는 다수의 발언을 포함하는 단계;
상기 다수의 발언 중의 각 발언에 대해 일 세트의 스펙트로그램 프레임을 획득하는 단계;
상기 일 세트의 스펙트로그램 프레임을 신경망에 입력하는 단계 - 상기 신경망은 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층을 포함하고, 상기 신경망은 상기 하나 또는 다수의 회기층 상에 위치한 행 콘볼루션 층을 더 포함하고, 상기 행 콘볼루션 층은 단방향 및 단순 순방향 층(unidirectional and forward-only layer)이고, 상기 신경망은, 트레이닝 데이터 세트로부터 샘플링된 다수의 미니 배치의 트레이닝 발언 시퀸스를 이용하여 사전 트레이닝되고, 다수의 미치 배치는 트레이닝 기간에 정규화되어 상기 하나 또는 다수의 콘볼루션 층 중 적어도 하나 중의 사전 활성화를 정규화시킴 -;
현재 시간 스텝 크기와 적어도 하나의 미래 시간 스텝 크기에서 상기 회귀층로부터의 정보를 이용하여 상기 행 콘볼루션 층의 활성화를 획득하는 단계;
상기 현재 시간 스텝에 대응하는 출력 예측을 위하여 상기 행 콘볼루션 층의 상기 획득된 활성화를 분류기로 피드하는 단계;
사전 트레이닝된 신경망로부터 하나 또는 다수의 예측된 문자의 확률 출력을 획득하는 단계; 및
각 발언의 가장 가능한 전사를 찾아내도록, 언어 모델에 제약된 상기 확률 출력을 이용하여 빔 탐색을 진행하되, 상기 언어 모델은, 상기 예측된 문자로부터 확정한 문자열을 하나의 단어 또는 다수의 단어로 해석하는 단계
를 포함하는 단계들을 수행하도록 하는 것을 특징으로 하는 컴퓨터 판독가능한 기록매체. - 제11항에 있어서,
상기 단계들은,
상기 일 세트의 스펙트로그램 프레임을 획득하는 과정에서, 미리 결정된 개수의 시편의 스텝 크기를 스트라이드(stride)로 취함으로써, 상기 발언에 대한 서브 샘플링이 상기 일 세트의 스펙트로그램 프레임을 획득할 때 구현되는 단계
를 더 포함하는 것을 특징으로 컴퓨터 판독가능한 기록매체. - 제11항에 있어서,
상기 하나 또는 다수의 예측된 문자는, 영어 알파벳으로부터 강화된 중복되지 않은 바이그램을 포함하는 것을 특징으로 하는 컴퓨터 판독가능한 기록매체. - 제11항에 있어서,
상기 단계들은,
상기 트레이닝 데이터 세트로부터의 통계 정보를 이용하여 상기 입력 오디오를 정규화시키는 단계
를 더 포함하는 것을 특징으로 컴퓨터 판독가능한 기록매체. - 제11항에 있어서,
사전 트레이닝된 신경망은 트레이닝 세트를 이용하여 CTC(Connectionist Temporal Classification) 손실 함수에 의해 트레이닝되는 것을 특징으로 하는 컴퓨터 판독가능한 기록매체.
- 삭제
- 삭제
- 삭제
- 삭제
- 삭제
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562260206P | 2015-11-25 | 2015-11-25 | |
US62/260,206 | 2015-11-25 | ||
US15/358,102 US10332509B2 (en) | 2015-11-25 | 2016-11-21 | End-to-end speech recognition |
US15/358,083 | 2016-11-21 | ||
US15/358,102 | 2016-11-21 | ||
US15/358,083 US10319374B2 (en) | 2015-11-25 | 2016-11-21 | Deployed end-to-end speech recognition |
PCT/US2016/063641 WO2017091751A1 (en) | 2015-11-25 | 2016-11-23 | Deployed end-to-end speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170106445A KR20170106445A (ko) | 2017-09-20 |
KR102008077B1 true KR102008077B1 (ko) | 2019-08-06 |
Family
ID=58721011
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020177023177A Active KR102033230B1 (ko) | 2015-11-25 | 2016-11-23 | 단대단 음성 인식 |
KR1020177023173A Active KR102008077B1 (ko) | 2015-11-25 | 2016-11-23 | 배치된 단대단 음성 인식 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020177023177A Active KR102033230B1 (ko) | 2015-11-25 | 2016-11-23 | 단대단 음성 인식 |
Country Status (6)
Country | Link |
---|---|
US (2) | US10332509B2 (ko) |
EP (2) | EP3245652B1 (ko) |
JP (2) | JP6629872B2 (ko) |
KR (2) | KR102033230B1 (ko) |
CN (2) | CN107408384B (ko) |
WO (2) | WO2017091763A1 (ko) |
Families Citing this family (297)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8515052B2 (en) | 2007-12-17 | 2013-08-20 | Wai Wu | Parallel signal processing system and method |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9634855B2 (en) | 2010-05-13 | 2017-04-25 | Alexander Poltorak | Electronic personal interactive device that determines topics of interest using a conversational agent |
KR102103057B1 (ko) | 2013-02-07 | 2020-04-21 | 애플 인크. | 디지털 어시스턴트를 위한 음성 트리거 |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10229672B1 (en) * | 2015-12-31 | 2019-03-12 | Google Llc | Training acoustic models using connectionist temporal classification |
JP6706326B2 (ja) * | 2016-02-03 | 2020-06-03 | グーグル エルエルシー | リカレントニューラルネットワークモデルの圧縮 |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
CN106251859B (zh) * | 2016-07-22 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | 语音识别处理方法和装置 |
US9984683B2 (en) * | 2016-07-22 | 2018-05-29 | Google Llc | Automatic speech recognition using multi-dimensional models |
US11080591B2 (en) * | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
KR102392094B1 (ko) * | 2016-09-06 | 2022-04-28 | 딥마인드 테크놀로지스 리미티드 | 컨볼루션 신경망을 이용한 시퀀스 프로세싱 |
AU2017324937B2 (en) | 2016-09-06 | 2019-12-19 | Deepmind Technologies Limited | Generating audio using neural networks |
US10224058B2 (en) | 2016-09-07 | 2019-03-05 | Google Llc | Enhanced multi-channel acoustic models |
CN109891437A (zh) * | 2016-10-03 | 2019-06-14 | 谷歌有限责任公司 | 使用神经网络处理文本序列 |
JP6756916B2 (ja) | 2016-10-26 | 2020-09-16 | ディープマインド テクノロジーズ リミテッド | ニューラルネットワークを使用したテキストシーケンスの処理 |
US10529320B2 (en) * | 2016-12-21 | 2020-01-07 | Google Llc | Complex evolution recurrent neural networks |
US10140980B2 (en) * | 2016-12-21 | 2018-11-27 | Google LCC | Complex linear projection for acoustic modeling |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
KR101882906B1 (ko) * | 2017-01-17 | 2018-07-27 | 경북대학교 산학협력단 | 복수 문단 텍스트의 추상적 요약문 생성 장치 및 방법, 그 방법을 수행하기 위한 기록 매체 |
US10049106B2 (en) * | 2017-01-18 | 2018-08-14 | Xerox Corporation | Natural language generation through character-based recurrent neural networks with finite-state prior knowledge |
US11907858B2 (en) * | 2017-02-06 | 2024-02-20 | Yahoo Assets Llc | Entity disambiguation |
US11087213B2 (en) * | 2017-02-10 | 2021-08-10 | Synaptics Incorporated | Binary and multi-class classification systems and methods using one spike connectionist temporal classification |
US10762891B2 (en) * | 2017-02-10 | 2020-09-01 | Synaptics Incorporated | Binary and multi-class classification systems and methods using connectionist temporal classification |
US11080600B2 (en) * | 2017-02-10 | 2021-08-03 | Synaptics Incorporated | Recurrent neural network based acoustic event classification using complement rule |
US11853884B2 (en) * | 2017-02-10 | 2023-12-26 | Synaptics Incorporated | Many or one detection classification systems and methods |
US10762417B2 (en) * | 2017-02-10 | 2020-09-01 | Synaptics Incorporated | Efficient connectionist temporal classification for binary classification |
US11100932B2 (en) * | 2017-02-10 | 2021-08-24 | Synaptics Incorporated | Robust start-end point detection algorithm using neural network |
US10872598B2 (en) * | 2017-02-24 | 2020-12-22 | Baidu Usa Llc | Systems and methods for real-time neural text-to-speech |
US10373610B2 (en) * | 2017-02-24 | 2019-08-06 | Baidu Usa Llc | Systems and methods for automatic unit selection and target decomposition for sequence labelling |
US10657955B2 (en) * | 2017-02-24 | 2020-05-19 | Baidu Usa Llc | Systems and methods for principled bias reduction in production speech models |
US10878837B1 (en) * | 2017-03-01 | 2020-12-29 | Snap Inc. | Acoustic neural network scene detection |
US10762427B2 (en) * | 2017-03-01 | 2020-09-01 | Synaptics Incorporated | Connectionist temporal classification using segmented labeled sequence data |
US10540961B2 (en) * | 2017-03-13 | 2020-01-21 | Baidu Usa Llc | Convolutional recurrent neural networks for small-footprint keyword spotting |
US11017291B2 (en) * | 2017-04-28 | 2021-05-25 | Intel Corporation | Training with adaptive runtime and precision profiling |
US11410024B2 (en) * | 2017-04-28 | 2022-08-09 | Intel Corporation | Tool for facilitating efficiency in machine learning |
US10467052B2 (en) * | 2017-05-01 | 2019-11-05 | Red Hat, Inc. | Cluster topology aware container scheduling for efficient data transfer |
US20180330718A1 (en) * | 2017-05-11 | 2018-11-15 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for End-to-End speech recognition |
KR20180124381A (ko) * | 2017-05-11 | 2018-11-21 | 현대자동차주식회사 | 운전자의 상태 판단 시스템 및 그 방법 |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Multi-modal interfaces |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
DK179549B1 (en) | 2017-05-16 | 2019-02-12 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US10896669B2 (en) | 2017-05-19 | 2021-01-19 | Baidu Usa Llc | Systems and methods for multi-speaker neural text-to-speech |
CN107240396B (zh) * | 2017-06-16 | 2023-01-17 | 百度在线网络技术(北京)有限公司 | 说话人自适应方法、装置、设备及存储介质 |
EP3422518B1 (en) * | 2017-06-28 | 2020-06-17 | Siemens Aktiengesellschaft | A method for recognizing contingencies in a power supply network |
KR102483643B1 (ko) * | 2017-08-14 | 2023-01-02 | 삼성전자주식회사 | 모델을 학습하는 방법 및 장치 및 상기 뉴럴 네트워크를 이용한 인식 방법 및 장치 |
KR102410820B1 (ko) * | 2017-08-14 | 2022-06-20 | 삼성전자주식회사 | 뉴럴 네트워크를 이용한 인식 방법 및 장치 및 상기 뉴럴 네트워크를 트레이닝하는 방법 및 장치 |
US10706840B2 (en) * | 2017-08-18 | 2020-07-07 | Google Llc | Encoder-decoder models for sequence to sequence mapping |
US11694066B2 (en) | 2017-10-17 | 2023-07-04 | Xilinx, Inc. | Machine learning runtime library for neural network acceleration |
US10872596B2 (en) | 2017-10-19 | 2020-12-22 | Baidu Usa Llc | Systems and methods for parallel wave generation in end-to-end text-to-speech |
US11017761B2 (en) | 2017-10-19 | 2021-05-25 | Baidu Usa Llc | Parallel neural text-to-speech |
US10796686B2 (en) | 2017-10-19 | 2020-10-06 | Baidu Usa Llc | Systems and methods for neural text-to-speech using convolutional sequence learning |
CN107680597B (zh) * | 2017-10-23 | 2019-07-09 | 平安科技(深圳)有限公司 | 语音识别方法、装置、设备以及计算机可读存储介质 |
US11556775B2 (en) | 2017-10-24 | 2023-01-17 | Baidu Usa Llc | Systems and methods for trace norm regularization and faster inference for embedded models |
US20190130896A1 (en) * | 2017-10-26 | 2019-05-02 | Salesforce.Com, Inc. | Regularization Techniques for End-To-End Speech Recognition |
US11250314B2 (en) * | 2017-10-27 | 2022-02-15 | Cognizant Technology Solutions U.S. Corporation | Beyond shared hierarchies: deep multitask learning through soft layer ordering |
US10573295B2 (en) * | 2017-10-27 | 2020-02-25 | Salesforce.Com, Inc. | End-to-end speech recognition with policy learning |
US11562287B2 (en) | 2017-10-27 | 2023-01-24 | Salesforce.Com, Inc. | Hierarchical and interpretable skill acquisition in multi-task reinforcement learning |
US10535001B2 (en) * | 2017-11-06 | 2020-01-14 | International Business Machines Corporation | Reducing problem complexity when analyzing 3-D images |
JP7213241B2 (ja) | 2017-11-14 | 2023-01-26 | マジック リープ, インコーポレイテッド | ニューラルネットワークに関するマルチタスク学習のためのメタ学習 |
US11537439B1 (en) * | 2017-11-22 | 2022-12-27 | Amazon Technologies, Inc. | Intelligent compute resource selection for machine learning training jobs |
US11977958B2 (en) | 2017-11-22 | 2024-05-07 | Amazon Technologies, Inc. | Network-accessible machine learning model training and hosting system |
CN108334889B (zh) | 2017-11-30 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 摘要描述生成方法和装置、摘要描述模型训练方法和装置 |
CN108171117B (zh) * | 2017-12-05 | 2019-05-21 | 南京南瑞信息通信科技有限公司 | 基于多核异构并行计算的电力人工智能视觉分析系统 |
CN107945791B (zh) * | 2017-12-05 | 2021-07-20 | 华南理工大学 | 一种基于深度学习目标检测的语音识别方法 |
US10847137B1 (en) * | 2017-12-12 | 2020-11-24 | Amazon Technologies, Inc. | Trigger word detection using neural network waveform processing |
KR102462426B1 (ko) | 2017-12-14 | 2022-11-03 | 삼성전자주식회사 | 발화의 의미를 분석하기 위한 전자 장치 및 그의 동작 방법 |
US10672388B2 (en) * | 2017-12-15 | 2020-06-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for open-vocabulary end-to-end speech recognition |
US10593321B2 (en) * | 2017-12-15 | 2020-03-17 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for multi-lingual end-to-end speech recognition |
US11443178B2 (en) | 2017-12-15 | 2022-09-13 | Interntional Business Machines Corporation | Deep neural network hardening framework |
CN108364662B (zh) * | 2017-12-29 | 2021-01-05 | 中国科学院自动化研究所 | 基于成对鉴别任务的语音情感识别方法与系统 |
CN108229659A (zh) * | 2017-12-29 | 2018-06-29 | 陕西科技大学 | 基于深度学习的钢琴单键音识别方法 |
FR3076378B1 (fr) * | 2017-12-29 | 2020-05-29 | Bull Sas | Procede de formation d'un reseau de neurones pour la reconnaissance d'une sequence de caracteres et procede de reconnaissance associe |
CN108089958B (zh) * | 2017-12-29 | 2021-06-08 | 珠海市君天电子科技有限公司 | Gpu测试方法、终端设备和计算机可读存储介质 |
KR102089076B1 (ko) * | 2018-01-11 | 2020-03-13 | 중앙대학교 산학협력단 | Bcsc를 이용한 딥러닝 방법 및 그 장치 |
CN108256474A (zh) * | 2018-01-17 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | 用于识别菜品的方法和装置 |
CN108417202B (zh) * | 2018-01-19 | 2020-09-01 | 苏州思必驰信息科技有限公司 | 语音识别方法及系统 |
CN108417201B (zh) * | 2018-01-19 | 2020-11-06 | 苏州思必驰信息科技有限公司 | 单信道多说话人身份识别方法及系统 |
CN108491836B (zh) * | 2018-01-25 | 2020-11-24 | 华南理工大学 | 一种自然场景图像中中文文本整体识别方法 |
US10657426B2 (en) * | 2018-01-25 | 2020-05-19 | Samsung Electronics Co., Ltd. | Accelerating long short-term memory networks via selective pruning |
US11182694B2 (en) | 2018-02-02 | 2021-11-23 | Samsung Electronics Co., Ltd. | Data path for GPU machine learning training with key value SSD |
US11527308B2 (en) | 2018-02-06 | 2022-12-13 | Cognizant Technology Solutions U.S. Corporation | Enhanced optimization with composite objectives and novelty-diversity selection |
US12033079B2 (en) | 2018-02-08 | 2024-07-09 | Cognizant Technology Solutions U.S. Corporation | System and method for pseudo-task augmentation in deep multitask learning |
US10776581B2 (en) * | 2018-02-09 | 2020-09-15 | Salesforce.Com, Inc. | Multitask learning as question answering |
TWI659411B (zh) * | 2018-03-01 | 2019-05-11 | 大陸商芋頭科技(杭州)有限公司 | 一種多語言混合語音識別方法 |
CN108564954B (zh) * | 2018-03-19 | 2020-01-10 | 平安科技(深圳)有限公司 | 深度神经网络模型、电子装置、身份验证方法和存储介质 |
KR102473447B1 (ko) | 2018-03-22 | 2022-12-05 | 삼성전자주식회사 | 인공지능 모델을 이용하여 사용자 음성을 변조하기 위한 전자 장치 및 이의 제어 방법 |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US20190318229A1 (en) * | 2018-04-12 | 2019-10-17 | Advanced Micro Devices, Inc. | Method and system for hardware mapping inference pipelines |
CN108538311B (zh) * | 2018-04-13 | 2020-09-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频分类方法、装置及计算机可读存储介质 |
US10672414B2 (en) * | 2018-04-13 | 2020-06-02 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable media for improved real-time audio processing |
WO2019209569A1 (en) * | 2018-04-23 | 2019-10-31 | Google Llc | Speaker diarization using an end-to-end model |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
JP2021523410A (ja) | 2018-05-10 | 2021-09-02 | ザ ボード オブ トラスティーズ オブ ザ レランド スタンフォード ジュニア ユニバーシティー | 光子ニューラルネットワークのための活性化関数のためのシステムおよび方法 |
JP2021523461A (ja) * | 2018-05-10 | 2021-09-02 | ザ ボード オブ トラスティーズ オブ ザ レランド スタンフォード ジュニア ユニバーシティー | 原位置逆伝搬を通した光子ニューラルネットワークの訓練 |
US11086937B2 (en) * | 2018-05-11 | 2021-08-10 | The Regents Of The University Of California | Speech based structured querying |
KR102018346B1 (ko) * | 2018-05-11 | 2019-10-14 | 국방과학연구소 | 음향 신호를 분류하는 방법 및 시스템 |
US11138471B2 (en) * | 2018-05-18 | 2021-10-05 | Google Llc | Augmentation of audiographic images for improved machine learning |
US11462209B2 (en) * | 2018-05-18 | 2022-10-04 | Baidu Usa Llc | Spectrogram to waveform synthesis using convolutional networks |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DE112019001959T5 (de) * | 2018-06-21 | 2021-01-21 | International Business Machines Corporation | Segmentieren unregelmässiger formen in bildern unter verwendung von tiefem bereichswachstum |
CN108984535B (zh) * | 2018-06-25 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 语句翻译的方法、翻译模型训练的方法、设备及存储介质 |
US11361456B2 (en) * | 2018-06-29 | 2022-06-14 | Baidu Usa Llc | Systems and methods for depth estimation via affinity learned with convolutional spatial propagation networks |
CN109147766B (zh) * | 2018-07-06 | 2020-08-18 | 北京爱医声科技有限公司 | 基于端到端深度学习模型的语音识别方法及系统 |
CN118737132A (zh) | 2018-07-13 | 2024-10-01 | 谷歌有限责任公司 | 端到端流关键词检出 |
US11335333B2 (en) | 2018-07-20 | 2022-05-17 | Google Llc | Speech recognition with sequence-to-sequence models |
CN110752973B (zh) * | 2018-07-24 | 2020-12-25 | Tcl科技集团股份有限公司 | 一种终端设备的控制方法、装置和终端设备 |
CN108962230B (zh) * | 2018-07-27 | 2019-04-23 | 重庆因普乐科技有限公司 | 基于忆阻器的语音识别方法 |
US10720151B2 (en) * | 2018-07-27 | 2020-07-21 | Deepgram, Inc. | End-to-end neural networks for speech recognition and classification |
JP7209330B2 (ja) * | 2018-07-30 | 2023-01-20 | 国立研究開発法人情報通信研究機構 | 識別器、学習済モデル、学習方法 |
US11107463B2 (en) | 2018-08-01 | 2021-08-31 | Google Llc | Minimum word error rate training for attention-based sequence-to-sequence models |
CN110825665B (zh) * | 2018-08-10 | 2021-11-05 | 昆仑芯(北京)科技有限公司 | 数据获取单元和应用于控制器的数据获取方法 |
US10650812B2 (en) * | 2018-08-13 | 2020-05-12 | Bank Of America Corporation | Deterministic multi-length sliding window protocol for contiguous string entity |
CN109003601A (zh) * | 2018-08-31 | 2018-12-14 | 北京工商大学 | 一种针对低资源土家语的跨语言端到端语音识别方法 |
WO2020048358A1 (en) * | 2018-09-04 | 2020-03-12 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for recognizing speech using depth information |
US10963721B2 (en) | 2018-09-10 | 2021-03-30 | Sony Corporation | License plate number recognition based on three dimensional beam search |
CN109271926B (zh) * | 2018-09-14 | 2021-09-10 | 西安电子科技大学 | 基于gru深度卷积网络的智能辐射源识别方法 |
CN109215662B (zh) * | 2018-09-18 | 2023-06-20 | 平安科技(深圳)有限公司 | 端对端语音识别方法、电子装置及计算机可读存储介质 |
JP7043373B2 (ja) * | 2018-09-18 | 2022-03-29 | ヤフー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US10672382B2 (en) * | 2018-10-15 | 2020-06-02 | Tencent America LLC | Input-feeding architecture for attention based end-to-end speech recognition |
US10891951B2 (en) * | 2018-10-17 | 2021-01-12 | Ford Global Technologies, Llc | Vehicle language processing |
EP3640856A1 (en) * | 2018-10-19 | 2020-04-22 | Fujitsu Limited | A method, apparatus and computer program to carry out a training procedure in a convolutional neural network |
KR20200045128A (ko) * | 2018-10-22 | 2020-05-04 | 삼성전자주식회사 | 모델 학습 방법 및 장치, 및 데이터 인식 방법 |
CN109447253B (zh) * | 2018-10-26 | 2021-04-27 | 杭州比智科技有限公司 | 显存分配的方法、装置、计算设备及计算机存储介质 |
CN112970063B (zh) | 2018-10-29 | 2024-10-18 | 杜比国际公司 | 用于利用生成模型的码率质量可分级编码的方法及设备 |
US11494612B2 (en) | 2018-10-31 | 2022-11-08 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks using domain classifier |
US11640519B2 (en) * | 2018-10-31 | 2023-05-02 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks using cross-domain batch normalization |
US12282845B2 (en) | 2018-11-01 | 2025-04-22 | Cognizant Technology Solutions US Corp. | Multiobjective coevolution of deep neural network architectures |
US11526759B2 (en) * | 2018-11-05 | 2022-12-13 | International Business Machines Corporation | Large model support in deep learning |
CN109523994A (zh) * | 2018-11-13 | 2019-03-26 | 四川大学 | 一种基于胶囊神经网络的多任务语音分类方法 |
CN109492233B (zh) * | 2018-11-14 | 2023-10-17 | 北京捷通华声科技股份有限公司 | 一种机器翻译方法和装置 |
US11250838B2 (en) * | 2018-11-16 | 2022-02-15 | Deepmind Technologies Limited | Cross-modal sequence distillation |
US11238845B2 (en) | 2018-11-21 | 2022-02-01 | Google Llc | Multi-dialect and multilingual speech recognition |
US11736363B2 (en) * | 2018-11-30 | 2023-08-22 | Disney Enterprises, Inc. | Techniques for analyzing a network and increasing network availability |
US10388272B1 (en) | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
US11170761B2 (en) | 2018-12-04 | 2021-11-09 | Sorenson Ip Holdings, Llc | Training of speech recognition systems |
US10573312B1 (en) | 2018-12-04 | 2020-02-25 | Sorenson Ip Holdings, Llc | Transcription generation from multiple speech recognition systems |
US11017778B1 (en) | 2018-12-04 | 2021-05-25 | Sorenson Ip Holdings, Llc | Switching between speech recognition systems |
KR102681637B1 (ko) | 2018-12-13 | 2024-07-05 | 현대자동차주식회사 | 문제소음 발음원 식별을 위한 소음데이터의 인공지능 장치 및 전처리 방법 |
WO2020139588A1 (en) | 2018-12-24 | 2020-07-02 | Dts, Inc. | Room acoustics simulation using deep learning image analysis |
JP7206898B2 (ja) * | 2018-12-25 | 2023-01-18 | 富士通株式会社 | 学習装置、学習方法および学習プログラム |
CN111369978B (zh) * | 2018-12-26 | 2024-05-17 | 北京搜狗科技发展有限公司 | 一种数据处理方法、装置和用于数据处理的装置 |
KR102744417B1 (ko) | 2018-12-28 | 2024-12-19 | 한국전자통신연구원 | 오디오 신호를 위한 손실 함수 결정 방법 및 손실 함수 결정 장치 |
CN111429889B (zh) * | 2019-01-08 | 2023-04-28 | 百度在线网络技术(北京)有限公司 | 基于截断注意力的实时语音识别的方法、装置、设备以及计算机可读存储介质 |
US11322136B2 (en) | 2019-01-09 | 2022-05-03 | Samsung Electronics Co., Ltd. | System and method for multi-spoken language detection |
US10740571B1 (en) * | 2019-01-23 | 2020-08-11 | Google Llc | Generating neural network outputs using insertion operations |
CN109783822B (zh) * | 2019-01-24 | 2023-04-18 | 中国—东盟信息港股份有限公司 | 一种基于验证码的数据样本识别系统及其方法 |
CN111489742B (zh) * | 2019-01-28 | 2023-06-27 | 北京猎户星空科技有限公司 | 声学模型训练方法、语音识别方法、装置及电子设备 |
KR102691895B1 (ko) | 2019-01-29 | 2024-08-06 | 삼성전자주식회사 | 가속 컴퓨팅 환경을 제공하는 서버 및 제어 방법 |
CN110517666B (zh) * | 2019-01-29 | 2021-03-02 | 腾讯科技(深圳)有限公司 | 音频识别方法、系统、机器设备和计算机可读介质 |
KR102592585B1 (ko) * | 2019-02-01 | 2023-10-23 | 한국전자통신연구원 | 번역 모델 구축 방법 및 장치 |
JP7028203B2 (ja) * | 2019-02-07 | 2022-03-02 | 日本電信電話株式会社 | 音声認識装置、音声認識方法、プログラム |
JP7218601B2 (ja) * | 2019-02-12 | 2023-02-07 | 日本電信電話株式会社 | 学習データ取得装置、モデル学習装置、それらの方法、およびプログラム |
CN110059813B (zh) | 2019-02-13 | 2021-04-06 | 创新先进技术有限公司 | 利用gpu集群更新卷积神经网络的方法、装置及设备 |
US10861441B2 (en) | 2019-02-14 | 2020-12-08 | Tencent America LLC | Large margin training for attention-based end-to-end speech recognition |
US11037547B2 (en) * | 2019-02-14 | 2021-06-15 | Tencent America LLC | Token-wise training for attention based end-to-end speech recognition |
US11481639B2 (en) | 2019-02-26 | 2022-10-25 | Cognizant Technology Solutions U.S. Corporation | Enhanced optimization with composite objectives and novelty pulsation |
CA3129731A1 (en) * | 2019-03-13 | 2020-09-17 | Elliot Meyerson | System and method for implementing modular universal reparameterization for deep multi-task learning across diverse domains |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
CN111709513B (zh) * | 2019-03-18 | 2023-06-09 | 百度在线网络技术(北京)有限公司 | 长短期记忆网络lstm的训练系统、方法及电子设备 |
WO2020198520A1 (en) | 2019-03-27 | 2020-10-01 | Cognizant Technology Solutions U.S. Corporation | Process and system including an optimization engine with evolutionary surrogate-assisted prescriptions |
US11182457B2 (en) | 2019-03-28 | 2021-11-23 | International Business Machines Corporation | Matrix-factorization based gradient compression |
US11011156B2 (en) * | 2019-04-11 | 2021-05-18 | International Business Machines Corporation | Training data modification for training model |
CN109887497B (zh) * | 2019-04-12 | 2021-01-29 | 北京百度网讯科技有限公司 | 语音识别的建模方法、装置及设备 |
CN110033760B (zh) * | 2019-04-15 | 2021-01-29 | 北京百度网讯科技有限公司 | 语音识别的建模方法、装置及设备 |
CN113841195B (zh) * | 2019-04-16 | 2023-12-22 | 谷歌有限责任公司 | 联合端点确定和自动语音识别 |
US11676006B2 (en) * | 2019-04-16 | 2023-06-13 | Microsoft Technology Licensing, Llc | Universal acoustic modeling using neural mixture models |
US10997967B2 (en) * | 2019-04-18 | 2021-05-04 | Honeywell International Inc. | Methods and systems for cockpit speech recognition acoustic model training with multi-level corpus data augmentation |
US11468879B2 (en) * | 2019-04-29 | 2022-10-11 | Tencent America LLC | Duration informed attention network for text-to-speech analysis |
US20200349425A1 (en) * | 2019-04-30 | 2020-11-05 | Fujitsu Limited | Training time reduction in automatic data augmentation |
KR102754124B1 (ko) * | 2019-05-03 | 2025-01-14 | 구글 엘엘씨 | 숫자 시퀀스에 대한 종단 간 자동 음성 인식 |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
CN110211565B (zh) * | 2019-05-06 | 2023-04-04 | 平安科技(深圳)有限公司 | 方言识别方法、装置及计算机可读存储介质 |
CN114097026A (zh) * | 2019-05-06 | 2022-02-25 | 谷歌有限责任公司 | 语音识别的上下文偏置 |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
KR102460676B1 (ko) | 2019-05-07 | 2022-10-31 | 한국전자통신연구원 | 밀집 연결된 하이브리드 뉴럴 네트워크를 이용한 음성 처리 장치 및 방법 |
WO2020225772A1 (en) * | 2019-05-07 | 2020-11-12 | Imagia Cybernetics Inc. | Method and system for initializing a neural network |
CN110222578B (zh) * | 2019-05-08 | 2022-12-27 | 腾讯科技(深圳)有限公司 | 对抗测试看图说话系统的方法和装置 |
CN110085249B (zh) * | 2019-05-09 | 2021-03-16 | 南京工程学院 | 基于注意力门控的循环神经网络的单通道语音增强方法 |
JP7229847B2 (ja) * | 2019-05-13 | 2023-02-28 | 株式会社日立製作所 | 対話装置、対話方法、及び対話コンピュータプログラム |
US11481609B2 (en) * | 2019-05-13 | 2022-10-25 | Google Llc | Computationally efficient expressive output layers for neural networks |
CN113924619A (zh) * | 2019-05-28 | 2022-01-11 | 谷歌有限责任公司 | 通过流式端到端模型的大规模多语言语音识别 |
CN112017676B (zh) * | 2019-05-31 | 2024-07-16 | 京东科技控股股份有限公司 | 音频处理方法、装置和计算机可读存储介质 |
US11289073B2 (en) * | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US10716089B1 (en) * | 2019-06-03 | 2020-07-14 | Mapsted Corp. | Deployment of trained neural network based RSS fingerprint dataset |
WO2020247489A1 (en) * | 2019-06-04 | 2020-12-10 | Google Llc | Two-pass end to end speech recognition |
CN110189766B (zh) * | 2019-06-14 | 2021-04-06 | 西南科技大学 | 一种基于神经网络的语音风格转移方法 |
KR20220054704A (ko) | 2019-06-19 | 2022-05-03 | 구글 엘엘씨 | 음성 인식을 위한 컨텍스트 바이어싱 |
CN110299132B (zh) * | 2019-06-26 | 2021-11-02 | 京东数字科技控股有限公司 | 一种语音数字识别方法和装置 |
CN110288682B (zh) | 2019-06-28 | 2023-09-26 | 北京百度网讯科技有限公司 | 用于控制三维虚拟人像口型变化的方法和装置 |
WO2021010562A1 (en) | 2019-07-15 | 2021-01-21 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
KR20210008788A (ko) | 2019-07-15 | 2021-01-25 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
US11244673B2 (en) * | 2019-07-19 | 2022-02-08 | Microsoft Technologly Licensing, LLC | Streaming contextual unidirectional models |
KR102824645B1 (ko) * | 2019-07-31 | 2025-06-24 | 삼성전자주식회사 | 음성 인식을 위한 인공신경망에서의 디코딩 방법 및 장치 |
CN110473554B (zh) * | 2019-08-08 | 2022-01-25 | Oppo广东移动通信有限公司 | 音频校验方法、装置、存储介质及电子设备 |
WO2021029643A1 (en) | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | System and method for modifying speech recognition result |
CN114223029A (zh) | 2019-08-13 | 2022-03-22 | 三星电子株式会社 | 支持装置进行语音识别的服务器及服务器的操作方法 |
EP3980991B1 (en) | 2019-08-13 | 2024-01-03 | Samsung Electronics Co., Ltd. | System and method for recognizing user's speech |
CN110459209B (zh) * | 2019-08-20 | 2021-05-28 | 深圳追一科技有限公司 | 语音识别方法、装置、设备及存储介质 |
US11151979B2 (en) | 2019-08-23 | 2021-10-19 | Tencent America LLC | Duration informed attention network (DURIAN) for audio-visual synthesis |
US11158303B2 (en) * | 2019-08-27 | 2021-10-26 | International Business Machines Corporation | Soft-forgetting for connectionist temporal classification based automatic speech recognition |
US11551675B2 (en) | 2019-09-03 | 2023-01-10 | Samsung Electronics Co., Ltd. | Electronic device and method for controlling the electronic device thereof |
CN110459208B (zh) * | 2019-09-09 | 2022-01-11 | 中科极限元(杭州)智能科技股份有限公司 | 一种基于知识迁移的序列到序列语音识别模型训练方法 |
CN110600020B (zh) * | 2019-09-12 | 2022-05-17 | 上海依图信息技术有限公司 | 一种梯度传输方法及装置 |
US11302309B2 (en) * | 2019-09-13 | 2022-04-12 | International Business Machines Corporation | Aligning spike timing of models for maching learning |
CN110807365B (zh) * | 2019-09-29 | 2022-02-11 | 浙江大学 | 一种基于gru与一维cnn神经网络融合的水下目标识别方法 |
CN112738634B (zh) * | 2019-10-14 | 2022-08-02 | 北京字节跳动网络技术有限公司 | 视频文件的生成方法、装置、终端及存储介质 |
US11681911B2 (en) * | 2019-10-15 | 2023-06-20 | Naver Corporation | Method and system for training neural sequence-to-sequence models by incorporating global features |
CN110704197B (zh) | 2019-10-17 | 2022-12-09 | 北京小米移动软件有限公司 | 处理内存访问开销的方法、装置及介质 |
CN110875035A (zh) * | 2019-10-24 | 2020-03-10 | 广州多益网络股份有限公司 | 新型多任务联合的语音识别训练架构和方法 |
KR102203786B1 (ko) * | 2019-11-14 | 2021-01-15 | 오로라월드 주식회사 | 스마트 토이를 이용한 인터렉션 서비스 제공방법 및 시스템 |
CN110930979B (zh) * | 2019-11-29 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | 一种语音识别模型训练方法、装置以及电子设备 |
CN111312228A (zh) * | 2019-12-09 | 2020-06-19 | 中国南方电网有限责任公司 | 一种基于端到端的应用于电力企业客服的语音导航方法 |
CN111048082B (zh) * | 2019-12-12 | 2022-09-06 | 中国电子科技集团公司第二十八研究所 | 一种改进的端到端语音识别方法 |
CN113077785B (zh) * | 2019-12-17 | 2022-07-12 | 中国科学院声学研究所 | 一种端到端的多语言连续语音流语音内容识别方法及系统 |
CN111079945B (zh) | 2019-12-18 | 2021-02-05 | 北京百度网讯科技有限公司 | 端到端模型的训练方法及装置 |
CN111145729B (zh) * | 2019-12-23 | 2022-10-28 | 厦门快商通科技股份有限公司 | 语音识别模型训练方法、系统、移动终端及存储介质 |
CN111063336A (zh) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | 一种基于深度学习的端对端语音识别系统 |
US11183178B2 (en) | 2020-01-13 | 2021-11-23 | Microsoft Technology Licensing, Llc | Adaptive batching to reduce recognition latency |
CN111382581B (zh) * | 2020-01-21 | 2023-05-19 | 沈阳雅译网络技术有限公司 | 一种机器翻译中的一次剪枝压缩方法 |
US11615779B2 (en) * | 2020-01-28 | 2023-03-28 | Google Llc | Language-agnostic multilingual modeling using effective script normalization |
CN111292727B (zh) * | 2020-02-03 | 2023-03-24 | 北京声智科技有限公司 | 一种语音识别方法及电子设备 |
CN111428750A (zh) * | 2020-02-20 | 2020-07-17 | 商汤国际私人有限公司 | 一种文本识别模型训练及文本识别方法、装置及介质 |
CN111210807B (zh) * | 2020-02-21 | 2023-03-31 | 厦门快商通科技股份有限公司 | 语音识别模型训练方法、系统、移动终端及存储介质 |
CN111397870B (zh) * | 2020-03-08 | 2021-05-14 | 中国地质大学(武汉) | 一种基于多样化集成卷积神经网络的机械故障预测方法 |
US11747902B2 (en) | 2020-03-11 | 2023-09-05 | Apple Inc. | Machine learning configurations modeled using contextual categorical labels for biosignals |
CN111246026A (zh) * | 2020-03-11 | 2020-06-05 | 兰州飞天网景信息产业有限公司 | 一种基于卷积神经网络和连接性时序分类的录音处理方法 |
CN111415667B (zh) * | 2020-03-25 | 2024-04-23 | 中科极限元(杭州)智能科技股份有限公司 | 一种流式端到端语音识别模型训练和解码方法 |
US12217156B2 (en) * | 2020-04-01 | 2025-02-04 | Sony Group Corporation | Computing temporal convolution networks in real time |
US12136411B2 (en) | 2020-04-03 | 2024-11-05 | International Business Machines Corporation | Training of model for processing sequence data |
US12099934B2 (en) * | 2020-04-07 | 2024-09-24 | Cognizant Technology Solutions U.S. Corporation | Framework for interactive exploration, evaluation, and improvement of AI-generated solutions |
US12020693B2 (en) * | 2020-04-29 | 2024-06-25 | Samsung Electronics Co., Ltd. | System and method for out-of-vocabulary phrase support in automatic speech recognition |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11796794B2 (en) | 2020-05-12 | 2023-10-24 | The Board Of Trustees Of The Leland Stanford Junior University | Multi-objective, robust constraints enforced global topology optimizer for optical devices |
US20210358490A1 (en) * | 2020-05-18 | 2021-11-18 | Nvidia Corporation | End of speech detection using one or more neural networks |
CN111798828B (zh) * | 2020-05-29 | 2023-02-14 | 厦门快商通科技股份有限公司 | 合成音频检测方法、系统、移动终端及存储介质 |
US11775841B2 (en) | 2020-06-15 | 2023-10-03 | Cognizant Technology Solutions U.S. Corporation | Process and system including explainable prescriptions through surrogate-assisted evolution |
US11646009B1 (en) * | 2020-06-16 | 2023-05-09 | Amazon Technologies, Inc. | Autonomously motile device with noise suppression |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11875797B2 (en) | 2020-07-23 | 2024-01-16 | Pozotron Inc. | Systems and methods for scripted audio production |
CN111816169B (zh) * | 2020-07-23 | 2022-05-13 | 思必驰科技股份有限公司 | 中英语种混杂语音识别模型训练方法和装置 |
KR102462932B1 (ko) * | 2020-08-03 | 2022-11-04 | 주식회사 딥브레인에이아이 | 텍스트 전처리 장치 및 방법 |
US11488604B2 (en) | 2020-08-19 | 2022-11-01 | Sorenson Ip Holdings, Llc | Transcription of audio |
KR102409873B1 (ko) * | 2020-09-02 | 2022-06-16 | 네이버 주식회사 | 증강된 일관성 정규화를 이용한 음성 인식 모델 학습 방법 및 시스템 |
US20220101112A1 (en) * | 2020-09-25 | 2022-03-31 | Nvidia Corporation | Neural network training using robust temporal ensembling |
CN112188004B (zh) * | 2020-09-28 | 2022-04-05 | 精灵科技有限公司 | 基于机器学习的障碍呼叫检测系统及其控制方法 |
CN112233655B (zh) * | 2020-09-28 | 2024-07-16 | 上海声瀚信息科技有限公司 | 一种提高语音命令词识别性能的神经网络训练方法 |
US11380307B2 (en) * | 2020-09-30 | 2022-07-05 | Tencent America LLC | All deep learning minimum variance distortionless response beamformer for speech separation and enhancement |
US11798534B2 (en) * | 2020-10-02 | 2023-10-24 | Salesforce.Com, Inc. | Systems and methods for a multilingual speech recognition framework |
WO2022076029A1 (en) * | 2020-10-05 | 2022-04-14 | Google Llc | Transformer transducer: one model unifying streaming and non-streaming speech recognition |
KR102429656B1 (ko) * | 2020-10-08 | 2022-08-08 | 서울대학교산학협력단 | 화자 인식을 위한 음성인식기 기반 풀링 기법의 화자 임베딩 추출 방법 및 시스템, 그리고 이를 위한 기록매체 |
CN112259080B (zh) * | 2020-10-20 | 2021-06-22 | 北京讯众通信技术股份有限公司 | 一种基于神经网络模型的语音识别方法 |
US12093802B2 (en) | 2020-10-20 | 2024-09-17 | International Business Machines Corporation | Gated unit for a gated recurrent neural network |
US11593560B2 (en) * | 2020-10-21 | 2023-02-28 | Beijing Wodong Tianjun Information Technology Co., Ltd. | System and method for relation extraction with adaptive thresholding and localized context pooling |
CN112466282B (zh) * | 2020-10-22 | 2023-11-28 | 北京仿真中心 | 一种面向航天专业领域的语音识别系统和方法 |
CN112420024B (zh) * | 2020-10-23 | 2022-09-09 | 四川大学 | 一种全端到端的中英文混合空管语音识别方法及装置 |
CN112329836B (zh) * | 2020-11-02 | 2024-12-27 | 成都网安科技发展有限公司 | 基于深度学习的文本分类方法、装置、服务器及存储介质 |
CN112614484B (zh) * | 2020-11-23 | 2022-05-20 | 北京百度网讯科技有限公司 | 特征信息挖掘方法、装置及电子设备 |
CN112669852B (zh) * | 2020-12-15 | 2023-01-31 | 北京百度网讯科技有限公司 | 内存分配方法、装置及电子设备 |
CN112786017B (zh) * | 2020-12-25 | 2024-04-09 | 北京猿力未来科技有限公司 | 语速检测模型的训练方法及装置、语速检测方法及装置 |
US11790906B2 (en) * | 2021-01-25 | 2023-10-17 | Google Llc | Resolving unique personal identifiers during corresponding conversations between a voice bot and a human |
US11817117B2 (en) * | 2021-01-29 | 2023-11-14 | Nvidia Corporation | Speaker adaptive end of speech detection for conversational AI applications |
KR20230141828A (ko) * | 2021-02-04 | 2023-10-10 | 딥마인드 테크놀로지스 리미티드 | 적응형 그래디언트 클리핑을 사용하는 신경 네트워크들 |
CN115188389B (zh) * | 2021-04-06 | 2024-04-05 | 京东科技控股股份有限公司 | 基于神经网络的端到端语音增强方法、装置 |
CN113421574B (zh) * | 2021-06-18 | 2024-05-24 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频特征提取模型的训练方法、音频识别方法及相关设备 |
CN113535510B (zh) * | 2021-06-24 | 2024-01-26 | 北京理工大学 | 一种大规模数据中心数据采集的自适应抽样模型优化方法 |
CN113327600B (zh) * | 2021-06-30 | 2024-07-23 | 北京有竹居网络技术有限公司 | 一种语音识别模型的训练方法、装置及设备 |
US12112200B2 (en) | 2021-09-13 | 2024-10-08 | International Business Machines Corporation | Pipeline parallel computing using extended memory |
WO2023055410A1 (en) | 2021-09-30 | 2023-04-06 | Google Llc | Contrastive siamese network for semi-supervised speech recognition |
US12347149B2 (en) | 2021-12-13 | 2025-07-01 | Tencent America LLC | System, method, and computer program for content adaptive online training for multiple blocks in neural image compression |
CN114548501B (zh) * | 2022-01-14 | 2024-06-18 | 北京全路通信信号研究设计院集团有限公司 | 一种均衡性检查方法、系统及设备 |
CN114842829B (zh) * | 2022-03-29 | 2025-03-28 | 北京理工大学 | 一种抑制语音要素异常点的文本驱动语音合成方法 |
US12136413B1 (en) * | 2022-03-31 | 2024-11-05 | Amazon Technologies, Inc. | Domain-specific parameter pre-fixes for tuning automatic speech recognition |
US11978436B2 (en) | 2022-06-03 | 2024-05-07 | Apple Inc. | Application vocabulary integration with a digital assistant |
CN114743554A (zh) * | 2022-06-09 | 2022-07-12 | 武汉工商学院 | 基于物联网的智能家居交互方法及装置 |
KR102547001B1 (ko) * | 2022-06-28 | 2023-06-23 | 주식회사 액션파워 | 하향식 방식을 이용한 오류 검출 방법 |
US20240339123A1 (en) * | 2023-04-06 | 2024-10-10 | Samsung Electronics Co., Ltd. | System and method for keyword spotting in noisy environments |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5790754A (en) | 1994-10-21 | 1998-08-04 | Sensory Circuits, Inc. | Speech recognition apparatus for consumer electronic applications |
US5749066A (en) | 1995-04-24 | 1998-05-05 | Ericsson Messaging Systems Inc. | Method and apparatus for developing a neural network for phoneme recognition |
JP2996926B2 (ja) | 1997-03-11 | 2000-01-11 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | 音素シンボルの事後確率演算装置及び音声認識装置 |
US6292772B1 (en) * | 1998-12-01 | 2001-09-18 | Justsystem Corporation | Method for identifying the language of individual words |
AUPQ439299A0 (en) * | 1999-12-01 | 1999-12-23 | Silverbrook Research Pty Ltd | Interface system |
US7035802B1 (en) * | 2000-07-31 | 2006-04-25 | Matsushita Electric Industrial Co., Ltd. | Recognition system using lexical trees |
US7219085B2 (en) * | 2003-12-09 | 2007-05-15 | Microsoft Corporation | System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit |
US20060031069A1 (en) | 2004-08-03 | 2006-02-09 | Sony Corporation | System and method for performing a grapheme-to-phoneme conversion |
GB0507036D0 (en) | 2005-04-07 | 2005-05-11 | Ibm | Method and system for language identification |
WO2009027980A1 (en) * | 2007-08-28 | 2009-03-05 | Yissum Research Development Company Of The Hebrew University Of Jerusalem | Method, device and system for speech recognition |
JP4869268B2 (ja) * | 2008-03-04 | 2012-02-08 | 日本放送協会 | 音響モデル学習装置およびプログラム |
US8332212B2 (en) * | 2008-06-18 | 2012-12-11 | Cogi, Inc. | Method and system for efficient pacing of speech for transcription |
US8781833B2 (en) | 2008-07-17 | 2014-07-15 | Nuance Communications, Inc. | Speech recognition semantic classification training |
US8886531B2 (en) | 2010-01-13 | 2014-11-11 | Rovi Technologies Corporation | Apparatus and method for generating an audio fingerprint and using a two-stage query |
US20130317755A1 (en) * | 2012-05-04 | 2013-11-28 | New York University | Methods, computer-accessible medium, and systems for score-driven whole-genome shotgun sequence assembly |
US10354650B2 (en) * | 2012-06-26 | 2019-07-16 | Google Llc | Recognizing speech with mixed speech recognition models to generate transcriptions |
US8831957B2 (en) * | 2012-08-01 | 2014-09-09 | Google Inc. | Speech recognition models based on location indicia |
CN102760436B (zh) * | 2012-08-09 | 2014-06-11 | 河南省烟草公司开封市公司 | 一种语音词库筛选方法 |
US9177550B2 (en) | 2013-03-06 | 2015-11-03 | Microsoft Technology Licensing, Llc | Conservatively adapting a deep neural network in a recognition system |
US9153231B1 (en) * | 2013-03-15 | 2015-10-06 | Amazon Technologies, Inc. | Adaptive neural network speech recognition models |
US9418650B2 (en) | 2013-09-25 | 2016-08-16 | Verizon Patent And Licensing Inc. | Training speech recognition using captions |
CN103591637B (zh) | 2013-11-19 | 2015-12-02 | 长春工业大学 | 一种集中供热二次网运行调节方法 |
US9189708B2 (en) | 2013-12-31 | 2015-11-17 | Google Inc. | Pruning and label selection in hidden markov model-based OCR |
CN103870863B (zh) * | 2014-03-14 | 2016-08-31 | 华中科技大学 | 制备隐藏二维码图像全息防伪标签的方法及其识别装置 |
US9390712B2 (en) | 2014-03-24 | 2016-07-12 | Microsoft Technology Licensing, Llc. | Mixed speech recognition |
US20150309987A1 (en) | 2014-04-29 | 2015-10-29 | Google Inc. | Classification of Offensive Words |
CN104035751B (zh) * | 2014-06-20 | 2016-10-12 | 深圳市腾讯计算机系统有限公司 | 基于多图形处理器的数据并行处理方法及装置 |
US10540957B2 (en) * | 2014-12-15 | 2020-01-21 | Baidu Usa Llc | Systems and methods for speech transcription |
US10733979B2 (en) * | 2015-10-09 | 2020-08-04 | Google Llc | Latency constraints for acoustic modeling |
-
2016
- 2016-11-21 US US15/358,102 patent/US10332509B2/en active Active
- 2016-11-21 US US15/358,083 patent/US10319374B2/en active Active
- 2016-11-23 EP EP16869294.5A patent/EP3245652B1/en active Active
- 2016-11-23 JP JP2017544352A patent/JP6629872B2/ja active Active
- 2016-11-23 WO PCT/US2016/063661 patent/WO2017091763A1/en active Application Filing
- 2016-11-23 EP EP16869302.6A patent/EP3245597B1/en active Active
- 2016-11-23 CN CN201680010871.9A patent/CN107408384B/zh active Active
- 2016-11-23 WO PCT/US2016/063641 patent/WO2017091751A1/en active Application Filing
- 2016-11-23 CN CN201680010873.8A patent/CN107408111B/zh active Active
- 2016-11-23 JP JP2017544340A patent/JP6661654B2/ja active Active
- 2016-11-23 KR KR1020177023177A patent/KR102033230B1/ko active Active
- 2016-11-23 KR KR1020177023173A patent/KR102008077B1/ko active Active
Non-Patent Citations (3)
Title |
---|
Awni Hannun et al., ‘Deep speech: Scaling up end-to-end speech recognition’, Cornell University Library, pp. 1~12, December 2014.* |
Sergey Ioffe et al., ‘Batch normalization: Accerlerating deep network training by reducing internal covalate shift’, Cornell University Library, pp.1~11, March 2015.* |
Tara N. Sainath et al., ‘Convolutional long short-term memory, fully connected deep neural networks’, ICASSP 2015, pp.4580~4584, April 2015.* |
Also Published As
Publication number | Publication date |
---|---|
KR20170107015A (ko) | 2017-09-22 |
KR102033230B1 (ko) | 2019-10-16 |
CN107408111B (zh) | 2021-03-30 |
US20170148431A1 (en) | 2017-05-25 |
KR20170106445A (ko) | 2017-09-20 |
CN107408384B (zh) | 2020-11-27 |
EP3245652A1 (en) | 2017-11-22 |
CN107408111A (zh) | 2017-11-28 |
JP2018513398A (ja) | 2018-05-24 |
WO2017091763A1 (en) | 2017-06-01 |
CN107408384A (zh) | 2017-11-28 |
US20170148433A1 (en) | 2017-05-25 |
EP3245597A4 (en) | 2018-05-30 |
US10332509B2 (en) | 2019-06-25 |
US10319374B2 (en) | 2019-06-11 |
JP6629872B2 (ja) | 2020-01-15 |
JP2018513399A (ja) | 2018-05-24 |
EP3245597A1 (en) | 2017-11-22 |
EP3245597B1 (en) | 2020-08-26 |
EP3245652A4 (en) | 2018-05-30 |
WO2017091751A1 (en) | 2017-06-01 |
EP3245652B1 (en) | 2019-07-10 |
JP6661654B2 (ja) | 2020-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102008077B1 (ko) | 배치된 단대단 음성 인식 | |
US11620986B2 (en) | Cold fusing sequence-to-sequence models with language models | |
KR101991733B1 (ko) | 음성 전사를 위한 시스템 및 방법 | |
US10019438B2 (en) | External word embedding neural network language models | |
Hannun et al. | Deep speech: Scaling up end-to-end speech recognition | |
Sundermeyer et al. | Comparison of feedforward and recurrent neural network language models | |
Huang et al. | SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition | |
Chen et al. | Efficient training and evaluation of recurrent neural network language models for automatic speech recognition | |
Arısoy et al. | Converting neural network language models into back-off language models for efficient decoding in automatic speech recognition | |
Jiang et al. | A further study of unsupervised pretraining for transformer based speech recognition | |
Scanzio et al. | Parallel implementation of artificial neural network training for speech recognition | |
Enarvi et al. | Automatic speech recognition with very large conversational finnish and estonian vocabularies | |
JP2022037862A (ja) | テキスト基盤の事前学習モデルを活用した縦断型音声言語理解知識を蒸留するための方法、システム、およびコンピュータ読み取り可能な記録媒体 | |
Suyanto et al. | End-to-end speech recognition models for a low-resourced indonesian language | |
Kipyatkova | Improving Russian LVCSR using deep neural networks for acoustic and language modeling | |
Buthpitiya et al. | A parallel implementation of viterbi training for acoustic models using graphics processing units | |
Karkada et al. | Training Speech Recognition Models on HPC Infrastructure | |
Tarján et al. | On the effectiveness of neural text generation based data augmentation for recognition of morphologically rich speech | |
Zhang | Research on Modeling of Speech Recognition Based on Deep Learning | |
Ravishankar | E cient algorithms for speech recognition | |
Chen | Cued rnnlm toolkit | |
Pinto Rivero | Acceleration of automatic speech recognition for low-power devices | |
da Fonseca Esteves | Transfer Learning for Low-Resource Automatic Speech Recognition | |
박진환 | End-to-End Neural Network-based Speech Recognition for Mobile and Embedded Devices | |
Chen | CUED RNNLM Toolkit v1. 0 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PA0105 | International application |
Patent event date: 20170818 Patent event code: PA01051R01D Comment text: International Patent Application |
|
A201 | Request for examination | ||
PA0201 | Request for examination |
Patent event code: PA02012R01D Patent event date: 20170821 Comment text: Request for Examination of Application |
|
PG1501 | Laying open of application | ||
E902 | Notification of reason for refusal | ||
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20181129 Patent event code: PE09021S01D |
|
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20190719 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20190731 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20190731 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
PR1001 | Payment of annual fee |
Payment date: 20220704 Start annual number: 4 End annual number: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20230628 Start annual number: 5 End annual number: 5 |
|
PR1001 | Payment of annual fee |
Payment date: 20240702 Start annual number: 6 End annual number: 6 |
|
PR1001 | Payment of annual fee |
Payment date: 20250602 Start annual number: 7 End annual number: 7 |