KR20170107015A - 단대단 음성 인식 - Google Patents
단대단 음성 인식 Download PDFInfo
- Publication number
- KR20170107015A KR20170107015A KR1020177023177A KR20177023177A KR20170107015A KR 20170107015 A KR20170107015 A KR 20170107015A KR 1020177023177 A KR1020177023177 A KR 1020177023177A KR 20177023177 A KR20177023177 A KR 20177023177A KR 20170107015 A KR20170107015 A KR 20170107015A
- Authority
- KR
- South Korea
- Prior art keywords
- speech
- training
- model
- layer
- ctc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 claims abstract description 129
- 238000000034 method Methods 0.000 claims abstract description 111
- 238000013528 artificial neural network Methods 0.000 claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims description 54
- 238000013518 transcription Methods 0.000 claims description 53
- 230000035897 transcription Effects 0.000 claims description 53
- 238000012545 processing Methods 0.000 claims description 28
- 238000010606 normalization Methods 0.000 claims description 27
- 230000005540 biological transmission Effects 0.000 claims description 22
- 230000004913 activation Effects 0.000 claims description 20
- 238000012546 transfer Methods 0.000 claims description 15
- 230000002441 reversible effect Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000009826 distribution Methods 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 48
- 230000015654 memory Effects 0.000 description 41
- 238000011161 development Methods 0.000 description 26
- 230000018109 developmental process Effects 0.000 description 26
- 238000012360 testing method Methods 0.000 description 25
- 230000006872 improvement Effects 0.000 description 24
- 230000008569 process Effects 0.000 description 19
- 238000001994 activation Methods 0.000 description 18
- 238000005457 optimization Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 10
- 230000002457 bidirectional effect Effects 0.000 description 10
- 238000007781 pre-processing Methods 0.000 description 10
- 239000000872 buffer Substances 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 8
- 238000003860 storage Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 5
- 241000282412 Homo Species 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000006978 adaptation Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000002269 spontaneous effect Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 241001672694 Citrus reticulata Species 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- JFLSOKIMYBSASW-UHFFFAOYSA-N 1-chloro-2-[chloro(diphenyl)methyl]benzene Chemical compound ClC1=CC=CC=C1C(Cl)(C=1C=CC=CC=1)C1=CC=CC=C1 JFLSOKIMYBSASW-UHFFFAOYSA-N 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000003931 cognitive performance Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 150000001879 copper Chemical class 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000004870 electrical engineering Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000010304 firing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 239000010409 thin film Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G06N3/0445—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Machine Translation (AREA)
Abstract
Description
도면 1("도1")은 본 개시의 실시예에 따른 단대단 딥 러닝 모델의 아키텍처를 나타낸다.
도2는 본 개시의 실시예에 따른 딥 러닝 모델의 트레이닝 방법을 나타낸다.
도3은 본 개시의 실시예에 따른 시퀸스별 배치 정규화(sequence-wise batch normalization) 방법을 나타낸다.
도4는 그래프로 본 개시의 실시예에 따른 배치 정규화를 이용하여 트레이닝한 것과 배치 정규화를 이용하지 않고 트레이닝한 두개의 모델의 트레이닝 곡선을 나타낸다.
도5는 본 개시의 실시예에 따른 커리큘럼 학습 전략(curriculum learning strategy)을 이용하여 RNN 모델을 트레이닝하는 방법을 나타낸다.
도6은 본 개시의 실시예에 따른 출력 전사를 위한 이중 자소 분할(bi-graphemes segmentation)을 이용하여 RNN 모델을 트레이닝하는 방법을 나타낸다.
도7은 본 개시의 실시예에 따른 미래 콘텍스트 크기가 2인 행 콘볼루션 아키텍처를 나타낸다.
도8은 본 개시의 실시예에 따른 단방향 RNN 모델을 구비한 오디오 전사 방법을 나타낸다.
도9는 본 개시의 실시예에 따른 다중 언어에 적용되는 음성 전사 모델에 대해 트레이닝을 진행하는 방법을 나타낸다.
도10은 본 개시의 실시예에 따른 2개의 망에 대한 스케일링 비교를 나타낸다.
도11은 본 개시의 실시예에 따른 CTC(Connectionist Temporal Classification) 기법의 GPU 구현을 위한 순방향 전송 및 역전파를 나타낸다.
도12는 본 개시의 실시예에 따른 CTC 손실 함수의 GPU 구현을 위한 방법을 나타낸다.
도13은 본 개시의 실시예에 따른 음성 전사 트레이닝을 위한 데이터 수집 방법을 나타낸다.
도14는 본 개시의 실시예에 따른 지정한 크기의 배치로 요청을 처리하는 확률을 나타낸다.
도15는 본 개시의 실시예에 따른 서버 부하 함수의 중간값 및 98 백분위수 지연을 나타낸다.
도16은 본 개시의 실시예에 따른 커널의 비교를 나타낸다.
도17은 본 개시의 실시예에 따른 트레이닝 노드의 예시도를 나타내며, 여기서, PLX는 PCI 스위치를 가리키고, 점선 박스는 동일한 PCI 루트 콤플렉스에 의해 연결된 모든 장치를 포함한다.
도18은 본 개시의 실시예에 따른 컴퓨팅 시스템의 간략화된 블록도를 나타낸다.
Claims (20)
- 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법에 있어서,
발언 세트 중의 각 발언에 대해:
관련된 실제 라벨을 구비하는 각 발언으로부터 일 세트의 스펙트로그램 프레임을 획득하되, 상기 발언과 상기 관련된 실제 라벨은 다수의 미니 배치를 포함하는 트레이닝 세트로부터 샘플링되고;
상기 전사 모델로부터 상기 발언의 예측된 문자 또는 문자 확률을 출력하되, 상기 전사 모델은 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층을 포함하고, 상기 하나 또는 다수의 회귀층 중 적어도 하나의 회귀층 중의 사전 활성화를 정규화시키도록, 상기 다수의 미니 배치 중의 하나 또는 다수의 미니 배치에는 배치 정규화가 적용되고;
상기 관련된 실제 라벨이 주어질 경우, 상기 발언에 대한 문자의 예측 중의 오류를 측정하도록 손실을 계산하고;
상기 전사 모델의 파라미터에 대한 상기 손실의 도함수를 계산하며;
상기 도함수를 이용하여 역전파를 통해 상기 전사 모델을 업데이트하는 것;을 포함하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 배치 정규화는 하나 또는 다수의 콘볼루션 층에서도 구현되는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제2항에 있어서,
상기 정규화는,
배치 정규화될 각 층의 각각의 은닉 유닛에 대해, 미니 배치 중의 발언 시퀸스의 길이 상에서 평균값과 분산을 계산하는 것을 포함하는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 일 세트의 스펙트로그램 프레임을 획득하는 기간에, 기정된 수량의 시편의 스텝 크기를 스트라이드로 취하여 상기 발언에 대한 서브 샘플링을 실현하는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제4항에 있어서,
상기 전사 모델로부터의 상기 예측된 문자는, 영어 알파벳으로부터 강화된(enriched) 대체 라벨을 포함하는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제5항에 있어서,
상기 대체 라벨은, 전체 단어, 음절 및 중복되지 않은 n-그램으로부터 선택되는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제6항에 있어서,
상기 중복되지 않은 n-그램은, 단어 레벨에서 중복되지 않은 바이그램인 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제7항에 있어서,
상기 출력된 예측된 문자 중의 임의의 유니그램 라벨(unigram labels)은 동형(isomorphism)을 통해 바이그램 라벨(bigram labels)로 변환되는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제1항에 있어서,
제1 트레이닝 에포치에서, 각 미니 배치 중의 가장 긴 발언의 길이가 증가하는 순서로 상기 트레이닝 세트에 대해 반복(iterate)을 진행하고;
상기 제1 트레이닝 에포치 이후에, 추가 전사 출력 트레이닝을 진행하도록, 상기 다수의 미니 배치를 랜덤순서로 다시 복구시키는 것;을 더 포함하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 트레이닝 세트는, 데이터 수집 파이프라인을 통해 기초 오디오 클립 및 기초 전사로부터 생성되는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제10항에 있어서,
상기 트레이닝 세트를 생성하는 것은,
상기 기초 오디오 클립과 상기 기초 전사를 정렬시키고;
상기 오디오가 일련의 연속된 빈칸 라벨의 출현에 대면할 때마다, 상기 정렬된 오디오 클립과 상응한 전사에 대해 분할을 진행하고;
잘못된 예시를 제거하여, 상기 분할된 오디오 클립과 상응한 전사에 대해 필터링을 진행하는 것;을 포함하는 것을 특징으로 하는 음성 전사를 위한 전사 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 음성 전사의 회귀 신경망(RNN) 모델을 트레이닝하기 위한 컴퓨터 구현 방법에 있어서,
상기 RNN 모델의 제1층에서, 다수의 발언 중의 각 발언에 대한 일 세트의 스펙트로그램 프레임을 수신하되, 상기 다수의 발언과 관련된 라벨은 트레이닝 세트로부터 샘플링되고;
상기 RNN 모델의 하나 또는 다수의 콘볼루션 층에서, 상기 일 세트의 스펙트로그램 프레임에 대해 주파수 도메인 및 시간 도메인 중의 적어도 하나로 콘볼루션을 적용하고;
상기 RNN 모델의 하나 또는 다수의 회귀층을 통해 하나 또는 다수의 문자를 예측하되, 배치 정규화는 상기 하나 또는 다수의 회귀층 중의 적어도 하나의 회귀층의 사전 활성화를 정규화시키도록 구현되고;
상기 RNN 모델의 출력층에서 상기 예측된 문자 상에서 확률 분포를 획득하고;
관련된 실제 라벨이 주어질 경우, 상기 발언에 대한 문자의 예측 중의 오류를 측정하도록 연결성 시간 분류법(CTC) 손실 함수를 구현하되, 상기 CTC 손실 함수의 구현은 각각 상기 CTC 손실 함수의 순방향 전송 및 역방향 전송 기간에 생성되는 순방향 매트릭스 및 역방향 매트릭스의 요소별 합산을 포함하며, 상기 순방향 매트릭스의 각 열 중의 모든 요소는 CTC 손실 함수의 구현에 이용되도록 계산되고;
상기 RNN 모델의 파라미터에 대해, 상기 손실의 도함수를 계산하고;
상기 도함수를 이용하여 역전파를 통해 상기 RNN 모델을 업데이트하는 것;을 포함하는 것을 특징으로 하는 음성 전사의 회귀 신경망(RNN) 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제12항에 있어서,
상기 배치 정규화는,
상기 하나 또는 다수의 회귀층에 대해, 상기 각 발언의 길이 상에서 평균값과 분산을 계산하는 것을 포함하는 것을 특징으로 하는 음성 전사의 회귀 신경망(RNN) 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제12항에 있어서,
상기 CTC 손실 함수는 로그 확률 공간에서 구현되는 것을 특징으로 하는 음성 전사의 회귀 신경망(RNN) 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제12항에 있어서,
상기 CTC 손실 함수의 구현은 그래픽 처리 유닛(GPU) 기반의 구현인 것을 특징으로 하는 음성 전사의 회귀 신경망(RNN) 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 제15항에 있어서,
상기 CTC 손실 함수의 알고리즘은,
(a) 기울기 계산에 대해, 상기 순방향 매트릭스 및 상기 역방향 매트릭스의 요소별 합산을 취하고, 상기 예측된 문자를 키로서 이용하는 키-값 삭감을 진행하는 방식;
(b) 상기 순방향 전송 및 상기 역방향 전송을 상응한 컴퓨팅 커널에 매핑시키는 방식; 및
(c) 키-값 순서 배열을 진행하되, 상기 키는 상기 발언 라벨 중의 문자이고, 상기 값은 상기 발언 중의 각 문자의 색인인 방식; 중의 하나 또는 다수를 포함하는 것을 특징으로 하는 음성 전사의 회귀 신경망(RNN) 모델을 트레이닝하기 위한 컴퓨터 구현 방법. - 하나 또는 다수의 명령어 시퀸스를 포함하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질에 있어서,
상기 명령어 시퀸스가 하나 또는 다수의 마이크로 프로세서에 의해 실행될 경우,
다수의 배치의 발언 시퀸스를 수신하되, 각 발언 시퀸스 및 관련된 라벨은 트레이닝 세트로부터 샘플링되는 단계;
상기 발언 시퀸스에 대응되는 예측된 문자 상의 확률 분포를 연결성 시간 분류법(CTC)층에 출력하는 단계;
음성 전사 트레이닝을 위한 CTC 손실 함수의 알고리즘을 구현하되, 상기 실시는 각각 상기 CTC 손실 함수의 순방향 전송 및 역방향 전송 기간에 생성되는 순방향 매트릭스 및 역방향 매트릭스의 요소별 합산을 포함하며, 상기 순방향 매트릭스의 각 열 중의 모든 요소는 CTC 손실 함수의 구현에 이용되도록 계산되는 단계;를 수행하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제17항에 있어서,
상기 명령어 시퀸스는 하나 또는 다수의 마이크로프로세서에 의해 실행될 경우,
상기 다수의 배치발언 시퀸스 중의 각 발언 시퀸스를 컴퓨팅 스레드 블록에 매핑시키는 것;을 더 포함하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제18항에 있어서,
상기 순방향 매트릭스 및 상기 역방향 매트릭스의 행은 상기 컴퓨팅 스레드 블록에 의해 병행으로 처리되고, 상기 순방향 매트릭스 및 상기 역방향 매트릭스의 열은 상기 컴퓨팅 스레드 블록에 의해 순차적으로 처리되는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제17항에 있어서,
상기 명령어 시퀸스가 하나 또는 다수의 마이크로프로세서에 의해 실행될 경우,
상기 순방향 전송 및 상기 역방향 전송을 각각 순방향 컴퓨팅 커널 및 역방향 컴퓨팅 커널에 매핑시키는 것;을 더 포함하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질.
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562260206P | 2015-11-25 | 2015-11-25 | |
US62/260,206 | 2015-11-25 | ||
US15/358,083 | 2016-11-21 | ||
US15/358,102 US10332509B2 (en) | 2015-11-25 | 2016-11-21 | End-to-end speech recognition |
US15/358,102 | 2016-11-21 | ||
US15/358,083 US10319374B2 (en) | 2015-11-25 | 2016-11-21 | Deployed end-to-end speech recognition |
PCT/US2016/063661 WO2017091763A1 (en) | 2015-11-25 | 2016-11-23 | End-to-end speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170107015A true KR20170107015A (ko) | 2017-09-22 |
KR102033230B1 KR102033230B1 (ko) | 2019-10-16 |
Family
ID=58721011
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020177023177A Active KR102033230B1 (ko) | 2015-11-25 | 2016-11-23 | 단대단 음성 인식 |
KR1020177023173A Active KR102008077B1 (ko) | 2015-11-25 | 2016-11-23 | 배치된 단대단 음성 인식 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020177023173A Active KR102008077B1 (ko) | 2015-11-25 | 2016-11-23 | 배치된 단대단 음성 인식 |
Country Status (6)
Country | Link |
---|---|
US (2) | US10332509B2 (ko) |
EP (2) | EP3245652B1 (ko) |
JP (2) | JP6661654B2 (ko) |
KR (2) | KR102033230B1 (ko) |
CN (2) | CN107408111B (ko) |
WO (2) | WO2017091751A1 (ko) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180084580A (ko) * | 2017-01-17 | 2018-07-25 | 경북대학교 산학협력단 | 복수 문단 텍스트의 추상적 요약문 생성 장치 및 방법, 그 방법을 수행하기 위한 기록 매체 |
KR20190085755A (ko) * | 2018-01-11 | 2019-07-19 | 중앙대학교 산학협력단 | Bcsc를 이용한 딥러닝 방법 및 그 장치 |
KR20200095789A (ko) * | 2019-02-01 | 2020-08-11 | 한국전자통신연구원 | 번역 모델 구축 방법 및 장치 |
Families Citing this family (294)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8515052B2 (en) | 2007-12-17 | 2013-08-20 | Wai Wu | Parallel signal processing system and method |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9634855B2 (en) | 2010-05-13 | 2017-04-25 | Alexander Poltorak | Electronic personal interactive device that determines topics of interest using a conversational agent |
CN104969289B (zh) | 2013-02-07 | 2021-05-28 | 苹果公司 | 数字助理的语音触发器 |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10229672B1 (en) * | 2015-12-31 | 2019-03-12 | Google Llc | Training acoustic models using connectionist temporal classification |
KR102100977B1 (ko) * | 2016-02-03 | 2020-04-14 | 구글 엘엘씨 | 압축된 순환 신경망 모델 |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
CN106251859B (zh) * | 2016-07-22 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | 语音识别处理方法和装置 |
US9984683B2 (en) * | 2016-07-22 | 2018-05-29 | Google Llc | Automatic speech recognition using multi-dimensional models |
CA3036067C (en) | 2016-09-06 | 2023-08-01 | Deepmind Technologies Limited | Generating audio using neural networks |
US11080591B2 (en) * | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
JP6750121B2 (ja) * | 2016-09-06 | 2020-09-02 | ディープマインド テクノロジーズ リミテッド | 畳み込みニューラルネットワークを使用したシーケンスの処理 |
US10224058B2 (en) | 2016-09-07 | 2019-03-05 | Google Llc | Enhanced multi-channel acoustic models |
EP3520036B1 (en) * | 2016-10-03 | 2020-07-29 | Google LLC | Processing text sequences using neural networks |
WO2018081089A1 (en) | 2016-10-26 | 2018-05-03 | Deepmind Technologies Limited | Processing text sequences using neural networks |
US10529320B2 (en) * | 2016-12-21 | 2020-01-07 | Google Llc | Complex evolution recurrent neural networks |
US10140980B2 (en) * | 2016-12-21 | 2018-11-27 | Google LCC | Complex linear projection for acoustic modeling |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US10049106B2 (en) * | 2017-01-18 | 2018-08-14 | Xerox Corporation | Natural language generation through character-based recurrent neural networks with finite-state prior knowledge |
US11907858B2 (en) * | 2017-02-06 | 2024-02-20 | Yahoo Assets Llc | Entity disambiguation |
US11087213B2 (en) * | 2017-02-10 | 2021-08-10 | Synaptics Incorporated | Binary and multi-class classification systems and methods using one spike connectionist temporal classification |
US11080600B2 (en) * | 2017-02-10 | 2021-08-03 | Synaptics Incorporated | Recurrent neural network based acoustic event classification using complement rule |
US10762417B2 (en) * | 2017-02-10 | 2020-09-01 | Synaptics Incorporated | Efficient connectionist temporal classification for binary classification |
US11100932B2 (en) * | 2017-02-10 | 2021-08-24 | Synaptics Incorporated | Robust start-end point detection algorithm using neural network |
US10762891B2 (en) * | 2017-02-10 | 2020-09-01 | Synaptics Incorporated | Binary and multi-class classification systems and methods using connectionist temporal classification |
US11853884B2 (en) * | 2017-02-10 | 2023-12-26 | Synaptics Incorporated | Many or one detection classification systems and methods |
US10657955B2 (en) * | 2017-02-24 | 2020-05-19 | Baidu Usa Llc | Systems and methods for principled bias reduction in production speech models |
US10872598B2 (en) * | 2017-02-24 | 2020-12-22 | Baidu Usa Llc | Systems and methods for real-time neural text-to-speech |
US10373610B2 (en) * | 2017-02-24 | 2019-08-06 | Baidu Usa Llc | Systems and methods for automatic unit selection and target decomposition for sequence labelling |
US10762427B2 (en) * | 2017-03-01 | 2020-09-01 | Synaptics Incorporated | Connectionist temporal classification using segmented labeled sequence data |
US10878837B1 (en) * | 2017-03-01 | 2020-12-29 | Snap Inc. | Acoustic neural network scene detection |
US10540961B2 (en) * | 2017-03-13 | 2020-01-21 | Baidu Usa Llc | Convolutional recurrent neural networks for small-footprint keyword spotting |
US11017291B2 (en) * | 2017-04-28 | 2021-05-25 | Intel Corporation | Training with adaptive runtime and precision profiling |
US11410024B2 (en) * | 2017-04-28 | 2022-08-09 | Intel Corporation | Tool for facilitating efficiency in machine learning |
US10467052B2 (en) * | 2017-05-01 | 2019-11-05 | Red Hat, Inc. | Cluster topology aware container scheduling for efficient data transfer |
KR20180124381A (ko) * | 2017-05-11 | 2018-11-21 | 현대자동차주식회사 | 운전자의 상태 판단 시스템 및 그 방법 |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US20180330718A1 (en) * | 2017-05-11 | 2018-11-15 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for End-to-End speech recognition |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Multi-modal interfaces |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US10896669B2 (en) | 2017-05-19 | 2021-01-19 | Baidu Usa Llc | Systems and methods for multi-speaker neural text-to-speech |
CN107240396B (zh) * | 2017-06-16 | 2023-01-17 | 百度在线网络技术(北京)有限公司 | 说话人自适应方法、装置、设备及存储介质 |
EP3422518B1 (en) * | 2017-06-28 | 2020-06-17 | Siemens Aktiengesellschaft | A method for recognizing contingencies in a power supply network |
KR102483643B1 (ko) * | 2017-08-14 | 2023-01-02 | 삼성전자주식회사 | 모델을 학습하는 방법 및 장치 및 상기 뉴럴 네트워크를 이용한 인식 방법 및 장치 |
KR102410820B1 (ko) * | 2017-08-14 | 2022-06-20 | 삼성전자주식회사 | 뉴럴 네트워크를 이용한 인식 방법 및 장치 및 상기 뉴럴 네트워크를 트레이닝하는 방법 및 장치 |
US10706840B2 (en) * | 2017-08-18 | 2020-07-07 | Google Llc | Encoder-decoder models for sequence to sequence mapping |
US11694066B2 (en) * | 2017-10-17 | 2023-07-04 | Xilinx, Inc. | Machine learning runtime library for neural network acceleration |
US10872596B2 (en) | 2017-10-19 | 2020-12-22 | Baidu Usa Llc | Systems and methods for parallel wave generation in end-to-end text-to-speech |
US10796686B2 (en) | 2017-10-19 | 2020-10-06 | Baidu Usa Llc | Systems and methods for neural text-to-speech using convolutional sequence learning |
US11017761B2 (en) | 2017-10-19 | 2021-05-25 | Baidu Usa Llc | Parallel neural text-to-speech |
CN107680597B (zh) * | 2017-10-23 | 2019-07-09 | 平安科技(深圳)有限公司 | 语音识别方法、装置、设备以及计算机可读存储介质 |
US11556775B2 (en) | 2017-10-24 | 2023-01-17 | Baidu Usa Llc | Systems and methods for trace norm regularization and faster inference for embedded models |
US20190130896A1 (en) * | 2017-10-26 | 2019-05-02 | Salesforce.Com, Inc. | Regularization Techniques for End-To-End Speech Recognition |
US10573295B2 (en) * | 2017-10-27 | 2020-02-25 | Salesforce.Com, Inc. | End-to-end speech recognition with policy learning |
US11562287B2 (en) | 2017-10-27 | 2023-01-24 | Salesforce.Com, Inc. | Hierarchical and interpretable skill acquisition in multi-task reinforcement learning |
US11250314B2 (en) * | 2017-10-27 | 2022-02-15 | Cognizant Technology Solutions U.S. Corporation | Beyond shared hierarchies: deep multitask learning through soft layer ordering |
US10535001B2 (en) * | 2017-11-06 | 2020-01-14 | International Business Machines Corporation | Reducing problem complexity when analyzing 3-D images |
IL274424B2 (en) * | 2017-11-14 | 2024-07-01 | Magic Leap Inc | Meta-learning for multi-task learning for neural networks |
US11537439B1 (en) * | 2017-11-22 | 2022-12-27 | Amazon Technologies, Inc. | Intelligent compute resource selection for machine learning training jobs |
US11977958B2 (en) | 2017-11-22 | 2024-05-07 | Amazon Technologies, Inc. | Network-accessible machine learning model training and hosting system |
CN110598779B (zh) * | 2017-11-30 | 2022-04-08 | 腾讯科技(深圳)有限公司 | 摘要描述生成方法、装置、计算机设备和存储介质 |
CN108171117B (zh) * | 2017-12-05 | 2019-05-21 | 南京南瑞信息通信科技有限公司 | 基于多核异构并行计算的电力人工智能视觉分析系统 |
CN107945791B (zh) * | 2017-12-05 | 2021-07-20 | 华南理工大学 | 一种基于深度学习目标检测的语音识别方法 |
US10847137B1 (en) * | 2017-12-12 | 2020-11-24 | Amazon Technologies, Inc. | Trigger word detection using neural network waveform processing |
KR102462426B1 (ko) * | 2017-12-14 | 2022-11-03 | 삼성전자주식회사 | 발화의 의미를 분석하기 위한 전자 장치 및 그의 동작 방법 |
US10593321B2 (en) * | 2017-12-15 | 2020-03-17 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for multi-lingual end-to-end speech recognition |
US11443178B2 (en) | 2017-12-15 | 2022-09-13 | Interntional Business Machines Corporation | Deep neural network hardening framework |
US10672388B2 (en) * | 2017-12-15 | 2020-06-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for open-vocabulary end-to-end speech recognition |
CN108229659A (zh) * | 2017-12-29 | 2018-06-29 | 陕西科技大学 | 基于深度学习的钢琴单键音识别方法 |
CN108364662B (zh) * | 2017-12-29 | 2021-01-05 | 中国科学院自动化研究所 | 基于成对鉴别任务的语音情感识别方法与系统 |
CN108089958B (zh) * | 2017-12-29 | 2021-06-08 | 珠海市君天电子科技有限公司 | Gpu测试方法、终端设备和计算机可读存储介质 |
FR3076378B1 (fr) * | 2017-12-29 | 2020-05-29 | Bull Sas | Procede de formation d'un reseau de neurones pour la reconnaissance d'une sequence de caracteres et procede de reconnaissance associe |
CN108256474A (zh) * | 2018-01-17 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | 用于识别菜品的方法和装置 |
CN108417202B (zh) * | 2018-01-19 | 2020-09-01 | 苏州思必驰信息科技有限公司 | 语音识别方法及系统 |
CN108417201B (zh) * | 2018-01-19 | 2020-11-06 | 苏州思必驰信息科技有限公司 | 单信道多说话人身份识别方法及系统 |
CN108491836B (zh) * | 2018-01-25 | 2020-11-24 | 华南理工大学 | 一种自然场景图像中中文文本整体识别方法 |
US10657426B2 (en) * | 2018-01-25 | 2020-05-19 | Samsung Electronics Co., Ltd. | Accelerating long short-term memory networks via selective pruning |
US11182694B2 (en) | 2018-02-02 | 2021-11-23 | Samsung Electronics Co., Ltd. | Data path for GPU machine learning training with key value SSD |
US11527308B2 (en) | 2018-02-06 | 2022-12-13 | Cognizant Technology Solutions U.S. Corporation | Enhanced optimization with composite objectives and novelty-diversity selection |
US12033079B2 (en) | 2018-02-08 | 2024-07-09 | Cognizant Technology Solutions U.S. Corporation | System and method for pseudo-task augmentation in deep multitask learning |
US11501076B2 (en) * | 2018-02-09 | 2022-11-15 | Salesforce.Com, Inc. | Multitask learning as question answering |
TWI659411B (zh) * | 2018-03-01 | 2019-05-11 | 大陸商芋頭科技(杭州)有限公司 | 一種多語言混合語音識別方法 |
CN108564954B (zh) * | 2018-03-19 | 2020-01-10 | 平安科技(深圳)有限公司 | 深度神经网络模型、电子装置、身份验证方法和存储介质 |
KR102473447B1 (ko) | 2018-03-22 | 2022-12-05 | 삼성전자주식회사 | 인공지능 모델을 이용하여 사용자 음성을 변조하기 위한 전자 장치 및 이의 제어 방법 |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US20190318229A1 (en) * | 2018-04-12 | 2019-10-17 | Advanced Micro Devices, Inc. | Method and system for hardware mapping inference pipelines |
US10672414B2 (en) * | 2018-04-13 | 2020-06-02 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable media for improved real-time audio processing |
CN108538311B (zh) * | 2018-04-13 | 2020-09-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频分类方法、装置及计算机可读存储介质 |
US11545157B2 (en) * | 2018-04-23 | 2023-01-03 | Google Llc | Speaker diartzation using an end-to-end model |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
EP3791332A4 (en) * | 2018-05-10 | 2022-03-09 | The Board of Trustees of the Leland Stanford Junior University | TRAINING PHOTONIC NEURAL NETWORKS BY IN SITU BACK PROPAGATION |
WO2019217836A1 (en) | 2018-05-10 | 2019-11-14 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for activation functions for photonic neural networks |
KR102018346B1 (ko) * | 2018-05-11 | 2019-10-14 | 국방과학연구소 | 음향 신호를 분류하는 방법 및 시스템 |
US11086937B2 (en) * | 2018-05-11 | 2021-08-10 | The Regents Of The University Of California | Speech based structured querying |
US11462209B2 (en) * | 2018-05-18 | 2022-10-04 | Baidu Usa Llc | Spectrogram to waveform synthesis using convolutional networks |
US11138471B2 (en) * | 2018-05-18 | 2021-10-05 | Google Llc | Augmentation of audiographic images for improved machine learning |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
WO2019243910A1 (en) * | 2018-06-21 | 2019-12-26 | International Business Machines Corporation | Segmenting irregular shapes in images using deep region growing |
CN108984535B (zh) * | 2018-06-25 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 语句翻译的方法、翻译模型训练的方法、设备及存储介质 |
EP3815043A4 (en) * | 2018-06-29 | 2022-01-26 | Baidu.com Times Technology (Beijing) Co., Ltd. | SYSTEMS AND METHODS FOR DEPTH ESTIMATION BY AFFINITY LEARNED WITH FOLDING SPATIAL PROPAGATION NETWORKS |
CN109147766B (zh) * | 2018-07-06 | 2020-08-18 | 北京爱医声科技有限公司 | 基于端到端深度学习模型的语音识别方法及系统 |
EP4471670A3 (en) | 2018-07-13 | 2024-12-25 | Google LLC | End-to-end streaming keyword spotting |
US11335333B2 (en) | 2018-07-20 | 2022-05-17 | Google Llc | Speech recognition with sequence-to-sequence models |
CN110752973B (zh) * | 2018-07-24 | 2020-12-25 | Tcl科技集团股份有限公司 | 一种终端设备的控制方法、装置和终端设备 |
US10720151B2 (en) | 2018-07-27 | 2020-07-21 | Deepgram, Inc. | End-to-end neural networks for speech recognition and classification |
CN108962230B (zh) * | 2018-07-27 | 2019-04-23 | 重庆因普乐科技有限公司 | 基于忆阻器的语音识别方法 |
JP7209330B2 (ja) * | 2018-07-30 | 2023-01-20 | 国立研究開発法人情報通信研究機構 | 識別器、学習済モデル、学習方法 |
US11107463B2 (en) | 2018-08-01 | 2021-08-31 | Google Llc | Minimum word error rate training for attention-based sequence-to-sequence models |
CN110825665B (zh) * | 2018-08-10 | 2021-11-05 | 昆仑芯(北京)科技有限公司 | 数据获取单元和应用于控制器的数据获取方法 |
US10650812B2 (en) * | 2018-08-13 | 2020-05-12 | Bank Of America Corporation | Deterministic multi-length sliding window protocol for contiguous string entity |
CN109003601A (zh) * | 2018-08-31 | 2018-12-14 | 北京工商大学 | 一种针对低资源土家语的跨语言端到端语音识别方法 |
CN112639964B (zh) * | 2018-09-04 | 2024-07-26 | Oppo广东移动通信有限公司 | 利用深度信息识别语音的方法、系统及计算机可读介质 |
US10963721B2 (en) | 2018-09-10 | 2021-03-30 | Sony Corporation | License plate number recognition based on three dimensional beam search |
CN109271926B (zh) * | 2018-09-14 | 2021-09-10 | 西安电子科技大学 | 基于gru深度卷积网络的智能辐射源识别方法 |
CN109215662B (zh) * | 2018-09-18 | 2023-06-20 | 平安科技(深圳)有限公司 | 端对端语音识别方法、电子装置及计算机可读存储介质 |
JP7043373B2 (ja) * | 2018-09-18 | 2022-03-29 | ヤフー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US10672382B2 (en) * | 2018-10-15 | 2020-06-02 | Tencent America LLC | Input-feeding architecture for attention based end-to-end speech recognition |
US10891951B2 (en) * | 2018-10-17 | 2021-01-12 | Ford Global Technologies, Llc | Vehicle language processing |
EP3640856A1 (en) | 2018-10-19 | 2020-04-22 | Fujitsu Limited | A method, apparatus and computer program to carry out a training procedure in a convolutional neural network |
KR20200045128A (ko) * | 2018-10-22 | 2020-05-04 | 삼성전자주식회사 | 모델 학습 방법 및 장치, 및 데이터 인식 방법 |
CN109447253B (zh) * | 2018-10-26 | 2021-04-27 | 杭州比智科技有限公司 | 显存分配的方法、装置、计算设备及计算机存储介质 |
CN112970063B (zh) | 2018-10-29 | 2024-10-18 | 杜比国际公司 | 用于利用生成模型的码率质量可分级编码的方法及设备 |
US11640519B2 (en) * | 2018-10-31 | 2023-05-02 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks using cross-domain batch normalization |
US11494612B2 (en) | 2018-10-31 | 2022-11-08 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks using domain classifier |
US12282845B2 (en) | 2018-11-01 | 2025-04-22 | Cognizant Technology Solutions US Corp. | Multiobjective coevolution of deep neural network architectures |
US11526759B2 (en) * | 2018-11-05 | 2022-12-13 | International Business Machines Corporation | Large model support in deep learning |
CN109523994A (zh) * | 2018-11-13 | 2019-03-26 | 四川大学 | 一种基于胶囊神经网络的多任务语音分类方法 |
CN109492233B (zh) * | 2018-11-14 | 2023-10-17 | 北京捷通华声科技股份有限公司 | 一种机器翻译方法和装置 |
US11250838B2 (en) * | 2018-11-16 | 2022-02-15 | Deepmind Technologies Limited | Cross-modal sequence distillation |
US11238845B2 (en) | 2018-11-21 | 2022-02-01 | Google Llc | Multi-dialect and multilingual speech recognition |
US11736363B2 (en) * | 2018-11-30 | 2023-08-22 | Disney Enterprises, Inc. | Techniques for analyzing a network and increasing network availability |
US10573312B1 (en) | 2018-12-04 | 2020-02-25 | Sorenson Ip Holdings, Llc | Transcription generation from multiple speech recognition systems |
US11017778B1 (en) | 2018-12-04 | 2021-05-25 | Sorenson Ip Holdings, Llc | Switching between speech recognition systems |
US10388272B1 (en) | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
US11170761B2 (en) | 2018-12-04 | 2021-11-09 | Sorenson Ip Holdings, Llc | Training of speech recognition systems |
KR102681637B1 (ko) | 2018-12-13 | 2024-07-05 | 현대자동차주식회사 | 문제소음 발음원 식별을 위한 소음데이터의 인공지능 장치 및 전처리 방법 |
KR102804488B1 (ko) | 2018-12-24 | 2025-05-07 | 디티에스, 인코포레이티드 | 딥 러닝 이미지 분석을 사용한 룸 음향 시뮬레이션 |
JP7206898B2 (ja) * | 2018-12-25 | 2023-01-18 | 富士通株式会社 | 学習装置、学習方法および学習プログラム |
CN111369978B (zh) * | 2018-12-26 | 2024-05-17 | 北京搜狗科技发展有限公司 | 一种数据处理方法、装置和用于数据处理的装置 |
KR102744417B1 (ko) | 2018-12-28 | 2024-12-19 | 한국전자통신연구원 | 오디오 신호를 위한 손실 함수 결정 방법 및 손실 함수 결정 장치 |
CN111429889B (zh) * | 2019-01-08 | 2023-04-28 | 百度在线网络技术(北京)有限公司 | 基于截断注意力的实时语音识别的方法、装置、设备以及计算机可读存储介质 |
US11322136B2 (en) * | 2019-01-09 | 2022-05-03 | Samsung Electronics Co., Ltd. | System and method for multi-spoken language detection |
EP3899807A1 (en) | 2019-01-23 | 2021-10-27 | Google LLC | Generating neural network outputs using insertion operations |
CN109783822B (zh) * | 2019-01-24 | 2023-04-18 | 中国—东盟信息港股份有限公司 | 一种基于验证码的数据样本识别系统及其方法 |
CN111489742B (zh) * | 2019-01-28 | 2023-06-27 | 北京猎户星空科技有限公司 | 声学模型训练方法、语音识别方法、装置及电子设备 |
CN109859743B (zh) * | 2019-01-29 | 2023-12-08 | 腾讯科技(深圳)有限公司 | 音频识别方法、系统和机器设备 |
KR102691895B1 (ko) | 2019-01-29 | 2024-08-06 | 삼성전자주식회사 | 가속 컴퓨팅 환경을 제공하는 서버 및 제어 방법 |
JP7028203B2 (ja) * | 2019-02-07 | 2022-03-02 | 日本電信電話株式会社 | 音声認識装置、音声認識方法、プログラム |
JP7218601B2 (ja) * | 2019-02-12 | 2023-02-07 | 日本電信電話株式会社 | 学習データ取得装置、モデル学習装置、それらの方法、およびプログラム |
CN110059813B (zh) * | 2019-02-13 | 2021-04-06 | 创新先进技术有限公司 | 利用gpu集群更新卷积神经网络的方法、装置及设备 |
US10861441B2 (en) | 2019-02-14 | 2020-12-08 | Tencent America LLC | Large margin training for attention-based end-to-end speech recognition |
US11037547B2 (en) | 2019-02-14 | 2021-06-15 | Tencent America LLC | Token-wise training for attention based end-to-end speech recognition |
US11481639B2 (en) | 2019-02-26 | 2022-10-25 | Cognizant Technology Solutions U.S. Corporation | Enhanced optimization with composite objectives and novelty pulsation |
CA3129731A1 (en) * | 2019-03-13 | 2020-09-17 | Elliot Meyerson | System and method for implementing modular universal reparameterization for deep multi-task learning across diverse domains |
CN111709513B (zh) * | 2019-03-18 | 2023-06-09 | 百度在线网络技术(北京)有限公司 | 长短期记忆网络lstm的训练系统、方法及电子设备 |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11783195B2 (en) | 2019-03-27 | 2023-10-10 | Cognizant Technology Solutions U.S. Corporation | Process and system including an optimization engine with evolutionary surrogate-assisted prescriptions |
US11182457B2 (en) | 2019-03-28 | 2021-11-23 | International Business Machines Corporation | Matrix-factorization based gradient compression |
US11011156B2 (en) * | 2019-04-11 | 2021-05-18 | International Business Machines Corporation | Training data modification for training model |
CN109887497B (zh) * | 2019-04-12 | 2021-01-29 | 北京百度网讯科技有限公司 | 语音识别的建模方法、装置及设备 |
CN110033760B (zh) | 2019-04-15 | 2021-01-29 | 北京百度网讯科技有限公司 | 语音识别的建模方法、装置及设备 |
US11676006B2 (en) | 2019-04-16 | 2023-06-13 | Microsoft Technology Licensing, Llc | Universal acoustic modeling using neural mixture models |
CN113841195B (zh) * | 2019-04-16 | 2023-12-22 | 谷歌有限责任公司 | 联合端点确定和自动语音识别 |
US10997967B2 (en) | 2019-04-18 | 2021-05-04 | Honeywell International Inc. | Methods and systems for cockpit speech recognition acoustic model training with multi-level corpus data augmentation |
US11468879B2 (en) * | 2019-04-29 | 2022-10-11 | Tencent America LLC | Duration informed attention network for text-to-speech analysis |
US20200349425A1 (en) * | 2019-04-30 | 2020-11-05 | Fujitsu Limited | Training time reduction in automatic data augmentation |
CN113811946B (zh) * | 2019-05-03 | 2024-07-16 | 谷歌有限责任公司 | 数字序列的端到端自动语音识别 |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
KR20210150497A (ko) * | 2019-05-06 | 2021-12-10 | 구글 엘엘씨 | 음성 인식을 위한 컨텍스트 바이어싱 |
CN110211565B (zh) * | 2019-05-06 | 2023-04-04 | 平安科技(深圳)有限公司 | 方言识别方法、装置及计算机可读存储介质 |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
CN113795850A (zh) * | 2019-05-07 | 2021-12-14 | 映佳控制公司 | 用于初始化神经网络的方法和系统 |
KR102460676B1 (ko) | 2019-05-07 | 2022-10-31 | 한국전자통신연구원 | 밀집 연결된 하이브리드 뉴럴 네트워크를 이용한 음성 처리 장치 및 방법 |
CN110222578B (zh) * | 2019-05-08 | 2022-12-27 | 腾讯科技(深圳)有限公司 | 对抗测试看图说话系统的方法和装置 |
CN110085249B (zh) * | 2019-05-09 | 2021-03-16 | 南京工程学院 | 基于注意力门控的循环神经网络的单通道语音增强方法 |
JP7229847B2 (ja) * | 2019-05-13 | 2023-02-28 | 株式会社日立製作所 | 対話装置、対話方法、及び対話コンピュータプログラム |
CN111832699A (zh) * | 2019-05-13 | 2020-10-27 | 谷歌有限责任公司 | 用于神经网络的计算高效富于表达的输出层 |
KR20220007160A (ko) * | 2019-05-28 | 2022-01-18 | 구글 엘엘씨 | 스트리밍 엔드-투-엔드 모델을 사용한 대규모 다국어 음성 인식 |
CN112017676B (zh) * | 2019-05-31 | 2024-07-16 | 京东科技控股股份有限公司 | 音频处理方法、装置和计算机可读存储介质 |
US11289073B2 (en) * | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US10716089B1 (en) * | 2019-06-03 | 2020-07-14 | Mapsted Corp. | Deployment of trained neural network based RSS fingerprint dataset |
JP7566789B2 (ja) * | 2019-06-04 | 2024-10-15 | グーグル エルエルシー | 2パスエンドツーエンド音声認識 |
CN110189766B (zh) * | 2019-06-14 | 2021-04-06 | 西南科技大学 | 一种基于神经网络的语音风格转移方法 |
WO2020256838A1 (en) | 2019-06-19 | 2020-12-24 | Google Llc | Contextual biasing for speech recognition |
CN110299132B (zh) * | 2019-06-26 | 2021-11-02 | 京东数字科技控股有限公司 | 一种语音数字识别方法和装置 |
CN110288682B (zh) | 2019-06-28 | 2023-09-26 | 北京百度网讯科技有限公司 | 用于控制三维虚拟人像口型变化的方法和装置 |
KR20210008788A (ko) | 2019-07-15 | 2021-01-25 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
WO2021010562A1 (en) | 2019-07-15 | 2021-01-21 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
US11244673B2 (en) * | 2019-07-19 | 2022-02-08 | Microsoft Technologly Licensing, LLC | Streaming contextual unidirectional models |
KR102824645B1 (ko) | 2019-07-31 | 2025-06-24 | 삼성전자주식회사 | 음성 인식을 위한 인공신경망에서의 디코딩 방법 및 장치 |
CN110473554B (zh) * | 2019-08-08 | 2022-01-25 | Oppo广东移动通信有限公司 | 音频校验方法、装置、存储介质及电子设备 |
US11532310B2 (en) | 2019-08-13 | 2022-12-20 | Samsung Electronics Co., Ltd. | System and method for recognizing user's speech |
WO2021029643A1 (en) | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | System and method for modifying speech recognition result |
CN114223029A (zh) | 2019-08-13 | 2022-03-22 | 三星电子株式会社 | 支持装置进行语音识别的服务器及服务器的操作方法 |
CN110459209B (zh) * | 2019-08-20 | 2021-05-28 | 深圳追一科技有限公司 | 语音识别方法、装置、设备及存储介质 |
US11151979B2 (en) | 2019-08-23 | 2021-10-19 | Tencent America LLC | Duration informed attention network (DURIAN) for audio-visual synthesis |
US11158303B2 (en) * | 2019-08-27 | 2021-10-26 | International Business Machines Corporation | Soft-forgetting for connectionist temporal classification based automatic speech recognition |
US11551675B2 (en) | 2019-09-03 | 2023-01-10 | Samsung Electronics Co., Ltd. | Electronic device and method for controlling the electronic device thereof |
CN110459208B (zh) * | 2019-09-09 | 2022-01-11 | 中科极限元(杭州)智能科技股份有限公司 | 一种基于知识迁移的序列到序列语音识别模型训练方法 |
CN110600020B (zh) * | 2019-09-12 | 2022-05-17 | 上海依图信息技术有限公司 | 一种梯度传输方法及装置 |
US11302309B2 (en) * | 2019-09-13 | 2022-04-12 | International Business Machines Corporation | Aligning spike timing of models for maching learning |
CN110807365B (zh) * | 2019-09-29 | 2022-02-11 | 浙江大学 | 一种基于gru与一维cnn神经网络融合的水下目标识别方法 |
CN112738634B (zh) * | 2019-10-14 | 2022-08-02 | 北京字节跳动网络技术有限公司 | 视频文件的生成方法、装置、终端及存储介质 |
US11681911B2 (en) * | 2019-10-15 | 2023-06-20 | Naver Corporation | Method and system for training neural sequence-to-sequence models by incorporating global features |
CN110704197B (zh) | 2019-10-17 | 2022-12-09 | 北京小米移动软件有限公司 | 处理内存访问开销的方法、装置及介质 |
CN110875035A (zh) * | 2019-10-24 | 2020-03-10 | 广州多益网络股份有限公司 | 新型多任务联合的语音识别训练架构和方法 |
KR102203786B1 (ko) * | 2019-11-14 | 2021-01-15 | 오로라월드 주식회사 | 스마트 토이를 이용한 인터렉션 서비스 제공방법 및 시스템 |
CN110930979B (zh) * | 2019-11-29 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | 一种语音识别模型训练方法、装置以及电子设备 |
CN111312228A (zh) * | 2019-12-09 | 2020-06-19 | 中国南方电网有限责任公司 | 一种基于端到端的应用于电力企业客服的语音导航方法 |
CN111048082B (zh) * | 2019-12-12 | 2022-09-06 | 中国电子科技集团公司第二十八研究所 | 一种改进的端到端语音识别方法 |
CN113077785B (zh) * | 2019-12-17 | 2022-07-12 | 中国科学院声学研究所 | 一种端到端的多语言连续语音流语音内容识别方法及系统 |
CN111079945B (zh) * | 2019-12-18 | 2021-02-05 | 北京百度网讯科技有限公司 | 端到端模型的训练方法及装置 |
CN111145729B (zh) * | 2019-12-23 | 2022-10-28 | 厦门快商通科技股份有限公司 | 语音识别模型训练方法、系统、移动终端及存储介质 |
CN111063336A (zh) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | 一种基于深度学习的端对端语音识别系统 |
US11183178B2 (en) | 2020-01-13 | 2021-11-23 | Microsoft Technology Licensing, Llc | Adaptive batching to reduce recognition latency |
CN111382581B (zh) * | 2020-01-21 | 2023-05-19 | 沈阳雅译网络技术有限公司 | 一种机器翻译中的一次剪枝压缩方法 |
EP4361897A3 (en) * | 2020-01-28 | 2024-07-17 | Google Llc | Language-agnostic multilingual modeling using effective script normalization |
CN111292727B (zh) * | 2020-02-03 | 2023-03-24 | 北京声智科技有限公司 | 一种语音识别方法及电子设备 |
CN111428750A (zh) * | 2020-02-20 | 2020-07-17 | 商汤国际私人有限公司 | 一种文本识别模型训练及文本识别方法、装置及介质 |
CN111210807B (zh) * | 2020-02-21 | 2023-03-31 | 厦门快商通科技股份有限公司 | 语音识别模型训练方法、系统、移动终端及存储介质 |
CN111397870B (zh) * | 2020-03-08 | 2021-05-14 | 中国地质大学(武汉) | 一种基于多样化集成卷积神经网络的机械故障预测方法 |
CN111246026A (zh) * | 2020-03-11 | 2020-06-05 | 兰州飞天网景信息产业有限公司 | 一种基于卷积神经网络和连接性时序分类的录音处理方法 |
US11747902B2 (en) | 2020-03-11 | 2023-09-05 | Apple Inc. | Machine learning configurations modeled using contextual categorical labels for biosignals |
CN111415667B (zh) * | 2020-03-25 | 2024-04-23 | 中科极限元(杭州)智能科技股份有限公司 | 一种流式端到端语音识别模型训练和解码方法 |
US12217156B2 (en) * | 2020-04-01 | 2025-02-04 | Sony Group Corporation | Computing temporal convolution networks in real time |
US12136411B2 (en) | 2020-04-03 | 2024-11-05 | International Business Machines Corporation | Training of model for processing sequence data |
US12099934B2 (en) * | 2020-04-07 | 2024-09-24 | Cognizant Technology Solutions U.S. Corporation | Framework for interactive exploration, evaluation, and improvement of AI-generated solutions |
US12020693B2 (en) | 2020-04-29 | 2024-06-25 | Samsung Electronics Co., Ltd. | System and method for out-of-vocabulary phrase support in automatic speech recognition |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
US11796794B2 (en) | 2020-05-12 | 2023-10-24 | The Board Of Trustees Of The Leland Stanford Junior University | Multi-objective, robust constraints enforced global topology optimizer for optical devices |
US20210358490A1 (en) * | 2020-05-18 | 2021-11-18 | Nvidia Corporation | End of speech detection using one or more neural networks |
CN111798828B (zh) * | 2020-05-29 | 2023-02-14 | 厦门快商通科技股份有限公司 | 合成音频检测方法、系统、移动终端及存储介质 |
US11775841B2 (en) | 2020-06-15 | 2023-10-03 | Cognizant Technology Solutions U.S. Corporation | Process and system including explainable prescriptions through surrogate-assisted evolution |
US11646009B1 (en) * | 2020-06-16 | 2023-05-09 | Amazon Technologies, Inc. | Autonomously motile device with noise suppression |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11875797B2 (en) * | 2020-07-23 | 2024-01-16 | Pozotron Inc. | Systems and methods for scripted audio production |
CN111816169B (zh) * | 2020-07-23 | 2022-05-13 | 思必驰科技股份有限公司 | 中英语种混杂语音识别模型训练方法和装置 |
KR102462932B1 (ko) * | 2020-08-03 | 2022-11-04 | 주식회사 딥브레인에이아이 | 텍스트 전처리 장치 및 방법 |
US11488604B2 (en) | 2020-08-19 | 2022-11-01 | Sorenson Ip Holdings, Llc | Transcription of audio |
KR102409873B1 (ko) * | 2020-09-02 | 2022-06-16 | 네이버 주식회사 | 증강된 일관성 정규화를 이용한 음성 인식 모델 학습 방법 및 시스템 |
US20220101112A1 (en) * | 2020-09-25 | 2022-03-31 | Nvidia Corporation | Neural network training using robust temporal ensembling |
CN112188004B (zh) * | 2020-09-28 | 2022-04-05 | 精灵科技有限公司 | 基于机器学习的障碍呼叫检测系统及其控制方法 |
CN112233655B (zh) * | 2020-09-28 | 2024-07-16 | 上海声瀚信息科技有限公司 | 一种提高语音命令词识别性能的神经网络训练方法 |
US11380307B2 (en) * | 2020-09-30 | 2022-07-05 | Tencent America LLC | All deep learning minimum variance distortionless response beamformer for speech separation and enhancement |
US11798534B2 (en) * | 2020-10-02 | 2023-10-24 | Salesforce.Com, Inc. | Systems and methods for a multilingual speech recognition framework |
CN116250038A (zh) * | 2020-10-05 | 2023-06-09 | 谷歌有限责任公司 | 变换器换能器:一种统一流式和非流式语音识别的模型 |
KR102429656B1 (ko) * | 2020-10-08 | 2022-08-08 | 서울대학교산학협력단 | 화자 인식을 위한 음성인식기 기반 풀링 기법의 화자 임베딩 추출 방법 및 시스템, 그리고 이를 위한 기록매체 |
CN112259080B (zh) * | 2020-10-20 | 2021-06-22 | 北京讯众通信技术股份有限公司 | 一种基于神经网络模型的语音识别方法 |
US12093802B2 (en) | 2020-10-20 | 2024-09-17 | International Business Machines Corporation | Gated unit for a gated recurrent neural network |
US11593560B2 (en) * | 2020-10-21 | 2023-02-28 | Beijing Wodong Tianjun Information Technology Co., Ltd. | System and method for relation extraction with adaptive thresholding and localized context pooling |
CN112466282B (zh) * | 2020-10-22 | 2023-11-28 | 北京仿真中心 | 一种面向航天专业领域的语音识别系统和方法 |
CN112420024B (zh) * | 2020-10-23 | 2022-09-09 | 四川大学 | 一种全端到端的中英文混合空管语音识别方法及装置 |
CN112329836B (zh) * | 2020-11-02 | 2024-12-27 | 成都网安科技发展有限公司 | 基于深度学习的文本分类方法、装置、服务器及存储介质 |
CN112614484B (zh) | 2020-11-23 | 2022-05-20 | 北京百度网讯科技有限公司 | 特征信息挖掘方法、装置及电子设备 |
CN112669852B (zh) * | 2020-12-15 | 2023-01-31 | 北京百度网讯科技有限公司 | 内存分配方法、装置及电子设备 |
CN112786017B (zh) * | 2020-12-25 | 2024-04-09 | 北京猿力未来科技有限公司 | 语速检测模型的训练方法及装置、语速检测方法及装置 |
US11790906B2 (en) * | 2021-01-25 | 2023-10-17 | Google Llc | Resolving unique personal identifiers during corresponding conversations between a voice bot and a human |
US11817117B2 (en) * | 2021-01-29 | 2023-11-14 | Nvidia Corporation | Speaker adaptive end of speech detection for conversational AI applications |
CA3207420A1 (en) * | 2021-02-04 | 2022-08-11 | Andrew Brock | Neural networks with adaptive gradient clipping |
CN115188389B (zh) * | 2021-04-06 | 2024-04-05 | 京东科技控股股份有限公司 | 基于神经网络的端到端语音增强方法、装置 |
CN113421574B (zh) * | 2021-06-18 | 2024-05-24 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频特征提取模型的训练方法、音频识别方法及相关设备 |
CN113535510B (zh) * | 2021-06-24 | 2024-01-26 | 北京理工大学 | 一种大规模数据中心数据采集的自适应抽样模型优化方法 |
CN113327600B (zh) * | 2021-06-30 | 2024-07-23 | 北京有竹居网络技术有限公司 | 一种语音识别模型的训练方法、装置及设备 |
US12112200B2 (en) | 2021-09-13 | 2024-10-08 | International Business Machines Corporation | Pipeline parallel computing using extended memory |
WO2023055410A1 (en) | 2021-09-30 | 2023-04-06 | Google Llc | Contrastive siamese network for semi-supervised speech recognition |
US12347149B2 (en) | 2021-12-13 | 2025-07-01 | Tencent America LLC | System, method, and computer program for content adaptive online training for multiple blocks in neural image compression |
CN114548501B (zh) * | 2022-01-14 | 2024-06-18 | 北京全路通信信号研究设计院集团有限公司 | 一种均衡性检查方法、系统及设备 |
CN114842829B (zh) * | 2022-03-29 | 2025-03-28 | 北京理工大学 | 一种抑制语音要素异常点的文本驱动语音合成方法 |
US12136413B1 (en) * | 2022-03-31 | 2024-11-05 | Amazon Technologies, Inc. | Domain-specific parameter pre-fixes for tuning automatic speech recognition |
US11978436B2 (en) | 2022-06-03 | 2024-05-07 | Apple Inc. | Application vocabulary integration with a digital assistant |
CN114743554A (zh) * | 2022-06-09 | 2022-07-12 | 武汉工商学院 | 基于物联网的智能家居交互方法及装置 |
KR102547001B1 (ko) * | 2022-06-28 | 2023-06-23 | 주식회사 액션파워 | 하향식 방식을 이용한 오류 검출 방법 |
US20240339123A1 (en) * | 2023-04-06 | 2024-10-10 | Samsung Electronics Co., Ltd. | System and method for keyword spotting in noisy environments |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5790754A (en) | 1994-10-21 | 1998-08-04 | Sensory Circuits, Inc. | Speech recognition apparatus for consumer electronic applications |
US5749066A (en) | 1995-04-24 | 1998-05-05 | Ericsson Messaging Systems Inc. | Method and apparatus for developing a neural network for phoneme recognition |
JP2996926B2 (ja) | 1997-03-11 | 2000-01-11 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | 音素シンボルの事後確率演算装置及び音声認識装置 |
US6292772B1 (en) * | 1998-12-01 | 2001-09-18 | Justsystem Corporation | Method for identifying the language of individual words |
AUPQ439299A0 (en) * | 1999-12-01 | 1999-12-23 | Silverbrook Research Pty Ltd | Interface system |
US7035802B1 (en) | 2000-07-31 | 2006-04-25 | Matsushita Electric Industrial Co., Ltd. | Recognition system using lexical trees |
US7219085B2 (en) * | 2003-12-09 | 2007-05-15 | Microsoft Corporation | System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit |
US20060031069A1 (en) * | 2004-08-03 | 2006-02-09 | Sony Corporation | System and method for performing a grapheme-to-phoneme conversion |
GB0507036D0 (en) | 2005-04-07 | 2005-05-11 | Ibm | Method and system for language identification |
WO2009027980A1 (en) * | 2007-08-28 | 2009-03-05 | Yissum Research Development Company Of The Hebrew University Of Jerusalem | Method, device and system for speech recognition |
JP4869268B2 (ja) * | 2008-03-04 | 2012-02-08 | 日本放送協会 | 音響モデル学習装置およびプログラム |
US8332212B2 (en) * | 2008-06-18 | 2012-12-11 | Cogi, Inc. | Method and system for efficient pacing of speech for transcription |
US8781833B2 (en) | 2008-07-17 | 2014-07-15 | Nuance Communications, Inc. | Speech recognition semantic classification training |
US8886531B2 (en) | 2010-01-13 | 2014-11-11 | Rovi Technologies Corporation | Apparatus and method for generating an audio fingerprint and using a two-stage query |
US20130317755A1 (en) * | 2012-05-04 | 2013-11-28 | New York University | Methods, computer-accessible medium, and systems for score-driven whole-genome shotgun sequence assembly |
US10354650B2 (en) * | 2012-06-26 | 2019-07-16 | Google Llc | Recognizing speech with mixed speech recognition models to generate transcriptions |
US8831957B2 (en) * | 2012-08-01 | 2014-09-09 | Google Inc. | Speech recognition models based on location indicia |
CN102760436B (zh) * | 2012-08-09 | 2014-06-11 | 河南省烟草公司开封市公司 | 一种语音词库筛选方法 |
US9177550B2 (en) | 2013-03-06 | 2015-11-03 | Microsoft Technology Licensing, Llc | Conservatively adapting a deep neural network in a recognition system |
US9153231B1 (en) * | 2013-03-15 | 2015-10-06 | Amazon Technologies, Inc. | Adaptive neural network speech recognition models |
US9418650B2 (en) | 2013-09-25 | 2016-08-16 | Verizon Patent And Licensing Inc. | Training speech recognition using captions |
CN103591637B (zh) | 2013-11-19 | 2015-12-02 | 长春工业大学 | 一种集中供热二次网运行调节方法 |
US9189708B2 (en) | 2013-12-31 | 2015-11-17 | Google Inc. | Pruning and label selection in hidden markov model-based OCR |
CN103870863B (zh) * | 2014-03-14 | 2016-08-31 | 华中科技大学 | 制备隐藏二维码图像全息防伪标签的方法及其识别装置 |
US9390712B2 (en) | 2014-03-24 | 2016-07-12 | Microsoft Technology Licensing, Llc. | Mixed speech recognition |
US20150309987A1 (en) | 2014-04-29 | 2015-10-29 | Google Inc. | Classification of Offensive Words |
CN104035751B (zh) | 2014-06-20 | 2016-10-12 | 深圳市腾讯计算机系统有限公司 | 基于多图形处理器的数据并行处理方法及装置 |
US10540957B2 (en) * | 2014-12-15 | 2020-01-21 | Baidu Usa Llc | Systems and methods for speech transcription |
US10733979B2 (en) * | 2015-10-09 | 2020-08-04 | Google Llc | Latency constraints for acoustic modeling |
-
2016
- 2016-11-21 US US15/358,102 patent/US10332509B2/en active Active
- 2016-11-21 US US15/358,083 patent/US10319374B2/en active Active
- 2016-11-23 CN CN201680010873.8A patent/CN107408111B/zh active Active
- 2016-11-23 CN CN201680010871.9A patent/CN107408384B/zh active Active
- 2016-11-23 JP JP2017544340A patent/JP6661654B2/ja active Active
- 2016-11-23 KR KR1020177023177A patent/KR102033230B1/ko active Active
- 2016-11-23 KR KR1020177023173A patent/KR102008077B1/ko active Active
- 2016-11-23 EP EP16869294.5A patent/EP3245652B1/en active Active
- 2016-11-23 WO PCT/US2016/063641 patent/WO2017091751A1/en active Application Filing
- 2016-11-23 EP EP16869302.6A patent/EP3245597B1/en active Active
- 2016-11-23 WO PCT/US2016/063661 patent/WO2017091763A1/en active Application Filing
- 2016-11-23 JP JP2017544352A patent/JP6629872B2/ja active Active
Non-Patent Citations (3)
Title |
---|
Awni Hannun et al., ‘Deep speech: Scaling up end-to-end speech recognition’, Cornell University Library, pp. 1~12, December 2014.* * |
Sergey Ioffe et al., ‘Batch normalization: Accerlerating deep network training by reducing internal covalate shift’, Cornell University Library, pp.1~11, March 2015.* * |
Tara N. Sainath et al., ‘Convolutional long short-term memory, fully connected deep neural networks’, ICASSP 2015, pp.4580~4584, April 2015.* * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180084580A (ko) * | 2017-01-17 | 2018-07-25 | 경북대학교 산학협력단 | 복수 문단 텍스트의 추상적 요약문 생성 장치 및 방법, 그 방법을 수행하기 위한 기록 매체 |
KR20190085755A (ko) * | 2018-01-11 | 2019-07-19 | 중앙대학교 산학협력단 | Bcsc를 이용한 딥러닝 방법 및 그 장치 |
KR20200095789A (ko) * | 2019-02-01 | 2020-08-11 | 한국전자통신연구원 | 번역 모델 구축 방법 및 장치 |
Also Published As
Publication number | Publication date |
---|---|
KR102033230B1 (ko) | 2019-10-16 |
US20170148433A1 (en) | 2017-05-25 |
US10332509B2 (en) | 2019-06-25 |
WO2017091763A1 (en) | 2017-06-01 |
EP3245597B1 (en) | 2020-08-26 |
EP3245652A4 (en) | 2018-05-30 |
KR102008077B1 (ko) | 2019-08-06 |
JP2018513399A (ja) | 2018-05-24 |
EP3245597A1 (en) | 2017-11-22 |
EP3245597A4 (en) | 2018-05-30 |
WO2017091751A1 (en) | 2017-06-01 |
US10319374B2 (en) | 2019-06-11 |
JP6629872B2 (ja) | 2020-01-15 |
US20170148431A1 (en) | 2017-05-25 |
CN107408111B (zh) | 2021-03-30 |
EP3245652B1 (en) | 2019-07-10 |
CN107408384B (zh) | 2020-11-27 |
CN107408111A (zh) | 2017-11-28 |
EP3245652A1 (en) | 2017-11-22 |
KR20170106445A (ko) | 2017-09-20 |
JP6661654B2 (ja) | 2020-03-11 |
JP2018513398A (ja) | 2018-05-24 |
CN107408384A (zh) | 2017-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102033230B1 (ko) | 단대단 음성 인식 | |
US11620986B2 (en) | Cold fusing sequence-to-sequence models with language models | |
Sundermeyer et al. | Comparison of feedforward and recurrent neural network language models | |
CN107077842B (zh) | 用于语音转录的系统和方法 | |
US10949736B2 (en) | Flexible neural network accelerator and methods therefor | |
Huang et al. | SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition | |
KR101970041B1 (ko) | 하이브리드 지피유/씨피유(gpu/cpu) 데이터 처리 방법 | |
Abdelhamid et al. | End-to-end arabic speech recognition: A review | |
Price | Energy-scalable speech recognition circuits | |
Tarján et al. | Investigation on N-gram approximated RNNLMs for recognition of morphologically rich speech | |
CN119895437A (zh) | 用于通过量化压缩生成式预训练语言模型的方法和设备 | |
US20220319501A1 (en) | Stochastic future context for speech processing | |
Buthpitiya et al. | A parallel implementation of viterbi training for acoustic models using graphics processing units | |
You et al. | Memory access optimized VLSI for 5000-word continuous speech recognition | |
Chetupalli et al. | Context dependent RNNLM for automatic transcription of conversations | |
Liu et al. | Speech recognition systems on the Cell Broadband Engine processor | |
Dua et al. | Cepstral and acoustic ternary pattern based hybrid feature extraction approach for end-to-end bangla speech recognition | |
Karkada et al. | Training Speech Recognition Models on HPC Infrastructure | |
Zhang | Research on Modeling of Speech Recognition Based on Deep Learning | |
Sung et al. | Exploration of on-device end-to-end acoustic modeling with neural networks | |
Pinto Rivero | Acceleration of automatic speech recognition for low-power devices | |
Chen | Cued rnnlm toolkit | |
Liu et al. | Cross Languages One-Versus-All Speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PA0105 | International application |
Patent event date: 20170818 Patent event code: PA01051R01D Comment text: International Patent Application |
|
A201 | Request for examination | ||
PA0201 | Request for examination |
Patent event code: PA02012R01D Patent event date: 20170821 Comment text: Request for Examination of Application |
|
PG1501 | Laying open of application | ||
E902 | Notification of reason for refusal | ||
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20181129 Patent event code: PE09021S01D |
|
E902 | Notification of reason for refusal | ||
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20190723 Patent event code: PE09021S01D |
|
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20190919 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20191010 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20191010 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
PR1001 | Payment of annual fee |
Payment date: 20220916 Start annual number: 4 End annual number: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20230918 Start annual number: 5 End annual number: 5 |
|
PR1001 | Payment of annual fee |
Payment date: 20240930 Start annual number: 6 End annual number: 6 |