KR20130068196A - Apparatus and method for clustering using confusion matrix of voice recognition error - Google Patents
Apparatus and method for clustering using confusion matrix of voice recognition error Download PDFInfo
- Publication number
- KR20130068196A KR20130068196A KR1020110134836A KR20110134836A KR20130068196A KR 20130068196 A KR20130068196 A KR 20130068196A KR 1020110134836 A KR1020110134836 A KR 1020110134836A KR 20110134836 A KR20110134836 A KR 20110134836A KR 20130068196 A KR20130068196 A KR 20130068196A
- Authority
- KR
- South Korea
- Prior art keywords
- error
- acoustic model
- clustering
- matrix
- speech recognition
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
Abstract
The present invention relates to a clustering device using a speech recognition error chaos matrix, comprising: an acoustic model generator for receiving training data and generating an acoustic model; A voice recognition unit for performing voice recognition based on the generated sound model and the received test and user voice data; An error chaos matrix constructing unit configured to form a chaos matrix with error pairs extracted by comparing the speech recognition result with transcription data; A high frequency error pair extractor which extracts an error pair having a high extraction frequency from the chaotic matrix; And a state clustering unit configured to state cluster the acoustic model based on a result of the high frequency error pair extractor.
Description
The present invention relates to a clustering apparatus and a method using a speech recognition error chaos matrix, and more particularly, by extracting high frequency error pairs with frequent error recognition results and clustering the acoustic model based on this, thereby improving discrimination of the acoustic model. The present invention relates to a clustering apparatus using the speech recognition error chaos matrix and a method thereof.
Speech recognition systems are based on the correlation between speech and characterization in acoustic space for that speech, which characterization is typically obtained from training data. Training data may be obtained from multiple training speakers to construct a speaker independent system, or may be obtained from a single speaker to construct a speaker dependent system.
The speaker-independent system tries to obtain average statistics among several speakers, so the recognition performance for a particular speaker is poor.
On the other hand, the speaker-dependent system has a better recognition performance for a specific speaker than the speaker-independent system, but has a disadvantage in that a large amount of training data must be obtained from the speaker using the system.
On the other hand, recognizing speech irrespective of the talker can be said to be the ultimate goal of speech recognition. Speaker adaptation is a way to achieve this goal. The speaker adaptation system is an intermediate form between the speaker independent system and the speaker dependent system. Basically, the speaker adaptation system provides the speaker independent function but improves the performance of the specific speaker through the prepared training process. That is, the speaker adaptation system creates a speaker independent system using training data generated by several speakers, and then builds a system adapted to a new speaker using some training data of a newly registered speaker.
In general, the speaker adaptation system uses a class composition method using phonological knowledge base and clustering characteristics in acoustic model space. However, this method assumes that phonemes with similar speech methods are located in similar areas in the acoustic model space, and there is no mathematical and logical basis to support them. In addition, there is a problem in that even if there is a difference in clustering between the models before and after the speaker adaptation.
That is, in the case of clustering using only the distribution of each model in the acoustic model space of the speaker independent model before the speaker adaptation, models belonging to any cluster can move to another cluster after the speaker adaptation. As a result, these shifted models have a problem that speaker adaptation is performed by an incorrect transformation matrix.
The present invention has been invented to solve the above problems, by extracting high frequency error pairs with frequent error recognition results and clustering the acoustic model based on the error recognition, a speech recognition error chaos matrix that can improve the discrimination power of the acoustic model. An object of the present invention is to provide a clustering apparatus and a method using the same.
In addition, an object of the present invention is to provide a clustering apparatus and method using a speech recognition error chaos matrix that can maximize speech recognition performance by improving discrimination against high frequency error pairs.
In order to achieve the above object, according to an embodiment of the present invention, a clustering apparatus using a speech recognition error chaos matrix includes: an acoustic model generator for generating an acoustic model by receiving training voice data; A voice recognition unit for performing voice recognition based on the generated acoustic model, the received test and user voice data, and outputting a result; An error chaos matrix constructing unit configured to form a chaos matrix with error pairs extracted by comparing the speech recognition result with transcription data; A high frequency error pair extractor which extracts an error pair having a high extraction frequency from the chaotic matrix; And a state clustering unit configured to state cluster the acoustic model based on a result of the high frequency error pair extractor.
According to the present invention having the above-described configuration, the clustering apparatus and method using the speech recognition error chaos matrix extract the high frequency error pairs with frequent error in speech recognition, and cluster the acoustic model based on this, thereby distinguishing and reliability of the acoustic model. There is an effect to improve.
Therefore, the present invention has an effect of maximizing speech recognition performance by improving discrimination against high frequency error pairs.
1 is a schematic diagram illustrating a clustering apparatus using a speech recognition error chaos matrix according to an embodiment of the present invention.
2 is a schematic diagram illustrating another configuration of a clustering apparatus using a speech recognition error chaos matrix according to an embodiment of the present invention.
3 is a flowchart illustrating a clustering method using a speech recognition error chaos matrix according to an embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . First, in adding reference numerals to the constituents of the drawings, it is to be noted that the same constituents are denoted by the same reference symbols as possible even if they are displayed on different drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.
Hereinafter, a clustering apparatus using a speech recognition error chaos matrix and a method thereof according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
1 is a schematic diagram illustrating a clustering apparatus using a speech recognition error chaos matrix, and FIG. 2 is a schematic diagram illustrating another configuration of a clustering apparatus using a speech recognition error chaos matrix.
Referring to FIGS. 1 and 2, the clustering apparatus using the speech recognition error chaos matrix is largely comprised of an
The
The
That is, the
The error chaos
Here, the chaos matrix is a matrix representing the relationship between the real and the prediction. The chaotic matrix is a data that represents the response to the stimulus or the recognition object (data). It will not be described in detail in the present invention as a method commonly used in the analysis method for efficient control, management.
More specifically, the error chaos
The error chaos matrix component extracts an error pair from the listed words. The extracted error pairs are constructed in the form of confusion matrix. At this time, the chaos matrix is composed only of voices that are erroneous, that is, errors on the voice result.
The high frequency
The
The
Hereinafter, a clustering method using a speech recognition error chaos matrix will be described in detail with reference to FIG. 3. 3 is a flowchart illustrating a clustering method using a speech recognition error chaos matrix.
Referring to FIG. 3, the clustering apparatus using the voice recognition error chaos matrix generates a sound model by receiving training voice data.
Next, the generated acoustic model, other pronunciation dictionary, language model, and calibration data are received from the outside, and voice recognition is performed, and the result is output.
Next, a chaotic matrix is formed of error pairs obtained by comparing the speech recognition result with the transcription data (S303). That is, the speech matching result is listed based on the triphone unit. Error pairs are extracted from the words, and the extracted error pairs form a confusion matrix.
Next, an error pair having a high extraction frequency is extracted from the chaotic matrix (S304).
Next, the acoustic model is state clustered based on the high frequency error pair extraction result (S305).
As described above, the clustering apparatus and the method using the speech recognition error chaos matrix of the present invention can improve the discriminating power and reliability of the acoustic model by extracting high frequency error pairs with frequent error recognition results and clustering the acoustic model based on this. . Therefore, the present invention can maximize the speech recognition performance by improving the discrimination ability for high frequency error pairs.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art without departing from the scope of the appended claims. As will be understood by those skilled in the art.
100:
110: speech recognition unit 120: error chaos matrix component
130: high frequency error pair extractor
Claims (1)
A voice recognition unit for performing voice recognition based on the generated sound model and the received test and user voice data;
An error chaos matrix constructing unit configured to form a chaos matrix with error pairs extracted by comparing the speech recognition result with transcription data;
A high frequency error pair extractor which extracts an error pair having a high extraction frequency from the chaotic matrix; And
A state clustering unit configured to state cluster the acoustic model based on a result of the high frequency error pair extractor;
Clustering apparatus using a speech recognition error chaos matrix comprising a.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110134836A KR20130068196A (en) | 2011-12-14 | 2011-12-14 | Apparatus and method for clustering using confusion matrix of voice recognition error |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110134836A KR20130068196A (en) | 2011-12-14 | 2011-12-14 | Apparatus and method for clustering using confusion matrix of voice recognition error |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20130068196A true KR20130068196A (en) | 2013-06-26 |
Family
ID=48863870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020110134836A KR20130068196A (en) | 2011-12-14 | 2011-12-14 | Apparatus and method for clustering using confusion matrix of voice recognition error |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20130068196A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101579544B1 (en) * | 2014-09-04 | 2015-12-23 | 에스케이 텔레콤주식회사 | Apparatus and Method for Calculating Similarity of Natural Language |
-
2011
- 2011-12-14 KR KR1020110134836A patent/KR20130068196A/en not_active Application Discontinuation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101579544B1 (en) * | 2014-09-04 | 2015-12-23 | 에스케이 텔레콤주식회사 | Apparatus and Method for Calculating Similarity of Natural Language |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Luo et al. | Speaker-independent speech separation with deep attractor network | |
EP3707716B1 (en) | Multi-channel speech separation | |
Vincent et al. | The second ‘CHiME’speech separation and recognition challenge: An overview of challenge systems and outcomes | |
CN104036774B (en) | Tibetan dialect recognition methods and system | |
CN106297826A (en) | Speech emotional identification system and method | |
CN106057206B (en) | Sound-groove model training method, method for recognizing sound-groove and device | |
McLaren et al. | Exploring the role of phonetic bottleneck features for speaker and language recognition | |
EP2851895A3 (en) | Speech recognition using variable-length context | |
Justin et al. | Speaker de-identification using diphone recognition and speech synthesis | |
CN103065620A (en) | Method with which text input by user is received on mobile phone or webpage and synthetized to personalized voice in real time | |
CN108780645A (en) | The speaker verification computer system of text transcription adaptation is carried out to universal background model and registration speaker models | |
Yin et al. | Automatic cognitive load detection from speech features | |
CN109300339A (en) | A kind of exercising method and system of Oral English Practice | |
Scheffer et al. | Content matching for short duration speaker recognition. | |
KR20160059265A (en) | Method And Apparatus for Learning Acoustic Model Considering Reliability Score | |
US7650281B1 (en) | Method of comparing voice signals that reduces false alarms | |
Meyer et al. | Anonymizing speech with generative adversarial networks to preserve speaker privacy | |
Han et al. | Continuous Speech Separation Using Speaker Inventory for Long Recording. | |
Sinclair et al. | A semi-markov model for speech segmentation with an utterance-break prior | |
Han et al. | Continuous speech separation using speaker inventory for long multi-talker recording | |
Polur et al. | Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model. | |
KR20160061071A (en) | Voice recognition considering utterance variation | |
Wahidah et al. | Makhraj recognition using speech processing | |
KR20130068196A (en) | Apparatus and method for clustering using confusion matrix of voice recognition error | |
Yanagisawa et al. | Noise robustness in HMM-TTS speaker adaptation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WITN | Withdrawal due to no request for examination |