KR20150054445A - Sound recognition device - Google Patents
Sound recognition device Download PDFInfo
- Publication number
- KR20150054445A KR20150054445A KR1020130136890A KR20130136890A KR20150054445A KR 20150054445 A KR20150054445 A KR 20150054445A KR 1020130136890 A KR1020130136890 A KR 1020130136890A KR 20130136890 A KR20130136890 A KR 20130136890A KR 20150054445 A KR20150054445 A KR 20150054445A
- Authority
- KR
- South Korea
- Prior art keywords
- voice
- user
- ambiguity
- speech
- speech recognition
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
Abstract
Description
More particularly, the present invention relates to a speech recognition apparatus that can improve recognition performance for recognizing user's voice such as presentation / conference recording, call center recording, medical / legal service, and the like.
Generally, a speech recognition apparatus uses HMM (Hidden Markov Model) as a speech recognition method. Here, the Viterbi search is performed in the speech recognition process, which is a process of determining the most likely candidate word by comparing the difference between the HMM constructed in advance by training the candidate words to be recognized and the features of the currently inputted speech .
The HMM is a method of modeling as a basic phoneme unit for speech recognition. In other words, most of the Korean speech recognition engine companies are using words and sentences by combining the phonemes coming into the speech recognition engine and the phonemes in the databases in the speech recognition engine.
HMM is a dual probability processing method that estimates an unobservable process through other observable processes. Therefore, the use of HMM in speech recognition refers to modeling a minimum phoneme unit of speech recognition and constructing a speech recognition apparatus using the model.
Here, in the speech recognition apparatus, the linguistic speech and the other natural speech have a large mutual characteristic in the same speaker and the individual speaker-to-speaker acoustic space, and stuttering patterns such as mutilation, . These characteristics appear in most natural speech interfaces such as general announcement / meeting minutes, call center recording, medical / legal services except for the purpose of providing information of a trained speaker such as broadcast news among voice dictation areas to which speech recognition technology is applied. Since the variation of the acoustic space in the natural language voice varies greatly depending on the area and the speaker in which the speech recognition is used and the same speaker / area varies depending on the situation, the speaker adaptation technique or the acoustic volume using a general- There is a limit to training a model.
It is an object of the present invention to provide a speech recognition apparatus which can improve recognition performance for recognizing user speech such as presentation / meeting minutes, call center recording, medical / legal services, and the like.
The speech recognition apparatus according to the embodiment is a speech recognition apparatus for recognizing a user's speech data corresponding to a user's speech uttered with a script considering a phoneme distribution and a user's natural language voice data accumulated corresponding to a previously inputted user's voice The user's speech data and the user's natural language speech data are compared with each other to extract context-specific ambiguity weights for ambiguities of the respective phonemes, And a speech recognition module for performing speech recognition based on the acoustic model to which the context-specific ambiguity weight is applied by the ambiguity application module when the speech user's speech is input From the uttered user voice, And a decoding module that performs speech recognition based on context information and classifier parameters set for a phoneme or phoneme interval whose ambiguity weight is equal to or greater than the set weight.
The voice database according to the embodiment may include a first database in which the user-friendly voice data is stored in advance, and a second database in which the user-friendly voice data is classified according to a situation.
The ambiguity extracting unit may determine the degree of variation in the acoustic space for each phoneme included in the user-read-aloud speech data and the user's natural language speech data, and determine ambiguity of each phoneme according to the context Estimate ambiguity weights.
The decoding module according to the exemplary embodiment may further include a speech recognition unit that extracts the speech uttered from the uttered user speech based on the acoustic model, excluding the phonemes having the context-specific ambiguity weight equal to or higher than the set weight, And a second decoding module for performing phonetic recognition by extracting phonemes corresponding to the set weight or more in the context information and the classifier parameter from the first decoding module, .
The speech recognition apparatus according to the embodiment can improve the acoustic model previously set for voice recognition of the user voice based on the accumulated user voice or voice data so that reliability and accuracy in voice recognition of the user voice can be secured There is an advantage.
1 is a control block diagram showing a control configuration of a speech recognition apparatus according to an embodiment.
The following merely illustrates the principles of the invention. Therefore, those skilled in the art will be able to devise various apparatuses which, although not explicitly described or shown herein, embody the principles of the invention and are included in the concept and scope of the invention. It is also to be understood that all conditional terms and examples recited in this specification are, in principle, expressly intended for the purpose of enabling the inventive concept to be understood and are not to be construed as limited to such specifically recited embodiments and conditions do.
It is also to be understood that the detailed description, as well as the principles, aspects and embodiments of the invention, as well as specific embodiments thereof, are intended to cover structural and functional equivalents thereof. It is also to be understood that such equivalents include all elements contemplated to perform the same function irrespective of the currently known equivalents as well as the equivalents to be developed in the future, i.e., the structure.
Thus, for example, it should be understood that the block diagrams herein illustrate exemplary conceptual aspects embodying the principles of the invention. Similarly, all flowcharts, state transition diagrams, pseudo code, and the like are representative of various processes that may be substantially represented on a computer-readable medium and executed by a computer or processor, whether or not the computer or processor is explicitly shown .
The functions of the various elements shown in the figures, including the functional blocks depicted in the processor or similar concept, may be provided by use of dedicated hardware as well as hardware capable of executing software in connection with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, a single shared processor, or a plurality of individual processors, some of which may be shared.
Also, the explicit use of terms such as processor, control, or similar concepts should not be interpreted exclusively as hardware capable of running software, and may be used without limitation as a digital signal processor (DSP) (ROM), random access memory (RAM), and non-volatile memory. Other hardware may also be included.
In the claims of the present specification, the elements represented as means for performing the functions described in the detailed description may include, for example, a combination of circuit elements performing the function or any type of software including firmware / And is coupled with appropriate circuitry for executing the software to perform the function. It is to be understood that the invention as defined by the appended claims is not to be interpreted as limiting the scope of the invention as defined by the appended claims, as the functions provided by the various enumerated means are combined and combined with the manner in which the claims require. .
BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: . In the following description, a detailed description of known technologies related to the present invention will be omitted when it is determined that the gist of the present invention may be unnecessarily blurred.
1 is a control block diagram showing a control configuration of a speech recognition apparatus according to an embodiment.
Referring to FIG. 1, the voice recognition apparatus 300 may include a voice database (DB) 310, a ambiguity application module 320, and first and
The
Here, the
That is, the
The ambiguity applying module 320 determines the degree of mutation in the acoustic space for each of the phonemes included in the user-readable speech data and the user's natural language speech data, and calculates the ambiguity weight for each ambiguity And an
The
That is, when the user speech v20 is inputted, the
The
The context information and the classifier parameters may be extracted from a separate repository and classifier and stored in the
It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.
Claims (1)
The speech recognition apparatus according to claim 1, further comprising: a speech recognition unit configured to compare the user's speech data with the user's natural language speech data to extract context-specific ambiguity weights for each ambiguity, An ambiguity applying module for applying the ambiguity weights to the acoustic models; And
Wherein the speech recognition unit performs speech recognition based on the acoustic model to which the context ambiguity weight is applied by the ambiguity application module when the speech user's speech is input, And a decoding module that performs speech recognition based on the context information and the classifier parameter set for the section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130136890A KR20150054445A (en) | 2013-11-12 | 2013-11-12 | Sound recognition device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130136890A KR20150054445A (en) | 2013-11-12 | 2013-11-12 | Sound recognition device |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20150054445A true KR20150054445A (en) | 2015-05-20 |
Family
ID=53390593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130136890A KR20150054445A (en) | 2013-11-12 | 2013-11-12 | Sound recognition device |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20150054445A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020091123A1 (en) * | 2018-11-02 | 2020-05-07 | 주식회사 시스트란인터내셔널 | Method and device for providing context-based voice recognition service |
CN111883113A (en) * | 2020-07-30 | 2020-11-03 | 云知声智能科技股份有限公司 | Voice recognition method and device |
-
2013
- 2013-11-12 KR KR1020130136890A patent/KR20150054445A/en not_active Application Discontinuation
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020091123A1 (en) * | 2018-11-02 | 2020-05-07 | 주식회사 시스트란인터내셔널 | Method and device for providing context-based voice recognition service |
CN111883113A (en) * | 2020-07-30 | 2020-11-03 | 云知声智能科技股份有限公司 | Voice recognition method and device |
CN111883113B (en) * | 2020-07-30 | 2024-01-30 | 云知声智能科技股份有限公司 | Voice recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10930270B2 (en) | Processing audio waveforms | |
US9299347B1 (en) | Speech recognition using associative mapping | |
US9311915B2 (en) | Context-based speech recognition | |
US9202462B2 (en) | Key phrase detection | |
US8972260B2 (en) | Speech recognition using multiple language models | |
US9552811B2 (en) | Speech recognition system and a method of using dynamic bayesian network models | |
US20150269931A1 (en) | Cluster specific speech model | |
JP6812843B2 (en) | Computer program for voice recognition, voice recognition device and voice recognition method | |
CN106875936B (en) | Voice recognition method and device | |
US10089978B2 (en) | Detecting customers with low speech recognition accuracy by investigating consistency of conversation in call-center | |
US9799325B1 (en) | Methods and systems for identifying keywords in speech signal | |
WO2014183373A1 (en) | Systems and methods for voice identification | |
CN112420026A (en) | Optimized keyword retrieval system | |
KR20170007107A (en) | Speech Recognition System and Method | |
JP7191792B2 (en) | Information processing device, information processing method and program | |
US9959887B2 (en) | Multi-pass speech activity detection strategy to improve automatic speech recognition | |
CN111640423B (en) | Word boundary estimation method and device and electronic equipment | |
KR20150054445A (en) | Sound recognition device | |
CN112397053B (en) | Voice recognition method and device, electronic equipment and readable storage medium | |
US11328713B1 (en) | On-device contextual understanding | |
WO2022226782A1 (en) | Keyword spotting method based on neural network | |
CN113658593B (en) | Wake-up realization method and device based on voice recognition | |
Kalantari et al. | Incorporating visual information for spoken term detection | |
Kanrar | i Vector used in Speaker Identification by Dimension Compactness | |
CN117037801A (en) | Method for detecting speech wheel and identifying speaker in real teaching environment based on multiple modes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WITN | Withdrawal due to no request for examination |