KR20140077780A

KR20140077780A - Apparatus for adapting language model scale using signal-to-noise ratio

Info

Publication number: KR20140077780A
Application number: KR1020120146911A
Authority: KR
Inventors: 정훈; 전형배; 박전규; 오유리; 강점자; 이윤근
Original assignee: 한국전자통신연구원
Priority date: 2012-12-14
Filing date: 2012-12-14
Publication date: 2014-06-24
Also published as: KR102020782B1

Abstract

The present invention relates to a voice recognition system. More particularly, the present invention relates to an apparatus for adapting language model scale to improve voice recognition performance in the voice recognition system. According to the present invention, in case of a voice signal of a low signal-to-noise ratio, a weighted value is applied to the discrimination of a language model. Therefore, recognition performance with regard to noisy environment is improved.

Description

[0001] The present invention relates to a language model scale adaptation apparatus using a signal-to-noise ratio

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a speech recognition system, and more particularly, to a language model scale adaptation apparatus for enhancing speech recognition performance in a speech recognition system.

Speech recognition technology is relatively common and is being used in various applications. However, since speech recognition technology of isolated word level is commercialized, there is an increasing demand for speech recognition products having higher functions in terms of users.

That is, there is a need for a key word spotting technique capable of recognizing even if another word is included before and after a recognition target word, or a continuous speech recognition technique capable of recognizing a natural sentence type.

However, in the case of continuous speech recognition, the user's expectation level has not been reached yet.

In other words, there is a problem of how good a language model can be applied in addition to the performance of an acoustic model.

In most cases, the language model is constructed using text data, which is constructed using a text corpus to obtain various text data.

For example, if you have versatility such as dictation, you will use newspaper articles, novels, and other materials available on the Internet. However, in this case, the performance of the language model made using the data is limited.

In particular, if a language model is not sufficient for a particular application, the performance expected by the user becomes difficult to obtain.

The most ideal method is to obtain textual data suitable for the application field, but this is difficult in reality.

Efforts to overcome these problems have been made in many ways. Bilingual model adaptation can also be seen as one of these efforts.

However, acoustic models and language models have different ranges of probabilities due to differences in modeling methods, and the role of correcting these differences is the language model scale.

In general, the optimal language model scale is obtained through experimentation and the optimal value of speed vs. performance is used for the given evaluation corpus and system.

In general, when the signal-to-noise ratio is good, the discrimination power between the acoustic models is good, but when the signal-to-noise ratio is bad, the discrimination power between the acoustic models is deteriorated.

However, there is a problem that the probability value or the discriminating power of the language model is maintained irrespective of the quality of the input signal.

1. Korean Patent Publication No. 10-2012-0066530

The present invention has been proposed in order to solve the problems described in the background art. In order to maintain a stable recognition performance even in a noisy environment, the language model scale is adjusted according to the degree of noise of an input signal.

However, the probability value or discriminating power of the language model is maintained irrespective of the quality of the input signal.

Therefore, if the signal-to-noise ratio is good, the probability value of the acoustic model is weighted more. Otherwise, the probability value of the acoustic model is more weighted so that the language model scale is adjusted so that the discrimination power of the language model is used more in the noisy environment. The present invention provides a language model scale adaptation apparatus using a signal-to-noise ratio that improves recognition performance in an environment.

In order to overcome the problems raised in the background art, the present invention is based on the assumption that the probability value of the acoustic model is weighted more when the signal-to-noise ratio is good, and is further weighted to the probability value of the acoustic model, The present invention provides a language model scale adaptation apparatus using a signal-to-noise ratio that improves recognition performance in a noisy environment by adjusting a language model scale to use more discriminating power.

Wherein the language model scale adaptation apparatus adjusts a language model scale by assigning different weights to a probability value of an acoustic model based on the signal-to-noise ratio in a language model scale adaptation apparatus using a signal-to-noise ratio of a speech recognition method .

On the other hand, another embodiment of the present invention is a speech signal input method comprising the steps of: inputting a voice signal; An end point detecting step of detecting an end point of the input voice signal; A signal-to-noise ratio measurement step of measuring a signal-to-noise ratio (SNR) for a speech signal as an end point is detected; A language model scale adaptation step of weighting the probability value of the acoustic model if the signal-to-noise ratio is good according to the measured signal-to-noise ratio, and adapting the language model scale by weighting the probability value of the acoustic model in a good case; Generating a search space for the speech signal as the language model scale is adapted; And a decoding step of decoding the search space signal to generate a final speech recognition result.

According to the present invention, the recognition performance of the noise environment is improved by weighting the discrimination power of the language model for a speech signal having a low signal-to-noise ratio.

That is, if the signal-to-noise ratio is good, the probability value of the acoustic model is weighted more, and if it is not good, the probability value of the acoustic model is more weighted so that the language model scale is adjusted so that the discrimination power of the language model is used more in the noisy environment, The recognition performance can be improved in the environment.

1 is a block diagram of a language model scale adaptation apparatus using a signal-to-noise ratio according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a language model scale adaptation process using a signal-to-noise ratio according to an exemplary embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Like reference numerals are used for similar elements in describing each drawing.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term "and / or" includes any combination of a plurality of related listed items or any of a plurality of related listed items.

Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Should not.

Hereinafter, a language model scale adaptation apparatus using a signal-to-noise ratio according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram of a language model scale adaptation apparatus using a signal-to-noise ratio according to an embodiment of the present invention. 1, the language model scale adaptation apparatus comprises an endpoint detector 100 for detecting an end point of an input speech signal, a signal-to-noise ratio (SNR) A signal-to-noise ratio measuring unit 110 for measuring a signal-to-noise ratio of the acoustic model, and a signal-to-noise ratio measuring unit 110 for weighting the probability value of the acoustic model if the signal- A language model scale adaptation unit 120 for adapting the model scale, a search space generation unit 130 for generating a search space according to the language model scale adaptation of the language model scale adaptation unit 120, And a decoding unit 140 for generating a final speech recognition result.

Generally, a probability-based speech recognition system obtains a word sequence W having a maximum likelihood a posteriori probability (ML-APP) with respect to an input speech signal X as shown in Equation (1).

At this time,

Acoustic model,

The language model, alpha, is called the language model scale.

The acoustic model is the probability that each word or phoneme will generate a specific speech signal, and the language model is the probability of occurrence for successive words.

The acoustic model and the language model have different ranges of probabilities due to differences in modeling methods, and the language model scale plays a role of correcting the differences.

In an embodiment of the present invention, a language model scale adaptive scheme based on the signal-to-noise ratio is used, and the expression is expressed by the following equation. As shown in Equation (2), the language model scale is a function of the time t and the signal-to-noise ratio.

Here, SNR (t) is the signal-to-noise ratio in time frame t, α is the optimal language model scale obtained through experiments, and β is obtained through experimentation with a weighting factor. At this time, the sigmoid function is obtained by the following equation.

FIG. 2 is a flowchart illustrating a language model scale adaptation process using a signal-to-noise ratio according to an embodiment of the present invention.

2, the language model scale adaptation process includes a speech signal input step S200 for inputting a speech signal, an end point detection step S210 for detecting an end point of the input speech signal, A signal-to-noise ratio measuring step (S220) of measuring a signal-to-noise ratio (SNR) of a speech signal; and a step of calculating a weighted value of the probability value of the acoustic model if the signal- A language model scale adaptation step (S230) of adapting a language model scale by weighting a probability value of an acoustic model in a good case, and a search space creation step of generating a search space for the speech signal as the language model scale is adapted A decoding step S250 of decoding the signal in the search space to generate a final speech recognition result, and the like.

In particular, in particular, the language model scale adaptation method using a signal-to-noise ratio according to an embodiment of the present invention may be implemented in the form of program command code that can be executed through various computer means and recorded in a computer-readable storage medium.

The computer-readable storage medium may include program instructions, data files, data structures, and the like, alone or in combination.

The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software.

Examples of computer-readable storage media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magneto-optical media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

The medium may be a transmission medium such as an optical or metal line, a wave guide, or the like, including a carrier wave for transmitting a signal designating a program command, a data structure, or the like.

Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

In addition, one embodiment of the present invention may be implemented in hardware, software, or a combination thereof. (DSP), a programmable logic device (PLD), a field programmable gate array (FPGA), a processor, a controller, a microprocessor, and the like, which are designed to perform the above- , Other electronic units, or a combination thereof.

In a software implementation, it may be implemented as a module that performs the functions described above. The software may be stored in a memory unit and executed by a processor. The memory unit or processor may employ various means well known to those skilled in the art.

100: End point detector
110: signal-to-noise ratio measuring unit
120: language model scale adaptation unit
130: Search space generating unit
140:

Claims

A language model scale adaptation apparatus using a speech recognition scheme signal-to-noise ratio,
Wherein the language model scale is adjusted by assigning a different weight to the probability value of the acoustic model based on the signal-to-noise ratio, and the speech model scale adaptation apparatus using the signal-to-noise ratio.