WO2005059896A1

WO2005059896A1 - Method for optimizing speech recognition processes

Info

Publication number: WO2005059896A1
Application number: PCT/EP2004/013910
Authority: WO
Inventors: Wolfgang Tschirk
Original assignee: Siemens Ag Österreich
Priority date: 2003-12-16
Filing date: 2004-12-07
Publication date: 2005-06-30
Also published as: AT414283B; EP1695336A1; ATA20252003A

Abstract

The invention relates to a method for optimizing speech recognition processes, wherein in every recognition process a probability of hit hypothesis is determined for every word of the universal set (V) of words detected by the speech recognition process. A first subset (S) comprising a vocabulary permissible in the present situation for the recognition process is selected from the universal set. A second subset (E) of words is selected which comprises the vocabulary of the first subset and additional randomly chosen words of the universal set. The hypotheses put forward with respect to the words of the second subset are ranked in terms of the determined probability of hit and the most probable hit is determined on the basis of a predetermined number (H) of first-ranked hypotheses.

Description

description

Process for optimizing speech recognition processes Technical field

The invention relates to a method for optimizing speech recognition processes in which ga at each Erkennungsvor- ^'ng to each word of the overall quantity of the speech recognition process recognized words, a hit probability - hypothesis is determined, and wherein a first subset is selected from the total amount, which includes a vocabulary permissible for this recognition process in the current situation.

State of the art

When using automatic speech recognition systems, for example to convert spoken commands into electrical control commands, the user is faced with the problem that the recognition process will deliver incorrect results with a certain probability. These erroneous results include: confusing commands, incorrectly rejecting commands, and incorrectly accepting spurious signals as commands.

The respective probabilities for the occurrence of one of the errors mentioned depend on one another, a low false rejection rate usually means a high false acceptance rate and often also a higher confusion rate, conversely, the requirement for a low false acceptance rate also leads to a higher false rejection rate.

Depending on the application, the relationship between the types of errors mentioned should now be optimized. For example, there is a requirement for control tasks in a noisy environment that that only commands of the user to a Steuerungsbe ^¬ lack lead and rejected the high Umgebungsgerausche Reliable ^¬ ness. Here wrestle Falschakzeptanzwahrschemlichkeit is in the interest of ge ^¬ will also accept a higher false rejection rate, while applications for other appli ^¬ in which the comfort of the user is in the foreground, the false rejection rate should be low and a higher false acceptance rate for m purchase is taken.

Presentation of the invention

The invention is based on the object of specifying a method with which the properties of a speech recognition method with regard to its types of errors can be optimized in relation to the application.

According to the invention, this object is achieved with a method of the type mentioned in the introduction, in which a second subset of words is selected which contain the vocabulary of the first

Subset and additional randomly selected words of the total amount and in which the hypotheses formed for the words of the second subset are ranked according to the determined probability of hits and the most likely hit is determined from a pre-determined number of first-ranked hypotheses.

The invention enables the optimized use of a speech recognition system with a constant recognition rate. By appropriately selecting the second subset and the number of first-ranked hypotheses, the ratio of the above-mentioned types of errors can "be adapted to any situation.

Advantageous embodiments of the invention result from the subclaims. It is particularly advantageous if the size of the second subset and the number of first-ranked hypotheses from which the most likely hit is determined are determined for each recognition process by means of optimization methods.

It is also advantageous if for "any Erkennungsvor ^¬ transitional own optimization criterion is chosen.

It is also beneficial if one of the words "for each set does not correspond to a command, but rather to the entirety of the possible fault signals.

Brief description of the drawing

The invention is explained in more detail with reference to a figure which represents the essential formulas of the mathematical foundations of the method according to the invention. Implementation of the invention

According to the prior art, in a method for optimizing speech recognition processes for each recognition process for "every word of the total set V, the number of words detected by the speech recognition process, the number of which is equal to V, and which is supplemented by a surrounding noise pattern to form a set V ₀ The most likely result, the hit, is now determined either from the total number of hypotheses or from a first subset S _{0 of} these hypotheses, which contain S words and an ambient noise pattern, as for example in the in W. Tschirk, "Neural Net Speech Recognizers. Voice Remote Control Devices for Disabled People," e & i Artificial Intelligence 7/8/2001, pp. 367-370, 2001. For example, be in a speech recognition system, which is used for control of lighting, heating and the telephone ^¬ apparatus in a flat after having ei ^¬ nem first command selecting the "heating" is carried out, the next step only the words " WÄRMER "or" KÄLTER "accepted, but not, for example, the words" HELLER "or" DUNKLER "which do not result in a sensible control command in this situation.

Therefore, the words "WARMER" and "Colder" form in this situ ^¬ ation with the pattern "ambient noise" the first subset of SQ probability hypotheses.

According to the invention, a second subset E of E words is now selected, which comprises the vocabulary of the first subset and additional randomly selected words of the total set V ₀ and is supplemented with E ₀ using the pattern of an “ambient noise”.

The one in the recognition process to the words of the second

Subset E ₀ hypotheses formed are ranked according to the hit probability determined and the most likely hit is determined from a predetermined number H of the first ranked hypotheses.

By a suitable choice of the number E of words from the second subset E or the supplemented second subset E ₀ and the predetermined number H of the first-ranked hypotheses, the property of the speech recognition method with regard to the ratio of false rejection rate R, false acceptance rate A and mix-up rate C can now be adapted to the respective situation become.

For example, in situations which are known to be noisy, the false acceptance rate can be chosen to be particularly low.

The mathematical limits for the choice of the second subset E ₀ and its size E and the predetermined number H of first-ranked hypotheses are defined in formulas 0, 1 and 2 of the figure.

An advantageous optimization process is now explained in more detail below. For this purpose, the values to be optimized for the size E of the second subset E ₀ and the predetermined number H of the first-ranked hypotheses are represented as functions of the false rejection rate R, the false acceptance rate A and the mix-up rate C.

For this purpose, the properties of the speech recognition method are determined in the narrower sense of a test vocabulary with a specific number T of words. As a result, test values c _τ , r _τ and a _τ for false rejection rate, false acceptance rate and mix-up rate are obtained.

From this, a c-characteristic constant triple U = (ui, u ₂ , u ₃ ) can be derived for the speech recognition process, which describes the speech recognition process independently of the size of the vocabulary to be recognized.

The relationships between constant triples U = (ui, u ₂ , u ₃ ) and the test values c _τ , r _τ and a _τ regarding the false rejection rate, false acceptance rate and mix-up rate are shown in equations 3, 4 and 5.

This means that for a certain hypothesis about an analyzed feature pattern, i.e. Word or ambient noise make the statements shown in equations 6 to 10:

If the analyzed feature pattern represents a word, the likelihood that the speech recognition system will supply a wrong hypothesis appearing in the order of the hypotheses in the first position (i.e. misclassified it as another word or the ambient noise) is what

- results in a word belonging to the first subset S: p _± ^s (Gig.6) - results in a word belonging to the second subset E but not the first subset S p ₁ ^VS (Gig.7)

- not a word but an ambient noise results in p ^G (Gig.8)

If the analyzed feature pattern represents an ambient noise, the probability that the speech recognition system incorrectly states a word as a hypothesis at the lth position in the sequence is the same

q_ ^s (Gig. 9) as the probability that a word from the first subset S is incorrectly specified and

qX ^s (Gig. 10) as the probability that a word is incorrectly specified which belongs to the second subset E but not to the first subset S.

From these probabilities, the values for false rejection rate R, false acceptance rate A and mistake rate C can now be determined in accordance with equations 11, 12 and 13 and for all permissible combinations of second subset size E and predetermined number of hypotheses H in accordance with equations 1 and 2 optimal values in accordance with Equation 14.

Claims

claims

1. A method for optimization of Spracherkennungsprozes ^¬ sen, wherein 5 of the total amount (V) detected by the speech recognition process words, a hit probability at each recognition operation for each word '- is determined hypothesis and wherein a first subset (S) is selected from the total amount, which comprises a vocabulary permissible in the current situation for this recognition process, characterized in that a second subset (E) of words is selected which comprises the vocabulary of the first subset and additional randomly selected words of the total set and that those relating to the Words of the 5 second subset of hypotheses formed are ranked according to the hit probability determined and the most likely hit is determined from a predetermined number (H) of the first ranked hypotheses. 0

2. The method according to claim 1, characterized in that the size of the second subset (E) and the number of first-ranked hypotheses (H), from which the most likely hit is determined, is determined5 for each recognition process by means of optimization methods (14).

3. The method according to claim 2, characterized in that a separate optimization criterion is selected for each recognition process. 0 Method according to one of claims 1 to 3, characterized in that one of the words of each set is not a command but the entirety of the possible ones Interference signals corresponds. 5