WO2018029071A1

WO2018029071A1 - Audio signature for speech command spotting

Info

Publication number: WO2018029071A1
Application number: PCT/EP2017/069649
Authority: WO
Inventors: Sacha Vrazic
Original assignee: Imra Europe S.A.S
Priority date: 2016-08-12
Filing date: 2017-08-03
Publication date: 2018-02-15
Also published as: DE102016115018A1; DE102016115018B4

Abstract

From a speech signal uttered by a user, for each of a number of time frames T of the speech signal, N Higuchi fractal dimension (HFD) parameters are extracted as a feature vector, using multi-scale HFD, and a feature space is formed from the feature vector and the number of time frames T for each scale of the multi-scale HFD (30). Feature spaces formed for each of a plurality of speech signals are concatenated, a universal background model (UBM) is estimated from the concatenated feature spaces (40), and a user and command dependent Gaussian mixture model (GMM) is estimated for each of the plurality of speech signals using the estimated UBM, thereby estimating GMMs each corresponding to one of the plurality of speech signals (50).

Description

AUDIO SIGNATURE FOR SPEECH COMMAND SPOTTING

DESCRIPTION BACKGROUND OF THE INVENTION Field of the invention

The present invention relates to detecting an audio signature in speech utterances for speech command spotting .

Related background Art

Voice communications are a natural and simple way of communicating between people. However, despite considerable improvement of speech recognition engines, making a machine understand some spoken

instructions is still challenging . Indeed, speech recognition engines work well in absence of noise and reverberation. Furthermore, they are language and vocabulary dependent where the vocabulary is trained (or pre-trained) on large occurrences of the same phonemes.

One application of speech recognition, but not limited thereto, is speech command spotting for vehicles. Speech commands can be given inside the vehicle to control equipment such as windows, air conditioning, winkers, wipers, etc.

Speech commands can be also given from outside the vehicle, when for example the user joins his car on the parking slot with hands carrying some shopping bags, and then by just uttering "open", the door at the user's side opens.

Most of the prior art systems implementing speech recognition or speech spotting use approaches with MFCC (Mel Frequency Cepstral Coefficients) as features or any extension with different types of models based on HMM (Hidden Markov Models), GMM (Gaussian Mixture Models), etc.

The problem of these systems is that they need a training of words (in reality entities smaller than a syllable) that are repeated many times with numerous speakers. Therefore, the systems are language and vocabulary dependent.

As an example, in vehicles, it is already possible to give voice commands to control the navigation or multimedia system. However, the list of

commands is pre-defined by the manufacturer and cannot be chosen by the vehicle user.

There are also some possibilities to enter a kind of reference by speech that is not pre-defined when affecting a voice label to the phone directory, for example. However, in general, the performances of such systems are poor. More advanced systems, even commercial ones, need a repetition of a given sentence several times, and still do not provide a high recognition rate. The following meanings for the abbreviations used in this specification apply:

GMM Gaussian Mixture Model

HFD Higuchi Fractal Dimension

HMM Hidden Markov Model

MAP Maximum A Posteriori

MFCC Mel Frequency Cepstral Coefficient

UBM Universal Background Model

VAD Voice Activity Detector

SUMMARY OF THE INVENTION At least one embodiment of the present invention aims at overcoming the above drawbacks and has an object of providing a speech spotting system that enables identification of an uttered speech command and the speaker without any previous training on a large database, in which the speech command can be language independent and does not have to be part of existing vocabulary.

According to aspects of the present invention, this is achieved by methods, apparatuses and a computer program product as defined in the appended claims.

According to at least one embodiment of the invention, it is possible for a given speaker to define a voice command that is language and vocabulary independent. The command may comprise speech, humming, singing, etc. The command can be registered with only one utterance.

According to an embodiment of the invention, the Higuchi fractal dimension is used followed by probabilistic discrimination. According to an embodiment of the invention, the Higuchi fractal dimension is applied in a multi-scale way in combination with a probabilistic modeling that enables assigning, as a signature, the couple speaker (i.e. user) and command, as well as identifying the command and the user robustly. In the following the invention will be described by way of embodiments thereof with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 shows a schematic block diagram illustrating processing in a registration mode according to an embodiment of the invention. Fig. 2 shows a schematic block diagram illustrating feature computation processing in a registration mode according to embodiments of the invention. Fig. 3 shows a flowchart illustrating a probabilistic modeling processing according to an embodiment of the invention.

Fig. 4 shows a diagram illustrating an example of user and command dependent GMM models according to an embodiment of the invention.

Fig. 5 shows a schematic block diagram illustrating a command and user detection processing in an action mode according to an embodiment of the invention. Fig. 6 shows a diagram illustrating results of the command and user detection processing according to an embodiment of the invention.

Figs. 7A and 7B show diagrams illustrating results of a command and user detection processing according to comparative examples.

Fig. 8 shows a schematic block diagram illustrating a configuration of a control unit in which examples of embodiments of the invention are implementable. DESCRIPTION OF THE EMBODIMENTS

Embodiments of the invention relate to functions that are in the digital domain. However, there is an analog part to condition (amplify and low- pass filter) microphone signals and convert them to digital signals. This part is out of the scope of this application. A speech spotting system according to at least one embodiment of the invention comprises two operation modes, i.e. a "registration" mode and an "action" mode. First, the registration mode will be described. Registration Mode

In the registration mode, a speech signal representing a command uttered by a user as a label to a defined action is registered in the speech spotting system. Referring to Fig. 1, first a speech utterance of the user is acquired by a microphone or microphone array 10 (for example, a one microphone or multi-microphone in-vehicle setting, which is out of the scope of this application). The speech utterance is amplified, low-pass filtered and digitized. Then, in a pre-processing block 20, which is out of scope of this application, noise and interferences for each situation (in-vehicle or out-of-vehicle application) are removed, and a digital audio signal is output from the pre-processing block 20.

A feature extraction block 30 of an embodiment of the invention, which receives the digital audio signal, comprises an estimation according to Higuchi Fractal Dimension (HFD) in a multi-scale way. "Multi-scale" means that the fractal dimension is computed for different (multiple) scales and all these scale dependent fractal dimensions (i.e. HFD parameters) are gathered. The HFD can be used alone or in combination with other features such as Mel-Frequency Cepstral Coefficients (MFCC).

Fig. 2 illustrates details of the feature extraction block 30. First, the digital audio signal is subjected to framing in a framing block 31, in which frames of, for example, 32 ms are overlapped by 50%. A voice activity detector (VAD) 32 applies an algorithm to the digital audio signal, which has been subjected to the framing, the algorithm detecting speech presence in the digital audio signal and segments a speech signal corresponding to a command, i.e. finds start and end of the speech signal. As a command can last several seconds, the speech signal after segmentation is a matrix of time samples, corresponding to speech frames contained in the command. The speech frames are also referred to as time frames of the command. In other words, each column of the matrix contains time samples

corresponding to a given time frame of the command. This matrix is also referred to as speech command matrix. The speech signal, i.e. the speech command matrix, is output from the VAD 32.

Then, from the speech signal, a feature space is computed. As mentioned above, according to an embodiment of the invention, it is possible to compute the feature space using only Higuchi fractal dimension block 34 as illustrated in the upper branch of Fig. 2. Alternatively, according to another embodiment of the invention, for computing the feature space the Higuchi fractal dimension block 34 is used together with Mel-frequency cepstral coefficients block 33 as illustrated in the lower branch of Fig. 2.

In the following, processing performed in HFD block 34 will be described.

First, from the speech signal output from the VAD 32, each column of the speech command matrix is processed independently, and from each column, a vector X™ of samples (time-series) is created as given by equation (1).

where k is the time interval, m is the initial time in the dimension computation, and W is the frame size in samples. The adjustment of these parameters defines the number of time-series that are obtained.

Then, the length L_mk of each time-series is computed as given by equation

(2). \W—m \

W-m \ \x[m+ik]-x[m+(i-l)k] \

The average L_k of the length is computed as given by equation (3).

Lk ^{= ,}∑m=l ^m,fe ■■■ (3)

Then, the slope of the line passing by the points given by

{log(l) , log(i), ... log(l/m)} on the x-axis and the points given by log(L_fe) on the y-axis is computed. The slope is the HFD parameter.

With the above processing, and for all chosen scales, N HFD parameters are computed, for each time frame, as a feature vector of length N, which can also be referred to as "command feature vector", and the dimension of a command feature space matrix is [N x T] in the upper branch of Fig. 2, or [(N + M) x T] in the lower branch of Fig . 2 in which in addition to the N HFD parameters, M parameters according to the MFCC block 33 are computed. T corresponds to the number of time frames of the command. For achieving multi-scale HFD, different values of m are used in the above equations, for example m=3, m = 10, and m = 50. In case three different values are applied for m, three feature spaces are calculated for the command. As shown in Fig . 1, the feature space computed in block 30 is input into a universal background model (UBM) estimation block 40 which defines a kind of borders for GMM models. According to an embodiment of the invention, the UBM is a user and command independent GMM model . The UBM is acting as a prior model and there are many ways to compute it. Most efficient (in terms of model quality) is the Expectation-Maximization approach. The UBM estimated in block 40 is input into block 50 in which a user and command depended GMM is computed from the UBM using e.g . the

Maximum A Posteriori (MAP) approach. For example, the number of

Gaussian mixtures is 16, which is the same as for the UBM estimation. The models estimated in blocks 40 and 50 are stored in a user/command model database 60. The database 60 further stores the calculated features spaces.

It is to be noted that every time a new command is registered by a user, i.e. a speech utterance is input by the user using the microphone or microphone array 10 shown in Fig. 1, both models UBM and GMM have to be re-estimated . The UBM is estimated on all feature spaces calculated from each of a plurality of speech signals uttered by a plurality of users and that are stored in the database 60.

Fig. 3 shows a procedure for user and command model estimation according to an embodiment of the invention. In case the registration mode is operated for the first time, the database 60 of user/command models and user/command feature spaces is empty (YES in step S20). Then, from the currently computed feature space extracted from the first speech signal uttered by a user, a UBM is estimated in step S22 and a GMM for the first speech signal (first user/command) is computed in step S23.

In case a second speech signal (a second command) has to be registered, a feature space calculated from this second speech signal and the feature space calculated from the first speech signal (the first command) are used together to estimate the UBM . In other words, in step S21 the feature spaces are concatenated, and in step S22 the UBM is calculated using the concatenated feature spaces. Then, using the UBM, by repeating step S23, a GMM for the first speech signal is re-estimated and a GMM for the second speech signal is estimated. As the second speech signal represents a last user/command (last feature space) in the databased 60 in step S24, the process ends after the estimation of the GMM for the second speech signal . Assuming that the number of users/commands (i.e. commands uttered by users) already registered is S, then when registering a user/command S+ l, all S feature spaces and the current one are used to estimate the UBM in step S22. Then, the S+l user/command GMMs are (re-)estimated in step S23.

It is to be noted that every time a new command is registered in the speech spotting system, all final user/command models must be re-estimated. This is, in a simple explanation, due to the re-estimation of the boundaries between models, because of the UBM-GMM approach.

Fig. 4 shows a two-dimensional representation of three user/command GMMs estimated according to an embodiment of the invention. For graphical representation purposes, only two dimensions of the GMMs are represented. The GMMs have in fact much more dimensions.

The straight lines in Fig . 4 represent the boundaries between models which are important in the discrimination (decision) of which speech signal was uttered (i.e. which command was uttered by which user). Therefore, each model is in a kind of cluster.

According to an embodiment of the invention, the computed user/command dependent GMMs, the UBM and all feature spaces are kept in database 60. As explained earlier, it is necessary to keep also the feature spaces for all registered commands (and not only their GMMs), because they are necessary in the re-estimation procedure when a new command is added or removed . It is noted that if a command is removed the same re-estimation procedure as performed for adding a new command applies to estimate new GMMs on all remaining commands.

Action mode In the following, the action mode of the speech spotting system according to an embodiment of the invention will be described . In the action mode, an uttered speech signal is evaluated in order to find whether there is a command (i.e. a couple user and command) for the uttered speech signal, that has been registered in the speech spotting system in the registration mode.

According to an embodiment of the invention, the registered commands are detected in a speech flow (continuous speech). According to another embodiment of the invention, the registered commands are detected from a short-time speech segment.

Fig. 5 illustrates processing in the action mode according to an embodiment of the invention. The uttered speech signal (also referred to as trial uttered command) is input via a microphone or microphone array 41 which may be the same as the microphone or microphone array 10 of Fig. 1.

In Fig . 5, the pre-processing block 20 and the feature extraction block 36 are similar to blocks 20 and 30 used in the registration mode, except for the VAD in block 36, which is slightly different in order to segment the commands in the speech flow, rather than in a time limited recording .

In blocks 44 and 45, the log-likelihood is computed for both the UBM and GMMs using the feature space from the trial uttered command . The final log-likelihood LL is given by the average difference between the UBM and GMM log-likelihoods.

If the final LL is below a predetermined threshold, then no commands (none of the registered commands uttered by a given user) are detected. In other words, in block 46 it is decided that the trial uttered command is not a registered command and user. Otherwise, the highest final LL provides the most probable detected couple of command and user, which is the output information from block 46. It may happen that the same command is uttered by multiple users. Such case is not a problem as the user will be discriminated in block 46.

According to an embodiment of the invention, in block 46, final log- likelihoods are calculated by computing an average difference between the log-likelihood for the UBM and the log-likelihoods for the GMMs. Further, in block 46, a registered command uttered by a registered user is detected based on a final log-likelihood of the calculated final log-likelihoods if the final log-likelihood exceeds a predetermined threshold . Finally, in block 46, the registered command and the registered user are decided based on the maximum log-likelihood of the final log-likelihoods exceeding the

predetermined threshold .

Fig. 6 shows a confusion matrix illustrating the result obtained in block 46 for five different registered users (i.e. speakers) and three registered commands for each registered user. Hence, there are 15 registered couples of user and command.

Each registered user utters each registered command 24 times. The x-axis represents the target, i.e. what must be detected, and the y-axis is the output from block 46. The number of correct detections is given on the diagonal of the confusion matrix. On the x-axis, indices 1 to 3 correspond to the three commands uttered by user 1, indices 4 to 6 correspond to the three commands uttered by user 2, indices 7 to 9 correspond to the three commands uttered by user 3, indices 10 to 12 correspond to the three commands uttered by user 4, and indices 13 to 15 correspond to the three commands uttered by user 5. The same applies for the y-axis.

When the number in the diagonal is equal to 24, it means that every time the command has been uttered the user and command are well recognized. When the number is below 24, it means that there are some errors, and it is possible to derive information on the errors. For example, in the case shown in Fig . 6, when user 2 has uttered command 3, 1 misdetection on 24 trials (number 23 on the diagonal) occurred, and by checking the column it can be seen that this one misdetection was detected as user 4/command 2.

The result table shown at the bottom right corner in Fig . 6 indicates an excellent recognition rate of the couples user and command of 98.1%.

According to embodiments of the invention, Higuchi's Fractal Dimension is applied as a key feature element in a multi-scale approach combined with the UBM/GMM estimation procedure for modeling uniquely the

user/command as an audio signature, that can be used in combination with other features or alone. In the following, the results illustrated in Fig . 6 are compared with results achieved by a first conventional speech spotting system using features extracted from a speech signal using a fractal dimension (which is different from Higuchi's Fractal Dimension) followed by a simple discrimination, and a second conventional speech spotting system using the fractal dimension features together with features derived from entropy of the speech signal.

Fig. 7A shows the results obtained from the first conventional speech spotting system, and Fig . 7B shows the results obtained from the second conventional speech spotting system, for five different registered users (i.e. speakers) and three registered commands for each registered user, applying the same conditions and data as in the embodiment of the invention the result of which is illustrated in Fig . 6. Hence, there are 15 couples of user and command. Each registered user utters each registered command 24 times. The x-axis represents the target, i.e. what must be detected, and the y-axis is the output from block 46. The number of correct detections is given on the diagonal of the confusion matrix. On the x-axis, indices 1 to 3 correspond to the three commands uttered by user 1, indices 4 to 6 correspond to the three commands uttered by user 2, indices 7 to 9 correspond to the three commands uttered by user 3, indices 10 to 12 correspond to the three commands uttered by user 4, and indices 13 to 15 correspond to the three commands uttered by user 5. The same applies for the y-axis.

The number of correct detections is given on the diagonal of the confusion matrices, and it should be equal to 24, as there are 24 repetitions of each command.

By using only the fractal dimension features, the recognition rate is low at 10.6%, as illustrated at the bottom right corner in Fig . 7A. When adding the second features (entropy), the results are improved but remain low at 14.2%, as illustrated at the bottom right corner in Fig . 7B.

Fig. 8 shows a schematic block diagram illustrating a configuration of a control unit in which at least some of the above described embodiments of the invention are implementable. The control unit comprises processing resources (processing circuitry), memory resources (memory circuitry) and interfaces. The microphone or microphone array 10, 41 may be

implemented by the interfaces, and at least some of the processing in blocks 20, 30, 36, 40, 44, 45, 46, 50 and 60 and steps S20 to S24 may be realized by the processing resources (processing circuitry) and memory resources (memory circuitry) of the control unit.

In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software (computer readable instructions embodied on a computer readable medium), logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

It is to be understood that the above description is illustrative of the invention and is not to be construed as limiting the invention. Various modifications and applications may occur to those skilled in the art without departing from the true spirit and scope of the invention as defined by the appended claims.

Claims

CLAIMS :

1. A method of registering commands uttered by users, the method comprising :

acquiring a plurality of speech signals, each of the plurality of speech signals corresponding to a command of a plurality of commands, which is uttered by a user of a plurality of users;

for each of the plurality of speech signals, extracting, for each of a number of time frames T of the speech signal, N Higuchi fractal dimension (HFD) parameters, as a feature vector, from the speech signal using multi- scale HFD, and forming a feature space from the feature vector and the number of time frames T of the speech signal for each scale of the multi- scale HFD, N and T being integers equal to or greater than one, thereby forming feature spaces each corresponding to one of the plurality of speech signals;

concatenating the feature spaces;

estimating a universal background model (UBM) from the

concatenated feature spaces; and

estimating a user and command dependent Gaussian mixture model (GMM) for each of the plurality of speech signals using the estimated UBM, thereby estimating GMMs each corresponding to one of the plurality of speech signals.

2. The method of claim 1, comprising :

holding the estimated GMMs, the UBM and the feature spaces in a database.

3. The method of claim 1 or 2, comprising :

extracting the speech signal from a digital audio signal.

4. The method of any one of claims 1 to 3, comprising : for each of the plurality of speech signals, extracting, for each time frame of the speech signal, M Mel-frequency cepstral coefficients (MFCCs) from the speech signal, M being an integer equal to or greater than one, wherein the feature vector comprises the M MFCCs and the N HFD parameters.

5. A method of detecting registered commands uttered by registered users, the method comprising :

acquiring a speech signal;

extracting, for each of a number of time frames T of the speech signal, N Higuchi fractal dimension (HFD) parameters, as a feature vector, from the speech signal using multi-scale HFD, and forming a feature space from the feature vector and the number of time frames T of the speech signal for each scale of the multi-scale HFD, N and T being integers equal to or greater than one;

acquiring a universal background model (UBM) and at least one user and command dependent Gaussian mixture model (GMM);

calculating, using the feature space, a log-likelihood for the UBM and a log-likelihood for the at least one GMM;

calculating at least one final log-likelihood by computing an average difference between the log-likelihood for the UBM and the log-likelihood for the at least one GMM;

detecting, in the speech signal, a registered command uttered by a registered user if the at least one final log-likelihood exceeds a

predetermined threshold; and

deciding the registered command and the registered user based on the maximum log-likelihood out of the at least one final log-likelihood exceeding the predetermined threshold.

6. The method of claim 5, wherein the UBM and the at least one GMM are estimated by: acquiring a plurality of speech signals for registration, each of the plurality of speech signals for registration corresponding to a command of a plurality of commands, which is uttered by a user of a plurality of users, for each of the plurality of speech signals for registration, extracting, for each of a number of time frames T of the speech signal for registration, N Higuchi fractal dimension (HFD) parameters, as a feature vector for registration, from the speech signal for registration using multi-scale HFD, and forming a feature space for registration from the feature vector for registration and the number of time frames T of the speech signal for registration for each scale of the multi-scale HFD, N and T being integers equal to or greater than one, thereby forming feature spaces for registration each corresponding to one of the plurality of speech signals for registration; concatenating the feature spaces for registration;

estimating the universal background model (UBM) from the

concatenated feature spaces for registration; and

estimating a user and command dependent Gaussian mixture model (GMM) for each of the plurality of speech signals for registration using the estimated UBM, thereby estimating the at least one GMM .

7. The method of claim 5 or 6, comprising :

acquiring the speech signal from a digital audio signal representing continuous speech.

8. The method of any one of claims 5 to 7, comprising :

extracting, for each time frame of the speech signal, M Mel-frequency cepstral coefficients (MFCCs) from the speech signal, M being an integer equal to or greater than one,

wherein the feature vector comprises the M MFCCs and the N HFD parameters.

9. The method of claim 6, comprising : extracting, for each time frame of the speech signal, M Mel-frequency cepstral coefficients (MFCCs) from the speech signal, M being an integer equal to or greater than one,

wherein the feature vector comprises the M MFCCs and the N HFD parameters,

wherein the UBM and the at least one GMM are further estimated by: for each of the plurality of speech signals for registration, extracting, for each time frame of the speech signal for registration, M Mel-frequency cepstral coefficients (MFCCs) from the speech signal for registration, M being an integer equal to or greater than one,

wherein the feature vector for registration comprises the M MFCCs and the N HFD parameters.

10. A computer program product including a program for a processing device, comprising software code portions for performing the steps of any one of claims 1 to 9 when the program is run on the processing device.

11. The computer program product according to claim 10, wherein the computer program product comprises a computer-readable medium on which the software code portions are stored.

12. The computer program product according to claim 10, wherein the program is directly loadable into an internal memory of the processing device.

13. An apparatus for registering commands uttered by users, the apparatus comprising :

an extracting unit (30) configured to :

acquire a plurality of speech signals, each of the plurality of speech signals corresponding to a command of a plurality of commands, which is uttered by a user of a plurality of users, and

for each of the plurality of speech signals, extract, for each of a number of time frames T of the speech signal, N Higuchi fractal dimension (HFD) parameters, as a feature vector, from the speech signal using multi- scale HFD, and form a feature space from the feature vector and the number of time frames T of the speech signal for each scale of the multi- scale HFD, N and T being integers equal to or greater than one, thereby forming feature spaces each corresponding to one of the plurality of speech signals; and

an estimating unit (40, 50) configured to :

concatenate the feature spaces,

estimate a universal background model (UBM) from the concatenated feature spaces, and

estimate a user and command dependent Gaussian mixture model (GMM) for each of the plurality of speech signals using the estimated UBM, thereby estimating GMMs each corresponding to one of the plurality of speech signals.

14. The apparatus of claim 13, wherein the extracting unit is configured to, for each of the plurality of speech signals, extract, for each time frame of the speech signal, M Mel-frequency cepstral coefficients (MFCCs) from the speech signal, M being an integer equal to or greater than one,

wherein the feature vector comprises the M MFCCs and the N HFD parameters.

15. An apparatus for detecting registered commands uttered by registered users, the apparatus comprising :

an extraction unit (36) configured to :

acquire a speech signal, and

extract, for each of a number of time frames T of the speech signal, N Higuchi fractal dimension (HFD) parameters, as a feature vector, from the speech signal using multi-scale HFD, and form a feature space from the feature vector and the number of time frames T of the speech signal for each scale of the multi-scale HFD, N and T being integers equal to or greater than one;

a calculating unit (44, 45) configured to : acquire a universal background model (UBM) and at least one user and command dependent Gaussian mixture model (GMM), and

calculate, using the feature space, a log-likelihood for the UBM and a log-likelihood for the at least one GMM; and

a deciding unit (46) configured to :

calculate at least one final log-likelihood by computing an average difference between the log-likelihood for the UBM and the log-likelihood for the at least one GMM,

detect, in the speech signal, a registered command uttered by a registered user if the at least one final log-likelihood exceeds a

predetermined threshold, and

decide the registered command and the registered user based on the maximum log-likelihood out of the at least one final log-likelihood exceeding the predetermined threshold .