JP2002091477A - Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program - Google Patents

Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program

Info

Publication number
JP2002091477A
JP2002091477A JP2000280674A JP2000280674A JP2002091477A JP 2002091477 A JP2002091477 A JP 2002091477A JP 2000280674 A JP2000280674 A JP 2000280674A JP 2000280674 A JP2000280674 A JP 2000280674A JP 2002091477 A JP2002091477 A JP 2002091477A
Authority
JP
Japan
Prior art keywords
model
language
acoustic
speech recognition
acoustic model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
JP2000280674A
Other languages
Japanese (ja)
Inventor
Jun Ishii
Yohei Okato
洋平 岡登
純 石井
Original Assignee
Mitsubishi Electric Corp
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp, 三菱電機株式会社 filed Critical Mitsubishi Electric Corp
Priority to JP2000280674A priority Critical patent/JP2002091477A/en
Publication of JP2002091477A publication Critical patent/JP2002091477A/en
Application status is Abandoned legal-status Critical

Links

Abstract

PROBLEM TO BE SOLVED: To update acoustic and language models and to improve the precision in recognition without exerting a large load onto a user. SOLUTION: An acoustic model control server 20 obtains updated acoustic data 107 and constructs an acoustic model. A language model control server 30 obtains updated language data 114 and constructs a language model. These models are respectively tansmitted to a voice recognition device 10. In the device 10, an acoustic model updating means 111 updates an acoustic model 102 by the transmitted acoustic model. A language model updating means 118 updates a language model 103 using the transmitted language model.

Description

DETAILED DESCRIPTION OF THE INVENTION

[0001]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system and a speech recognition system for updating an acoustic model and a language model referred to during speech recognition through a network so as to obtain a high recognition rate. The present invention relates to an apparatus, an acoustic model management server, a language model management server, a speech recognition method, and a computer-readable recording medium recording a speech recognition program.

[0002]

2. Description of the Related Art In speech recognition, usually, a digitally input speech is converted into a time series of vectors that well represent the acoustic characteristics of the speech by using a signal processing technique.
A matching process with a voice model (acoustic model, language model) is performed.

The collation processing is a problem of finding a word sequence W = [w1w2 :: wk] (k is the number of words) uttered from an acoustic feature vector time series A = [a1a2 :: an] composed of n time frames. is there. In order to estimate a word string with the highest recognition accuracy, a recognized word string W * that maximizes the appearance probability P (W | A) may be obtained. That is, W * = argmaxWP (W | A) (1) However, it is usually difficult to directly obtain P (W | A). Therefore, using Bayes' theorem, P (W | A) is rewritten as in equation (2). P (W | A) = P (W) P (A | W) / P (A) (2)

Here, when determining W for maximizing the left side, P (A), which is the denominator on the right side, does not affect W which is a recognition candidate. Therefore, W for maximizing the numerator of the right side may be obtained. That is, equation (3) is obtained. W * = argmaxWP (W) P (A | W) (3) Here, the probability model that gives P (W) is called a language model, and the probability model that gives P (A | W) is called an acoustic model.

[0005] These typical modeling methods in speech recognition represent an acoustic model by a hidden Markov model,
This is a method of expressing a language model by an n-1 double Markov process of a word called n-gram.

The details of these methods are described in, for example, "Basics of speech recognition (upper and lower)" L. RABINER, B .; H. JU
ANG, edited by Furui, November, 1995, NTT Advanced Technology (hereinafter referred to as Reference 1), "Probabilistic Language Model" Kenji Kita, University of Tokyo Press (hereinafter referred to as Reference 2), "Speech and Sound Information" Digital Signal Processing ", co-authored by Kiyohiro Kano, Satoshi Nakamura and Shiro Ise, Nov. 1997, Shokodo (hereinafter referred to as Reference 3).

In these methods, parameters constituting a model are statistically estimated from a large amount of data. In constructing the acoustic model, words from many speakers,
Speech data such as sentences is collected, and estimation is performed by using a statistical method so that recognition accuracy and an index well related to the recognition accuracy are improved. For example, the parameters of the hidden Markov model constituting the acoustic model are estimated by using the Baum-Welch algorithm so that the likelihood that the acoustic model outputs the learning data is maximized. The method of estimating the acoustic model is described in detail in the lower volume of Reference 1.

Similarly, in the construction of a language model, the probability of appearance of each utterance and words constituting the utterance is calculated from texts such as newspapers and transcripts of conversations in accordance with the structure of the language model. For example, when n = 2 in the n-gram language model (called a bigram language model), P (W) is approximated as in equation (4). The parameters of the n-gram language model are estimated from the frequency of adjacent n words in the learning text data. The method of estimating the language model is described in detail in Reference 2. P (w1..wk) = P (w1) P (w2 | w1). . P. (Wk | w1 ... wk-1) {P (w1) P (w2 | w1). . P. (Wk | wk-1) (4)

By using a large amount of text to statistically estimate the appearance probability of each word,
A language model that achieves higher recognition accuracy than a method that does not use statistics can be constructed. In addition, the definition of the word is ambiguous because the text is not separated in Japanese,
In the text, each unit obtained by dividing the text by some means that is consistent is defined as a word. This word is, for example, a linguistic unit such as a character, a morpheme, or a phrase, a text segmentation based on an entropy criterion, or a combination thereof.

FIG. 12 is a block diagram showing the configuration of the conventional speech recognition device disclosed in the above-mentioned document 1. In FIG. 12, reference numeral 101 denotes a matching unit that receives a voice signal 100 and performs voice recognition to output a recognition result 104;
Reference numeral 01 denotes an acoustic model to be referred to when performing voice recognition, and reference numeral 103 denotes a language model to be referred to when the matching unit 101 performs voice recognition.

Next, the operation will be described. Collation means 10
1 inputs a user's voice signal 100 and outputs an acoustic model 1
With reference to the language model 103 and the language model 103, speech recognition is executed and a recognition result 104 is output. The acoustic model 102 is composed of a time series of an acoustic feature vector obtained by performing signal processing on a speech waveform of an input user speech signal 100 and, for example, the minimum symbol information handled by a speech recognition device represented by a phoneme or the like. Represents a mapping relationship. The language model 103 describes a correspondence relationship with a recognition unit longer than a word or the like represented by a combination of symbols mapped by the acoustic model 102, and word appearance information. The acoustic model 102 is for calculating the probability that a model of a certain symbol outputs a vector time series, that is, for obtaining the probability of an acoustic observation value sequence of speech, and the language model 103 is for calculating a certain word string. This is for calculating the appearance probability.

FIG. 13 is a flowchart showing the procedure of a conventional voice recognition process for obtaining a recognition result 104 by inputting a voice signal 100. In step ST1301, the input audio signal 100 is A / D converted to a digital signal. In step ST1302, the digitized audio signal is signal-processed at appropriate intervals,
It is converted into a time series of acoustic feature vectors that well represent the properties of speech.

In step ST1303, the acoustic feature vector is collated with the acoustic model 102 by acoustic collation processing, and the probability of outputting a time series of acoustic feature vectors is obtained for each recognition candidate. Step ST1
At 304, each recognition candidate is further collated with the language model 103 by a language collation process and multiplied by the output probability of the word string. Finally, in step ST1305, the most appropriate candidate is selected from the respective recognition candidates, and the recognition result 104 is obtained. Usually, the most appropriate recognition result is a recognition candidate that is determined to have the highest probability by the above-described collation.

In the case where sufficient performance is not achieved only by the acoustic model and the language model constructed by the above method, a speech recognition device that can be customized by the user is to improve the recognition of speech and words that are difficult to recognize.
In some cases, the recognition accuracy can be improved by adapting the acoustic model to the user or adding the recognition target word to the user dictionary.

First, the case where the acoustic model is adapted will be described. FIG. 14 is a block diagram showing a configuration of a conventional speech recognition device provided with an acoustic model adapting means for adapting an acoustic model to a speech signal 100 disclosed in the above-mentioned Document 1. 14 differs from FIG. 12 in that:
An initial acoustic model 1003, an acoustic model adapting means 1004, and an adapted acoustic model 1401 are provided to adapt the acoustic model to the input audio signal 100.
4 and the matching means 101.

Next, the operation of the speech recognition apparatus shown in FIG. 14 will be described. The acoustic model adapting means 1004 converts the initial acoustic model 1003 from the adaptation speech (speech signal 100) collected before the actual recognition and the initial acoustic model 1003 using, for example, a maximum a posteriori probability estimation method. And an adapted acoustic model 1401 is obtained. A method of adapting the acoustic model is described in Chapter 7 of Reference 3. The matching unit 101 receives the audio signal 100 and receives the adapted acoustic model 1401 and language model 10
3, the speech recognition is performed and the recognition result 104 is output.

Next, a case where a user registers a recognition target word in a dictionary will be described. FIG. 15 is a block diagram showing the configuration of a conventional speech recognition apparatus disclosed in the above-mentioned Document 1 to which a user dictionary is added. 15 differs from FIG. 12 in that a user dictionary 601 is provided.

FIG. 16 is a diagram showing a configuration example of the user dictionary 601. The user dictionary 601 is a group of words registered by the user for better recognition of unrecognized words and words that are difficult to recognize, and is a list of words that are written and read.

Next, the operation of the speech recognition apparatus shown in FIG. 15 will be described. FIG. 17 is a flowchart showing a procedure of a conventional speech recognition process to which a user dictionary 601 has been added. 17 differs from FIG. 13 in that step S
In the language matching process of T1704, the language model 103
To refer to the user dictionary 601,
1 is added to the recognition target words.

The words registered in the user dictionary 601 can be connected to an arbitrary word string with an appropriate connection probability. For example, in a bigram language model in which the appearance condition of a word is determined by only one preceding word, a user dictionary registered word “user” sandwiched between arbitrary words wi and wj.
Probability P (user | wi), P (wj | use
A constant value is given to r), and the probability values are redistributed so that the probability of the entire language model becomes 1. As a result, the user can obtain a recognition result including the registered word.

However, in the configuration of the voice recognition apparatus shown in FIGS. 14 and 15, the voice signal 1
The user needs to customize the acoustic model 102 and the language model 103 by creating the acoustic model 1401 and the adapted acoustic model 1401 or registering a word in the user dictionary 601, thereby imposing a heavy burden on the user. .

Even when words are registered in the user dictionary 601, there are words that have not appeared or usages that have not been taken into account at the time of constructing the language model. Probabilities may be inappropriate. Further, this may cause a decrease in recognition accuracy.

Further, the acoustic model 102 and the language model 103 customized as described above are referred to only by the specific matching unit 101. For this reason, the other matching means 101 uses this customized acoustic model 1
If the language model 02 and the language model 103 are used, the recognition accuracy is reduced.

For this reason, the language processing system such as kanji-to-kana conversion or machine translation has a function of automatically updating a user dictionary 601 via a network with respect to updating of the user dictionary 601 among the problems described above. For example, Japanese Patent Application Laid-Open No. H10-260
No. 960 has been proposed. However, the above-mentioned publication does not consider handling a model for pattern recognition such as speech, and cannot be applied to a model related to pattern information such as the acoustic model 102.

Further, also in the update of the language model 103, if the number of registered words of the user increases, a source error in which an inappropriate short word is inserted is likely to occur, so that the recognition accuracy tends to decrease. This decrease in recognition accuracy is caused by the fact that words additionally registered in the user dictionary 601 do not exist at the time of constructing the language model 103, or the use environment has changed, for example, the appearance probability P (user | Wi), P (wj | use
This is because an inappropriate appearance probability may be given to r) or the like. As a result, an inappropriate recognition result may be easily obtained, and the recognition accuracy may be reduced. In order to prevent this, it is necessary to appropriately set the appearance probability of words, but it is generally difficult for the user himself to give a reasonable value.

[0026]

Since the conventional speech recognition apparatus is configured as described above, when customizing an acoustic model or a language model in order to improve recognition accuracy,
There has been a problem that a heavy burden is imposed on the user.

Further, when using a speech recognition device other than a speech recognition device customized for each user, there is a problem that the recognition accuracy is reduced.

Further, there is a problem that the acoustic model cannot be automatically updated by customization via a network.

Further, there is a problem that the recognition accuracy is apt to decrease when the number of registrations in the user dictionary increases.

The present invention has been made in order to solve the above-mentioned problems. A server connected to a network acquires the latest acoustic data or language data to obtain the latest acoustic model or the latest acoustic data. Improve recognition accuracy without imposing a heavy burden on the user by constructing a language model, constructing an acoustic model or language model corresponding to the user, and updating the acoustic model and language model on the user side via the network An object of the present invention is to obtain a computer-readable recording medium that stores a speech recognition system, a speech recognition device, an acoustic model management server, a language model management server, a speech recognition method, and a speech recognition program.

Also, by connecting via a network, any speech recognition apparatus can use a customized acoustic model or language model, a speech recognition apparatus, an acoustic model management server, a language model management server, and a speech model. It is an object of the present invention to obtain a computer-readable recording medium on which a recognition method and a voice recognition program are recorded.

Further, by utilizing a dictionary or text obtained from the user and text collected semi-automatically, even when the user dictionary becomes large, the recognition accuracy is not easily reduced even if the user dictionary becomes large. It is an object of the present invention to obtain a computer-readable recording medium on which a model management server, a language model management server, a speech recognition method, and a speech recognition program are recorded.

[0033]

A speech recognition system according to the present invention receives a speech signal, performs speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and generates a recognition result. An acoustic model management server configured to output a speech recognition device and to connect to the speech recognition device via a network to acquire updated acoustic data and construct the acoustic model. Transmits the constructed acoustic model to the speech recognition apparatus, and the speech recognition apparatus updates the acoustic model referred to during speech recognition by the acoustic model transmitted by the acoustic model management server.

In the speech recognition system according to the present invention, the acoustic model management server acquires an ID for specifying an acoustic model to be referred to when the speech recognition device recognizes speech, and acquires the acquired ID.
In response to the specific condition indicated by D, the updated acoustic data is read, an acoustic model dependent on the specific condition is constructed, and transmitted to the speech recognition device.

A speech recognition system according to the present invention receives a speech signal, performs speech recognition with reference to a language model for calculating the probability of occurrence of a word string, and outputs a recognition result. And a language model management server connected via a network to obtain updated language data and construct the language model, wherein the language model constructed by the language model management server is transmitted to the speech recognition device. And the voice recognition device transmits
The language model referred at the time of speech recognition is updated by the language model transmitted by the language model management server.

In the speech recognition system according to the present invention, the language model management server acquires an ID for specifying a language model to be referred to when the speech recognition device recognizes speech, and acquires the acquired ID.
The updated language data is read in accordance with the specific condition indicated by D, a language model dependent on the specific condition is constructed, and transmitted to the speech recognition device.

In the speech recognition system according to the present invention, the speech recognition device refers to a user dictionary in which words are registered at the time of speech recognition, and the language model management server reads out the user dictionary via a network and updates the user dictionary. With reference to the language data and the read user dictionary, a language model dependent on the user dictionary is constructed and transmitted to the speech recognition device.

In the speech recognition system according to the present invention, the language model management server acquires the text used by the user of the speech recognition apparatus, refers to the updated language data and the acquired text, and adds the text to the text. It builds a dependent language model and sends it to the speech recognition device.

A speech recognition system according to the present invention includes a speech recognition device that inputs a speech signal, performs speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and outputs a recognition result. And an acoustic model management server connected to the speech recognition apparatus via a network and having an initial acoustic model before adaptation, wherein the speech recognition apparatus receives an ID for specifying the acoustic model, Audio data for adaptation is obtained from the obtained audio signal, the acquired ID and the audio data for adaptation are transmitted to the acoustic model management server via a network, and the acoustic model management server is transmitted. Using the audio data for adaptation, the initial acoustic model is adapted, the adapted acoustic model is stored in association with the transmitted ID, and Receiving an acoustic model update command from the voice recognition device via the network, receives an ID specifying the acoustic model, and stores an adapted acoustic model corresponding to the received ID. An acoustic model selected and read out from the acoustic model and transmitted to the voice recognition device via the network, and the voice recognition device refers to the acoustic model to be referred to at the time of voice recognition, and is adapted by the acoustic model management server. It is updated by the acoustic model.

A speech recognition apparatus according to the present invention receives an acoustic model for obtaining a probability of an acoustic observation sequence of speech and a speech signal and performs speech recognition with reference to the acoustic model.
And a matching means for outputting a recognition result.
An acoustic model constructed based on the updated acoustic data is received from an acoustic model management server connected via a network, and the acoustic model referred to by the matching unit at the time of speech recognition is updated with the received acoustic model. Sound model updating means.

In the speech recognition apparatus according to the present invention, the acoustic model updating means is constructed based on the updated acoustic data from the acoustic model management server connected via the network. An acoustic model dependent on specific conditions of an acoustic model to be executed is received, and an acoustic model referred to by the matching means at the time of speech recognition is updated with the received acoustic model.

The speech recognition apparatus according to the present invention includes a language model for obtaining an appearance probability of a word string, and a matching unit for receiving a speech signal, performing speech recognition with reference to the language model, and outputting a recognition result. A language model constructed from updated language data from a language model management server connected via a network, and the matching means refers to a language model referred to in speech recognition in the received language. Language model updating means for updating with a model.

In the speech recognition apparatus according to the present invention, the language model updating means is constructed based on the updated language data from the language model management server connected via the network. A language model depending on a specific condition of the language model to be received is received, and the language model referred to by the matching means at the time of speech recognition is updated with the received language model.

[0044] The speech recognition apparatus according to the present invention includes a user dictionary in which the collating means registers words to be referred to during speech recognition, and the language model updating means is provided from a language model management server connected via a network. A language model constructed based on updated language data, which depends on a user dictionary referred to by the matching means during speech recognition, is received, and a language model referred by the matching means during speech recognition is received. It is updated by the model.

[0045] In the speech recognition apparatus according to the present invention, the language model updating means uses a text model created by the updated language data from a language model management server connected via a network and used by a user who performs speech recognition. , And updates the language model referred to by the matching means at the time of speech recognition with the received language model.

A speech recognition apparatus according to the present invention receives an acoustic model for obtaining a probability of an acoustic observation value sequence of speech and a speech signal and performs speech recognition with reference to the acoustic model.
A matching unit that outputs a recognition result; an acoustic model ID acquiring unit that acquires an ID that specifies the acoustic model; and an ID that is acquired by the acoustic model ID acquiring unit. Data acquisition means for acquiring the read-out ID and the acquired speech data for adaptation to an acoustic model management server connected via a network; and Receiving an adapted acoustic model adapted by the adaptation speech data corresponding to the above, and updating the acoustic model referred to by the matching means at the time of speech recognition by the received adapted acoustic model. Model updating means.

The acoustic model management server according to the present invention comprises:
An acoustic data acquisition unit that acquires updated acoustic data, and receives an external acoustic model update command, reads out the updated acoustic data acquired by the acoustic data acquisition unit, and obtains an acoustic observation value sequence of the audio. An acoustic model construction means for constructing an acoustic model for which a probability is to be obtained, and an acoustic model transmission means for transmitting the acoustic model constructed by the acoustic model construction means to a speech recognition apparatus for performing speech recognition via a network. It is.

The acoustic model management server according to the present invention comprises:
An acoustic data acquisition unit for acquiring updated acoustic data, and an ID for identifying an acoustic model to be referred to by a speech recognition device connected via a network upon speech recognition in response to an acoustic model update command from outside. The updated acoustic model ID acquisition means to be acquired, and the specific sound to read out the updated acoustic data acquired by the acoustic data acquisition means in response to the specific condition indicated by the ID acquired by the updated acoustic model ID acquisition means. Data readout means, specific acoustic model construction means for constructing an acoustic model dependent on the specific condition with reference to the updated acoustic data read by the specific acoustic data readout means, and specific acoustic model construction means And an acoustic model transmitting means for transmitting the constructed acoustic model to the speech recognition device via a network. It is intended.

The acoustic model management server according to the present invention comprises:
Finding the probability of an acoustic observation sequence of speech, an initial acoustic model before adaptation, and speech data for adaptation transmitted from a speech recognizer connected via a network,
The voice recognition device receives an ID for specifying an acoustic model to be referred to at the time of speech recognition, adapts the initial acoustic model using the adaptation speech data, and receives the adapted acoustic model. Acoustic model adaptation means for storing in the adapted acoustic model storage means in association with the ID,
Receiving an acoustic model update command from the outside, receiving the ID from the speech recognition device via a network, selecting an adapted acoustic model corresponding to the received ID from the adapted acoustic model storage means. And an acoustic model transmitting means for transmitting the adapted acoustic model read by the adapted acoustic model selecting means to the speech recognition device via a network. is there.

The language model management server according to the present invention comprises:
A language data acquiring means for acquiring updated language data, and a language model for receiving an external language model update command, reading out the updated language data acquired by the language data acquiring means, and obtaining a word string appearance probability. And language model transmission means for transmitting the language model constructed by the language model construction means to a speech recognition apparatus for performing speech recognition via a network.

The language model management server according to the present invention comprises:
A language data acquisition unit for acquiring updated language data, and an ID for specifying a language model to be referred to by a speech recognition device connected via a network upon speech recognition in response to an external language model update command. An updated language model ID acquisition unit to be acquired, and a specific language for reading out updated language data acquired by the language data acquisition unit in response to a specific condition indicated by the ID acquired by the update language model ID acquisition unit. Data reading means, specific language model building means for referring to the updated language data read by the specific language data reading means, and building a language model dependent on the specific conditions, and the specific language model building means Language model transmitting means for transmitting the language model constructed by the method to the speech recognition device via a network. It is intended.

The language model management server according to the present invention comprises:
Language data acquisition means for acquiring updated language data, and a user dictionary readout for receiving a language model update command from the outside and reading out a user dictionary referred to by a speech recognition device connected via a network during speech recognition Means for reading the updated language data acquired by the language data acquisition means, and constructing a language model dependent on the user dictionary read by the user dictionary reading means; and a user dictionary dependent language model construction means. Language model transmitting means for transmitting the language model constructed by the language model constructing means to the speech recognition device via a network.

The language model management server according to the present invention comprises:
Language data acquisition means for acquiring updated language data; user use text acquisition means for receiving a language model update command from the outside, and acquiring text used by a user of a voice recognition device connected via a network; ,
A user-based text-dependent language model constructing unit that reads the updated language data acquired by the language data acquiring unit and constructs a language model dependent on the text acquired by the user-used text acquiring unit; Language model transmitting means for transmitting the language model constructed by the model constructing means to the speech recognition apparatus via a network.

A speech recognition method according to the present invention is characterized in that a speech signal is input, speech recognition is performed with reference to an acoustic model for obtaining the probability of an acoustic observation value sequence of speech, and a recognition result is output. First to obtain the acquired acoustic data
Receiving the acoustic model update command and the first
A second step of reading out the updated acoustic data acquired in the step and constructing an acoustic model;
A third step of transmitting the acoustic model constructed in step 3 through a network, receiving the acoustic model transmitted in the third step, and referencing the acoustic model to be referred to at the time of the speech recognition. And a fourth step of updating with the model.

In a speech recognition method according to the present invention, a speech signal is input, speech recognition is performed with reference to a language model for obtaining a word string occurrence probability, and a recognition result is output. A first step of acquiring, a second step of reading the updated language data acquired in the first step in response to a language model update command, and constructing a language model, and a second step of constructing the language model Receiving the language model transmitted through the network in the third step, and receiving the language model transmitted in the third step, and updating the language model referred to in the speech recognition based on the received language model. And a fourth step.

A speech recognition method according to the present invention is characterized in that a speech signal is input, speech recognition is performed with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, and a recognition result is output. First to obtain the acquired acoustic data
A second step of receiving an acoustic model update command and acquiring an ID for identifying an acoustic model to be referred to during speech recognition, and a specific condition indicated by the ID acquired in the second step. Correspondingly, a third step of reading out the updated acoustic data obtained in the first step, and an acoustic model dependent on the specific condition with reference to the updated acoustic data read out in the third step And a fifth step of transmitting the acoustic model constructed in the fourth step via a network.
And a sixth step of receiving the acoustic model transmitted in the fifth step and updating the acoustic model to be referred to at the time of speech recognition with the received acoustic model.

According to the speech recognition method of the present invention, a speech signal is input, speech recognition is performed with reference to a language model for obtaining the probability of occurrence of a word string, and a recognition result is output. A first step of obtaining, a second step of obtaining an ID for specifying a language model to be referred to at the time of speech recognition in response to a language model update command, and an instruction specified by the ID obtained in the second step. A third step of reading out the updated language data obtained in the first step in response to the specific condition, and referring to the updated language data read out in the third step. A fourth step of constructing a dependent language model, a fifth step of transmitting the language model constructed in the fourth step via a network, and a fifth step of In receiving the transmitted language model, in which the language model referenced during speech recognition, and a sixth step of updating the language model received.

A speech recognition method according to the present invention performs speech recognition by inputting a speech signal and referring to a language model for obtaining a word string appearance probability and a user dictionary in which words are registered, and outputs a recognition result. A first step of obtaining updated language data, a second step of receiving a language model update command and reading out a user dictionary to be referred to at the time of speech recognition, and a step of obtaining the updated language data in the first step. A third language model that reads the language data thus read and constructs a language model dependent on the user dictionary read in the second step.
And a fourth step of transmitting, via a network, the language model constructed in the third step, and a language to receive the language model transmitted in the fourth step and to refer to the language model for speech recognition. Updating the model with the received language model.

In the speech recognition method according to the present invention, a speech signal is input, speech recognition is performed with reference to a language model for obtaining the probability of occurrence of a word string, and a recognition result is output. A first step of acquiring, a second step of acquiring a text used by a user performing speech recognition in response to a language model update command, and reading the updated language data acquired in the first step; A third step of constructing a language model dependent on the text obtained in the second step,
A fourth step of transmitting the language model constructed in step 4 via a network, receiving the language model transmitted in the fourth step, and referencing the language model to be referred to in speech recognition. Updated by the fifth
And the following steps.

A speech recognition method according to the present invention is characterized in that a speech signal is inputted, speech recognition is performed with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and a recognition result is output. A first step of acquiring an ID for specifying an acoustic model, reading the ID acquired in the first step, acquiring audio data for adaptation from the input audio signal, and reading the data via a network. Second for transmitting ID and acquired audio data for adaptation
And the adaptation of the initial acoustic model before adaptation using the audio data for adaptation transmitted in the second step, and the adaptation of the adapted acoustic model to the ID transmitted in the second step. Receiving the ID acquired in the first step via a network in response to the third step of storing the associated acoustic model and the acoustic model update command, and storing the adapted acoustic model corresponding to the received ID A fourth step of selecting and reading out the adapted acoustic model stored in the third step, and a fifth step of transmitting the adapted acoustic model read in the fourth step via a network. Receiving the adapted acoustic model transmitted in the fifth step and updating the acoustic model to be referred to at the time of speech recognition with the received adapted acoustic model. It is obtained by a sixth step.

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal and performs speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech. A sound data acquiring function for acquiring a renewed acoustic data, and an updated acoustic data acquired by the acoustic data acquiring function in response to an acoustic model update command, realizing a matching function for outputting a recognition result. And an acoustic model construction function for constructing an acoustic model, an acoustic model transmission function for transmitting the acoustic model constructed by the acoustic model construction function via a network, and an acoustic model transmitted by the acoustic model transmission function Then, the acoustic model referred to by the matching function during speech recognition is updated with the received acoustic model. It is used for realizing the potential.

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to a language model for obtaining a word string occurrence probability, and outputs a recognition result. A language data acquisition function for acquiring the updated language data, and a language model update command, which reads the updated language data acquired by the language data acquisition function, A language model construction function for constructing a language model, a language model transmission function for transmitting the language model constructed by the language model construction function via a network, and a language model transmitted by the language model transmission function Language model updating function that updates the language model referred to during speech recognition by the received language model. Is shall.

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal and performs speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech. A sound data acquisition function for acquiring updated acoustic data, and an updated acoustic model for receiving an acoustic model update command and acquiring an ID for specifying the acoustic model. An ID acquisition function, a specific acoustic data reading function for reading the updated acoustic data acquired by the acoustic data acquisition function in response to a specific condition indicated by the ID acquired by the updated acoustic model ID acquisition function, Referring to the updated acoustic data read by the specific acoustic data reading function, an acoustic model depending on the specific condition is obtained. The specific acoustic model construction function to be built, the acoustic model constructed by the specific acoustic model construction function, an acoustic model transmission function to transmit via a network, and the acoustic model transmitted by the acoustic model transmission function are received, An acoustic model updating function of updating the acoustic model referred to by the matching function at the time of speech recognition with the received acoustic model is realized.

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to a language model for obtaining a word string appearance probability, and outputs a recognition result. A language data acquiring function for realizing a collation function, acquiring updated language data, an updated language model ID acquiring function for acquiring an ID for specifying the language model in response to a language model update command, A language data reading function for reading updated language data acquired by the language data acquisition function in response to a specific condition indicated by the ID acquired by the updated language model ID acquisition function; The reading function refers to the updated language data read, and a specific direction for constructing a language model depending on the specific conditions described above. A language model construction function, a language model transmission function for transmitting the language model constructed by the specific language model construction function via a network, and a language model transmitted by the language model transmission function, and the matching function This implements a language model updating function of updating a language model referred to during speech recognition based on the received language model.

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, and refers to a language model for obtaining a word string appearance probability and a user dictionary in which words are registered for speech recognition. And implements a collation function of outputting a recognition result,
A language data acquisition function for acquiring updated language data, a user dictionary reading function for receiving the language model update command and reading the user dictionary, and an updated language data acquired by the language data acquisition function. A user dictionary dependent language model construction function for constructing a language model dependent on the user dictionary read by the user dictionary read function, and a language model transmission for transmitting the language model constructed by the user dictionary dependent language model construction function via a network And a language model updating function of receiving the language model transmitted by the language model transmitting function and updating the language model referred to by the matching function at the time of speech recognition with the received language model. .

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to a language model for obtaining the appearance probability of a word string, and outputs a recognition result. A language data acquisition function for realizing a collation function, which acquires updated language data, and a user use text acquisition function for acquiring a text used by a user who performs speech recognition in response to a language model update command. Reading the updated language data acquired by the language data acquisition function, and constructing a language model dependent on the text acquired by the user use text acquisition function; a user use text dependent language model construction function; The language that sends the language model constructed by the language model construction function via the network A device that realizes a Dell transmission function and a language model update function of receiving the language model transmitted by the language model transmission function and updating the language model referred to by the matching function during speech recognition with the received language model. It is.

A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, An acoustic model ID acquisition function that implements a collation function of outputting a recognition result, and acquires an ID that specifies the acoustic model;
The above-described acoustic model ID acquisition function reads the acquired ID, acquires audio data for adaptation from the input audio signal, and transmits the read ID and the acquired audio data for adaptation via a network. Adaptation of the initial acoustic model before adaptation using the adaptation speech acquisition function and the adaptation speech data transmitted by the adaptation speech acquisition function, and the adaptation acoustic model The acoustic model adaptation function stored in association with the ID transmitted by the acquisition function, and the I acquired by the acoustic model ID acquisition function via the network in response to the acoustic model update command.
D: receiving an adaptive acoustic model corresponding to the received ID from the acoustic model adaptation function stored in the acoustic model adaptation function and reading out the adapted acoustic model. An acoustic model transmitting function of transmitting the adapted acoustic model read by the optimized acoustic model selecting function via a network; and an adapted acoustic model transmitted by the acoustic model transmitting function. This realizes an acoustic model update function of updating an acoustic model referred to at the time of recognition with the received adapted acoustic model.

[0068]

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a speech recognition system according to Embodiment 1 of the present invention. In the figure, reference numeral 10 denotes a voice recognition device for performing voice recognition, 20 denotes an acoustic model management server connected to a network, 30
Is a language model management server connected to the network. Here, a network generally indicates a communication path capable of transmitting a digital signal by wire or wirelessly.

In the speech recognition apparatus 10, reference numeral 100 denotes an input speech signal; 101, a matching means for performing speech recognition of the speech signal 100; 102, an acoustic model referred to by the matching means 101 for speech recognition; Is a language model referred to during speech recognition.
, The acoustic model updating unit 111 updates the acoustic model 102 with the acoustic model transmitted via the network, and the language model updating unit 118 updates the language model 103 with the language model transmitted via the network. Means.

In the acoustic model management server 20, 10
5 is an acoustic model update command given from the outside, 106 is acoustic data acquisition means for acquiring updated acoustic data,
Reference numeral 107 denotes updated acoustic data acquired by the acoustic data acquisition means 106. Reference numeral 108 denotes an acoustic model update command 105, reads out the updated acoustic data 107, and constructs an acoustic model by performing parameter estimation using a statistical method. Acoustic model constructing means 109, acoustic model storing means 109 for storing the constructed acoustic model, 110 acoustic model for transmitting the acoustic model stored in the acoustic model storing means 109 to the speech recognition apparatus 10 via a network. Transmission means.

In the language model management server 30, 11
2 is a language model update command given from outside, 113 is language data acquisition means for acquiring updated language data,
Reference numeral 114 denotes updated language data obtained by the language data obtaining means 113. Reference numeral 115 denotes a language model constructed by reading out updated language data 114 in response to a language model update command 112 and estimating parameters using a statistical method. Language model constructing means 116 for storing the constructed language model, and 117 a language model for transmitting the language model stored in the language model storing means 116 to the speech recognition apparatus 10 via a network. Transmission means.

The feature of the present invention, which is different from the prior art, is that the acoustic data 107 updated by the acoustic data acquisition means 106 and the language data acquisition means 113 in the acoustic model management server 20 and the language model management server 30
The updated language data 114 is acquired, the latest acoustic model and the latest language model are constructed by the acoustic model construction unit 108 and the language model construction unit 115, and the constructed latest acoustic model and the latest language model are transmitted to the network. The voice model is transmitted to the voice recognition device 10 via the voice recognition device 10, where the acoustic model updating unit 111 and the language model updating unit 1
18, the latest acoustic model, the latest language model,
This is to update the acoustic model 102 and the language model 103 referred to by the matching unit 101.

Next, the operation will be described. In the acoustic model management server 20, the acoustic data acquisition means 106
It operates synchronously or asynchronously with the acoustic model update command 105, constantly or automatically or semi-automatically downloads updated or distributed acoustic data and stores it in the updated acoustic data 107. These acoustic data to be acquired are, for example, audio data that is updated on the Internet, distributed by multimedia broadcasting, or transcripts corresponding to the audio data. It is determined using a broadcast program table and downloaded.

The updated acoustic data 107 is an accumulation of acoustic model learning acoustic data acquired by the acoustic data acquiring means 106. In the above example, the updated acoustic data 107 is composed of audio data and a transcript corresponding to the audio data.

The acoustic model constructing means 108 sends the acoustic model update command 105 given at an appropriate timing, for example, at a fixed time interval, a time interval at which the speech recognition processing is performed, or a user instruction given from the input device. Then, by referring to the updated acoustic data 107 and estimating the parameters of the acoustic model using a statistical method, for example, by using a vector quantization algorithm from only the audio data, or by corresponding to the audio data. By using the Baum-Welch algorithm from the transcribed text, an acoustic model is constructed so as to express the learning data well, and the acoustic model storage means 109 is constructed.
To be stored.

The acoustic model storage means 109 stores the acoustic model constructed by the acoustic model construction means 108,
An acoustic model is output in response to a read request. The acoustic model transmitting unit 110 reads the acoustic model from the acoustic model storing unit 109 and transmits the acoustic model to the acoustic model updating unit 111 of the speech recognition device 10 via the network.

In the speech recognition apparatus 10, the acoustic model updating unit 111 uses the acoustic model received from the acoustic model transmitting unit 110 of the acoustic model management server 20 via the network to refer to the acoustic model that the collating unit 101 refers to at the time of collation. Update 102.

In the language model management server 30, the language data acquisition means 113 operates synchronously or asynchronously with the language model update command 112, constantly downloads updated or distributed language data, and constructs a language model. New language data to be used is collected and stored in the updated language data 114. These language data to be acquired are, for example, regularly distributed newspapers, mail magazines, texts that can be searched from the Internet, texts of chats, mails, manuals, and the like.

The updated language data 114 is an accumulation of language model learning language data obtained by the language data obtaining means 113, and includes text data and keyword information on text contents obtained at the same time.

The language model constructing means 115 sends a language model update command 112 given at an appropriate timing, for example, at a fixed time interval, a time interval at which the speech recognition processing is performed, or a user's instruction given from an input device. Then, by reading the text data with reference to the updated language data 114 and estimating the parameters of the language model using a statistical method, for example, by obtaining an n-gram statistic from the text data divided into words Then, a language model is constructed so as to express the learning data well and stored in the language model storage means 116.

The language model storage means 116 stores the language model constructed by the language model construction means 115,
A language model is output in response to a read request. The language model transmitting unit 117 reads the language model from the language model storing unit 116 and transmits the language model to the language model updating unit 118 of the speech recognition device 10 via the network.

In the speech recognition apparatus 10, the language model updating means 118 uses the language model received from the language model transmitting means 117 of the language model management server 30 via the network to refer to the language model which the collating means 101 refers to at the time of collation. 103 is updated.

FIG. 2 is a flowchart showing a process of updating the acoustic model 102 according to the first embodiment of the present invention.
In step ST201, the acoustic model update timing determination means (not shown) determines an appropriate update timing from, for example, a user's instruction, a time interval from the last update time of the acoustic model, a monitoring of network usage, and the like. The acoustic model update command 105 is transmitted to the acoustic model construction means 108 of the acoustic model management server 20. If the acoustic model constructing unit 108 has received the acoustic model update command 105, the process proceeds to step ST202, and if not, the process ends.

In step ST202, the acoustic model construction means 108 reads out the updated acoustic data 107 used for learning. In step ST203, the acoustic model construction unit 108 updates the acoustic data 107
Then, the acoustic model is constructed by estimating the parameters of the acoustic model using a statistical method, and the constructed acoustic model is stored in the acoustic model storage unit 109.

In step ST204, the acoustic model transmitting means 110 reads the acoustic model from the acoustic model storage means 109 and transmits the acoustic model to the acoustic model updating means 111 of the speech recognition device 10 via the network. In step ST205, the acoustic model updating unit 111 uses the received acoustic model to
1 updates the acoustic model 102 referred to.

Note that the acoustic model update command 105 gives
When requesting the update of the acoustic model 102, the version of the acoustic model 102 used by the speech recognition device 10 is transmitted at the same time, and the acoustic model transmitting unit 110 transmits the entire acoustic model stored in the acoustic model storage unit 109. Instead, if only the difference information from the acoustic model 102 used by the speech recognition apparatus 10 is transmitted, the transmission data can be reduced, and the load on the network can be reduced.

The acoustic model construction means 108 constructs an acoustic model updated in advance, and transmits the acoustic model to the speech recognition apparatus 10 according to the request of the acoustic model update command 105. In this case, the operation can be performed similarly.

Further, even when the voice recognition device 10 has a user dictionary, the same processing can be performed.

Further, in this embodiment, the description has been given for speech recognition. However, the present invention is intended for pattern recognition including a probability model representing a relationship between a pattern and a symbol and a probability model representing a symbol appearance. If so, the same can be applied.

Further, the storage format of the updated acoustic data 107 may be a signal processing or a frequency distribution calculated in advance as long as the format can be used at the time of constructing the acoustic model.

FIG. 3 is a flowchart showing a process of updating language model 103 according to the first embodiment of the present invention.
In step ST301, the language model update timing determining means (not shown) determines an appropriate update timing based on, for example, monitoring of a user's instruction, a time interval from the last update time of the language model, a network usage status, and the like. The language model update command 112 is transmitted to the language model construction means 115 of the language model management server 30. Language model construction means 115 proceeds to step ST302 if language model update command 112 has been received, and ends the process if language model update command 112 has not been received.

In step ST302, language model construction means 115 reads updated language data 114 used for learning. In step ST303, the language model construction means 115 updates the language data 114
Then, a language model is constructed by estimating the parameters of the language model using a statistical method, and the constructed language model is stored in the language model storage means 116.

In step ST304, language model transmitting means 117 reads the language model from language model storing means 116 and transmits the language model to language model updating means 118 of speech recognition apparatus 10 via the network. In step ST305, the language model updating means 118
The language model 103 referred to by the matching unit 101 is updated based on the received language model.

Note that the language model update command 112 gives
When requesting the update of the language model, the version of the language model 103 used by the speech recognition apparatus 10 is transmitted at the same time, so that the speech model is not the entire language model stored in the language model storage means 116 but the speech model up to that point. If only the difference information from the language model 103 used by the recognition device 10 is transmitted, the transmission data can be reduced, and the load on the network can be reduced.

Also, even when the language model construction means 115 constructs a language model updated in advance and transmits the language model in accordance with the request of the language model update command 112, the language model construction means 115 similarly Operable.

Further, even when the speech recognition apparatus 10 has a user dictionary, the same operation can be performed.

Further, in this embodiment, the description has been made for speech recognition. However, the present invention is directed to pattern recognition including a probability model representing the relationship between a pattern and a symbol and a probability model representing the appearance of a symbol. If so, the same can be applied.

Further, if the storage format of the updated language data 114 is a format that can be used at the time of constructing a language model, it can be divided into words in advance, or words and word chains can be used at the same time for constructing a language model. Frequency or probability may be calculated for combinations of appearing words.

FIG. 1 of the first embodiment shows a case where an acoustic model is constructed in accordance with the acoustic model update command 105. However, when the acoustic data acquiring means 106 acquires acoustic data, the acoustic model constructing means 108 Updates the acoustic model and stores it in the acoustic model storage means 109, and the acoustic model transmitting means 110
05 may be read out.
The same applies to updating the language model.

In FIG. 1, the acoustic model updating means 111
And the case where the language model updating unit 118 is provided, but the case where only one of the acoustic model updating unit 111 and the language model updating unit 118 is provided may be used.

Note that the speech recognition system according to the first embodiment can be recorded on a recording medium as a speech recognition program. In this case, the acoustic model management server 20
, An acoustic data acquisition function for performing the same processing as the acoustic data acquisition means 106, an acoustic model construction function for performing the same processing as the acoustic model construction means 108, and an acoustic model storage for performing the same processing as the acoustic model storage means 109 A software comprising an acoustic model transmitting function for performing the same processing as the acoustic model transmitting means 110 and the language model acquiring means 11 in the language model management server 30.
3; a language model construction function for performing the same processing as the language model construction means 115; a language model storage function for performing the same processing as the language model storage means 116; Software comprising a language model transmission function for performing the same processing as the means 117, and an acoustic model updating function for performing the same processing as the acoustic model updating means 111 in the speech recognition device 10,
The speech recognition program is software that includes a language model updating function that performs the same processing as the language model updating unit 118 and a matching function that performs the same processing as the matching unit 101.

In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the acoustic model management server 20 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition apparatus 10 or the acoustic model management server 20 may transmit the information to the acoustic model management server 20 or the speech recognition apparatus 10, respectively. The same applies to the case of a language model.

As described above, according to the first embodiment, the acoustic model management server 2 connected to the network
0 or the language model management server 30 acquires the updated sound data 107 or the updated language data 114,
Build up-to-date acoustic or language models,
By updating the acoustic model 102 or the language model 103 of the speech recognition device 10 on the user side via the network, the effect of improving the recognition accuracy of speech recognition can be obtained without imposing a heavy burden on the user. .

Embodiment 2 FIG. 4 is a block diagram showing a configuration of a speech recognition system according to Embodiment 2 of the present invention. In the language model management server 30 of FIG.
Reference numeral 1 denotes an updated language model ID acquisition unit that acquires an ID that specifies the language model 103 of the speech recognition device 10 to be updated.
02 is a specific language data reading means for reading the updated language data 114 in accordance with the specific condition indicated by the ID obtained by the updated language model ID obtaining means 401, and 403 is a specific language data reading means 402
Is a specific language model constructing means for constructing a language model corresponding to a specific condition by referring to the updated language data 114 read by. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

A characteristic part of this embodiment different from the prior art is that it comprises an updated language model ID acquiring means 401, a specific language data reading means 402, and a specific language model constructing means 403, and is provided via a network. To provide the language model specified by the language model ID. Here, the specific language model is a language model learned so as to obtain higher performance by specializing the language model for a specific user, group, application, or the like.

Next, the operation will be described. In the language management server 30, the updated language model ID acquisition means 401
Acquires an ID used to select a specific language model 103 from the plurality of updated language models 103. This ID is, for example, a user ID of a user, an ID representing a task targeted by the audio signal 100, and the like, and can uniquely determine a specific language model 103 to be updated.

Specific language data reading means 402
Receives the updated language model ID acquired by the updated language model ID acquisition means 401, and updates the updated language data 11
4 is read out in units of a sentence or an independent text, for example, a flag for judging from a keyword relating to the text content added to the language data or a keyword included in the language data and identifying whether or not the object is specified by the language model ID And sends it to the specific language model construction means 403.

The specific language model constructing means 403 constructs a language model learned from the updated language data 114 so that the recognition accuracy of the specific object becomes high, and stores the language model in the language model storing means 116.

For this purpose, first, the language model 103 to be specified is determined from the learning language data from a sentence or a keyword or the like including a plurality of sentences as a unit.
In accordance with this, for example, "Study of Prior Task Adaptation for Dialogue Speech Recognition", Akinori Ito, Masaki Yoshida, IEICE Technical Report (SP96-81), 1996 (hereinafter referred to as Document 4) As described above, it is possible to construct a language model corresponding to a specific condition by giving greater weight to closely related text data.

For example, in order to construct a specific language model specialized for a topic related to sports, a flag obtained from the specific language data reading means 402 is referred to at the time of language model learning, and text data of a topic relating to sports may be used. For example, the actual frequency is counted by multiplying by α, and in the case of other articles, the probability model is estimated by adding both at the same frequency. Here, α is determined so that the entropy of the language model for data not used for learning among the specific texts to be subjected to speech recognition is minimized.

FIG. 5 is a flowchart showing a process of updating language model 103 according to the second embodiment of the present invention.
In step ST501, the language model update timing determination means (not shown) determines an appropriate update timing based on, for example, a user's instruction, a time interval from the last update time, monitoring of network usage, and the like. Update language model ID acquisition means 4 of management server 30
01 is transmitted to the language model update command 112. The update language model ID acquisition means 401 receives the language model update command 11
If the language model update instruction 112 has not been received, the process ends.

In step ST502, if it is the update timing, the update language model ID obtaining means 401
The ID of the language model 103 to be updated is acquired by means for specifying the user group in use, means for specifying the task, etc., and the specific language data reading means 402
Send to In step ST503, the specific language data reading unit 402 reads the updated language data 114 in units of sentences or independent texts according to the specific language model ID, and determines whether the target is specified by the language model ID. The updated language data 114 is read with a flag for determination.

In step ST 504, the specific language model constructing means 403 follows the learning algorithm
Estimating the language model from the updated language data 114,
The estimated language model is stored in the language model storage means 116. In step ST505, the language model transmitting means 117 transmits the read language model to the language model updating means 118 of the speech recognition device 10 via the network. In step ST506, the language model updating unit 118 updates the language model 103 referred to by the matching unit 101 with the received language model.

Note that if a language model that matches the language model ID can be output, it is not necessary to construct a specific language model in advance. For example, only language data for learning may be created depending on the language model ID, and a language model may be constructed according to a request.

In this embodiment, the specific language model construction method according to Document 4 is described as an example. However, a plurality of language models 10
The method can be similarly applied as long as the method is selected from the three.

Further, according to the language model update command 112,
By transmitting the version of the language model 103 used by the speech recognition apparatus 10 at the time of the language model update request, only the difference information from the current language model 103 is used instead of the entire specific language model constructed. It can transmit and reduce the load on the network.

Further, the user dictionary 601 is stored in the speech recognition device.
Even when there is, the same processing can be performed.

Further, in this description, the description has been made for speech recognition. However, if it is for pattern recognition consisting of a probability model representing the relationship between a pattern and a symbol, and a probability model representing the appearance of a symbol. It is equally applicable.

In this embodiment, the specific language model is constructed, and the language model 103 is updated with the constructed specific language model. However, the specific acoustic model is constructed, and the constructed specific acoustic model is constructed. , The acoustic model 102 can be updated. In that case, FIG.
, The acoustic model management server in place of the language model management server 30, the acoustic data acquisition means in place of the language data acquisition means 113, the updated acoustic data in place of the updated language data 114, and the updated language model ID acquisition means 401 Instead of the updated acoustic model ID acquisition means, the specific acoustic model reading means in place of the specific language data reading means 402, the specific acoustic model construction means in place of the specific language model construction means 403, and the language model storage means 116 Instead, acoustic model storage means,
An acoustic model transmitting means is provided in place of the language model transmitting means 117. In the speech recognition apparatus 10, an acoustic model updating means is provided in place of the language model updating means 118, and the acoustic model updating means updates the acoustic model 102. Good.

Furthermore, the speech recognition system according to the second embodiment can be recorded on a recording medium as a speech recognition program. In this case, the language model management server 3
0, a language data obtaining function for performing the same processing as the language data obtaining means 113, an updated language model ID obtaining function for performing the same processing as the updated language model ID obtaining means 401, and the same as the specific language data reading means 402 A specific language data reading function for performing the same processing, a specific language model building function for performing the same processing as the specific language model building means 403, a language model storing function for performing the same processing as the language model storing means 116, Software comprising a language model transmitting function for performing the same processing as the language model transmitting means 117, a language model updating function for performing the same processing as the language model updating means 118 in the speech recognition device 10, and the same as the matching means 101 This is software that has a collation function that performs the processing described above, and becomes a speech recognition program. This is the same even when an acoustic model is targeted.

The voice recognition program recorded on the recording medium may record the software of the voice recognition device 10 and the software of the language model management server 30 on separate recording media, or may record the software on one recording medium. , The speech recognition device 10 or the language model management server 30 may transmit the data to the language model management server 30 or the speech recognition device 10, respectively. The same applies to the case where an acoustic model is targeted.

As described above, according to the second embodiment, the language model management server 3 connected to the network
0, the updated language data 114 is obtained, and
By acquiring the ID of the language model 103 of the user's voice recognition device 10, a language model for the user that is in the latest state and that is specific to the user is constructed, and the language of the user's voice recognition device 10 is connected via a network. By updating the model 103, it is possible to obtain an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the second embodiment, by updating a language model customized for a specific purpose via a network, even if a user uses a plurality of different collation means 101, all collation can be performed. Means 101
In this case, it is possible to obtain an effect that a high recognition accuracy can be obtained by using an appropriate language model 103 when using.

Embodiment 3 FIG. 6 is a block diagram showing a configuration of a speech recognition system according to Embodiment 3 of the present invention. In the speech recognition apparatus 10 of FIG. 6, reference numeral 601 denotes a user dictionary in which words to be referred to by the matching means 101 at the time of matching are registered.
2 receives the language model update command 112 and receives the
The user dictionary reading unit 603 reads out the user dictionary 601 referred to by the user 1 via a network.
Refers to the user dictionary 601 read by the
01 is a user dictionary dependent language model construction means for constructing a language model dependent on 01. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

A feature of this embodiment different from the prior art is that a user dictionary reading means 602 and a user dictionary dependent language model construction means 603 are provided.

Next, the operation will be described. Upon receiving the language model update command 112, the user dictionary reading unit 602 of the language model management server 30 reads out the user dictionary 601 referenced by the matching unit 101 of the speech recognition device 10 via the network. The user dictionary-dependent language model construction means 603 constructs a language model that has been updated to the latest state and customized for the user, using the words registered in the user dictionary 601 and the updated language data 114.

For constructing a language model depending on the user dictionary 601, for example, a text including a word existing in the user dictionary 601 is extracted from the updated language data 114, and this is regarded as a specific text. This is performed by implementing the method described in Reference 4 referred to in the second embodiment. Thereby, the user dictionary 60
Among the words described in 1, the appropriate language can be given to words that were not registered in the original language model and did not have appropriate statistics, and appeared in the updated text. And it can be expected that the recognition accuracy is improved.

FIG. 7 is a flowchart showing a process of updating the language model 103 according to the third embodiment of the present invention.
In step ST701, the language model update timing determination means (not shown) determines an appropriate update timing based on, for example, a user's instruction, a time interval from the last update time, monitoring of a network usage state, and the like. User dictionary reading means 602 of management server 30
To the language model update command 112. If the user dictionary reading unit 602 has received the language model update command 112, the process proceeds to step ST702, and if it has not received the language model update command 112, the process ends.

In step ST702, the user dictionary reading means 602 reads the user dictionary 601 referenced by the collating means 101 via the network. In step ST703, the user dictionary-dependent language model construction means 603 reads the updated language data 114. In step ST704, the user dictionary dependent language model construction means 603 constructs a language model dependent on the user dictionary 601 from the user dictionary 601 and the updated language data 114, and stores the language model storage means 116
To be stored.

At step ST705, the language model transmitting means 117 transmits the language model dependent on the user dictionary 601 read from the language model storing means 116 to the language model updating means 118 of the speech recognition apparatus 10 via the network. In step ST706, the language model updating unit 118 updates the language model 103 referred to by the matching unit 101 with the received language model.

In this embodiment, the description has been given for speech recognition. However, the present embodiment is directed to pattern recognition including a probability model representing a relationship between a pattern and a symbol and a probability model representing a symbol appearance. If so, the same can be applied.

Further, the speech recognition system according to Embodiment 3 can be recorded on a recording medium as a speech recognition program. In this case, the language model management server 3
0, the language data acquisition function for performing the same processing as the language data acquisition means 113, and the user dictionary reading means 6
02, a user dictionary-dependent language model construction function that performs the same processing as the user dictionary-dependent language model construction means 603, and a language model storage function that performs the same processing as the language model storage means 116. Software comprising a function and a language model transmitting function for performing the same processing as the language model transmitting means 117; a language model updating function for performing the same processing as the language model updating means 118 in the speech recognition apparatus 10; A speech recognition program is software that includes a collation function that performs the same processing as 101.

In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the language model management server 30 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition device 10 or the language model management server 30 may transmit the data to the language model management server 30 or the speech recognition device 10, respectively.

As described above, according to the third embodiment, the language model management server 3 connected to the network
0, the updated language data 114 is obtained, and
By reading the user dictionary 601 of the user's voice recognition device 10, the user dictionary 601 is kept up-to-date.
By constructing a language model reflecting the words registered in the language in more detail, and updating the language model 103 of the user's speech recognition device 10 via the network, even if the user dictionary becomes large, the language is large for the user. The effect is obtained that the recognition accuracy of the voice recognition can be improved without imposing a burden.

Embodiment 4 FIG. 8 is a block diagram showing a configuration of a speech recognition system according to Embodiment 4 of the present invention. In the language model management server 30 of FIG.
1 is a user use text acquisition unit that acquires a text used by the user in response to the language model update command 112, 802 is a user use text storage unit that stores the text acquired by the user use text acquisition unit 801,
Reference numeral 803 denotes a user-used text-dependent language model construction unit that constructs a text-dependent language model by referring to the updated language data 114 and the text stored in the user-used text storage unit 802. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

The features of this embodiment different from the prior art are that a user-used text acquisition unit 801, a user-used text storage unit 802, and a user-used text-dependent language model construction unit 803 are provided. Referencing the language data 114 updated to the latest state, and constructing a language model in accordance with the text used by the user.

Next, the operation will be described. The user use text acquisition unit 801 of the language model management server 30
In response to the language model update command 112, for example, by scanning a file or directory specified by the user in advance, a text file referred or described by the user is read. User use text storage means 80
Reference numeral 2 stores the text collected by the user use text acquisition unit 801.

The user use text dependent language model construction means 803 refers to the user use text and the updated language data 114, and constructs a language model so that the recognition accuracy is improved. In the construction of the language model using the user-used text, for example, the user-used text is regarded as the specific text, and the method described in Reference 4 referred to in the second embodiment is performed, whereby the language that depends on the user-used text is used. Build the model. Since the language model constructed in this way reflects the properties of the text that the user has referred to or has already published, it contains linguistic properties that have a high probability that the user will utter, and can obtain more accurate recognition results. it can.

FIG. 9 is a flowchart showing a process of updating the language model 103 according to the fourth embodiment of the present invention.
In step ST901, the language model update timing determining means (not shown) determines an appropriate timing based on a user's instruction, a time interval from the last update time, a monitor of a network usage status, and the like. The language model update instruction 112 is transmitted to the user use text acquisition unit 801 of the first embodiment. The user use text obtaining unit 801 proceeds to step ST902 if the language model update command 112 has been received, and ends the process if the language model update command 112 has not been received.

[0140] In step ST902, the user use text acquisition means 801 reads out the user use text and stores it in the user use text storage means 802. In step ST903, the user use text dependent language model construction means 803 reads the user use text and the updated language data 114. Step ST90
In 4, the user-used text-dependent language model construction unit 803 constructs a user-used text-dependent language model from the user-used text and the updated language data 114 and stores the model in the language model storage unit 116.

At step ST905, language model transmitting means 117 transmits the user-dependent text-dependent language model read from language model storing means 116 to language model updating means 1 of speech recognition apparatus 10 via the network.
18 to be sent. In step ST906, the language model updating unit 118 updates the language model 103 referred to by the matching unit 101 with the received language model.

In this embodiment, the user use text is obtained by searching a specific directory or file. However, the user use text acquisition unit 801 extracts the text from the file or directory if the text can be collected. Instead, user input such as voice recognition, a keyboard, a pen, OCR, or the like, or text or the like viewed by a user using a browser or the like may be used.

The user use text storage means 802
Stores the text collected by the user use text acquisition means 801, but divides the text into words by an appropriate means according to the criteria of the user use text dependent language model construction means 803, The same applies to the case where the frequency is stored as a frequency related to a word chain, a combination of words that appear simultaneously, and the like.

Furthermore, in this embodiment, the description has been made with respect to voice recognition. However, the present invention is directed to pattern recognition including a probability model representing the relationship between a pattern and a symbol and a probability model representing the appearance of a symbol. If so, it can be similarly applied.

Further, the speech recognition system according to Embodiment 4 can be recorded on a recording medium as a speech recognition program. In this case, in the language model management server 30, a language data acquisition function for performing the same processing as the language data acquisition means, and the user use text acquisition means 8
01, a user use text storage function that performs the same processing as the user use text storage unit 802, and a user use that performs the same process as the user use text dependent language model construction unit 803. Software comprising a text-dependent language model construction function, a language model storage function for performing the same processing as the language model storage means 116, and a language model transmission function for performing the same processing as the language model transmission means 117; , A speech recognition program is formed by software including a language model updating function for performing the same processing as the language model updating unit 118 and a collating function for performing the same processing as the collating unit 101.

In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the language model management server 30 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition device 10 or the language model management server 30 may transmit the data to the language model management server 30 or the speech recognition device 10, respectively.

As described above, according to the fourth embodiment, the language model management server 3 connected to the network
0, the updated language data 114 is obtained, and
By acquiring the text used by the user, constructing a language model that is up to date and dependent on the text used by the user, and updating the language model 103 of the user's speech recognition device 10 via the network. so,
The effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user is obtained.

Embodiment 5 FIG. FIG. 10 is a block diagram showing a configuration of a speech recognition system according to Embodiment 5 of the present invention. In the speech recognition apparatus 10 of FIG.
Is an acoustic model ID acquiring means for acquiring an ID for identifying an acoustic model referred to by the collating means 101 at the time of collation;
2 is the ID acquired by the acoustic model ID acquisition unit 1001
To obtain audio data for adaptation from the input audio signal, and transmit the read ID and the obtained audio data for adaptation to the acoustic model management server 20 via the network. Means.

In the acoustic model management server 20 shown in FIG. 10, reference numeral 1003 denotes an initial acoustic model before adaptation;
Is a voice acquisition unit for adaptation 1002 of the voice recognition device 10
Adaptation of the initial acoustic model 1003 before the adaptation is performed using the adaptation audio data transmitted from, and the adapted acoustic model is adapted in association with the ID transmitted from the adaptation audio acquisition unit 1002. Stored acoustic model storage means 1
The acoustic model adaptation means 1006 stored in 005
Receiving the acoustic model update command 105, the acoustic model ID acquiring means 100 of the speech recognition apparatus 10 via the network
1 is an adapted acoustic model selecting means for receiving the acquired ID and selecting and reading out the adapted acoustic model corresponding to the received ID from the adapted acoustic model storage means 1005. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

A characteristic part of this embodiment that differs from the prior art is that the speech recognition apparatus 10
The acoustic model management server 20 includes an acquisition unit 1001, an adaptation voice acquisition unit 1002, and an initial acoustic model 1003, an acoustic model adaptation unit 1004, an adapted acoustic model storage unit 1005, and an adapted acoustic model selection unit 1006. It is to have.

In this embodiment, the speech recognition apparatus 10 acquires an acoustic model ID to be adapted and speech data for adaptation, and the acoustic model management server 20 performs adaptation depending on the acoustic model ID. The constructed acoustic model is constructed and transmitted to the speech recognition apparatus 10, and the speech model 102 updates the acoustic model 102 so that the user refers to the adapted acoustic model when using the arbitrary matching unit 101. Since it is possible, higher recognition accuracy can be obtained.

Next, the operation will be described. In the speech recognition apparatus 10, the acoustic model ID acquisition unit 1001 determines an acoustic model to be adapted, and is, for example, a user ID of a user who uses the speech recognition apparatus 10. The voice acquisition unit for adaptation 1002 is
Before using the audio signal 1, the audio signal 100 is used for adaptation.
Acquisition of audio data for adaptation by acoustic model ID
The acoustic model ID read from the acquisition unit 1001 and the acquired adaptation voice data are transmitted to the acoustic model adaptation unit 1004 of the acoustic model management server 20 connected via a network.

In the acoustic model management server 20, the initial acoustic model 1003 is an acoustic model before performing the adaptation. The acoustic model adapting means 1004 constructs an adapted acoustic model using the audio data for adaptation received via the network and the initial acoustic model 1003,
The adapted acoustic model and the acoustic model ID received via the network are stored in the adapted acoustic model storage unit 1005. For the adaptation of the acoustic model, for example, a maximum posterior probability estimation method is used.

Adapted acoustic model storage means 1005
Stores an acoustic model ID and an acoustic model adapted by the acoustic model adapting means 1004, and outputs an acoustic model having a designated acoustic model ID according to a request from the adapted acoustic model selecting means. The adapted acoustic model selecting means 1006 receives the acoustic model update command 105,
An acoustic model ID is acquired from the acoustic model ID acquiring means 1001, and a corresponding adapted acoustic model is selected and read from the adapted acoustic model storage means 1005.

FIG. 11 is a flowchart showing a process of updating the acoustic model 102 according to the fifth embodiment of the present invention. The processing is divided into an acoustic model adaptation stage in steps ST1101 to ST1107 and an acoustic model update stage in steps ST1108 to ST1112. In the adaptation stage, an acoustic model dependent on the acoustic model ID is constructed using the input speech data for adaptation.

In step ST1101, the acoustic model ID acquiring means 1001 of the speech recognition device 10 acquires identification information such as a user name for identifying an acoustic model to be adapted. In step ST1102,
The adaptation voice acquisition unit 1002 reads the acoustic model ID and acquires the adaptation voice data input from the voice signal 100. In step ST1103, the adaptation voice acquisition unit 1002 transmits an acoustic model adaptation request to the acoustic model management server 20 via the network, and simultaneously transmits the acoustic model ID and the adaptation voice data to the acoustic model adaptation unit. 1004.

In step ST1104, the acoustic model adapting means 1004 receives the acoustic model adaptation request via the network, and reads out the acoustic model ID and the audio data for adaptation. In step ST1105, the acoustic model adapting means 1004 sets the initial acoustic model 1
003 is read. In Step ST1106, the acoustic model adapting means 1004 adapts the initial acoustic model 1003 using the audio data for adaptation received via the network. In step ST1107, the acoustic model adapting means 1004 sets the adapted acoustic model so that it can be distinguished by the acoustic model ID.
It is stored in the adapted acoustic model storage means 1005.

In step ST1108, the acoustic model update timing determining means (not shown) determines an appropriate update timing based on the user's instruction, monitoring of the time interval from the last update time, network usage, and the like. The model update command 105 is transmitted to the adapted acoustic model selecting means 1006. Adapted acoustic model selecting means 1006 proceeds to step ST1109 if acoustic model update instruction 105 has been received, and ends the process if acoustic model update instruction 105 has not been received. In step ST1109, the adapted acoustic model selecting unit 1006 reads the acoustic model ID to be adapted from the acoustic model ID acquiring unit 1001 of the speech recognition device 10 via the network.

In step ST1110, the adapted acoustic model selecting means 1006 selects and reads out the acoustic model specified by the acoustic model ID from the adapted acoustic model storage means 1005. Step ST1111
In, the acoustic model transmitting unit 110 transmits the read-out adapted acoustic model to the acoustic model updating unit 111 of the speech recognition device 10 via the network. In step ST1112, the acoustic model updating unit 11
1 updates the acoustic model 102 referred to by the matching unit 101 with the received adapted acoustic model.

In the speech recognition apparatus 10, the acoustic model ID acquiring means 1001 determines the line transfer characteristic, the background noise characteristic, the reverberation sound characteristic, etc., if it determines the acoustic model to be adapted. It may be.

In the speech recognition apparatus 10, the adaptation speech acquisition means 1002 acquires the user's speech data for adaptation in advance before using the collation means 101. By acquiring the data and inputting the acquired data to the adaptation voice acquiring unit 1002, it is also possible to adapt the acoustic model to be referred to at the time of the next collation.

Further, in the speech recognition apparatus 10, the procedure of acquiring the data of the speech for adaptation and the procedure of acquiring the acoustic model ID may be reversed.

Further, in the speech recognition apparatus 10, the storage form of the speech data in the adaptation speech acquisition means 1002 is a speech waveform. However, the time series of the acoustic feature vector obtained by signal processing of the speech data, the acoustic feature vector and the vector quantum Codebook code sequence obtained by referring to a structured codebook, frequency distribution obtained by statistically processing them, probability distribution obtained from frequency distribution, etc., as long as they can be used for learning acoustic models But it doesn't matter.

Further, in the acoustic model management server 20, means for storing the audio data for adaptation received by the acoustic model adaptation means 1004 is added, and the initial model 1003 is created by the stored many audio data for adaptation. By updating, the learning accuracy is increased, and an adapted acoustic model that performs more accurate recognition can be constructed.

Furthermore, in this embodiment, speech recognition is targeted. However, the same applies to pattern recognition including a probability model representing the relationship between a pattern and a symbol and a probability model representing the appearance of a symbol. Applicable to

Furthermore, the speech recognition system according to the fifth embodiment can be recorded on a recording medium as a speech recognition program. In this case, in the speech recognition device 10, an acoustic model ID acquisition function that performs the same processing as the acoustic model ID acquisition unit 1001 and the adaptation speech acquisition unit 1
Adaptation voice acquisition function performing the same processing as 002, acoustic model updating function performing the same processing as the acoustic model updating unit 111, and software including a matching function performing the same processing as the matching unit 101; In the acoustic model management server 20, an acoustic model adapting function for performing the same processing as the acoustic model adapting means 1004, an adapted acoustic model storing function for performing the same processing as the adapted acoustic model storing means 1005, The software includes an adapted acoustic model selecting function for performing the same processing as that of the optimized acoustic model selecting means 1006, and an acoustic model transmitting function for performing the same processing as the acoustic model transmitting means 110, and forms a speech recognition program.

In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the acoustic model management server 20 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition apparatus 10 or the acoustic model management server 20 may transmit the information to the acoustic model management server 20 or the speech recognition apparatus 10, respectively.

As described above, according to the fifth embodiment, in the acoustic language model management server 20 connected to the network, the acoustic model that has been adapted by the audio data for adaptation by the user's audio signal 100 is used. By constructing and updating the acoustic model 102 of the user's voice recognition device 10 via the network, the effect of improving the recognition accuracy of voice recognition can be obtained without imposing a large burden on the user.

According to the fifth embodiment, the user uses a plurality of different matching units 101 by updating the acoustic model 102 via the network with the adapted acoustic model adapted to the user. Even in such a case, an appropriate acoustic model 102 is used when all the matching means 101 are used.
Is used to obtain an effect that a high recognition accuracy can be obtained.

[0170]

As described above, according to the present invention, the acoustic model management server acquires the updated acoustic data, transmits the constructed acoustic model to the speech recognition device via the network, and performs speech recognition. By updating the acoustic model referred to in the speech recognition by the device with the acoustic model transmitted by the acoustic model management server, the recognition accuracy of the speech recognition can be improved without imposing a large burden on the user. effective.

According to the present invention, the acoustic model management server acquires an ID for identifying an acoustic model to be referred to when the speech recognition device performs speech recognition, and responds to the specific condition indicated by the acquired ID. Read the updated sound data,
By constructing an acoustic model depending on specific conditions and transmitting it to the speech recognition device, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and an acoustic model customized for a specific application To
By transmitting via a network, there is an effect that even when a user uses a plurality of different speech recognition devices, an appropriate acoustic model can be used and high recognition accuracy can be obtained.

According to the present invention, the language model management server acquires the updated language data and transmits the language model constructed to the speech recognition device via the network, and the speech recognition device performs The language model referred to in
By updating with the language model transmitted by the language model management server, there is an effect that the recognition accuracy of speech recognition can be improved without imposing a large burden on the user.

According to the present invention, the language model management server acquires the ID for specifying the language model to be referred to when the speech recognition device recognizes the speech, and responds to the specific condition indicated by the acquired ID. Read the updated language data,
By constructing a language model depending on specific conditions and transmitting the language model to the speech recognition device, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and a language model customized for a specific purpose. To
By updating via a network, there is an effect that even when a user uses a plurality of different speech recognition devices, an appropriate language model is used and high recognition accuracy can be obtained.

According to the present invention, the speech recognition device refers to the user dictionary in which words are registered at the time of speech recognition, and the language model management server reads the user dictionary via the network, and updates the language data, By referring to the read user dictionary and constructing a language model dependent on the user dictionary and transmitting the language model to the speech recognition device, even if the user dictionary becomes large, the user can perform speech recognition without imposing a heavy burden on the user. There is an effect that recognition accuracy can be improved.

According to the present invention, the language model management server acquires the text used by the user of the speech recognition apparatus, refers to the updated language data and the acquired text, and creates a language model dependent on the text. By constructing and transmitting the speech to the speech recognition device, there is an effect that the recognition accuracy of speech recognition can be improved without imposing a large burden on the user.

According to the present invention, the speech recognition apparatus acquires an ID for specifying an acoustic model and speech data for adaptation from an inputted speech signal, and converts the acquired ID and speech data for adaptation. Is transmitted to the acoustic model management server via the network, and the acoustic model management server adapts the initial acoustic model using the transmitted speech data for adaptation, and transmits the adapted acoustic model to the transmitted ID. And in response to an external acoustic model update command, the voice recognition device
D, receives and reads the adapted acoustic model corresponding to the received ID from the stored adapted acoustic models, and transmits the read acoustic model to the speech recognition device via the network. By updating the acoustic model to be referred to at the time of speech recognition with the adapted acoustic model transmitted by the acoustic model management server, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user. At the same time, by updating the acoustic model via the network using the adapted acoustic model adapted to the user, even when the user uses a plurality of different speech recognition devices, the user can use an appropriate acoustic model to achieve high recognition. There is an effect that accuracy can be obtained.

According to the present invention, the speech recognition device is provided from the acoustic model management server connected via the network.
An acoustic model constructed by the updated acoustic data is received, and an acoustic model updating means for updating the acoustic model referred to by the matching means at the time of speech recognition based on the received acoustic model is provided. There is an effect that the recognition accuracy of voice recognition can be improved without imposing a burden.

According to the present invention, the acoustic model updating means is constructed by the acoustic model management server connected via the network and constructed by the updated acoustic data, and the acoustic model referred by the collating means at the time of speech recognition. Receiving the acoustic model depending on the specific condition of the user, and updating the acoustic model referred to by the matching means in the speech recognition based on the received acoustic model, thereby reducing the recognition accuracy of the speech recognition without imposing a large burden on the user. In addition to receiving a customized acoustic model via a network, even if the user uses a plurality of different matching means, the user can use an appropriate acoustic model and achieve high recognition accuracy. Is obtained.

According to the present invention, the speech recognition device is sent from the language model management server connected via the network.
Language model updating means for receiving a language model constructed based on the updated language data and updating the language model referred to by the matching means during speech recognition based on the received language model is provided. There is an effect that the recognition accuracy of voice recognition can be improved without imposing a burden.

According to the present invention, the language model updating means is constructed by the language model management server connected via the network and constructed by the updated language data, and the language model referred by the collating means at the time of speech recognition. Receiving the language model depending on the specific condition of the speech recognition, and updating the language model referred to by the matching means in the speech recognition with the received language model, without imposing a great burden on the user, and thereby improving the recognition accuracy of the speech recognition. By receiving a language model customized for a specific application via a network, a user can use an appropriate language model even when using a plurality of different matching means, and achieve high recognition accuracy. Is obtained.

According to the present invention, the collating means has a user dictionary in which words to be referred to at the time of speech recognition are registered, and the language model updating means is updated from the language model management server connected via the network. Receives a language model that depends on the user dictionary, constructed with language data,
By updating the language model referred to by the matching means during speech recognition with the received language model, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user even when the user dictionary becomes large. There is an effect that can be.

According to the present invention, the language model updating means depends on a text used by a user who performs speech recognition, constructed from updated language data from a language model management server connected via a network. By receiving the language model and updating the language model referred to by the collation unit at the time of speech recognition with the received language model, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user. This has the effect.

According to the present invention, there is provided an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and a matching means for inputting a speech signal, performing speech recognition with reference to the acoustic model, and outputting a recognition result. An acoustic model ID acquiring means for acquiring an ID for specifying an acoustic model, an acquired ID, readout of audio data for adaptation from an input audio signal, the read-out ID and the acquired audio for adaptation An adaptation sound acquisition unit for transmitting data to an acoustic model management server connected via a network; and an adapted sound adapted from the acoustic model management server by adaptation sound data corresponding to the ID. An acoustic model updating means for receiving the model and updating the acoustic model referred to by the matching means during speech recognition with the received adapted acoustic model Thus, the recognition accuracy of the voice recognition can be improved without imposing a great burden on the user, and the user can update the acoustic model via the network with the adapted acoustic model adapted for the user, thereby enabling the user to Even when a plurality of different matching means are used, there is an effect that high recognition accuracy can be obtained by using an appropriate acoustic model.

According to the present invention, the acoustic data acquiring means for acquiring updated acoustic data and the acoustic data updated in response to an external acoustic model update command are read out.
Acoustic model construction means for constructing an acoustic model for determining the probability of an acoustic observation sequence of speech, and sound for transmitting the acoustic model constructed by the acoustic model construction means to a speech recognition device for speech recognition via a network The provision of the model transmission means has an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the present invention, an acoustic data acquiring means for acquiring updated acoustic data and an ID for specifying an acoustic model to be referred to at the time of speech recognition in response to an acoustic model update command from outside are acquired. In accordance with the updated acoustic model ID acquisition means and the specific condition indicated by the acquired ID,
Specific acoustic data reading means for reading the updated acoustic data, a specific acoustic model construction means for referring to the read updated acoustic data, and constructing an acoustic model dependent on specific conditions, the constructed acoustic model, By providing the acoustic model transmitting means for transmitting to the voice recognition device via the network, it is possible to improve the recognition accuracy of the voice recognition without imposing a great burden on the user, and to customize the acoustics customized for the specific application. By transmitting the model via the network, there is an effect that even when the user uses a plurality of different speech recognition devices, an appropriate acoustic model can be used and high recognition accuracy can be obtained.

According to the present invention, the initial acoustic model before adaptation, which determines the probability of the acoustic observation value sequence of speech, and the adaptive acoustic model transmitted from the speech recognition device connected via the network. Receiving voice data and an ID for specifying an acoustic model to be referred to by the voice recognition device for voice recognition,
An acoustic model adapting means for adapting the initial acoustic model using the audio data for adaptation, storing the adapted acoustic model in the adapted acoustic model storing means in association with the received ID, and In response to the acoustic model update command, an ID is received from the speech recognition device via the network, and an adapted acoustic model corresponding to the received ID is
By providing an adapted acoustic model selecting means for selecting and reading from the adapted acoustic model storing means and an acoustic model transmitting means for transmitting the read adapted acoustic model to the speech recognition device via a network. It is possible to improve the recognition accuracy of speech recognition without imposing a large burden on the user, and to update the acoustic model via the network with the adapted acoustic model adapted to the user, thereby enabling the user to perform multiple operations. In the case of using a speech recognition device different from the above, there is an effect that high recognition accuracy can be obtained by using an appropriate acoustic model.

According to the present invention, language data acquiring means for acquiring updated language data and language data updated in response to an external language model update command are read out.
A language model constructing means for constructing a language model for obtaining a word string appearance probability; and a language model transmitting means for transmitting the language model constructed by the language model constructing means to a speech recognition apparatus for performing speech recognition via a network. As a result, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the present invention, a language data acquiring means for acquiring updated language data, and an ID for specifying a language model to be referred to at the time of speech recognition in response to an external language model update command. Update language model ID acquisition means, specific language data reading means for reading updated language data corresponding to a specific condition specified by the ID, and refer to the read updated language data to depend on the specific condition Specific language model construction means for constructing the language model, and a language model transmission means for transmitting the constructed language model to the speech recognition device via a network, without imposing a heavy burden on the user, It can improve the recognition accuracy of speech recognition, and can customize language models customized for specific applications via a network. By signal, utilizing the appropriate language model, even if the user uses a plurality of different speech recognition device, there is an effect that it is possible to obtain a high recognition accuracy.

According to the present invention, a language data acquiring means for acquiring updated language data and a speech recognition device connected via a network in response to a language model update command from outside are used for speech recognition. A user dictionary reading means for reading a user dictionary to be referred to, a user dictionary dependent language model building means for reading updated language data and building a language model dependent on the user dictionary, and voices the built language model via a network. With the provision of the language model transmitting means for transmitting to the recognition device, there is an effect that even when the user dictionary becomes large, the recognition accuracy of the voice recognition can be improved without imposing a large burden on the user.

According to the present invention, a language data acquiring means for acquiring updated language data, a user use text acquiring means for acquiring a text used by a user in response to an external language model update command, User-based text-dependent language model construction means for reading out the language data obtained and constructing a language model dependent on the acquired text, and language model transmission means for transmitting the constructed language model to a speech recognition device via a network. With the provision, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the present invention, the first step of acquiring updated acoustic data, the second step of reading out updated acoustic data in response to an acoustic model update command, and constructing an acoustic model, And a fourth step of receiving the acoustic model and updating the acoustic model to be referred to during speech recognition based on the received acoustic model. Accordingly, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the present invention, the first step of acquiring updated language data, the second step of reading language data updated in response to a language model update command, and constructing a language model, And a fourth step of receiving the language model via a network, and updating the language model referred to during speech recognition with the received language model. Accordingly, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the present invention, the first step of acquiring updated acoustic data and the I / O that specifies an acoustic model to be referred to during speech recognition upon receiving an acoustic model update command
A second step of acquiring D, and a third step of reading updated acoustic data corresponding to a specific condition indicated by the ID.
And a fourth step of constructing an acoustic model depending on specific conditions with reference to the updated acoustic data,
A fifth step of transmitting the constructed acoustic model via a network; and a sixth step of receiving the acoustic model and updating the acoustic model to be referred to at the time of speech recognition with the received acoustic model. This makes it possible to improve the recognition accuracy of voice recognition without imposing a large burden on the user, and to receive a customized acoustic model via a network so that the user can recognize a plurality of different voices. Even when a recognition method is used, there is an effect that a high recognition accuracy can be obtained by using an appropriate acoustic model.

According to the present invention, the first step of acquiring the updated language data and the I step of receiving the language model update command and specifying the language model to be referred to during speech recognition.
A second step of acquiring D, a third step of reading out updated language data corresponding to a specific condition indicated by the acquired ID, and referencing the updated language data, depending on the specific condition. A fourth step of constructing the constructed language model, a fifth step of transmitting the constructed language model via a network, and receiving the language model and referencing the language model to be referred to during speech recognition in the received language. And the sixth step of updating with a model, the recognition accuracy of speech recognition can be improved without imposing a great burden on the user, and a language model customized for specific use can be transmitted via a network. To obtain high recognition accuracy by using an appropriate language model even when the user uses a plurality of different speech recognition methods. There is an effect that can be.

According to the present invention, the first step of acquiring updated language data, the second step of receiving a language model update command and reading out a user dictionary to be referred to during speech recognition, and A third step of reading out the language data, and constructing a language model dependent on the user dictionary, a fourth step of transmitting the constructed language model via a network, and receiving the language model and performing speech recognition. And the fifth step of updating the language model to be referred to in accordance with the received language model, thereby improving the recognition accuracy of speech recognition without imposing a large burden on the user even when the user dictionary becomes large. There is an effect that can be made.

According to the present invention, the first step of acquiring updated language data, the second step of acquiring a text used by a user in response to a language model update command, and the step of acquiring updated language data And constructing a language model dependent on the text, and a fourth step of transmitting the constructed language model via a network.
And a fifth step of receiving a language model and updating a language model to be referred to in speech recognition with the received language model. This has the effect of improving the recognition accuracy of.

According to the present invention, the first step of obtaining an ID for specifying an acoustic model, reading out the obtained ID, obtaining audio data for adaptation from the input audio signal, and via the network A second step of transmitting the read ID and the acquired audio data for adaptation, and using the transmitted audio data for adaptation to adapt the initial acoustic model before adaptation, And the third step of storing the ID acquired in the first step via the network in response to the acoustic model update command and storing the received I
A fourth step of selecting and reading out the adapted acoustic model corresponding to D from the adapted acoustic models stored in the third step, and an adapted acoustic model read out in the fourth step And the sixth step of receiving the transmitted adapted acoustic model and updating the acoustic model to be referred to at the time of speech recognition with the received adapted acoustic model. With this configuration, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and the acoustic model is updated via the network by the adapted acoustic model adapted to the user. Thus, even when the user uses a plurality of different voice recognition methods, it is possible to obtain a high recognition accuracy by using an appropriate acoustic model. There is an effect.

According to the present invention, an acoustic data acquisition function for acquiring updated acoustic data on a recording medium on which a speech recognition program is recorded, and an updated acoustic data in response to an acoustic model update command are read out. An acoustic model construction function for constructing a sound model, an acoustic model transmission function for transmitting the constructed acoustic model via a network, an acoustic model for receiving the acoustic model, and an acoustic model referred to by the matching function for speech recognition. By realizing the acoustic model update function of updating with a model, there is an effect that the recognition accuracy of speech recognition can be improved without imposing a large burden on the user.

According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
A language model construction function for reading updated language data acquired by the language data acquisition function and constructing a language model; and a language model transmission function for transmitting the language model constructed by the language model construction function via a network. Receiving the language model transmitted by the language model transmission function, and realizing a language model update function of updating the language model referred to by the collation function at the time of speech recognition with the received language model, thereby enabling the user There is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden.

According to the present invention, an acoustic data acquisition function for acquiring updated acoustic data on a recording medium on which a speech recognition program is recorded, and an acoustic model update instruction
Updated acoustic model I for acquiring ID for specifying acoustic model
D Acquisition function, a specific acoustic data reading function for reading updated acoustic data corresponding to a specific condition indicated by the acquired ID, and an acoustic model dependent on the specific condition with reference to the updated acoustic data An acoustic model construction function for constructing an acoustic model, an acoustic model transmission function for transmitting an acoustic model via a network, an acoustic model for receiving an acoustic model, and a collation function for referencing an acoustic model for speech recognition. By realizing the acoustic model update function of updating with a model, it is possible to improve the recognition accuracy of speech recognition without imposing a great burden on the user, and to provide a customized acoustic model via a network. By receiving the information, the appropriate acoustic model is used even when the user uses multiple matching functions, and high recognition is achieved. There is an effect that it is possible to obtain accuracy.

According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
Updated language model I for acquiring ID specifying language model
D acquisition function, specific language data read function for reading updated language data corresponding to the specific condition indicated by the obtained ID, and language model dependent on the specific condition by referring to the updated language data A language model construction function for constructing a language model, a language model transmission function for transmitting the constructed language model via a network, a language model for receiving the language model, and a language model for the collation function to refer to during speech recognition. By realizing the language model update function of updating with a language model that has been updated, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and a language model customized for specific use can be transferred to a network. Through the use of the appropriate language model even if the user uses multiple matching functions. There is an effect that it is possible to obtain a high recognition accuracy.

According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
A user dictionary reading function for reading a user dictionary, a user dictionary dependent language model building function for reading updated language data and building a language model dependent on the user dictionary, and a language for transmitting the built language model via a network When the user dictionary becomes large by realizing a model transmission function and a language model update function of receiving a language model and updating the language model referred by the collation function for speech recognition with the received language model However, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
A user use text acquisition function for acquiring a text used by a user, a user use text dependent language model construction function for reading updated language data and constructing a text dependent language model, and a constructed language model
A language model transmission function for transmission via a network,
A language model is received, and a language model updating function of updating the language model referred to by the collating function at the time of speech recognition with the received language model is realized. There is an effect that recognition accuracy can be improved.

According to the present invention, an acoustic model ID acquisition function for acquiring an ID for specifying an acoustic model on a recording medium on which a speech recognition program is recorded, an acquired ID is read, and an adaptation And an adaptation audio acquisition function of transmitting the read ID and the acquired adaptation audio data via a network, and using the transmitted adaptation audio data before adaptation. The acoustic model adaptation function that adapts the initial acoustic model of the above and stores the adapted acoustic model in association with the transmitted ID, and the acoustic model ID acquisition function acquires via the network in response to the acoustic model update command Received, and selects an adapted acoustic model corresponding to the received ID from the adapted acoustic models stored by the acoustic model adaptation function. A function for selecting an adapted acoustic model to be read, an acoustic model transmitting function for transmitting the read adapted acoustic model via a network, and receiving an adapted acoustic model, and referencing the matching function for speech recognition. By realizing the acoustic model updating function of updating the acoustic model with the received adapted acoustic model, it is possible to improve the recognition accuracy of speech recognition without imposing a large burden on the user and to adapt to the user. By updating the acoustic model via the network using the optimized adapted acoustic model, even when the user uses a plurality of different matching functions, an appropriate acoustic model can be used and high recognition accuracy can be obtained. This has the effect.

[Brief description of the drawings]

FIG. 1 is a block diagram showing a configuration of a speech recognition device according to a first embodiment of the present invention.

FIG. 2 is a flowchart showing a procedure of an acoustic model updating process according to the first embodiment of the present invention.

FIG. 3 is a flowchart showing a procedure of a language model update process according to the first embodiment of the present invention;

FIG. 4 is a block diagram showing a configuration of a speech recognition device according to a second embodiment of the present invention.

FIG. 5 is a flowchart showing a procedure of a language model update process according to the second embodiment of the present invention;

FIG. 6 is a block diagram showing a configuration of a voice recognition device according to a third embodiment of the present invention.

FIG. 7 is a flowchart showing a procedure of a language model update process according to Embodiment 3 of the present invention;

FIG. 8 is a block diagram showing a configuration of a voice recognition device according to a fourth embodiment of the present invention.

FIG. 9 is a flowchart showing a procedure of a language model update process according to Embodiment 4 of the present invention.

FIG. 10 is a block diagram showing a configuration of a speech recognition device according to a fifth embodiment of the present invention.

FIG. 11 is a flowchart showing a procedure of an acoustic model updating process according to the fifth embodiment of the present invention.

FIG. 12 is a block diagram illustrating a configuration of a conventional voice recognition device.

FIG. 13 is a flowchart showing a procedure of a conventional voice recognition process.

FIG. 14 is a block diagram illustrating a configuration of a conventional voice recognition device.

FIG. 15 is a block diagram showing a configuration of a conventional speech recognition device.

FIG. 16 is a diagram illustrating a configuration example of a user dictionary 601.

FIG. 17 is a flowchart showing a procedure of a conventional voice recognition process.

[Explanation of symbols]

10 speech recognition device, 20 acoustic model management server, 3
0 language model management server, 100 voice signal, 101
Collation means, 102 acoustic model, 103 language model, 104 recognition result, 105 acoustic model update command,
106 sound data acquisition means, 107 updated sound data, 108 sound model construction means, 109 sound model storage means, 110 sound model transmission means, 111
Acoustic model updating means, 112 language model updating command, 1
13 language data acquisition means, 114 updated language data, 115 language model construction means, 116 language model storage means, 117 language model transmission means, 118 language model update means, 401 updated language model ID acquisition means,
402 Specific language data reading means, 403 Specific language model construction means, 601 User dictionary, 60
2 User dictionary reading means, 603 User dictionary dependent language model construction means, 801 User use text acquisition means, 802 User use text storage means, 803
User-dependent text-dependent language model construction means, 1001
Acoustic model ID acquisition means, 1002 Adaptation speech acquisition means, 1003 Initial acoustic model, 1004 Acoustic model adaptation means, 1005 Adapted acoustic model storage means, 1006 Adapted acoustic model selection means.

Claims (35)

[Claims]
1. A speech recognition apparatus for receiving a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of the speech, and outputting a recognition result. And an acoustic model management server configured to acquire updated acoustic data and construct the acoustic model, the acoustic model being constructed by the acoustic model management server, and And the voice recognition device updates the acoustic model to be referred to during speech recognition with the acoustic model transmitted by the acoustic model management server.
2. An acoustic model management server acquires an ID for identifying an acoustic model to be referred to when a speech recognition device recognizes speech, and updates an acoustic model corresponding to a specific condition indicated by the acquired ID. 2. The speech recognition system according to claim 1, wherein data is read out, an acoustic model depending on the specific condition is constructed, and transmitted to the speech recognition device.
3. A speech recognition device that inputs a speech signal, performs speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputs a recognition result. The speech recognition device is connected to the speech recognition device via a network. A language model management server that acquires updated language data and constructs the language model, and transmits the language model constructed by the language model management server to the speech recognition device. A speech recognition system, characterized in that a speech recognition device updates a language model referred to during speech recognition with the language model transmitted by the language model management server.
4. A language model management server acquires an ID for identifying a language model to be referred to when the speech recognition device performs speech recognition, and updates the language in accordance with a specific condition indicated by the acquired ID. 4. The speech recognition system according to claim 3, wherein data is read, a language model depending on the specific condition is constructed, and transmitted to the speech recognition device.
5. A speech recognition device refers to a user dictionary in which words are registered at the time of speech recognition, and a language model management server reads the user dictionary via a network, and updates the language data and the read language data. 4. The speech recognition system according to claim 3, wherein a language model dependent on the user dictionary is constructed with reference to the user dictionary and transmitted to the speech recognition device.
6. A language model management server acquires a text used by a user of the speech recognition device, refers to updated language data and the acquired text, and constructs a language model dependent on the text. 4. The speech recognition system according to claim 3, wherein the speech is transmitted to the speech recognition device.
7. A speech recognition device that inputs a speech signal, performs speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation value sequence of the speech, and outputs a recognition result. And a sound model management server having an initial sound model before adaptation, wherein the sound recognition device has an ID for specifying the sound model.
And audio data for adaptation are acquired from the input audio signal, and the acquired ID and audio data for adaptation are transmitted to the acoustic model management server via a network. Using the transmitted adaptation audio data, the initial acoustic model is adapted, the adapted acoustic model is stored in association with the transmitted ID, and an external acoustic model update command is issued. Receiving an ID specifying the acoustic model from the speech recognition device via the network, and selecting an adapted acoustic model corresponding to the received ID from the stored adapted acoustic models. The voice recognition device transmits the voice model to the voice recognition device via the network to the voice recognition device. There speech recognition system and updates the transmitted adapted pre acoustic model.
8. A speech system comprising: an acoustic model for obtaining a probability of an acoustic observation value sequence of speech; and a matching unit for inputting a speech signal, performing speech recognition with reference to the acoustic model, and outputting a recognition result. In the recognition device, an acoustic model constructed based on the updated acoustic data is received from an acoustic model management server connected via a network, and the matching unit refers to the received acoustic model for speech recognition. A speech recognition device comprising: an acoustic model updating unit that updates the model based on a model.
9. An acoustic model updating unit according to an acoustic model management server connected via a network, based on updated acoustic data, based on a specific condition of an acoustic model to be referred to by the collating unit for speech recognition. 9. The speech recognition apparatus according to claim 8, wherein a dependent acoustic model is received, and an acoustic model referred to by the matching means at the time of speech recognition is updated with the received acoustic model.
10. A speech recognition apparatus comprising: a language model for obtaining an appearance probability of a word string; and a matching unit for inputting a speech signal, performing speech recognition with reference to the language model, and outputting a recognition result. A language model constructed based on the updated language data from a language model management server connected via a language model, and updating the language model referred to by the matching means at the time of speech recognition with the received language model A speech recognition device comprising: a model updating unit.
11. A language model updating unit according to a language model management server connected via a network, based on a specific condition of a language model constructed by updated language data and referred to by a collating unit during speech recognition. 11. The speech recognition apparatus according to claim 10, wherein a language model dependent on the language model is received, and a language model referred to by the matching unit during speech recognition is updated with the received language model.
12. The collation means includes a user dictionary in which words to be referred to in speech recognition are registered, and the language model updating means is constructed from the language model management server connected via a network with updated language data. Receiving the language model dependent on the user dictionary referred to by the matching means for speech recognition, and updating the language model referred to by the matching means for speech recognition with the received language model. Claim 10
The speech recognition device according to the above.
13. A language model updating unit receives a language model dependent on a text used by a user who performs speech recognition, constructed from updated language data, from a language model management server connected via a network. And
A language model referred to by the matching means during speech recognition,
The speech recognition device according to claim 10, wherein the speech recognition device is updated based on the received language model.
14. An acoustic model for obtaining a probability of an acoustic observation value sequence of speech, a matching means for inputting a speech signal, performing speech recognition with reference to the acoustic model, and outputting a recognition result, Acoustic model I for acquiring ID for specifying
D acquisition means, reads the ID acquired by the acoustic model ID acquisition means, acquires audio data for adaptation from the input audio signal, and transmits the read ID and the acquired audio data for adaptation to the network. An adaptation sound acquisition unit for transmitting to the acoustic model management server connected via the control unit; and an adaptation sound model adapted by the adaptation sound data corresponding to the ID from the acoustic model management server. A speech recognition apparatus, comprising: an acoustic model updating unit that receives and updates an acoustic model referred to by the matching unit during speech recognition based on the received adapted acoustic model.
15. An acoustic data acquiring means for acquiring updated acoustic data, and an acoustic model updating command received from the outside, the acoustic data acquiring means reads the acquired acoustic data, and reads the acquired acoustic data. Acoustic model construction means for constructing an acoustic model for obtaining a probability of a series of observation values, and acoustic model transmission means for transmitting the acoustic model constructed by the acoustic model construction means to a speech recognition apparatus for performing speech recognition via a network. An acoustic model management server comprising:
16. An acoustic data acquisition unit for acquiring updated acoustic data, and an acoustic model referred to by a speech recognition device connected via a network upon receiving an acoustic model update command from the outside when performing speech recognition. Updated acoustic model ID acquiring means for acquiring an ID specifying the updated acoustic model ID, and updated acoustic data acquired by the acoustic data acquiring means corresponding to a specific condition indicated by the ID acquired by the updated acoustic model ID acquiring means Specific acoustic data reading means for reading the specific acoustic data reading means, a specific acoustic model constructing means for constructing an acoustic model dependent on the specific condition with reference to the updated acoustic data read by the specific acoustic data reading means, Model for transmitting the acoustic model constructed by the acoustic model constructing means to the speech recognition apparatus via a network. Acoustic model management server characterized by comprising a transmission unit.
17. An initial acoustic model before adaptation for obtaining the probability of an acoustic observation value sequence of speech, speech data for adaptation transmitted from a speech recognizer connected via a network, The speech recognition device receives an ID for specifying an acoustic model to be referred to during speech recognition, adapts the initial acoustic model using the adaptation speech data, and receives the adapted acoustic model. An acoustic model adapting unit that stores the acoustic model in the adapted acoustic model storing unit in association with the ID; and an external acoustic model update command. An adapted acoustic model selecting means for selecting and reading out an adapted acoustic model corresponding to the ID from the adapted acoustic model storage means, and the adapted acoustic model An acoustic model management server comprising: an acoustic model transmitting unit that transmits the adapted acoustic model read by the selecting unit to the speech recognition device via a network.
18. A language data acquiring means for acquiring updated language data, and in response to a language model update command from outside, reads out the updated language data acquired by the language data acquiring means, and generates a word string. Language model construction means for constructing a language model for obtaining a probability; and language model transmission means for transmitting the language model constructed by the language model construction means to a speech recognition device that performs speech recognition via a network. Characteristic language model management server.
19. A language data acquisition means for acquiring updated language data, and a language model referred to by a speech recognition device connected via a network upon receiving a language model update command from the outside in speech recognition. Language model ID acquiring means for acquiring an ID specifying the language, and updated language data acquired by the language data acquiring means in response to a specific condition indicated by the ID acquired by the updated language model ID acquiring means Specific language data reading means for reading, specific language model constructing means for constructing a language model dependent on the specific condition with reference to the updated language data read by the specific language data reading means, Language model for transmitting the language model constructed by the language model constructing means to the speech recognition device via a network. Language model management server characterized by comprising a transmission unit.
20. A language data acquisition means for acquiring updated language data, and a user dictionary which, upon receiving a language model update command from the outside, refers to a speech recognition device connected via a network when performing speech recognition. A user dictionary reading unit for reading the updated language data obtained by the language data obtaining unit, and a user dictionary dependent language model building unit for building a language model dependent on the user dictionary read by the user dictionary reading unit. A language model management server, comprising: a language model transmission unit that transmits a language model constructed by the user dictionary dependent language model construction unit to the speech recognition device via a network.
21. A language data acquiring means for acquiring updated language data, and a user who receives a language model update command from outside and acquires text used by a user of a speech recognition device connected via a network. Use text acquisition means, a user use text dependent language model construction means for reading updated language data acquired by the language data acquisition means, and constructing a language model dependent on the text acquired by the user use text acquisition means, A language model management server, comprising: a language model transmission unit that transmits a language model constructed by the user-used text-dependent language model construction unit to the speech recognition device via a network.
22. A speech recognition method for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of the speech, and outputting a recognition result. A first step of acquiring, an acoustic model update command, receiving the updated acoustic data acquired in the first step, a second step of constructing an acoustic model, and a second step of constructing the acoustic model. A third step of transmitting the obtained acoustic model via a network, receiving the acoustic model transmitted in the third step, and updating the acoustic model to be referred to at the time of the speech recognition with the received acoustic model. And a fourth step.
23. A speech recognition method for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. Receiving a language model update command, reading the updated language data acquired in the first step, and constructing a language model; and a language step constructed in the second step, A third step of transmitting via the network, a fourth step of receiving the language model transmitted in the third step, and updating the language model to be referred to at the time of the speech recognition with the received language model; A speech recognition method comprising:
24. A speech recognition method for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, and outputting a recognition result. A first step of acquiring, an acoustic model update instruction, a second step of acquiring an ID for specifying an acoustic model to be referred to at the time of speech recognition, and an instruction designated by the ID acquired in the second step. A third step of reading out the updated acoustic data acquired in the first step in response to the specific condition, and referring to the updated acoustic data read in the third step. A fourth step of constructing a dependent acoustic model, a fifth step of transmitting the acoustic model constructed in the fourth step via a network, and a fifth step Receiving the acoustic model transmitted in step (a), and updating the acoustic model to be referred to in the speech recognition with the received acoustic model.
25. A speech recognition method for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. Step, a second step of receiving an instruction to update the language model, and acquiring an ID for identifying a language model to be referred to in speech recognition, and corresponding to a specific condition indicated by the ID acquired in the second step. And a third step of reading out the updated language data obtained in the first step; and referring to the updated language data read out in the third step, a language model depending on the specific condition is referred to. A fourth step of constructing, a fifth step of transmitting the language model constructed in the fourth step via a network, and a language transmitted in the fifth step And a sixth step of receiving the model and updating a language model referred to during speech recognition with the received language model.
26. A speech recognition method for inputting a speech signal and obtaining a probability of occurrence of a word string and performing speech recognition with reference to a user dictionary in which words are registered, and outputting a recognition result. A first step of acquiring language data; a second step of receiving a language model update command and reading a user dictionary to be referred to during speech recognition; and updating the updated language data acquired in the first step. A third step of reading and constructing a language model dependent on the user dictionary read in the second step; a fourth step of transmitting the language model constructed in the third step via a network; Fifth step of receiving the language model transmitted in the fourth step and updating a language model referred to during speech recognition with the received language model Speech recognition method characterized by comprising a.
27. A speech recognition method for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. Receiving a language model update command, a second step of obtaining a text used by a user performing voice recognition, and reading the updated language data obtained in the first step, A third step of constructing a language model dependent on the text obtained in step 3, a fourth step of transmitting the language model constructed in the third step via a network, and a fourth step of transmitting the language model in the fourth step. A fifth step of receiving a language model and updating a language model referred to during speech recognition with the received language model. Voice recognition method.
28. A speech recognition method for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, and outputting a recognition result, wherein the acoustic model is specified. A first step of acquiring an ID; reading the ID acquired in the first step; acquiring audio data for adaptation from the input audio signal; and via a network, the read ID and the acquired adaptation. A second step of transmitting audio data for adaptation, and adapting the initial acoustic model before adaptation using the audio data for adaptation transmitted in the second step, ID sent in the second step above
Receiving the ID acquired in the first step via a network in response to the acoustic model update command,
A fourth step of selecting and reading out the adapted acoustic model corresponding to D from the adapted acoustic model stored in the third step, and the adapted acoustic model read in the fourth step A fifth step of transmitting an acoustic model via a network, receiving the adapted acoustic model transmitted in the fifth step, and converting the acoustic model to be referred to in speech recognition to the received adapted acoustic model. And a sixth step of updating with a model.
29. A speech recognition program for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation value sequence of the speech, and outputting a recognition result is recorded. A recording medium, comprising: an acoustic data acquisition function for acquiring updated acoustic data; and an acoustic model for receiving an acoustic model update command, reading the updated acoustic data acquired by the acoustic data acquisition function, and constructing an acoustic model. A model construction function; an acoustic model transmission function for transmitting the acoustic model constructed by the acoustic model construction function via a network; and an acoustic model transmitted by the acoustic model transmission function. Record a speech recognition program that realizes the acoustic model update function that updates the acoustic model to be referred to when receiving the acoustic model Computer readable recording medium.
30. A recording medium on which a speech recognition program for inputting a speech signal, performing a speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result is provided. A language data acquisition function for acquiring updated language data, and a language model construction function for receiving a language model update command, reading the updated language data acquired by the language data acquisition function, and constructing a language model. A language model transmission function for transmitting the language model constructed by the language model construction function via a network, a language model for receiving the language model transmitted by the language model transmission function, and referring to the collation function for speech recognition A computer that records a speech recognition program that implements a language model update function of updating a model with a received language model. Readable recording medium.
31. A speech recognition program for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation value sequence of the speech, and outputting a recognition result is recorded. A recording medium, an acoustic data acquiring function for acquiring updated acoustic data, an updated acoustic model ID acquiring function for acquiring an ID for specifying the acoustic model in response to an acoustic model update command, In response to a specific condition indicated by the ID obtained by the ID obtaining function, a specific sound data reading function for reading the updated sound data obtained by the sound data obtaining function; A specific acoustic model construction function for constructing an acoustic model depending on the specific condition with reference to the updated acoustic data, An acoustic model transmission function for transmitting the acoustic model constructed by the regular acoustic model construction function via a network, and an acoustic model transmitted by the acoustic model transmission function are received, and the collation function refers to the acoustic model when performing speech recognition. A computer-readable recording medium storing a speech recognition program for realizing an acoustic model updating function of updating an acoustic model with a received acoustic model.
32. A recording medium on which a speech recognition program for inputting a speech signal, performing a speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result is provided. A language data acquisition function for acquiring updated language data, an update language model ID acquisition function for acquiring an ID for specifying the language model in response to a language model update command, and an update language model ID acquisition function for acquiring A specific language data reading function for reading the updated language data obtained by the language data obtaining function in response to the specific condition indicated by the specified ID; and an updated language read by the specific language data reading function. A language model construction function for constructing a language model depending on the specific conditions by referring to the data; A language model transmission function for transmitting the language model constructed by the file construction function via a network, a language model transmitted by the language model transmission function, and a language model referenced by the collation function for speech recognition. And a computer-readable recording medium on which a speech recognition program for realizing a language model updating function for updating with a received language model is recorded.
33. A speech recognition program for realizing a matching function of performing speech recognition by inputting a speech signal and calculating a word string appearance probability and a user dictionary in which words are registered, and outputting a recognition result. A language data acquiring function for acquiring updated language data, a user dictionary reading function for reading the user dictionary in response to a language model update command, and a language data acquiring function for acquiring the language data. A user dictionary dependent language model construction function for reading updated language data and constructing a language model dependent on the user dictionary read by the user dictionary read function; and a language model constructed by the user dictionary dependent language model construction function. A language model transmission function for transmitting via a network, and a language model transmitted by the language model transmission function. Receiving the computer-readable recording medium recording a speech recognition program language model reference, to realize a language model update function for updating the language model received during the verification function is speech recognition.
34. A recording medium which records a speech recognition program for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. A language data acquisition function for acquiring updated language data, a user use text acquisition function for acquiring a text used by a user who performs speech recognition in response to a language model update command, and a language data acquisition function for acquiring A user-based text-dependent language model construction function for reading updated language data and constructing a language model dependent on the text acquired by the user-based text acquisition function; and a language model constructed by the user-based text-dependent language model construction function Language model transmission function for transmitting A computer-readable recording medium storing a speech recognition program for receiving a received language model and realizing a language model update function of updating the language model referred to by the matching function during speech recognition with the received language model .
35. A speech recognition program for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of the speech, and outputting a recognition result. A recording medium, and an acoustic model I for acquiring an ID for specifying the acoustic model
D acquisition function, reads the ID acquired by the acoustic model ID acquisition function, acquires speech data for adaptation from the input speech signal, and reads the read ID and acquired speech for adaptation via a network. Adaptation voice acquisition function for transmitting data, and adaptation of the initial acoustic model before adaptation using the speech data for adaptation transmitted by the speech acquisition function for adaptation,
The acoustic model adaptation function of storing the adapted acoustic model in association with the ID transmitted by the adaptation voice acquisition function, and the acoustic model ID acquisition function via a network in response to an acoustic model update command Receiving an acquired ID, and selecting and reading out an adapted acoustic model corresponding to the received ID from the adapted acoustic models stored by the acoustic model adaptation function; An acoustic model transmitting function of transmitting the adapted acoustic model read by the adapted acoustic model selecting function via a network; and receiving the adapted acoustic model transmitted by the acoustic model transmitting function, and performing the matching function. And an acoustic model updating function of updating an acoustic model referred to during speech recognition by the received adapted acoustic model. A computer-readable recording medium for recording the voice recognition program.
JP2000280674A 2000-09-14 2000-09-14 Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program Abandoned JP2002091477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2000280674A JP2002091477A (en) 2000-09-14 2000-09-14 Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2000280674A JP2002091477A (en) 2000-09-14 2000-09-14 Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program

Publications (1)

Publication Number Publication Date
JP2002091477A true JP2002091477A (en) 2002-03-27

Family

ID=18765461

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000280674A Abandoned JP2002091477A (en) 2000-09-14 2000-09-14 Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program

Country Status (1)

Country Link
JP (1) JP2002091477A (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003049080A1 (en) * 2001-11-30 2003-06-12 Dictaphone Corporation Distributed speech recognition system with speech recognition engines offering multiple fuctionalites
WO2005010868A1 (en) * 2003-07-29 2005-02-03 Mitsubishi Denki Kabushiki Kaisha Voice recognition system and its terminal and server
US7133829B2 (en) 2001-10-31 2006-11-07 Dictaphone Corporation Dynamic insertion of a speech recognition engine within a distributed speech recognition system
US7146321B2 (en) 2001-10-31 2006-12-05 Dictaphone Corporation Distributed speech recognition system
US7236931B2 (en) 2002-05-01 2007-06-26 Usb Ag, Stamford Branch Systems and methods for automatic acoustic speaker adaptation in computer-assisted transcription systems
US7292975B2 (en) 2002-05-01 2007-11-06 Nuance Communications, Inc. Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription
WO2007142102A1 (en) * 2006-05-31 2007-12-13 Nec Corporation Language model learning system, language model learning method, and language model learning program
KR100845825B1 (en) * 2005-06-20 2008-07-14 서울산업대학교 산학협력단 recirculation device for infiltrated water in the reclaimed land utilizing waste tire chip
JP2008529101A (en) * 2005-02-03 2008-07-31 ボイス シグナル テクノロジーズ インコーポレイテッドVoice Signal Technologies,Inc. Method and apparatus for automatically expanding the speech vocabulary of a mobile communication device
JP2009145435A (en) * 2007-12-12 2009-07-02 O Chuhei System and method for providing unspecified speaker speech recognition engine used in a plurality of apparatuses to individual user via the internet
JP2009294269A (en) * 2008-06-03 2009-12-17 Nec Corp Speech recognition system
US7698138B2 (en) 2003-01-15 2010-04-13 Panasonic Corporation Broadcast receiving method, broadcast receiving system, recording medium, and program
JP2011064913A (en) * 2009-09-16 2011-03-31 Ntt Docomo Inc Telephone system, terminal device, voice model updating device, and voice model updating method
US8032372B1 (en) 2005-09-13 2011-10-04 Escription, Inc. Dictation selection
US8108212B2 (en) 2007-03-13 2012-01-31 Nec Corporation Speech recognition method, speech recognition system, and server thereof
CN103578465A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Speech recognition method and electronic device
US8719021B2 (en) 2006-02-23 2014-05-06 Nec Corporation Speech recognition dictionary compilation assisting system, speech recognition dictionary compilation assisting method and speech recognition dictionary compilation assisting program
WO2015148333A1 (en) * 2014-03-27 2015-10-01 Microsoft Technology Licensing, Llc Flexible schema for language model customization
US9311298B2 (en) 2013-06-21 2016-04-12 Microsoft Technology Licensing, Llc Building conversational understanding systems using a toolset
US9324321B2 (en) 2014-03-07 2016-04-26 Microsoft Technology Licensing, Llc Low-footprint adaptation and personalization for a deep neural network
US9367490B2 (en) 2014-06-13 2016-06-14 Microsoft Technology Licensing, Llc Reversible connector for accessory devices
US9384334B2 (en) 2014-05-12 2016-07-05 Microsoft Technology Licensing, Llc Content discovery in managed wireless distribution networks
US9384335B2 (en) 2014-05-12 2016-07-05 Microsoft Technology Licensing, Llc Content delivery prioritization in managed wireless distribution networks
JP2016128924A (en) * 2010-05-19 2016-07-14 サノフィ−アベンティス・ドイチュラント・ゲゼルシャフト・ミット・ベシュレンクテル・ハフツング Interaction and/or change of operation data of instruction determination process
US9430667B2 (en) 2014-05-12 2016-08-30 Microsoft Technology Licensing, Llc Managed wireless distribution network
US9520127B2 (en) 2014-04-29 2016-12-13 Microsoft Technology Licensing, Llc Shared hidden layer combination for speech recognition systems
US9589565B2 (en) 2013-06-21 2017-03-07 Microsoft Technology Licensing, Llc Environmentally aware dialog policies and response generation
US9614724B2 (en) 2014-04-21 2017-04-04 Microsoft Technology Licensing, Llc Session-based device configuration
US9717006B2 (en) 2014-06-23 2017-07-25 Microsoft Technology Licensing, Llc Device quarantine in a wireless network
US9728184B2 (en) 2013-06-18 2017-08-08 Microsoft Technology Licensing, Llc Restructuring deep neural network acoustic models
US9874914B2 (en) 2014-05-19 2018-01-23 Microsoft Technology Licensing, Llc Power management contracts for accessory devices
US10111099B2 (en) 2014-05-12 2018-10-23 Microsoft Technology Licensing, Llc Distributing content in managed wireless distribution networks
US10412439B2 (en) 2002-09-24 2019-09-10 Thomson Licensing PVR channel and PVR IPG information

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7133829B2 (en) 2001-10-31 2006-11-07 Dictaphone Corporation Dynamic insertion of a speech recognition engine within a distributed speech recognition system
US7146321B2 (en) 2001-10-31 2006-12-05 Dictaphone Corporation Distributed speech recognition system
US6785654B2 (en) 2001-11-30 2004-08-31 Dictaphone Corporation Distributed speech recognition system with speech recognition engines offering multiple functionalities
WO2003049080A1 (en) * 2001-11-30 2003-06-12 Dictaphone Corporation Distributed speech recognition system with speech recognition engines offering multiple fuctionalites
US7236931B2 (en) 2002-05-01 2007-06-26 Usb Ag, Stamford Branch Systems and methods for automatic acoustic speaker adaptation in computer-assisted transcription systems
US7292975B2 (en) 2002-05-01 2007-11-06 Nuance Communications, Inc. Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription
US10412439B2 (en) 2002-09-24 2019-09-10 Thomson Licensing PVR channel and PVR IPG information
US7698138B2 (en) 2003-01-15 2010-04-13 Panasonic Corporation Broadcast receiving method, broadcast receiving system, recording medium, and program
WO2005010868A1 (en) * 2003-07-29 2005-02-03 Mitsubishi Denki Kabushiki Kaisha Voice recognition system and its terminal and server
JP2013047809A (en) * 2005-02-03 2013-03-07 Naunce Communications Inc Methods and apparatus for automatically extending voice vocabulary of mobile communications devices
JP2008529101A (en) * 2005-02-03 2008-07-31 ボイス シグナル テクノロジーズ インコーポレイテッドVoice Signal Technologies,Inc. Method and apparatus for automatically expanding the speech vocabulary of a mobile communication device
KR100845825B1 (en) * 2005-06-20 2008-07-14 서울산업대학교 산학협력단 recirculation device for infiltrated water in the reclaimed land utilizing waste tire chip
US8032372B1 (en) 2005-09-13 2011-10-04 Escription, Inc. Dictation selection
US8719021B2 (en) 2006-02-23 2014-05-06 Nec Corporation Speech recognition dictionary compilation assisting system, speech recognition dictionary compilation assisting method and speech recognition dictionary compilation assisting program
WO2007142102A1 (en) * 2006-05-31 2007-12-13 Nec Corporation Language model learning system, language model learning method, and language model learning program
US8831943B2 (en) 2006-05-31 2014-09-09 Nec Corporation Language model learning system, language model learning method, and language model learning program
JP5088701B2 (en) * 2006-05-31 2012-12-05 日本電気株式会社 Language model learning system, language model learning method, and language model learning program
US8108212B2 (en) 2007-03-13 2012-01-31 Nec Corporation Speech recognition method, speech recognition system, and server thereof
JP2009145435A (en) * 2007-12-12 2009-07-02 O Chuhei System and method for providing unspecified speaker speech recognition engine used in a plurality of apparatuses to individual user via the internet
JP2009294269A (en) * 2008-06-03 2009-12-17 Nec Corp Speech recognition system
JP2011064913A (en) * 2009-09-16 2011-03-31 Ntt Docomo Inc Telephone system, terminal device, voice model updating device, and voice model updating method
US9842591B2 (en) 2010-05-19 2017-12-12 Sanofi-Aventis Deutschland Gmbh Methods and systems for modifying operational data of an interaction process or of a process for determining an instruction
JP2016128924A (en) * 2010-05-19 2016-07-14 サノフィ−アベンティス・ドイチュラント・ゲゼルシャフト・ミット・ベシュレンクテル・ハフツング Interaction and/or change of operation data of instruction determination process
US9728184B2 (en) 2013-06-18 2017-08-08 Microsoft Technology Licensing, Llc Restructuring deep neural network acoustic models
US9311298B2 (en) 2013-06-21 2016-04-12 Microsoft Technology Licensing, Llc Building conversational understanding systems using a toolset
US10572602B2 (en) 2013-06-21 2020-02-25 Microsoft Technology Licensing, Llc Building conversational understanding systems using a toolset
US10304448B2 (en) 2013-06-21 2019-05-28 Microsoft Technology Licensing, Llc Environmentally aware dialog policies and response generation
US9589565B2 (en) 2013-06-21 2017-03-07 Microsoft Technology Licensing, Llc Environmentally aware dialog policies and response generation
US9697200B2 (en) 2013-06-21 2017-07-04 Microsoft Technology Licensing, Llc Building conversational understanding systems using a toolset
CN103578465A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Speech recognition method and electronic device
US9324321B2 (en) 2014-03-07 2016-04-26 Microsoft Technology Licensing, Llc Low-footprint adaptation and personalization for a deep neural network
US10497367B2 (en) 2014-03-27 2019-12-03 Microsoft Technology Licensing, Llc Flexible schema for language model customization
CN106133826B (en) * 2014-03-27 2019-12-17 微软技术许可有限责任公司 flexible schema for language model customization
WO2015148333A1 (en) * 2014-03-27 2015-10-01 Microsoft Technology Licensing, Llc Flexible schema for language model customization
JP2017515141A (en) * 2014-03-27 2017-06-08 マイクロソフト テクノロジー ライセンシング,エルエルシー Flexible schema for language model customization
CN106133826A (en) * 2014-03-27 2016-11-16 微软技术许可有限责任公司 For the self-defining flexible modes of language model
US9529794B2 (en) 2014-03-27 2016-12-27 Microsoft Technology Licensing, Llc Flexible schema for language model customization
US9614724B2 (en) 2014-04-21 2017-04-04 Microsoft Technology Licensing, Llc Session-based device configuration
US9520127B2 (en) 2014-04-29 2016-12-13 Microsoft Technology Licensing, Llc Shared hidden layer combination for speech recognition systems
US9384335B2 (en) 2014-05-12 2016-07-05 Microsoft Technology Licensing, Llc Content delivery prioritization in managed wireless distribution networks
US9384334B2 (en) 2014-05-12 2016-07-05 Microsoft Technology Licensing, Llc Content discovery in managed wireless distribution networks
US10111099B2 (en) 2014-05-12 2018-10-23 Microsoft Technology Licensing, Llc Distributing content in managed wireless distribution networks
US9430667B2 (en) 2014-05-12 2016-08-30 Microsoft Technology Licensing, Llc Managed wireless distribution network
US9874914B2 (en) 2014-05-19 2018-01-23 Microsoft Technology Licensing, Llc Power management contracts for accessory devices
US9367490B2 (en) 2014-06-13 2016-06-14 Microsoft Technology Licensing, Llc Reversible connector for accessory devices
US9477625B2 (en) 2014-06-13 2016-10-25 Microsoft Technology Licensing, Llc Reversible connector for accessory devices
US9717006B2 (en) 2014-06-23 2017-07-25 Microsoft Technology Licensing, Llc Device quarantine in a wireless network

Similar Documents

Publication Publication Date Title
US9514126B2 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
US10410627B2 (en) Automatic language model update
CN106463113B (en) Predicting pronunciation in speech recognition
US8571861B2 (en) System and method for processing speech recognition
JP5162697B2 (en) Generation of unified task-dependent language model by information retrieval method
US8280733B2 (en) Automatic speech recognition learning using categorization and selective incorporation of user-initiated corrections
JP5598998B2 (en) Speech translation system, first terminal device, speech recognition server device, translation server device, and speech synthesis server device
Valtchev et al. MMIE training of large vocabulary recognition systems
US8793130B2 (en) Confidence measure generation for speech related searching
JP5212910B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
JP4322815B2 (en) Speech recognition system and method
DE60005326T2 (en) Detection units with complementary language models
US6487534B1 (en) Distributed client-server speech recognition system
US7636657B2 (en) Method and apparatus for automatic grammar generation from data entries
US7197460B1 (en) System for handling frequently asked questions in a natural language dialog service
US7574358B2 (en) Natural language system and method based on unisolated performance metric
JP4141495B2 (en) Method and apparatus for speech recognition using optimized partial probability mixture sharing
ES2420559T3 (en) A large-scale system, independent of the user and independent of the device for converting the vocal message to text
JP3716870B2 (en) Speech recognition apparatus and speech recognition method
JP4180110B2 (en) Language recognition
US9292487B1 (en) Discriminative language model pruning
CN1296886C (en) Speech recognition system and method
Hori et al. Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
CN101305362B (en) Speech index pruning
EP1669980B1 (en) System and method for identifiying semantic intent from acoustic information

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20050707

A521 Written amendment

Effective date: 20071112

Free format text: JAPANESE INTERMEDIATE CODE: A821

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20071112

RD02 Notification of acceptance of power of attorney

Effective date: 20071112

Free format text: JAPANESE INTERMEDIATE CODE: A7422

A977 Report on retrieval

Effective date: 20080204

Free format text: JAPANESE INTERMEDIATE CODE: A971007

A131 Notification of reasons for refusal

Effective date: 20080408

Free format text: JAPANESE INTERMEDIATE CODE: A131

A762 Written abandonment of application

Free format text: JAPANESE INTERMEDIATE CODE: A762

Effective date: 20080515