JPH11175087A - Character string matching method for word speech recognition - Google Patents

Character string matching method for word speech recognition

Info

Publication number
JPH11175087A
JPH11175087A JP9339586A JP33958697A JPH11175087A JP H11175087 A JPH11175087 A JP H11175087A JP 9339586 A JP9339586 A JP 9339586A JP 33958697 A JP33958697 A JP 33958697A JP H11175087 A JPH11175087 A JP H11175087A
Authority
JP
Japan
Prior art keywords
word
phoneme
series
sequence
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP9339586A
Other languages
Japanese (ja)
Inventor
Tetsutada Sakurai
哲真 桜井
Yoshio Nakadai
芳夫 中台
Yoshitake Suzuki
義武 鈴木
Shunei Kurokawa
俊英 黒川
Yamato Sato
大和 佐藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Advanced Technology Corp
Nippon Telegraph and Telephone Corp
Original Assignee
NTT Advanced Technology Corp
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NTT Advanced Technology Corp, Nippon Telegraph and Telephone Corp filed Critical NTT Advanced Technology Corp
Priority to JP9339586A priority Critical patent/JPH11175087A/en
Publication of JPH11175087A publication Critical patent/JPH11175087A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To make the storage capacity and calculation quantity small by deciding similarity by recognizing matches between a phoneme series of respective words and phoneme symbols of the phoneme series of an input voice sequentially from the beginning of a word. SOLUTION: Word registration in a word dictionary 8 is performed with a vowel, a fricative, and a voiceless part so that vowels appearing in words are traced and a spectrum analysis of the input speech is taken; and conversion to three kinds of phoneme series is performed (12) and voiceless sounds in the beginning and ending of the word are removed. Further, discontinuous symbols are replaced with last symbols to generate a symbol sequence (13, 14), each word dictionary series, e.g. 'a*uaei' is compared with an input series 'a*u*aeie', character by character, from the beginning of the word, and when a match between, for example, 'a' and 'a' is obtained, an advance to a comparison of a next character is made, but when a mismatch between, for example, 4th 'a' and '*' is obtained, the penality is increased by one, only the input series is advanced by one character, and 4th 'a'and 5th 'a' are compared with each other.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】この発明は、コンピュータあ
るいはより小さなシステム上で単語音声を認識させる際
の辞書と入力音声とのマッチング方法に関するものであ
る。
[0001] 1. Field of the Invention [0002] The present invention relates to a method for matching a dictionary with input speech when recognizing word speech on a computer or a smaller system.

【0002】[0002]

【従来の技術】音素情報を用いた音声認識、即ち音素系
列の辞書を備え、音声入力信号を一定フレーム毎に分析
し、フレーム毎の音声音素系列の各記号と辞書の単語音
素系列の記号との類似度を求める認識手法がある。ま
た、SPLIT法と呼ばれる音素標準パターンの数を、
スペクトルの変動を表現するのに十分な数に増加した認
識手法がある。
2. Description of the Related Art Speech recognition using phoneme information, that is, a dictionary of phoneme sequences is provided, a speech input signal is analyzed for each fixed frame, and each symbol of a phoneme sequence for each frame and a symbol of a word phoneme sequence in the dictionary are analyzed. There is a recognition method for calculating the similarity of the Also, the number of phoneme standard patterns called SPLIT method is
There are recognition techniques that have been increased to a number sufficient to represent spectral variations.

【0003】先にあげた音素情報を用いた音声認識法で
は、スペクトルを基にした音素の標準パターンが必要に
なる。入力音声と標準パターンの類似度を求める際に、
スペクトル同士の距離計算を行うため、計算量が多くな
る。さらに、こうして得られた類似度と単語辞書とを比
較した類似度和によって単語音声の認識を行う。そのた
め、取りこぼしが少なく認識率が高いが、認識に必要な
記憶容量と計算量も著しく増加する。この方法で用いら
れる単語辞書は音素について知っていないと、辞書自体
が音素の並びとなっているので新たに作ることが難し
い。
In the above-described speech recognition method using phoneme information, a standard phoneme pattern based on a spectrum is required. When calculating the similarity between the input voice and the standard pattern,
Since the distance between the spectra is calculated, the amount of calculation increases. Further, word speech is recognized based on the similarity sum obtained by comparing the obtained similarity with the word dictionary. As a result, the recognition rate is high with few misses, but the storage capacity and calculation amount required for recognition also increase significantly. If the word dictionary used in this method does not know about phonemes, it is difficult to create a new dictionary because the dictionary itself is a sequence of phonemes.

【0004】SPLIT法は、従来の音素単位の認識
系、単語単位の認識系、音声符号化におけるベクトル量
子化法の3手法の融合されたものといえる。ここで、S
PLIT法の標準パターンを擬音韻標準パターンとい
い、各話者毎に、多数の音声サンプルの短時間スペクト
ルパターンのクラスタ化によって得られるものであり、
スペクトルパターンの分布のみに基づいているため音素
との明確な対応付けはない。それゆえ、単語辞書の生成
に正書法を用いることはできない。また、認識に必要な
記憶容量と計算量は単語単位の標準パターンを用いる際
の1/10程度である。
[0004] The SPLIT method can be said to be a combination of three conventional methods: a phoneme-based recognition system, a word-based recognition system, and a vector quantization method in speech coding. Where S
The standard pattern of the PLIT method is called a pseudophoneme standard pattern, and is obtained by clustering short-time spectral patterns of a large number of voice samples for each speaker.
Since it is based only on the distribution of the spectral pattern, there is no clear association with phonemes. Therefore, orthography cannot be used to generate a word dictionary. In addition, the storage capacity and the amount of calculation required for recognition are about 1/10 of the case of using a standard pattern in word units.

【0005】SPLIT法は、通常特定話者の音声認識
を行う際に使用されるため、ここでは音素単位の単語音
声認識法における母音系列と単語辞書とのマッチングに
ついて図4を参照して説明する。入力端子1からの入力
音声は、A/D変換部2を通り、ディジタル信号に変換
される。その後、分析部3によって、スペクトル分析さ
れる。この分析されたスペクトルは類似度算出部4で各
音素の標準パターン5(スペクトル)との比較により、
各音素との類似度を得る。各音素と時間経過の類似度を
類似度行列6に格納する。類似度和算出部7で単語辞書
8の各単語の音素系列とのDPマッチングを行うことに
より類似度和を得る。
[0005] Since the SPLIT method is usually used for speech recognition of a specific speaker, matching between a vowel sequence and a word dictionary in the word speech recognition method for each phoneme will be described with reference to FIG. . The input voice from the input terminal 1 passes through the A / D converter 2 and is converted into a digital signal. Thereafter, the spectrum is analyzed by the analysis unit 3. The analyzed spectrum is compared with the standard pattern 5 (spectrum) of each phoneme by the similarity calculation unit 4 to obtain
Get the similarity with each phoneme. The similarity between each phoneme and the passage of time is stored in the similarity matrix 6. The similarity sum calculator 7 performs DP matching with the phoneme sequence of each word in the word dictionary 8 to obtain a similarity sum.

【0006】この類似度和をもとにして、単語判定部9
でもっとも類似度和の高いものをその単語であると判定
する。その結果を出力端子10に渡す。
On the basis of the sum of similarities, the word judging unit 9
Is determined to be the word having the highest similarity sum. The result is passed to the output terminal 10.

【0007】[0007]

【発明が解決しようとする課題】以上に述べた従来の方
法では、音素の数だけでも40程度あり、そこに単語辞
書を用いたDPマッチングを施すと、かなりの計算量に
なる。また、利用者が新たに辞書に単語を登録するの
に、音素を知っている必要がある。
In the conventional method described above, the number of phonemes alone is about 40, and if DP matching using a word dictionary is performed there, a considerable amount of calculation is required. In addition, the user needs to know phonemes in order to newly register words in the dictionary.

【0008】[0008]

【課題を解決するための手段】この発明によれば、単語
辞書の各単語の音韻系列と、入力音声の音韻系列との音
韻記号の一致を語頭から順次確認し、不一致の場合は、
ペナルティを+1増加させると共に、その辞書の単語音
韻系列の当該音韻と、入力音声の音韻系列における次に
続く音韻記号との照合を進める。ペナルティが少ない単
語程、その単語との類似度が高いと判定する。
According to the present invention, the phonemic sequence of each word in the word dictionary and the phonemic sequence of the input speech are checked sequentially from the beginning of the phonemic symbol, and if they do not match,
The penalty is increased by +1 and the matching of the phoneme of the word phoneme sequence of the dictionary with the next succeeding phoneme symbol in the phoneme sequence of the input speech is advanced. It is determined that a word having a smaller penalty has a higher similarity to the word.

【0009】特に各単語を、母音と摩擦音と無音部との
計7個の記号で各単語を表現し、入力音声も、この7個
の記号系列に変換して、前記マッチング法を適用する。
In particular, each word is expressed by a total of seven symbols including vowels, fricatives, and silence, and the input speech is also converted into these seven symbol sequences, and the matching method is applied.

【0010】[0010]

【発明の実施の形態】この発明の実施例を適用した認識
装置の処理手順を図1Aに示す。図4と処理内容は違っ
ても、全体の流れから見て同等の処理をしている箇所に
は、同一符号をつけてある。入力端子1からの入力音声
は、A/D変換部2を通り、ディジタル信号に変換され
る。その後分析部3によってLPC分析され、例えばケ
プストラムが抽出される。
FIG. 1A shows a processing procedure of a recognition apparatus to which an embodiment of the present invention is applied. Although the processing contents are different from those in FIG. 4, the same reference numerals are given to the parts where the same processing is performed from the overall flow. The input voice from the input terminal 1 passes through the A / D converter 2 and is converted into a digital signal. Thereafter, the analysis unit 3 performs an LPC analysis to extract, for example, a cepstrum.

【0011】系列生成部(フレーム毎)12で標準パタ
ーン(ケプストラムのベクトル)5との比較により、音
韻の時間系列を得る。この実施例は、各単語を母音と摩
擦音と無音とにより表現するようにした場合であり、入
力音声に対しても系列生成部12で、母音と摩擦音と無
音よりなる系列とする。例えば単語「カップ」の音声が
入力されると、図2Aに示す音韻の時間系列が得られ
る。*は無音を示し、aは「カ」の母音を示し、「ッ」
は無音として検出され、uは「プ」の母音である。語頭
と語尾に無音*があり、「ァ」と「ッ」の間に不連続な
母音oが、「ッ」「プ」の間に不連続な母音iが検出さ
れている。
A sequence generation unit (for each frame) 12 obtains a time sequence of phonemes by comparison with a standard pattern (cepstrum vector) 5. In this embodiment, each word is represented by vowels, fricatives, and silences. The sequence generator 12 also makes the input speech into a sequence composed of vowels, fricatives, and silences. For example, when a voice of the word “cup” is input, a time sequence of phonemes shown in FIG. 2A is obtained. * Indicates silence, a indicates a vowel of "ka", "tsu"
Is detected as silence, and u is a vowel of “P”. There is a silence * at the beginning and end of the word, and a discontinuous vowel o is detected between "a" and "tsu", and a discontinuous vowel i is detected between "tsu" and "pu".

【0012】系列処理(A)13で先に出てきた音韻の
時間系列中、語頭および語尾に出てくる無音部*を削除
し、また不連続な音韻記号を直前の音韻記号を使って置
き換える。図2Aの例では、図2Bに示すように、語頭
の無音*と語尾の無音*とが削除され、不連続な音韻記
号oがその直前の記号aに置き換えられ、また不連続な
音韻記号iがその直前の記号*に置き換えられる。
In the time sequence of the phoneme that appears earlier in the sequence processing (A) 13, the silent part * appearing at the beginning and the end of the phoneme is deleted, and the discontinuous phoneme symbol is replaced with the previous phoneme symbol. . In the example of FIG. 2A, as shown in FIG. 2B, the silence * at the beginning and the silence * at the end are deleted, the discontinuous phonological symbol o is replaced by the symbol a immediately before it, and the discontinuous phonological symbol i is changed. Is replaced by the symbol * immediately before.

【0013】次にこの例では、系列処理(B)14で系
列処理(A)で生成した系列中の同一音韻記号が連続す
る部分を、その連続した音韻記号の一文字で置き換え
る。図2に示した例では、図2Cに示すように連続した
8個の音韻記号aは1つのaとして、連続した18個の
無音記号*は1個の*で、連続した8個のuは1個のu
に置き換えられる。
Next, in this example, in the sequence processing (B) 14, the portion where the same phoneme symbol is continuous in the sequence generated in the sequence process (A) is replaced with one character of the continuous phoneme symbol. In the example shown in FIG. 2, as shown in FIG. 2C, eight consecutive phoneme symbols a are one a, eighteen consecutive silence symbols * are one *, and eight consecutive u are One u
Is replaced by

【0014】この系列処理(B)14から出てきた系列
と単語辞書8とのマッチングを類似度和算出部7で行
う。ここで、求めた類似度和をもとにして単語判定部9
で正解と思われる単語を求め、出力端子10より出力す
る。ここで、類似度和算出部7、つまりマッチング部に
ついてくわしい説明をつけておく。まず、単語辞書8は
その単語の母音部分のみを強調して書かれている。つま
り、単語辞書内で使われている記号中の“a”は
“あ”、“i”は“い”、“u”は“う”、“e”は
“え”、“o”は“お”、“*”は“無音部”、“S”
は“摩擦音”をそれぞれ表している。単語辞書に書かれ
ている例をあげると、図3に示すようにカップなら『a
*u』、ガウンなら『Cau』のような文字列に書いて
ある。また、“b、d、g”や“m、n”は実際の認識
の際には使用していないが、ここでは便宜上一般的に子
音を表す“C”で記す。例えば、ゲームなら『CeC
u』なる文字列という具合である。単語辞書8には多く
の人が発声する場合こういう文字列がとれるであろう、
というものを考えて登録する。
The similarity sum calculator 7 performs matching between the series output from the series processing (B) 14 and the word dictionary 8. Here, the word determination unit 9 is performed based on the obtained similarity sum.
To find a word that seems to be the correct answer and output it from the output terminal 10. Here, the similarity sum calculating unit 7, that is, the matching unit will be described in detail. First, the word dictionary 8 is written with only the vowel parts of the word emphasized. That is, in the symbols used in the word dictionary, “a” is “A”, “i” is “Yes”, “u” is “U”, “e” is “E”, and “o” is “ “O” and “*” are “silence” and “S”
Represents "frictional sound", respectively. To give an example written in a word dictionary, as shown in FIG.
* U ”, and gowns are written in character strings such as“ Cau ”. Although “b, d, g” and “m, n” are not used in actual recognition, they are generally denoted by “C” representing consonants for convenience. For example, if the game is "CeC
u ”. If many people utter a word in the word dictionary 8, such a character string will be taken.
Think about it and register.

【0015】また、類似度和算出部7では、従来行われ
ているようなDPマッチングを行わず、この発明の特徴
であるより簡単な方法を使用している。単語辞書8の前
記単語表記から理解されるように、系列処理(B)14
より出てきた系列は、単語辞書8の各単語文字列とマッ
チングが行える形の系列になっている。この発明のマッ
チング方法では、例えば図1Bに示すように単語辞書8
に登録されている『a*uaei』という系列と、系列
処理(B)14で処理された入力音声の音韻系列『a*
u*aei』とのマッチング処理を例として説明する。
Further, the similarity sum calculating section 7 does not perform the DP matching as conventionally performed, but uses a simpler method which is a feature of the present invention. As understood from the word notation in the word dictionary 8, the series processing (B) 14
The series that has come out is a series that can be matched with each word character string in the word dictionary 8. In the matching method of the present invention, for example, as shown in FIG.
And the phoneme series “a *” of the input voice processed in the series processing (B) 14.
u * aei ”will be described as an example.

【0016】ステップ1:辞書系列の一番目“a”と入
力系列の一番目“a”とを比較し、同一記号であるか
ら、何もせず両系列とも一文字進める。 ステップ2:辞書系列の二番目“*”と入力系列の二番
目“*”とを比較し、同一記号であるから同様に両系列
とも一文字進める。 ステップ3:両系列の各三番目“u”と“u”を比較
し、同一記号で、両系列とも一文字進める。
Step 1: The first "a" in the dictionary series is compared with the first "a" in the input series. Since the symbols are the same, both characters are advanced by one character without any operation. Step 2: The second "*" in the dictionary series is compared with the second "*" in the input series, and both strings are advanced by one character because they are the same symbol. Step 3: The third "u" and "u" of each of the two sequences are compared, and both characters are advanced by one character with the same symbol.

【0017】ステップ4:両系列の各四番目“a”と
“*”とを比較し、不一致であるからペナルティを+1
して、入力系列だけ一文字進める。 ステップ5:辞書系列の四番目“a”と入力系列の五番
目“a”とを比較し、一致するから両系列とも一文字進
める。 ステップ6:辞書系列の五番目“e”と入力系列の六番
目“e”とを比較し、一致するから両系列とも一文字進
める。
Step 4: Each fourth "a" of both streams is compared with "*", and since they do not match, the penalty is increased by +1.
Then, advance the input sequence by one character. Step 5: The fourth “a” in the dictionary series is compared with the fifth “a” in the input series, and both strings are advanced by one character because they match. Step 6: The fifth “e” in the dictionary series is compared with the sixth “e” in the input series, and both strings are advanced by one character because they match.

【0018】ステップ7:辞書系列の六番目“i”と入
力系列の七番目“i”とを比較し、一致により両系列と
も一文字進める。 ステップ8:辞書系列は六番目で終っており、入力系列
は一文字“e”が残っているので不一致によりペナルテ
ィを+1する。 このように両系列の各記号の語頭から順次比較し、同じ
であれば両方とも一文字分進め、違っていればペナルテ
ィを+1して、入力系列のみ一文字進める。これをどち
らかの系列の最後がくるまで続ける。一方が最後になっ
た時は、もう片方の系列にまだ比較していない系列があ
れば、その数分をペナルティに加える。
Step 7: The sixth "i" in the dictionary series is compared with the seventh "i" in the input series, and both strings are advanced by one character by matching. Step 8: The dictionary sequence ends at the sixth, and the input sequence has one character "e" remaining, so the penalty is increased by 1 due to mismatch. As described above, the comparison is sequentially performed from the beginning of each symbol of both sequences, and if both are the same, both are advanced by one character, and if they are different, the penalty is incremented by one and only the input sequence is advanced by one character. This continues until the end of either series is reached. At the end of one, if there is a series that has not been compared to the other series, add a few minutes to the penalty.

【0019】このようにして単語辞書8中の全ての単語
とのマッチングを行い、ペナルティが0のもの、又は所
定値以下のものの単語を認識結果として出力する。この
ような文字列マッチング方法は、前記単語辞書の単語表
記に限らず、図2Bに示したような表記とし、この辞書
と、単語系列処理(A)13から処理系列とのマッチン
グに適用してもよく、あるいは、単語辞書として子音も
含めた表記とし、入力音声系列も、子音も含めた音韻系
列とし、これらに対しても図1Bに示したようなマッチ
ング法を適用してもよい。
In this way, matching with all the words in the word dictionary 8 is performed, and words having a penalty of 0 or less than a predetermined value are output as recognition results. Such a character string matching method is not limited to the word notation in the word dictionary, but is applied to the notation as shown in FIG. 2B, and is applied to matching between the dictionary and the processing sequence from the word sequence processing (A) 13. Alternatively, the word dictionary may be a notation including consonants, the input speech sequence may be a phoneme sequence including consonants, and the matching method as shown in FIG. 1B may be applied to these.

【0020】[0020]

【発明の効果】以上説明してきたように、この発明によ
れば、マッチングの方法が基本的に比較と加算のみなの
でCPUにかかる負荷が少なく、小さなシステムでも動
作可能である。先の実施例のように単語辞書に書く系列
を7種の記号に限る場合は、単語に現れる母音をなぞる
ようにして単語辞書を作るため、容易に辞書の更新がで
きる。また母音を中心とした系列のため、音素に関する
特殊な知識を持っている必要がない。基本的な知識に、
母音の無声化や、“あ”から“い”の音に遷移する時に
若干“え”の音が混じるなどの知識があれば、より認識
率を上げ得る。
As described above, according to the present invention, since the matching method is basically only comparison and addition, the load on the CPU is small and the system can operate even with a small system. When the sequence written in the word dictionary is limited to seven types of symbols as in the previous embodiment, the dictionary can be easily updated because the word dictionary is created by tracing the vowels appearing in the word. Also, since the sequence is centered on vowels, there is no need to have special knowledge about phonemes. To basic knowledge,
If there is knowledge such as devoicing of vowels and slight mixing of "e" when transitioning from "a" to "i", the recognition rate can be further increased.

【0021】このマッチング法を用いて1チップのDS
Pに動作させ、30名の被験者(男性15名、女性15
名)による認識実験を行ったところ、認識率は男性の場
合は90.3%、女性の場合は94.2%、総合で9
2.3%となった。
Using this matching method, one-chip DS
P, 30 subjects (15 men, 15 women)
A), the recognition rate was 90.3% for men, 94.2% for women, and 9
2.3%.

【図面の簡単な説明】[Brief description of the drawings]

【図1】Aはこの発明方法を適用した単語音声認識装置
の機能構成例を示すブロック図、Bはそのマッチング方
法の例示を示す図である。
FIG. 1A is a block diagram showing a functional configuration example of a word speech recognition apparatus to which the method of the present invention is applied, and FIG. 1B is a diagram showing an example of a matching method thereof.

【図2】図1A中の系列処理(A)13の処理例を示す
図。
FIG. 2 is a view showing a processing example of series processing (A) 13 in FIG. 1A.

【図3】単語辞書8の内容の例を示す図。FIG. 3 is a view showing an example of the contents of a word dictionary 8;

【図4】従来の音素単位の単語音声認識装置の機能構成
を示すブロック図。
FIG. 4 is a block diagram showing a functional configuration of a conventional phoneme-based word speech recognition apparatus.

フロントページの続き (72)発明者 中台 芳夫 東京都新宿区西新宿三丁目19番2号 日本 電信電話株式会社内 (72)発明者 鈴木 義武 東京都新宿区西新宿三丁目19番2号 日本 電信電話株式会社内 (72)発明者 黒川 俊英 東京都武蔵野市御殿山一丁目1番3号 エ ヌ・ティ・ティ・アドバンステクノロジ株 式会社内 (72)発明者 佐藤 大和 東京都武蔵野市御殿山一丁目1番3号 エ ヌ・ティ・ティ・アドバンステクノロジ株 式会社内Continued on the front page (72) Inventor Yoshio Nakadai 3-192-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Japan Telegraph and Telephone Corporation (72) Inventor Yoshitake Suzuki 3-192-2, Nishi-Shinjuku, Shinjuku-ku, Tokyo Japan Inside Telegraph and Telephone Corporation (72) Inventor Toshihide Kurokawa 1-3-1 Gotenyama, Musashino-shi, Tokyo NTT Advanced Technology Corporation (72) Inventor Yamato Sato 1-chome, Gotenyama, Musashino-shi, Tokyo No. 1-3 NTT Advanced Technology Co., Ltd.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 音韻記号の系列で表現された単語辞書
と、入力音声の短時間分析で得られる音韻の時間系列と
のマッチングによって認識を行う単語音声認識の文字列
マッチング法において、 単語辞書の辞書音韻系列と入力音声の音韻系列との音韻
記号の一致を語頭より順次確認し、 不一致の場合にはペナルティを+1増加させるととも
に、辞書音韻系列の当該音韻と、入力音声の音韻系列に
おける次に続く音韻記号との照合を進めることを特徴と
する単語音声認識の文字列マッチング法。
1. A character string matching method for word speech recognition that performs recognition by matching a word dictionary represented by a sequence of phonemic symbols with a time sequence of phonemes obtained by a short-time analysis of input speech. The matching of phonemic symbols between the dictionary phoneme sequence and the phoneme sequence of the input speech is sequentially confirmed from the beginning of the word. If they do not match, the penalty is increased by +1. A character string matching method for word speech recognition, characterized in that the matching with the following phonological symbols is advanced.
【請求項2】 入力音韻系列、上記単語辞書の単語音韻
系列はそれぞれ母音、摩擦音、無音部の1乃至複数で構
成されていることを特徴とする請求項1記載の単語音声
認識の文字列マッチング法。
2. The character string matching for word speech recognition according to claim 1, wherein the input phoneme sequence and the word phoneme sequence of the word dictionary are each composed of one or more of a vowel, a fricative, and a silent part. Law.
【請求項3】 入力音韻系列の語頭、語尾の無音部を除
去し、過渡的不連続部分の音韻を直前の音韻と置換し、
同一音韻が連続している部分をその1つの音韻と置換し
た後、単語辞書の各単語との上記マッチングを行うこと
を特徴とする請求項2記載の単語音声認識の文字列マッ
チング法。
3. A silent part at the beginning and end of an input phoneme sequence is removed, and a phoneme of a transient discontinuous part is replaced with a phoneme immediately before.
3. A character string matching method for word speech recognition according to claim 2, wherein said matching is performed with each word of the word dictionary after replacing a portion where the same phoneme is continuous with the one phoneme.
JP9339586A 1997-12-10 1997-12-10 Character string matching method for word speech recognition Pending JPH11175087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP9339586A JPH11175087A (en) 1997-12-10 1997-12-10 Character string matching method for word speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9339586A JPH11175087A (en) 1997-12-10 1997-12-10 Character string matching method for word speech recognition

Publications (1)

Publication Number Publication Date
JPH11175087A true JPH11175087A (en) 1999-07-02

Family

ID=18328888

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9339586A Pending JPH11175087A (en) 1997-12-10 1997-12-10 Character string matching method for word speech recognition

Country Status (1)

Country Link
JP (1) JPH11175087A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017191166A (en) * 2016-04-12 2017-10-19 富士通株式会社 Voice recognition device, voice recognition method and voice recognition program
CN111540361A (en) * 2020-03-26 2020-08-14 北京搜狗科技发展有限公司 Voice processing method, device and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017191166A (en) * 2016-04-12 2017-10-19 富士通株式会社 Voice recognition device, voice recognition method and voice recognition program
US10733986B2 (en) 2016-04-12 2020-08-04 Fujitsu Limited Apparatus, method for voice recognition, and non-transitory computer-readable storage medium
CN111540361A (en) * 2020-03-26 2020-08-14 北京搜狗科技发展有限公司 Voice processing method, device and medium
CN111540361B (en) * 2020-03-26 2023-08-18 北京搜狗科技发展有限公司 Voice processing method, device and medium

Similar Documents

Publication Publication Date Title
CN110211565B (en) Dialect identification method and device and computer readable storage medium
CN108899009B (en) Chinese speech synthesis system based on phoneme
CN110570876B (en) Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
KR20060049290A (en) Mixed-lingual text to speech
CN115485766A (en) Speech synthesis prosody using BERT models
JP2005208652A (en) Segmental tonal modeling for tonal language
JPH05265483A (en) Voice recognizing method for providing plural outputs
Anoop et al. Automatic speech recognition for Sanskrit
JP2022133392A (en) Speech synthesis method and device, electronic apparatus, and storage medium
Oo et al. Burmese speech corpus, finite-state text normalization and pronunciation grammars with an application to text-to-speech
CN113450758B (en) Speech synthesis method, apparatus, device and medium
CN114242093A (en) Voice tone conversion method and device, computer equipment and storage medium
CN108109610B (en) Simulated sounding method and simulated sounding system
CN109859746B (en) TTS-based voice recognition corpus generation method and system
Hanifa et al. Malay speech recognition for different ethnic speakers: an exploratory study
CN113539239B (en) Voice conversion method and device, storage medium and electronic equipment
Abujar et al. A comprehensive text analysis for Bengali TTS using unicode
JPH11175087A (en) Character string matching method for word speech recognition
Kurian et al. Connected digit speech recognition system for Malayalam language
Nga et al. A Survey of Vietnamese Automatic Speech Recognition
Barros et al. Maximum entropy motivated grapheme-to-phoneme, stress and syllable boundary prediction for Portuguese text-to-speech
JP2004021207A (en) Phoneme recognizing method, phoneme recognition system and phoneme recognizing program
Ferreiros et al. Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations
CN110610721A (en) Detection system and method based on lyric singing accuracy
JPH11175086A (en) Preprocessing method for character string matching of word speech recognition