JP3591695B2 - Topic extraction method and its program recording medium - Google Patents

Topic extraction method and its program recording medium Download PDF

Info

Publication number
JP3591695B2
JP3591695B2 JP04965898A JP4965898A JP3591695B2 JP 3591695 B2 JP3591695 B2 JP 3591695B2 JP 04965898 A JP04965898 A JP 04965898A JP 4965898 A JP4965898 A JP 4965898A JP 3591695 B2 JP3591695 B2 JP 3591695B2
Authority
JP
Japan
Prior art keywords
word
likelihood
topic
relevance
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP04965898A
Other languages
Japanese (ja)
Other versions
JPH11249691A (en
Inventor
克年 大附
昭一 松永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP04965898A priority Critical patent/JP3591695B2/en
Publication of JPH11249691A publication Critical patent/JPH11249691A/en
Application granted granted Critical
Publication of JP3591695B2 publication Critical patent/JP3591695B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Description

【0001】
【発明の属する技術分野】
この発明は、連続発声された音声に対してその内容を表す話題を抽出する方法、及びそのプログラム記録媒体に関する。
【0002】
【従来の技術】
連続発声された音声に対して連続音声認識を行い、その結果の単語系列から、音声の内容を表す複数の単語の組み合わせを抽出する話題抽出では、大量のテキストデータを用いて学習された確率モデルに基づいて、音声の内容を表す単語(話題単語)が抽出される。例えば、
今井,小林,安藤,“話題混合モデルによる日本語ニュースからの話題抽出”、日本音響学会平成9年度秋季研究発表会、3−1−7,pp.99−100。
【0003】
大附,松岡,松永,古井,“ニュース音声を対象とした連続音声認識に基づく話題抽出”、日本音響学会平成9年度秋季研究発表会、3−1−8,pp.101−102。
特願平9−160954号
などに示されている。
【0004】
この連続発声された音声の話題単語の抽出を簡単に述べる。図2に示すように入力音声は音響モデル、言語モデルを参照して連続音声認識される(S1)、その認識結果として単語系列が得られる。この単語系列の各単語の、話題単語に対する関連度を話題抽出モデルを参照して求められ、関連度系列が各話題単語ごとに作られる(S2)。つまり話題単語モデルには例えば図3Aに示すように、話題単語tと単語W11,W12,W13,・・・との各関連度r111,r112,r113,・・・が格納され、話題単語tと単語W21,W22,W23,・・・との各関連度r221,r222,r223,・・・が格納され、以下同様に各話題単語ごとに、各単語との関連度が格納されている。
【0005】
連続音声認識の結果の単語系列が例えば図3Bに示すように、W1,W2,・・・Wnであった場合、話題単語抽出モデルを参照して、その話題単語tと単語系列の単語W1との関連度がr11、単語W2との関連度がr12と、・・・単語Wnとの関連度がr1nと求めて関連度系列を得、また同様に話題単語tと単語系列の単語W1,W2,・・・Wnとの各関連度がr21,r22,・・・r2nと求めて関連度系列を得、同様にして全ての話題単語について単語系列の各単語との関連度を求めて関連度系列を得る。
【0006】
これら各話題単語t・・・tことの関連度系列における関連度の和、つまり、tについてRΣ k=1 1k ,tについてRΣ k=1 2k
・・・を求められて、単語系列W1,W2,・・・Wnに対する各話題単語t,t・・・の関連度R,R・・・を得る(S3)。この話題単語関連度R,R・・・から大きいのから順にQ個(Qは1以上の整数)が、その入力音声に対する話題単語として出力される(S4)。
【0007】
上述したように従来の話題抽出法は、連続音声認識結果が誤りを含むにもかかわらず、認識されたすべての単語を均等の重みで扱っていた。
【0008】
【発明が解決しようとする課題】
従来の連続音声認識に基づく話題抽出方法では、音声認識結果として得られる単語系列の各単語をすべて均等の重みで扱っていた。放送ニュースの連続音声認識結果には、10%〜30%ほどの認識誤り単語が含まれるが、それらの誤り単語も正しく認識された単語と同じ重みで扱われるために、それらの誤り単語によって話題抽出性能が低下していた。
【0009】
この発明の目的は、音声認識結果の各単語の確からしさによって単語を扱う重みを変えることにより、誤り単語の話題抽出への影響を小さくし、話題抽出性能を改善した話題抽出方法を提供することにある。
【0010】
【課題を解決するための手段】
この発明の話題抽出方法では、入力音声の連続音声認識結果から、単語系列、各単語の音響的な尤度、各単語の言語的な尤度を得、各話題単語と単語系列中の各単語との関連度に対して、各単語の尤度に基づいて得られる信頼度による重み付けをした上で話題単語ごとの関連度系列を得、これら単語系列の関連度中の大きいものと対応する話題単語を入力音声に対する話題として出力する。
【0011】
【発明の実施の形態】
音声認識時の尤度に基づく重み係数を用いたこの発明の実施例の話題抽出の方法を図1を参照して説明する。
まず各単語の尤度に基づいて信頼度を得る方法を説明する。連続音声認識(S1)時に得られる単語の尤度には、音響的な確からしさを表す音響尤度と、言語的な確からしさを表す言語尤度とがある。音素HMM(隠れマルコフモデル)による音響モデル(S2)と、単語n−gram言語モデル(S3)を用いた連続音声認識システムの場合には、各単語の音響尤度として、音素HMMを連結した単語モデルに、その単語の音声区間のパラメータを入力した際に各分析フレームにおいて得られる尤度の累積値が得られる。言語尤度としては、単語bigramの場合には、先行単語の次に当該単語の出現する確率、単語trigramの場合には、先先行単語、先行単語に続いて当該単語が出現する確率に基づく尤度が得られる。認識の際には、これらの2種類の尤度を音響モデルと言語モデルとのバランスをとるための重み付けで加算(対数領域)したものが単語の尤度となり、累積尤度が最大となるような単語系列が認識結果として出力される。
【0012】
認識結果として得られるのは、単語系列全体の尤度が最大となる単語系列(S4)であるため、各単語に着目すると尤度に局所的な落ち込みのあるものがある。また単語尤度は高くても、音響尤度は高いが言語尤度が低い、またはその逆の場合が考えられる。音響尤度または言語尤度が低い単語は信頼度が低く誤りである可能性が高い。
【0013】
音響尤度(S5)は前記のように単語の音声区間で累積した値であるために、区間長の異なる単語間同士の比較に用いることができない。そこで、単語の音響尤度を区間長(分析フレーム数)で正規化(S6)することにより、一分析フレームあたりの音響尤度(正規化音響尤度)を得ることができる。放送ニュース音声142文の連続音声認識結果の単語系列に対して正規化音響尤度を求めると、最大値は54.91、最小値は12.76になった。この値をそのまま信頼度として用いると扱いにくいため、式(1)を用いて重み係数sに変換(S7)する。
【0014】
=((P−Lmin )/(Lmax −Lmin ))・(Smax −Smin )+Smin …(1)
つまり尤度Pが最大値Lmax と最小値Lmin との間をとるとき、尤度Pが最大値Smax と最小値Smin との間をとるように変換する。
正規化音響尤度aを最大値1,最小値0となるような重み係数wに変換する式は次式のようになる。
【0015】
=[(a−12.76)/(54.91−12.76)] ・(1−0)+0 …(2)
言語尤度は単語が出現する確率に基づく値であるので、区間長による正規化の必要はない。言語尤度(S8)も音響尤度と同様に式(1)を用いて最大値1,最小値0となるような重み係数に変換することができる。
次にこの重み係数を用いて話題抽出を行う方法を説明する。話題抽出モデルS10には各話題単語と、その単語系列における各単語との関連度がそれぞれ格納されている。従ってこの話題抽出モデルS10を参照して、各話題単語の、単語系列に対する関連度は、従来技術の項で述べたように、各話題単語tに対する単語系列W中の各単語wとの関連度rkjの和R(W,t)として次式のように求められる。
【0016】
R(W,t)=Σ k=1 kj …(3)
nは単語系列Wの単語数
従来の話題抽出では、この単語系列に対する関連度の大きい話題単語を関連度の大きなものから複数抽出する。
この発明による各単語の尤度に基づく重み係数sを用いた話題抽出では、単語系列中の各単語の関連度rkjに対して重み付けsを行い、次式のように関連度の和を求める(S9)。
【0017】
R(W,t)=Σ k=1 kj …(4)
また、音響重み係数aと言語重み係数bの積を単語の重み係数として次式のように関連度の和を求めることもできる。
R(W,t)=Σ k=1 (a)・rkj …(5)
このように音響重み係数a、言語重み係数bを用いて話題抽出を行うことにより、音響尤度や言語尤度が低くなっている認識誤りの単語に対する関連度を低く見積もることができ、認識誤り単語による話題抽出の性能劣化を抑えることができる。
【0018】
さらに、入力単語系列として、連続音声認識結果の上位複数の単語系列候補を用いることにより、複数候補中に安定して現れる正解単語に重み付けした話題抽出を行うことができる。また上述の説明から明らかなように、各単語の認識時における信頼度による重み付けとして、言語的尤度、音響的尤度に基づき、計算した値を用いたが、これら尤度自体を用いてもよく、上記例以外の変換形式により変形した値を用いてもよい。更に言語的尤度と音響的尤度との差が大きい場合は、その単語の認識尤度は低いと考えられ、これら両尤度と対応した値の差を用いて、この差が大きい程、信頼度が低くなる重みを関連度に与えてもよい。
【0019】
【発明の効果】
評価は、放送ニュース音声の2万語彙の大語彙連続音声認識システムによる音声認識結果に対して、この発明の評価を行った。評価用音声の書き起こし文に対して3人の被験者が人手で付与した話題を評価対象とした。話題単語を10単語抽出した場合の適合率(抽出した話題単語のうち、正解の話題単語の割合)は、重み係数を用いない場合に62.4%であったが、この発明を適用し、音響重みを用いた場合は、63.1%、言語重みを用いた場合は66.9%、とそれぞれ向上し、更に単語重み(音響重みと言語重みの積)を用いた場合は68.3%と重み係数を用いることにより改善された。
【0020】
音声認識結果の上位10位までの単語系列に対して単語重みを用いて話題抽出を行うと、さらに改善がみられ、10単語抽出の場合、認識誤りによって低下した話題抽出性能の55.6%を回復することができた。
この発明によれば、音声認識結果の単語系列中の各単語の音響尤度、言語尤度を話題単語と単語間の関連度に対する重み付けに用いることにより、認識誤りを含む単語系列から正確な話題抽出を行うことができるという利点がある。
【0021】
つまり、各単語の信頼度を考慮することが可能であり、信頼度の低い単語の重み係数を小さくすることにより、音声認識結果の単語系列の各単語を均等の重みで扱う話題抽出方法に比べ、正確な話題が抽出できるという利点がある。
【図面の簡単な説明】
【図1】この発明の話題抽出方法を示す流れ図。
【図2】従来の話題抽出方法を示す流れ図。
【図3】Aは話題抽出モデルの例を示す図、Bは単語系列と、各話題単語に対する関連度系列との例を示す図である。
[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method for extracting a topic representing the content of continuously uttered speech, and a program recording medium therefor.
[0002]
[Prior art]
In topic extraction, which performs continuous speech recognition on continuously uttered speech and extracts a combination of multiple words representing the content of the speech from the resulting word sequence, a probability model learned using a large amount of text data , A word (topic word) representing the content of the voice is extracted. For example,
Imai, Kobayashi, Ando, "Topic Extraction from Japanese News Using Topic Mixture Model", The Acoustical Society of Japan, Fall Meeting, 1997, 3-1-7, pp. 147-64. 99-100.
[0003]
Ohtsuki, Matsuoka, Matsunaga, Furui, "Topic Extraction Based on Continuous Speech Recognition for News Speech", The Acoustical Society of Japan, Fall Meeting, 1997, 3-1-8, pp. 146-64. 101-102.
This is disclosed in Japanese Patent Application No. 9-160954.
[0004]
The extraction of the topic word of the continuously uttered speech will be briefly described. As shown in FIG. 2, the input speech is subjected to continuous speech recognition with reference to an acoustic model and a language model (S1), and a word sequence is obtained as a result of the recognition. The degree of relevance of each word in this word sequence to the topic word is determined with reference to the topic extraction model, and a relevance series is created for each topic word (S2). That is, as the topic word model shown in FIG. 3A for example, the topic word t 1 and words W11, W12, W13, each relevance between ··· r111, r112, r113, ··· are stored, the topic word t 2 and the words W21, W22, W23,..., Are stored, and similarly, for each topic word, the relevance to each word is stored. .
[0005]
As successive results word sequence of the speech recognition is shown in FIG. 3B for example, W1, W2, if it was · · · Wn, with reference to the topic word extraction model, the words of the topic words t 1 and word sequence W1 relevance of the r11, the relevance of the word W2 is r12, give relevance sequence relevance of ... word Wn is determined to R1n, Similarly word topic words t 2 and word sequence W1 , W2,..., Wn are obtained as r21, r22,... R2n to obtain a relevance sequence, and similarly, the relevance of each topic word to each word in the word sequence is obtained. Get the relevance series.
[0006]
The sum of relevance in the context of series of that these each topic word t 1 ··· t M, that is, for t 1 R 1 = Σ k = 1 n r 1k, for t 2 R 2 = Σ k = 1 n r 2k
.. Are obtained, and the relevance R 1 , R 2, ... Of each topic word t 1 , t 2, ... To the word series W1, W2,. The Q words (Q is an integer of 1 or more) in descending order from the topic word relevance R 1 , R 2 ... Are output as the topic words for the input voice (S 4).
[0007]
As described above, the conventional topic extraction method treats all recognized words with equal weight, even though the continuous speech recognition result includes an error.
[0008]
[Problems to be solved by the invention]
In a conventional topic extraction method based on continuous speech recognition, all words in a word sequence obtained as a result of speech recognition are all treated with equal weight. The continuous speech recognition result of broadcast news contains about 10% to 30% of recognition error words. Since these error words are also treated with the same weight as the correctly recognized words, the topic of the word is determined by the error words. Extraction performance was reduced.
[0009]
SUMMARY OF THE INVENTION An object of the present invention is to provide a topic extraction method in which the influence on the topic extraction of an erroneous word is reduced by changing the weight for handling the word depending on the likelihood of each word in the speech recognition result, and the topic extraction performance is improved. It is in.
[0010]
[Means for Solving the Problems]
According to the topic extraction method of the present invention, a word sequence, an acoustic likelihood of each word, and a linguistic likelihood of each word are obtained from a continuous speech recognition result of an input speech, and each topic word and each word in the word sequence are obtained. Is weighted by the degree of reliability obtained based on the likelihood of each word, and a relevance series for each topic word is obtained. Output words as topics for input speech.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
A topic extraction method according to an embodiment of the present invention using a weight coefficient based on the likelihood at the time of speech recognition will be described with reference to FIG.
First, a method of obtaining reliability based on the likelihood of each word will be described. The likelihood of a word obtained at the time of continuous speech recognition (S1) includes an acoustic likelihood indicating acoustic certainty and a linguistic likelihood indicating linguistic certainty. In the case of a continuous speech recognition system using an acoustic model (S2) based on a phoneme HMM (Hidden Markov Model) and a word n-gram language model (S3), a word obtained by connecting phoneme HMMs is used as the acoustic likelihood of each word. The cumulative value of the likelihood obtained in each analysis frame when the parameters of the speech section of the word are input to the model is obtained. The linguistic likelihood is the likelihood based on the probability that the word will appear next to the preceding word in the case of the word bigram, and the preceding preceding word in the case of the word trigram, and the probability that the word will appear following the preceding word. Degree. At the time of recognition, these two types of likelihoods are added (logarithmic domain) by weighting to balance the acoustic model and the language model, which becomes the likelihood of the word, and the cumulative likelihood is maximized. Is output as a recognition result.
[0012]
Since the result of recognition is a word sequence (S4) in which the likelihood of the entire word sequence is maximized, there is a case where there is a local drop in the likelihood when focusing on each word. Also, it is conceivable that the word likelihood is high, but the acoustic likelihood is high but the language likelihood is low, or vice versa. Words with low acoustic or linguistic likelihood have low reliability and are likely to be incorrect.
[0013]
Since the acoustic likelihood (S5) is a value accumulated in the speech section of a word as described above, it cannot be used for comparison between words having different section lengths. Therefore, the sound likelihood of a word is normalized by the section length (the number of analysis frames) (S6), so that the sound likelihood per analysis frame (normalized sound likelihood) can be obtained. When the normalized acoustic likelihood was calculated for the word sequence of the continuous speech recognition result of 142 sentences of the broadcast news audio, the maximum value was 54.91 and the minimum value was 12.76. Therefore the value as it is difficult to handle and use as the reliability, converting (S7) the weighting coefficient s k using Equation (1).
[0014]
s k = ((P k -L min) / (L max -L min)) · (S max -S min) + S min ... (1)
That likelihood P k is the time to take between the maximum value L max and the minimum value L min, the likelihood P k is converted to take between the maximum value S max and a minimum value S min.
Wherein converting the normalized acoustic likelihood a maximum value 1, the weighting coefficient w a such that the minimum value 0 is expressed by the following equation.
[0015]
w a = [(a-12.76 ) / (54.91-12.76)] · (1-0) +0 ... (2)
Since the linguistic likelihood is a value based on the probability that a word appears, there is no need for normalization by section length. The linguistic likelihood (S8) can be converted into a weight coefficient such that the maximum value and the minimum value are 0 using Expression (1), similarly to the acoustic likelihood.
Next, a method of extracting a topic by using the weight coefficient will be described. The topic extraction model S10 stores the degree of association between each topic word and each word in the word sequence. Thus with reference to this topic extraction model S10, each thread word relevance for the word sequence, as described in the prior art section, with each word w k in the word sequence W for each topic word t j The sum R (W, t j ) of the relevance rkj is obtained as in the following equation.
[0016]
R (W, t j ) = Σ k = 1 n r kj (3)
n is the number of words in the word series W In the conventional topic extraction, a plurality of topic words having a high degree of relevance to this word series are extracted from those having a high degree of relevance.
The topic extraction with weighting factors s k based on the likelihood of each word according to the present invention, performs weighting s k for each word of the relevance r kj in the word sequence, the sum of relevance to the following equation (S9).
[0017]
R (W, t j) = Σ k = 1 n s k r kj ... (4)
In addition, the sum of the relevance can be obtained by using the product of the acoustic weighting coefficient a k and the language weighting coefficient b k as a word weighting coefficient as in the following equation.
R (W, t j ) = Σ k = 1 n (a k b k ) · r kj (5)
In this way, by performing topic extraction using the acoustic weighting coefficient a k and the language weighting coefficient b k , it is possible to estimate the relevance of a recognition error word whose acoustic likelihood and language likelihood are low, Deterioration of performance of topic extraction due to recognition error words can be suppressed.
[0018]
Further, by using a plurality of top word sequence candidates of the result of the continuous speech recognition as the input word sequence, it is possible to perform a topic extraction in which the correct words appearing stably in the plurality of candidates are weighted. Further, as is clear from the above description, as the weighting based on the reliability at the time of recognizing each word, the values calculated based on the linguistic likelihood and the acoustic likelihood are used. Alternatively, a value modified by a conversion format other than the above example may be used. Further, when the difference between the linguistic likelihood and the acoustic likelihood is large, it is considered that the recognition likelihood of the word is low, and using the difference between these two likelihoods and the corresponding value, the larger the difference, A weight that reduces the reliability may be given to the degree of association.
[0019]
【The invention's effect】
For the evaluation, the present invention was evaluated with respect to a speech recognition result by a large vocabulary continuous speech recognition system of 20,000 vocabulary of broadcast news speech. Topics that were manually added by three subjects to the transcript of the evaluation speech were evaluated. The relevance rate when 10 topic words were extracted (the ratio of correct topic words in the extracted topic words) was 62.4% when the weight coefficient was not used. When the acoustic weight is used, it is improved to 63.1%, and when the language weight is used, it is improved to 66.9%. Further, when the word weight (product of the acoustic weight and the language weight) is used, 68.3%. It was improved by using% and weighting factor.
[0020]
When topic extraction is performed using word weights for the top 10 word sequences in the speech recognition result, further improvement is seen. In the case of 10-word extraction, 55.6% of the topic extraction performance reduced due to recognition errors. Was able to recover.
According to the present invention, by using the acoustic likelihood and the linguistic likelihood of each word in the word sequence of the speech recognition result for weighting the relevance between the topic word and the word, an accurate topic from the word sequence including the recognition error can be obtained. There is the advantage that extraction can be performed.
[0021]
In other words, it is possible to consider the reliability of each word, and by reducing the weighting factor of words with low reliability, a topic extraction method that treats each word of the word sequence of the speech recognition result with equal weight is used. There is an advantage that an accurate topic can be extracted.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a topic extraction method of the present invention.
FIG. 2 is a flowchart showing a conventional topic extraction method.
3A is a diagram illustrating an example of a topic extraction model, and FIG. 3B is a diagram illustrating an example of a word sequence and a relevance sequence for each topic word.

Claims (8)

複数の話題単語と、その各話題単語と各一般の単語との関連度とがそれぞれ格納されている話題抽出モデルを用いて、
入力された音声の連続音声認識結果の単語の系列の内容を表す話題単語を抽出する方法において、
上記話題抽出モデル中の各話題単語ごとに、これと上記入力単語系列の各単語との関連度を上記話題抽出モデルを参照して求めて関連度系列をそれぞれ作り、
これらの各関連度系列の各関連度の和を求めて上記単語系列に対する各話題単語の関連度を求め、
これらの単語系列に対する関連度中の大きいものから順にQ個(Qは1以上の整数)のものとそれぞれ対応する話題単語を出力する話題抽出方法であって、
各単語の上記認識時の音響的な尤度と対応した値と言語的な尤度と対応した値の双方に基づいて各単語の認識尤度を求め、
上記各単語の音響的な尤度と対応した値と言語的な尤度と対応した値との差分値を求め、
上記各単語に対応する差分値が大きいほど認識尤度が低くなる重みを上記各単語の認識尤度に与えたものを各単語の信頼度とし、
上記話題抽出モデルの各話題単語と上記単語系列の各単語との関連度に対して、対応する各単語の上記信頼度により重み付けを行って上記各関連度の和を求めることを特徴とする話題抽出方法。
Using a topic extraction model in which a plurality of topic words and the degree of association between each topic word and each general word are stored,
In a method for extracting a topic word representing the content of a series of words as a result of continuous speech recognition of input speech,
For each topic word in the topic extraction model, the relevance between the word and each word of the input word sequence is determined by referring to the topic extraction model to form a relevance sequence,
The relevance of each topic word with respect to the word sequence is obtained by summing the relevance of each of these relevance sequences,
A topic extraction method for outputting topic words corresponding to Q words (Q is an integer of 1 or more) in ascending order of relevance to these word sequences,
Finding the recognition likelihood of each word based on both the value corresponding to the acoustic likelihood at the time of recognition of each word and the value corresponding to the linguistic likelihood,
Find the difference between the value corresponding to the acoustic likelihood of each word and the value corresponding to the linguistic likelihood,
The weight given to the recognition likelihood of each word as the recognition likelihood decreases as the difference value corresponding to each of the words is large is defined as the reliability of each word,
Relative relevance of each word of each topic word and the word sequence of the topic extraction model, I corresponding row weighted by the reliability of each word, characterized in that the sum of the respective relevance Topic extraction method.
上記連続音声認識時の音響的な尤度及び言語的な尤度をそれぞれある指定した値域の中の値をとるように変換して上記音響的尤度と対応した値及び言語的尤度と対応した値とする請求項1に記載の話題抽出方法。The acoustic likelihood and the linguistic likelihood at the time of the continuous speech recognition are converted so as to take a value within a certain specified range, respectively , and the values corresponding to the acoustic likelihood and the linguistic likelihood are corresponded. 2. The topic extraction method according to claim 1, wherein the value is a value obtained by performing the above operation. 連続音声認識時の音響的な尤度または言語的な尤度Pが最大値Lmax 、最小値Lminをとるとき、尤度Pを最大値Smaxと最小値Sminとの間の値をとるように下記式(1)を用いて変換してsを得て、それを上記尤度と対応した値として用いる請求項2記載の話題抽出方法。
=((P−Lmin)/(Lmax−Lmin))・(Smax−Smin)+Smin・・・(1)
When the acoustic or linguistic likelihood P k at the time of continuous speech recognition takes the maximum value L max and the minimum value L min , the likelihood P k is set between the maximum value S max and the minimum value S min . to obtain s k by conversion using the following equation (1) to take a value, topic extraction method according to claim 2, wherein using the same as a value corresponding to the likelihood.
s k = ((P k -L min) / (L max -L min)) · (S max -S min) + S min ··· (1)
連続音声認識時に得られる複数の上位候補の単語系列を上記入力単語系列とすることを特徴とする請求項1乃至3の何れかに記載の話題抽出方法。4. The topic extraction method according to claim 1, wherein a word sequence of a plurality of upper candidates obtained during continuous speech recognition is used as the input word sequence. 入力された音声を、音響モデル及び言語モデルを参照して連続的に音声認識し、その認識結果として単語系列を得、
この単語系列の各単語の上記認識時の音響的な尤度と対応した値と、言語的な尤度と対応した値との双方に基づいて各単語の認識尤度を求め、
上記各単語の音響的尤度と対応した値と言語的尤度と対応した値との差分値を求め、
上記各単語に対応する差分値が大きいほど認識尤度が低くなる重みを上記各単語の認識尤度に与えたものを各単語の信頼度とし、
入力された音声を連続的に音声認識して単語系列を得、
複数の話題単語と、その各話題単語と各一般の単語との関連度とがそれぞれ格納されている話題抽出モデルを参照して、その各話題単語ごとに、上記認識結果の単語系列の各単語との関連度に対し、対応する各単語の上記信頼度により重み付けを行った関連度をそれぞれ求めて関連度系列を作り、
これらの各関連度系列の各関連度の和を求めて、その大きいものから順にQ個(Qは1以上の整数)のものを入力音声の話題単語として出力する各過程を
コンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。
The input speech is continuously recognized by referring to the acoustic model and the language model, and a word sequence is obtained as a result of the recognition.
Finding the recognition likelihood of each word based on both the value corresponding to the acoustic likelihood at the time of recognition of each word of this word sequence and the value corresponding to the linguistic likelihood,
The difference value between the value corresponding to the acoustic likelihood of each word and the value corresponding to the linguistic likelihood is obtained,
The weight given to the recognition likelihood of each word as the recognition likelihood decreases as the difference value corresponding to each of the words is large is defined as the reliability of each word,
Continuously recognizes the input speech to obtain a word sequence,
Referring to a topic extraction model in which a plurality of topic words and the degree of relevance between each topic word and each general word are stored, for each of the topic words, For the degree of relevance with, the degree of relevance weighted by the above-mentioned reliability of each corresponding word is determined to form a relevance series,
And it calculates the sum of the relevance of each of these relevance sequence, <br/> computer each process output as topic words of the input speech and those of the Q pieces (Q is an integer of 1 or more) in order from the large A computer-readable recording medium on which a program to be executed by a computer is recorded.
上記連続音声認識時の音響的な尤度及び言語的な尤度をそれぞれある指定した値域の中の値をとるように変換して上記音響的尤度と対応した値及び上記言語的尤度と対応した値とする過程を含むことを特徴とする請求項5に記載の記録媒体。 A value corresponding to the acoustic likelihood and the linguistic likelihood are converted by converting the acoustic likelihood and the linguistic likelihood at the time of the continuous speech recognition so as to take a value within a specified range, respectively. recording medium according to claim 5, characterized in that it comprises a step shall be the corresponding value. 上記尤度と対応した値を得る過程は、上記連続音声認識時の音響的な尤度または言語的な尤度Pが最大値Lmax 、最小値Lminをとるとき、尤度Pを最大値Smaxと最小値Sminとの間の値をとるように下記式(1)を用いて変換してsを得て、それを上記尤度と対応した値とする過程であることを特徴とする請求項6記載の記録媒体。
=((P−Lmin)/(Lmax−Lmin))・(Smax−Smin)+Smin・・・(1)
Process of obtaining a value corresponding to the likelihood, when acoustic likelihood or linguistic likelihood P k during recognition the continuous speech takes a maximum value L max, the minimum value L min, the likelihood P k maximum value to obtain a s k by conversion using the following equation (1) to take a value between S max and a minimum value S min, it it a a process of a value corresponding with the likelihood 7. The recording medium according to claim 6, wherein:
s k = ((P k -L min) / (L max -L min)) · (S max -S min) + S min ··· (1)
上記連続音声認識時に得られる複数の上位候補の単語系列を上記入力単語系列とすることを特徴とする請求項5乃至7の何れかに記載の記録媒体。The recording medium according to any one of claims 5 to 7, wherein a word sequence of a plurality of high-order candidates obtained during the continuous speech recognition is used as the input word sequence.
JP04965898A 1998-03-02 1998-03-02 Topic extraction method and its program recording medium Expired - Fee Related JP3591695B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP04965898A JP3591695B2 (en) 1998-03-02 1998-03-02 Topic extraction method and its program recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP04965898A JP3591695B2 (en) 1998-03-02 1998-03-02 Topic extraction method and its program recording medium

Publications (2)

Publication Number Publication Date
JPH11249691A JPH11249691A (en) 1999-09-17
JP3591695B2 true JP3591695B2 (en) 2004-11-24

Family

ID=12837294

Family Applications (1)

Application Number Title Priority Date Filing Date
JP04965898A Expired - Fee Related JP3591695B2 (en) 1998-03-02 1998-03-02 Topic extraction method and its program recording medium

Country Status (1)

Country Link
JP (1) JP3591695B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7062966B2 (en) * 2018-01-19 2022-05-09 富士フイルムビジネスイノベーション株式会社 Voice analyzer, voice analysis system, and program

Also Published As

Publication number Publication date
JPH11249691A (en) 1999-09-17

Similar Documents

Publication Publication Date Title
US6385579B1 (en) Methods and apparatus for forming compound words for use in a continuous speech recognition system
US7254529B2 (en) Method and apparatus for distribution-based language model adaptation
EP1128361B1 (en) Language models for speech recognition
JP3933750B2 (en) Speech recognition method and apparatus using continuous density Hidden Markov model
US5884259A (en) Method and apparatus for a time-synchronous tree-based search strategy
US20060020461A1 (en) Speech processing apparatus, speech processing method, program, and recording medium
WO2004034378A1 (en) Language model creation/accumulation device, speech recognition device, language model creation method, and speech recognition method
JPWO2007142102A1 (en) Language model learning system, language model learning method, and language model learning program
Chen et al. Lightly supervised and data-driven approaches to mandarin broadcast news transcription
US20100324897A1 (en) Audio recognition device and audio recognition method
US6016470A (en) Rejection grammar using selected phonemes for speech recognition system
JP5274191B2 (en) Voice recognition device
JP3819896B2 (en) Speech recognition method, apparatus for implementing this method, program, and recording medium
JP3660512B2 (en) Voice recognition method, apparatus and program recording medium
Ogawa et al. Estimating speech recognition accuracy based on error type classification
JP2002358097A (en) Voice recognition device
JP3444108B2 (en) Voice recognition device
JP2886121B2 (en) Statistical language model generation device and speech recognition device
JP3591695B2 (en) Topic extraction method and its program recording medium
KR100480790B1 (en) Method and apparatus for continous speech recognition using bi-directional n-gram language model
JP2005275348A (en) Speech recognition method, device, program and recording medium for executing the method
Hwang et al. Building a highly accurate Mandarin speech recognizer
JP4741452B2 (en) Language model creation device, language model creation program, speech recognition device, and speech recognition program
JPH08241096A (en) Speech recognition method
CN117351944B (en) Speech recognition method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20040305

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20040330

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20040531

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20040727

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7426

Effective date: 20040819

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20040819

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080903

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080903

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090903

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090903

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100903

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100903

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110903

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120903

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130903

Year of fee payment: 9

LAPS Cancellation because of no payment of annual fees