JP3591695B2

JP3591695B2 - Topic extraction method and its program recording medium

Info

Publication number: JP3591695B2
Application number: JP04965898A
Authority: JP
Inventors: 克年大附; 昭一松永
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-03-02
Filing date: 1998-03-02
Publication date: 2004-11-24
Anticipated expiration: 2018-03-02
Also published as: JPH11249691A

Description

【０００１】
【発明の属する技術分野】
この発明は、連続発声された音声に対してその内容を表す話題を抽出する方法、及びそのプログラム記録媒体に関する。
【０００２】
【従来の技術】
連続発声された音声に対して連続音声認識を行い、その結果の単語系列から、音声の内容を表す複数の単語の組み合わせを抽出する話題抽出では、大量のテキストデータを用いて学習された確率モデルに基づいて、音声の内容を表す単語（話題単語）が抽出される。例えば、
今井，小林，安藤，“話題混合モデルによる日本語ニュースからの話題抽出”、日本音響学会平成９年度秋季研究発表会、３−１−７，ｐｐ．９９−１００。
【０００３】
大附，松岡，松永，古井，“ニュース音声を対象とした連続音声認識に基づく話題抽出”、日本音響学会平成９年度秋季研究発表会、３−１−８，ｐｐ．１０１−１０２。
特願平９−１６０９５４号
などに示されている。
【０００４】
この連続発声された音声の話題単語の抽出を簡単に述べる。図２に示すように入力音声は音響モデル、言語モデルを参照して連続音声認識される（Ｓ１）、その認識結果として単語系列が得られる。この単語系列の各単語の、話題単語に対する関連度を話題抽出モデルを参照して求められ、関連度系列が各話題単語ごとに作られる（Ｓ２）。つまり話題単語モデルには例えば図３Ａに示すように、話題単語ｔ_１と単語Ｗ１１，Ｗ１２，Ｗ１３，・・・との各関連度ｒ１１１，ｒ１１２，ｒ１１３，・・・が格納され、話題単語ｔ_２と単語Ｗ２１，Ｗ２２，Ｗ２３，・・・との各関連度ｒ２２１，ｒ２２２，ｒ２２３，・・・が格納され、以下同様に各話題単語ごとに、各単語との関連度が格納されている。
【０００５】
連続音声認識の結果の単語系列が例えば図３Ｂに示すように、Ｗ１，Ｗ２，・・・Ｗｎであった場合、話題単語抽出モデルを参照して、その話題単語ｔ_１と単語系列の単語Ｗ１との関連度がｒ１１、単語Ｗ２との関連度がｒ１２と、・・・単語Ｗｎとの関連度がｒ１ｎと求めて関連度系列を得、また同様に話題単語ｔ_２と単語系列の単語Ｗ１，Ｗ２，・・・Ｗｎとの各関連度がｒ２１，ｒ２２，・・・ｒ２ｎと求めて関連度系列を得、同様にして全ての話題単語について単語系列の各単語との関連度を求めて関連度系列を得る。
【０００６】
これら各話題単語ｔ_１・・・ｔ_Ｍことの関連度系列における関連度の和、つまり、ｔ_１についてＲ_１＝Σ _ｋ＝１ ^ｎｒ_１ｋ，ｔ_２についてＲ_２＝Σ _ｋ＝１ ^ｎｒ_２ｋ
・・・を求められて、単語系列Ｗ１，Ｗ２，・・・Ｗｎに対する各話題単語ｔ_１，ｔ_２・・・の関連度Ｒ_１，Ｒ_２・・・を得る（Ｓ３）。この話題単語関連度Ｒ_１，Ｒ_２・・・から大きいのから順にＱ個（Ｑは１以上の整数）が、その入力音声に対する話題単語として出力される（Ｓ４）。
【０００７】
上述したように従来の話題抽出法は、連続音声認識結果が誤りを含むにもかかわらず、認識されたすべての単語を均等の重みで扱っていた。
【０００８】
【発明が解決しようとする課題】
従来の連続音声認識に基づく話題抽出方法では、音声認識結果として得られる単語系列の各単語をすべて均等の重みで扱っていた。放送ニュースの連続音声認識結果には、１０％〜３０％ほどの認識誤り単語が含まれるが、それらの誤り単語も正しく認識された単語と同じ重みで扱われるために、それらの誤り単語によって話題抽出性能が低下していた。
【０００９】
この発明の目的は、音声認識結果の各単語の確からしさによって単語を扱う重みを変えることにより、誤り単語の話題抽出への影響を小さくし、話題抽出性能を改善した話題抽出方法を提供することにある。
【００１０】
【課題を解決するための手段】
この発明の話題抽出方法では、入力音声の連続音声認識結果から、単語系列、各単語の音響的な尤度、各単語の言語的な尤度を得、各話題単語と単語系列中の各単語との関連度に対して、各単語の尤度に基づいて得られる信頼度による重み付けをした上で話題単語ごとの関連度系列を得、これら単語系列の関連度中の大きいものと対応する話題単語を入力音声に対する話題として出力する。
【００１１】
【発明の実施の形態】
音声認識時の尤度に基づく重み係数を用いたこの発明の実施例の話題抽出の方法を図１を参照して説明する。
まず各単語の尤度に基づいて信頼度を得る方法を説明する。連続音声認識（Ｓ１）時に得られる単語の尤度には、音響的な確からしさを表す音響尤度と、言語的な確からしさを表す言語尤度とがある。音素ＨＭＭ（隠れマルコフモデル）による音響モデル（Ｓ２）と、単語ｎ−ｇｒａｍ言語モデル（Ｓ３）を用いた連続音声認識システムの場合には、各単語の音響尤度として、音素ＨＭＭを連結した単語モデルに、その単語の音声区間のパラメータを入力した際に各分析フレームにおいて得られる尤度の累積値が得られる。言語尤度としては、単語ｂｉｇｒａｍの場合には、先行単語の次に当該単語の出現する確率、単語ｔｒｉｇｒａｍの場合には、先先行単語、先行単語に続いて当該単語が出現する確率に基づく尤度が得られる。認識の際には、これらの２種類の尤度を音響モデルと言語モデルとのバランスをとるための重み付けで加算（対数領域）したものが単語の尤度となり、累積尤度が最大となるような単語系列が認識結果として出力される。
【００１２】
認識結果として得られるのは、単語系列全体の尤度が最大となる単語系列（Ｓ４）であるため、各単語に着目すると尤度に局所的な落ち込みのあるものがある。また単語尤度は高くても、音響尤度は高いが言語尤度が低い、またはその逆の場合が考えられる。音響尤度または言語尤度が低い単語は信頼度が低く誤りである可能性が高い。
【００１３】
音響尤度（Ｓ５）は前記のように単語の音声区間で累積した値であるために、区間長の異なる単語間同士の比較に用いることができない。そこで、単語の音響尤度を区間長（分析フレーム数）で正規化（Ｓ６）することにより、一分析フレームあたりの音響尤度（正規化音響尤度）を得ることができる。放送ニュース音声１４２文の連続音声認識結果の単語系列に対して正規化音響尤度を求めると、最大値は５４．９１、最小値は１２．７６になった。この値をそのまま信頼度として用いると扱いにくいため、式（１）を用いて重み係数ｓ_ｋに変換（Ｓ７）する。
【００１４】
ｓ_ｋ＝（（Ｐ_ｋ−Ｌ_ｍｉｎ）／（Ｌ_ｍａｘ−Ｌ_ｍｉｎ））・（Ｓ_ｍａｘ−Ｓ_ｍｉｎ）＋Ｓ_ｍｉｎ…（１）
つまり尤度Ｐ_ｋが最大値Ｌ_ｍａｘと最小値Ｌ_ｍｉｎとの間をとるとき、尤度Ｐ_ｋが最大値Ｓ_ｍａｘと最小値Ｓ_ｍｉｎとの間をとるように変換する。
正規化音響尤度ａを最大値１，最小値０となるような重み係数ｗ_ａに変換する式は次式のようになる。
【００１５】
ｗ_ａ＝［（ａ−１２．７６）／（５４．９１−１２．７６）］・（１−０）＋０ …（２）
言語尤度は単語が出現する確率に基づく値であるので、区間長による正規化の必要はない。言語尤度（Ｓ８）も音響尤度と同様に式（１）を用いて最大値１，最小値０となるような重み係数に変換することができる。
次にこの重み係数を用いて話題抽出を行う方法を説明する。話題抽出モデルＳ１０には各話題単語と、その単語系列における各単語との関連度がそれぞれ格納されている。従ってこの話題抽出モデルＳ１０を参照して、各話題単語の、単語系列に対する関連度は、従来技術の項で述べたように、各話題単語ｔ_ｊに対する単語系列Ｗ中の各単語ｗ_ｋとの関連度ｒ_ｋｊの和Ｒ（Ｗ，ｔ_ｊ）として次式のように求められる。
【００１６】
Ｒ（Ｗ，ｔ_ｊ）＝Σ _ｋ＝１ ^ｎｒ_ｋｊ …（３）
ｎは単語系列Ｗの単語数
従来の話題抽出では、この単語系列に対する関連度の大きい話題単語を関連度の大きなものから複数抽出する。
この発明による各単語の尤度に基づく重み係数ｓ_ｋを用いた話題抽出では、単語系列中の各単語の関連度ｒ_ｋｊに対して重み付けｓ_ｋを行い、次式のように関連度の和を求める（Ｓ９）。
【００１７】
Ｒ（Ｗ，ｔ_ｊ）＝Σ _ｋ＝１ ^ｎｓ_ｋｒ_ｋｊ …（４）
また、音響重み係数ａ_ｋと言語重み係数ｂ_ｋの積を単語の重み係数として次式のように関連度の和を求めることもできる。
Ｒ（Ｗ，ｔ_ｊ）＝Σ _ｋ＝１ ^ｎ（ａ_ｋｂ_ｋ）・ｒ_ｋｊ …（５）
このように音響重み係数ａ_ｋ、言語重み係数ｂ_ｋを用いて話題抽出を行うことにより、音響尤度や言語尤度が低くなっている認識誤りの単語に対する関連度を低く見積もることができ、認識誤り単語による話題抽出の性能劣化を抑えることができる。
【００１８】
さらに、入力単語系列として、連続音声認識結果の上位複数の単語系列候補を用いることにより、複数候補中に安定して現れる正解単語に重み付けした話題抽出を行うことができる。また上述の説明から明らかなように、各単語の認識時における信頼度による重み付けとして、言語的尤度、音響的尤度に基づき、計算した値を用いたが、これら尤度自体を用いてもよく、上記例以外の変換形式により変形した値を用いてもよい。更に言語的尤度と音響的尤度との差が大きい場合は、その単語の認識尤度は低いと考えられ、これら両尤度と対応した値の差を用いて、この差が大きい程、信頼度が低くなる重みを関連度に与えてもよい。
【００１９】
【発明の効果】
評価は、放送ニュース音声の２万語彙の大語彙連続音声認識システムによる音声認識結果に対して、この発明の評価を行った。評価用音声の書き起こし文に対して３人の被験者が人手で付与した話題を評価対象とした。話題単語を１０単語抽出した場合の適合率（抽出した話題単語のうち、正解の話題単語の割合）は、重み係数を用いない場合に６２．４％であったが、この発明を適用し、音響重みを用いた場合は、６３．１％、言語重みを用いた場合は６６．９％、とそれぞれ向上し、更に単語重み（音響重みと言語重みの積）を用いた場合は６８．３％と重み係数を用いることにより改善された。
【００２０】
音声認識結果の上位１０位までの単語系列に対して単語重みを用いて話題抽出を行うと、さらに改善がみられ、１０単語抽出の場合、認識誤りによって低下した話題抽出性能の５５．６％を回復することができた。
この発明によれば、音声認識結果の単語系列中の各単語の音響尤度、言語尤度を話題単語と単語間の関連度に対する重み付けに用いることにより、認識誤りを含む単語系列から正確な話題抽出を行うことができるという利点がある。
【００２１】
つまり、各単語の信頼度を考慮することが可能であり、信頼度の低い単語の重み係数を小さくすることにより、音声認識結果の単語系列の各単語を均等の重みで扱う話題抽出方法に比べ、正確な話題が抽出できるという利点がある。
【図面の簡単な説明】
【図１】この発明の話題抽出方法を示す流れ図。
【図２】従来の話題抽出方法を示す流れ図。
【図３】Ａは話題抽出モデルの例を示す図、Ｂは単語系列と、各話題単語に対する関連度系列との例を示す図である。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method for extracting a topic representing the content of continuously uttered speech, and a program recording medium therefor.
[0002]
[Prior art]
In topic extraction, which performs continuous speech recognition on continuously uttered speech and extracts a combination of multiple words representing the content of the speech from the resulting word sequence, a probability model learned using a large amount of text data , A word (topic word) representing the content of the voice is extracted. For example,
Imai, Kobayashi, Ando, "Topic Extraction from Japanese News Using Topic Mixture Model", The Acoustical Society of Japan, Fall Meeting, 1997, 3-1-7, pp. 147-64. 99-100.
[0003]
Ohtsuki, Matsuoka, Matsunaga, Furui, "Topic Extraction Based on Continuous Speech Recognition for News Speech", The Acoustical Society of Japan, Fall Meeting, 1997, 3-1-8, pp. 146-64. 101-102.
This is disclosed in Japanese Patent Application No. 9-160954.
[0004]
The extraction of the topic word of the continuously uttered speech will be briefly described. As shown in FIG. 2, the input speech is subjected to continuous speech recognition with reference to an acoustic model and a language model (S1), and a word sequence is obtained as a result of the recognition. The degree of relevance of each word in this word sequence to the topic word is determined with reference to the topic extraction model, and a relevance series is created for each topic word (S2). That is, as the topic word model shown in FIG. 3A for example, the topic word _{t 1} and words W11, W12, W13, each relevance between ··· r111, r112, r113, ··· are stored, the topic word t ₂ and the words W21, W22, W23,..., Are stored, and similarly, for each topic word, the relevance to each word is stored. .
[0005]
As successive results word sequence of the speech recognition is shown in FIG. 3B for example, W1, W2, if it was · · · Wn, with reference to the topic word extraction model, the words of the topic words t ₁ and word sequence W1 relevance of the r11, the relevance of the word W2 is r12, give relevance sequence relevance of ... word Wn is determined to R1n, Similarly word topic words t ₂ and word sequence W1 , W2,..., Wn are obtained as r21, r22,... R2n to obtain a relevance sequence, and similarly, the relevance of each topic word to each word in the word sequence is obtained. Get the relevance series.
[0006]
The sum of relevance in the context of series of that these each topic word _{_t} 1 ··· _t _M, that is, for _{_{_{^{t 1 R 1 = Σ k =}}}} 1 n r 1k, for _{_{_{^{t 2 R 2 = Σ k =}}}} 1 n r _2k
.. Are obtained, and the relevance R ₁ , R _2, ... Of each topic word t ₁ , t _2, ... To the word series W1, W2,. The Q words (Q is an integer of 1 or more) in descending order from the topic word relevance R ₁ , R ₂ ... Are output as the topic words for the input voice (S 4).
[0007]
As described above, the conventional topic extraction method treats all recognized words with equal weight, even though the continuous speech recognition result includes an error.
[0008]
[Problems to be solved by the invention]
In a conventional topic extraction method based on continuous speech recognition, all words in a word sequence obtained as a result of speech recognition are all treated with equal weight. The continuous speech recognition result of broadcast news contains about 10% to 30% of recognition error words. Since these error words are also treated with the same weight as the correctly recognized words, the topic of the word is determined by the error words. Extraction performance was reduced.
[0009]
SUMMARY OF THE INVENTION An object of the present invention is to provide a topic extraction method in which the influence on the topic extraction of an erroneous word is reduced by changing the weight for handling the word depending on the likelihood of each word in the speech recognition result, and the topic extraction performance is improved. It is in.
[0010]
[Means for Solving the Problems]
According to the topic extraction method of the present invention, a word sequence, an acoustic likelihood of each word, and a linguistic likelihood of each word are obtained from a continuous speech recognition result of an input speech, and each topic word and each word in the word sequence are obtained. Is weighted by the degree of reliability obtained based on the likelihood of each word, and a relevance series for each topic word is obtained. Output words as topics for input speech.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
A topic extraction method according to an embodiment of the present invention using a weight coefficient based on the likelihood at the time of speech recognition will be described with reference to FIG.
First, a method of obtaining reliability based on the likelihood of each word will be described. The likelihood of a word obtained at the time of continuous speech recognition (S1) includes an acoustic likelihood indicating acoustic certainty and a linguistic likelihood indicating linguistic certainty. In the case of a continuous speech recognition system using an acoustic model (S2) based on a phoneme HMM (Hidden Markov Model) and a word n-gram language model (S3), a word obtained by connecting phoneme HMMs is used as the acoustic likelihood of each word. The cumulative value of the likelihood obtained in each analysis frame when the parameters of the speech section of the word are input to the model is obtained. The linguistic likelihood is the likelihood based on the probability that the word will appear next to the preceding word in the case of the word bigram, and the preceding preceding word in the case of the word trigram, and the probability that the word will appear following the preceding word. Degree. At the time of recognition, these two types of likelihoods are added (logarithmic domain) by weighting to balance the acoustic model and the language model, which becomes the likelihood of the word, and the cumulative likelihood is maximized. Is output as a recognition result.
[0012]
Since the result of recognition is a word sequence (S4) in which the likelihood of the entire word sequence is maximized, there is a case where there is a local drop in the likelihood when focusing on each word. Also, it is conceivable that the word likelihood is high, but the acoustic likelihood is high but the language likelihood is low, or vice versa. Words with low acoustic or linguistic likelihood have low reliability and are likely to be incorrect.
[0013]
Since the acoustic likelihood (S5) is a value accumulated in the speech section of a word as described above, it cannot be used for comparison between words having different section lengths. Therefore, the sound likelihood of a word is normalized by the section length (the number of analysis frames) (S6), so that the sound likelihood per analysis frame (normalized sound likelihood) can be obtained. When the normalized acoustic likelihood was calculated for the word sequence of the continuous speech recognition result of 142 sentences of the broadcast news audio, the maximum value was 54.91 and the minimum value was 12.76. Therefore the value as it is difficult to handle and use as the reliability, converting (S7) the weighting coefficient s _k using Equation (1).
[0014]
_{_{_{_{s k = ((P k -L}}}} min) / (L max -L min)) · (S max -S min) + S min ... (1)
That likelihood _{P k} is the time to take between the maximum value _{L max} and the minimum value _{L min,} the likelihood _{P k} is converted to take between the maximum value _{S max} and a minimum value _{S min.}
Wherein converting the normalized acoustic likelihood a maximum value 1, the weighting coefficient w _a such that the minimum value 0 is expressed by the following equation.
[0015]
_{w a = [(a-12.76} ) / (54.91-12.76)] · (1-0) +0 ... (2)
Since the linguistic likelihood is a value based on the probability that a word appears, there is no need for normalization by section length. The linguistic likelihood (S8) can be converted into a weight coefficient such that the maximum value and the minimum value are 0 using Expression (1), similarly to the acoustic likelihood.
Next, a method of extracting a topic by using the weight coefficient will be described. The topic extraction model S10 stores the degree of association between each topic word and each word in the word sequence. Thus with reference to this topic extraction model S10, each thread word relevance for the word sequence, as described in the prior art section, with each word w _k in the word sequence W for each topic word t _j The sum R (W, t _j ) of the relevance _rkj is obtained as in the following equation.
[0016]
R (W, t _j ) = Σ _{k = 1} ⁿ r _kj (3)
n is the number of words in the word series W In the conventional topic extraction, a plurality of topic words having a high degree of relevance to this word series are extracted from those having a high degree of relevance.
The topic extraction with weighting factors s _k based on the likelihood of each word according to the present invention, performs weighting s _k for each word of the relevance r _kj in the word sequence, the sum of relevance to the following equation (S9).
[0017]
_{R (W, t j) =} Σ k = 1 n s k r kj ... (4)
In addition, the sum of the relevance can be _{obtained by using} the product of the acoustic weighting coefficient a _k and the language weighting coefficient b _k as a word weighting coefficient as in the following equation.
R (W, t _j ) = Σ _{k = 1} ⁿ (a _k b _k ) · r _kj (5)
In this way, by performing topic extraction using the acoustic weighting coefficient a _k and the language weighting coefficient b _k , it is possible to estimate the relevance of a recognition error word whose acoustic likelihood and language likelihood are low, Deterioration of performance of topic extraction due to recognition error words can be suppressed.
[0018]
Further, by using a plurality of top word sequence candidates of the result of the continuous speech recognition as the input word sequence, it is possible to perform a topic extraction in which the correct words appearing stably in the plurality of candidates are weighted. Further, as is clear from the above description, as the weighting based on the reliability at the time of recognizing each word, the values calculated based on the linguistic likelihood and the acoustic likelihood are used. Alternatively, a value modified by a conversion format other than the above example may be used. Further, when the difference between the linguistic likelihood and the acoustic likelihood is large, it is considered that the recognition likelihood of the word is low, and using the difference between these two likelihoods and the corresponding value, the larger the difference, A weight that reduces the reliability may be given to the degree of association.
[0019]
【The invention's effect】
For the evaluation, the present invention was evaluated with respect to a speech recognition result by a large vocabulary continuous speech recognition system of 20,000 vocabulary of broadcast news speech. Topics that were manually added by three subjects to the transcript of the evaluation speech were evaluated. The relevance rate when 10 topic words were extracted (the ratio of correct topic words in the extracted topic words) was 62.4% when the weight coefficient was not used. When the acoustic weight is used, it is improved to 63.1%, and when the language weight is used, it is improved to 66.9%. Further, when the word weight (product of the acoustic weight and the language weight) is used, 68.3%. It was improved by using% and weighting factor.
[0020]
When topic extraction is performed using word weights for the top 10 word sequences in the speech recognition result, further improvement is seen. In the case of 10-word extraction, 55.6% of the topic extraction performance reduced due to recognition errors. Was able to recover.
According to the present invention, by using the acoustic likelihood and the linguistic likelihood of each word in the word sequence of the speech recognition result for weighting the relevance between the topic word and the word, an accurate topic from the word sequence including the recognition error can be obtained. There is the advantage that extraction can be performed.
[0021]
In other words, it is possible to consider the reliability of each word, and by reducing the weighting factor of words with low reliability, a topic extraction method that treats each word of the word sequence of the speech recognition result with equal weight is used. There is an advantage that an accurate topic can be extracted.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a topic extraction method of the present invention.
FIG. 2 is a flowchart showing a conventional topic extraction method.
3A is a diagram illustrating an example of a topic extraction model, and FIG. 3B is a diagram illustrating an example of a word sequence and a relevance sequence for each topic word.

Claims

Using a topic extraction model in which a plurality of topic words and the degree of association between each topic word and each general word are stored,
In a method for extracting a topic word representing the content of a series of words as a result of continuous speech recognition of input speech,
For each topic word in the topic extraction model, the relevance between the word and each word of the input word sequence is determined by referring to the topic extraction model to form a relevance sequence,
The relevance of each topic word with respect to the word sequence is obtained by summing the relevance of each of these relevance sequences,
A topic extraction method for outputting topic words corresponding to Q words (Q is an integer of 1 or more) in ascending order of relevance to these word sequences,
Finding the recognition likelihood of each word based on both the value corresponding to the acoustic likelihood at the time of recognition of each word and the value corresponding to the linguistic likelihood,
Find the difference between the value corresponding to the acoustic likelihood of each word and the value corresponding to the linguistic likelihood,
The weight given to the recognition likelihood of each word as the recognition likelihood decreases as the difference value corresponding to each of the words is large is defined as the reliability of each word,
Relative relevance of each word of each topic word and the word sequence of the topic extraction model, I corresponding row weighted by the reliability of each word, characterized in that the sum of the respective relevance Topic extraction method.

The acoustic likelihood and the linguistic likelihood at the time of the continuous speech recognition are converted so as to take a value within a certain specified range, respectively , and the values corresponding to the acoustic likelihood and the linguistic likelihood are corresponded. 2. The topic extraction method according to claim 1, wherein the value is a value obtained by performing the above operation.

When the acoustic or linguistic likelihood P _k at the time of continuous speech recognition takes the maximum value L _max and the minimum value L _min , the likelihood P _k is set between the maximum value S _max and the minimum value S _min . to obtain s _k by conversion using the following equation (1) to take a value, topic extraction method according to claim 2, wherein using the same as a value corresponding to the likelihood.
_{_{s k = ((P k -L}} min) / (L max -L min)) · (S max -S min) + S min ··· (1)

4. The topic extraction method according to claim 1, wherein a word sequence of a plurality of upper candidates obtained during continuous speech recognition is used as the input word sequence.

The input speech is continuously recognized by referring to the acoustic model and the language model, and a word sequence is obtained as a result of the recognition.
Finding the recognition likelihood of each word based on both the value corresponding to the acoustic likelihood at the time of recognition of each word of this word sequence and the value corresponding to the linguistic likelihood,
The difference value between the value corresponding to the acoustic likelihood of each word and the value corresponding to the linguistic likelihood is obtained,
The weight given to the recognition likelihood of each word as the recognition likelihood decreases as the difference value corresponding to each of the words is large is defined as the reliability of each word,
Continuously recognizes the input speech to obtain a word sequence,
Referring to a topic extraction model in which a plurality of topic words and the degree of relevance between each topic word and each general word are stored, for each of the topic words, For the degree of relevance with, the degree of relevance weighted by the above-mentioned reliability of each corresponding word is determined to form a relevance series,
And it calculates the sum of the relevance of each of these relevance sequence, <br/> computer each process output as topic words of the input speech and those of the Q pieces (Q is an integer of 1 or more) in order from the large A computer-readable recording medium on which a program to be executed by a computer is recorded.

A value corresponding to the acoustic likelihood and the linguistic likelihood are converted by converting the acoustic likelihood and the linguistic likelihood at the time of the continuous speech recognition so as to take a value within a specified range, respectively. recording medium according to claim 5, characterized in that it comprises a step shall be the corresponding value.

Process of obtaining a value corresponding to the likelihood, when acoustic likelihood or linguistic likelihood P _k during recognition the continuous speech takes a maximum value L _max, the minimum value L _min, the likelihood P _k maximum value to obtain a s _k by conversion using the following equation (1) to take a value between S _max and a minimum value S _min, it it a a process of a value corresponding with the likelihood 7. The recording medium according to claim 6, wherein:
_{_{s k = ((P k -L}} min) / (L max -L min)) · (S max -S min) + S min ··· (1)

The recording medium according to any one of claims 5 to 7, wherein a word sequence of a plurality of high-order candidates obtained during the continuous speech recognition is used as the input word sequence.