JP4878220B2

JP4878220B2 - Model learning method, information extraction method, model learning device, information extraction device, model learning program, information extraction program, and recording medium recording these programs

Info

Publication number: JP4878220B2
Application number: JP2006155970A
Authority: JP
Inventors: 克仁須藤; 元塚田; 秀樹磯崎
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-06-05
Filing date: 2006-06-05
Publication date: 2012-02-15
Anticipated expiration: 2026-06-05
Also published as: JP2007322984A

Description

本発明は、データの情報抽出を行う技術に関する。 The present invention relates to a technique for extracting data information.

従来、音声データや画像データ（以下、「元データ」という）から言語的な情報、例えば特定のキーワードや、人名・地名等の情報抽出時には、音声認識や文字認識等の技術を用いて、元データに含まれていると推定される単語列または文字列（以下、「認識単語列」という）を得て、その後、情報の抽出が行われる。その場合、認識単語列には、音声認識や文字認識の誤り（以下、「認識誤り」という）が含まれ得るため、本来抽出されるべき情報が欠落したり、元データに含まれていない誤った情報が抽出されたりすることがある。従って、認識誤りの影響を軽減することが必要となる。 Conventionally, when extracting linguistic information from voice data or image data (hereinafter referred to as “original data”), for example, information such as a specific keyword, a person name, a place name, etc., using a technique such as voice recognition or character recognition, A word string or a character string estimated to be included in the data (hereinafter referred to as “recognized word string”) is obtained, and then information is extracted. In that case, the recognition word string may contain an error in speech recognition or character recognition (hereinafter referred to as “recognition error”), so that information that should be extracted is missing or an error that is not included in the original data. Information may be extracted. Therefore, it is necessary to reduce the influence of recognition errors.

音声認識を用いた情報抽出の技術としては、認識結果に対して各認識単語が正しいか否かを測る、認識確信度（confidence、以下「確信度」と適宜記載）と呼ばれる値を利用して認識誤りを予測し、情報抽出時に受ける影響を軽減する方法がある（例えば、非特許文献１参照）。 Information extraction technology using speech recognition uses a value called recognition confidence (confidence, hereinafter referred to as “confidence” as appropriate), which measures whether each recognition word is correct for the recognition result. There is a method for predicting a recognition error and reducing the influence of information extraction (see Non-Patent Document 1, for example).

当該技術では、公知の技術である隠れマルコフモデル（Hidden Markov Model）に類した生成モデル（generative model）を利用して、認識単語列に抽出対象の種別を示すラベル（出力ラベル）を付与する。この生成モデルでは、単語の系列Ｗと出力ラベルの系列Ｎの同時確率Ｐ（Ｗ，Ｎ）を最大化するようなＮを選択する。一般に、系列全体の同時確率の推定には、学習データが不足するため、ある位置ｉに存在する単語ｗｉとこの単語ｗｉに対応する出力ラベルｎｉとの同時確率は、直前Ｉ個の単語と直前Ｊ個の出力ラベルのみに依存する、というような近似が行われる。 In this technique, a label (output label) indicating the type of an extraction target is assigned to a recognized word string using a generative model similar to a hidden Markov model that is a known technique. In this generation model, N that maximizes the joint probability P (W, N) of the word sequence W and the output label sequence N is selected. In general, since there is insufficient learning data to estimate the joint probability of the entire sequence, the joint probability between the word wi existing at a certain position i and the output label ni corresponding to the word wi is the immediately preceding I word and the immediately preceding word. An approximation is made that depends only on J output labels.

非特許文献１における生成モデルの学習には、音声認識誤りを含み、認識誤りした単語は「誤認識した単語」として表現されたデータと、認識誤りを含まないテキストデータとを用いる。なお、それぞれのデータの各単語には、抽出対象の種別を示すラベルが付与されている。そして、情報抽出時には、各認識単語の認識確信度を用いて、その単語の認識の正誤や、付与されるラベル等の様々な可能性を考慮し、最終的に同時確率が最適となるようなラベルの系列を出力する。この生成モデルでは、認識誤りである単語には抽出対象にならないようにラベルが付与されている。そのため、生成モデルを利用した場合、認識確信度の低い認識単語に対しては、その認識単語が正しいものとして処理した場合の尤もらしさと、その認識単語が誤っているものとして処理した場合の尤もらしさとを比較してから、認識単語を選択することで、認識誤りの影響を軽減させている。 The learning of the generation model in Non-Patent Document 1 uses data in which a recognition error is included as a word that includes a speech recognition error, and text data that does not include a recognition error. Each word of each data is given a label indicating the type to be extracted. And at the time of information extraction, using the recognition certainty of each recognized word, considering the possibility of recognition of the word and various possibilities such as the label to be given, the joint probability is finally optimal Output a series of labels. In this generation model, a word that is a recognition error is given a label so as not to be extracted. Therefore, when the generation model is used, for a recognition word with a low recognition certainty, the likelihood when the recognition word is processed as correct and the likelihood when the recognition word is processed as incorrect are considered. The effect of recognition error is reduced by selecting the recognition word after comparing with the originality.

一方、認識誤りを考慮しない、通常の文字列データからの情報抽出に関する技術としては、自然言語処理における固有表現抽出の技術が知られている。固有表現とは、人名、地名、組織名等の固有名詞や、日付、時間、金額等の特定の実体を指す名詞句のことである。この自然言語処理における固有表現抽出では、隠れマルコフモデルのような生成モデルよりも、最大エントロピーモデル、サポートベクトルマシン（Support Vector Machine、以下「ＳＶＭ」と適宜記載）等の識別モデル（discriminative model）のほうが高い性能を示すことが知られている。 On the other hand, as a technique related to information extraction from normal character string data that does not consider recognition errors, a technique for extracting a specific expression in natural language processing is known. The proper expression is a proper noun such as a person name, a place name, and an organization name, and a noun phrase indicating a specific entity such as a date, time, and money. In the natural language processing, the identification expression (discriminative model) such as the maximum entropy model, support vector machine (hereinafter referred to as “SVM”) is used rather than the generation model such as the hidden Markov model. Is known to exhibit higher performance.

この識別モデルでは、単語の系列Ｗと出力ラベルの系列Ｎの同時確率Ｐ（Ｎ，Ｗ）ではなく、系列Ｗに対する系列Ｎの条件付き確率Ｐ（Ｎ｜Ｗ）を最大化するようなＮを選択する。識別モデルは、識別対象の多数の素性（feature）を利用して、その識別対象がどの種別（人名、地名、日付等）に属するかを判定するためのモデルであり、周辺の単語の情報や品詞の情報等を柔軟に利用できる利点がある。例えば非特許文献２では、ＳＶＭを用い、単語、品詞、単語の文字種（漢字のみから成る単語、数字、・・・等）を素性として利用することにより、高精度な固有表現抽出を実現している。
つまり、識別モデルは、扱われる素性が互いに独立でないような多数の情報について、統合的に用いることができる。 In this identification model, N that maximizes the conditional probability P (N | W) of the sequence N for the sequence W, not the simultaneous probability P (N, W) of the sequence W of words and the sequence N of output labels. select. The identification model is a model for determining which type (person name, place name, date, etc.) the identification object belongs to by using a number of features of the identification object. There is an advantage that information of part of speech etc. can be used flexibly. For example, Non-Patent Document 2 uses SVM to realize high-precision specific expression extraction by using words, parts of speech, and word character types (words consisting only of kanji, numbers,...) As features. Yes.
That is, the identification model can be used in an integrated manner with respect to a large number of pieces of information whose handled features are not independent of each other.

それに対して、生成モデルでは、素性の独立性を保つために、「単語表層のみ」、または「単語表層＋品詞」の組み合わせを１つの素性として用いている。この場合、素性の組み合わせによって素性数が増大し、さらにこの組み合わせ素性に統合する情報が増えると、素性数が指数関数的に増大するため、データスパースネス（学習データの不足）の問題はより大きくなる。
そのため、生成モデルにおいて、識別モデルで行われているように多数の情報を統合的に用いること、つまり、素性を生成モデルの情報として用いることは、使用できる情報の制限が大きく、理論的には可能であっても、実現は困難であった。 On the other hand, in the generation model, a combination of “word surface layer only” or “word surface layer + part of speech” is used as one feature in order to maintain the independence of the features. In this case, the number of features increases due to the combination of features, and the number of features increases exponentially as more information is integrated into the combined features, so the problem of data sparseness (learning data shortage) becomes larger. Become.
Therefore, in the generation model, using a large amount of information as is done in the identification model, that is, using the feature as information of the generation model has a large limitation on the information that can be used. Even if possible, it was difficult to realize.

また、生成モデルにおいて、「単語表層」と「品詞」とを独立であるとみなして、それぞれ個別に素性として同時に用いることは、不自然な状態を引き起こす。
その点について具体的に説明する。独立は以下の式（１）で表される。

ここで、Ｐ（Ｘ）をＸの確率、Ｐ（Ｘ，Ｙ）をＸとＹの同時確率、Ｐ（Ｘ｜Ｙ）をＸの条件付確率とする。 In addition, in the generation model, it is considered that “word surface layer” and “part of speech” are independent and simultaneously used as features separately, which causes an unnatural state.
This will be specifically described. Independence is represented by the following formula (1).

Here, let P (X) be the probability of X, P (X, Y) be the joint probability of X and Y, and P (X | Y) be the conditional probability of X.

例えば、ある単語（表層「に」、品詞「助詞・格助詞」）を考えると、両者が独立、すなわちＰ（「に」｜「助詞・格助詞」）＝Ｐ（「に」）であることは、「助詞・格助詞」である単語として「に」を観測する条件付き確率と、任意の単語として「に」を観測する確率とが等しいことになってしまう。このことから、「に」の品詞が「助詞・格助詞」である頻度の多さ（動詞「にる」の未然形、連用形等と比較して）を考えると不自然であることが分かる。
また、単語表層（例：「ＮＴＴ」（登録商標））と文字種（例：「すべてアルファベット大文字」）も同様に独立ではない。
また、階層化された品詞体系を利用し、複数の階層を素性として利用するような場合、品詞の大分類（例：「助詞」）と少し細かい分類（例：「助詞・格助詞」）も同じように独立ではない。
さらに、文字種素性の与え方として、互いに独立でない方法も考えることができる。例えば、単語「ＴＨＩＮＫ」は、「アルファベット」「アルファベット大文字で始まる」「すべてアルファベット大文字」という３つの文字種素性を持つ単語と考えることもできる。 For example, given a word (surface layer “ni”, part of speech “particle / case particle”), both are independent, ie, P (“ni” | “particle / case particle”) = P (“ni”) The conditional probability of observing “ni” as a word that is “particle / case particle” is equal to the probability of observing “ni” as an arbitrary word. From this, it can be seen that it is unnatural considering the frequency with which the part-of-speech of “ni” is “particle / case particle” (compared to the verbal form of the verb “niru”, the combined form, etc.).
Similarly, the word surface layer (for example, “NTT” (registered trademark)) and the character type (for example, “all alphabetic capital letters”) are not independent.
In addition, when using a hierarchical part-of-speech system and using multiple hierarchies as features, there is also a major part-of-speech classification (eg, “particle”) and a slightly finer classification (eg, “particle / case particle”). Just as independent.
Furthermore, methods that are not independent of each other can be considered as a method of giving character type features. For example, the word “THINK” can be considered as a word having three character type features of “alphabet”, “starting with an uppercase alphabetic character”, and “all uppercase alphabetic characters”.

素性が独立であることが必要な生成モデルにおいては、識別モデルのように独立でない素性を利用するためには、独立性を維持できる適切な素性集合を定義する、という困難な作業が要求されることになる。
このことから、自然言語処理分野では識別モデルが多く用いられている。 Generating models that require independence of features requires the difficult task of defining an appropriate feature set that can maintain independence in order to use non-independent features such as the identification model. It will be.
Therefore, many identification models are used in the natural language processing field.

なお、非特許文献３には、識別モデルである最大エントロピーモデルを、音声認識結果に適用する技術が記載されている。 Non-Patent Document 3 describes a technique for applying a maximum entropy model, which is an identification model, to a speech recognition result.

また、特許文献１には、識別モデルにおいて、入力ベクトルの属するクラスの判定に要する計算時間を大幅に削減する分類技術が開示されている。
D.D.Palmer and M.Ostendorf、“Improving information extraction by modeling errors in speech recognizer output”、in Proceedings of the First International Conference on Human Language Technology Research、2001 磯崎秀樹、賀沢秀人、“固有表現抽出のためのＳＶＭの高速化”、情報処理学会論文誌、2003、Vol.44、No.3、pp.970-979 L.Zhai et al.、“Using N-best lists for named entity recognition from Chinese speech”、in Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computation Linguistics、2004、pp.37-40 特開２００３−３３１２５４号公報 Patent Document 1 discloses a classification technique that significantly reduces the calculation time required for determining a class to which an input vector belongs in an identification model.
DDPalmer and M. Ostendorf, “Improving information extraction by modeling errors in speech recognizer output”, in Proceedings of the First International Conference on Human Language Technology Research, 2001 Hideki Amagasaki, Hideto Kazawa, “Acceleration of SVM for Extracting Specific Expressions”, Transactions of Information Processing Society of Japan, 2003, Vol.44, No.3, pp.970-979 L. Zhai et al., “Using N-best lists for named entity recognition from Chinese speech”, in Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computation Linguistics, 2004, pp. 37-40 JP 2003-331254 A

識別モデルは、高性能であることが知られていながら、これまで生成モデルのように入力単語の確信度（正誤予測）の情報が用いられてこなかったため、音声認識結果のような誤りを含む入力に、誤りの存在を考慮しない識別モデルを利用するという形の技術しか存在していなかった（非特許文献３参照）。 Although the identification model is known to have high performance, the input word confidence (correct / predictive) information has not been used as in the generation model so far. However, there is only a technique of using an identification model that does not consider the presence of errors (see Non-Patent Document 3).

また、非特許文献１のような、生成モデルにおいて入力単語の正誤情報を利用する場合にも、誤認識した単語の素性（単語そのもの、品詞、文字種等）の情報を利用することは、前記した素性の独立性の観点から困難であり、単に「誤認識した単語」と表現することしかできず、柔軟な素性設計が困難だった。 In addition, when using correct / incorrect information of input words in the generation model as in Non-Patent Document 1, using the information of misrecognized word features (word itself, part of speech, character type, etc.) It was difficult from the viewpoint of feature independence, and could only be expressed as “a misrecognized word”, and flexible feature design was difficult.

生成モデルにおける確信度は連続値である。
識別モデルにおいて、より多くの情報を用いてモデル情報を学習するために、認識誤りを含まない参照単語列と、認識誤りを含む認識単語列とをモデルの学習用データとして用いた場合、参照単語列と認識単語列とで同じ設計の素性を定義しなければならない。そして、正誤情報を付与するために、生成モデルで利用されるような確信度を用いると、参照単語列の確信度は「正」か「誤」かの２値で表現されるのに対し、認識単語列の確信度は連続値であるために、参照単語列と比較することはできなかった。つまり、識別モデルの学習に確信度を含んだ情報を利用することは困難であった。 The certainty factor in the generation model is a continuous value.
In order to learn model information using more information in an identification model, when a reference word string that does not include a recognition error and a recognition word string that includes a recognition error are used as model learning data, the reference word The same design features must be defined for sequences and recognition word sequences. Then, when using certainty factors such as those used in the generation model in order to give correct / incorrect information, the certainty factor of the reference word string is expressed as a binary value of “correct” or “false” Since the certainty factor of the recognized word string is a continuous value, it cannot be compared with the reference word string. That is, it is difficult to use information including confidence in learning of the identification model.

そこで、本発明では、前記した問題を解決し、認識誤りによる影響を軽減することを目的とする。 Therefore, an object of the present invention is to solve the above-described problems and reduce the influence of recognition errors.

本発明では、識別モデルの学習には、固有表現ラベル（抽出対象情報）が付与された誤りを含む認識単語列と、同様に固有表現ラベル（抽出対象情報）が付与された、誤りを含まない参照単語列（例えば、音声を人手で書き起こしたもの）を用いる。そして、各認識単語の正しさを表す認識確信度素性を２値化したことで、参照単語列は「すべての認識単語が正しいような認識単語列」として扱うことができる。これにより、参照単語列を認識単語列と同時に用いることで、識別モデルの学習効果を向上することが可能となる。 In the present invention, the learning of the identification model does not include the recognition word string including the error to which the specific expression label (extraction target information) is added and the error to which the specific expression label (extraction target information) is similarly added. A reference word string (for example, a transcription of speech manually) is used. Then, by binarizing the recognition certainty feature indicating the correctness of each recognized word, the reference word string can be handled as “a recognized word string in which all recognized words are correct”. Thereby, the learning effect of the identification model can be improved by using the reference word string at the same time as the recognized word string.

また、本発明では、従来技術では用いられなかった、誤認識した単語に関する素性も同時に利用する。生成モデルを利用した従来技術のように「誤認識した単語」という情報だけでなく、それがどの単語として認識されたのか、また、その単語はどのような素性を持つ単語なのか、という情報を、本発明では簡便に利用できる。このような誤認識した単語に関する情報は、誤りを含まない入力を仮定していた識別モデルを用いた従来技術では、考慮されなかった情報である。 In the present invention, a feature related to a misrecognized word that was not used in the prior art is also used. Not only the information of “words that were misrecognized” as in the prior art using generative models, but also information about which words were recognized and what features the words have. In the present invention, it can be easily used. Information regarding such misrecognized words is information that has not been considered in the prior art using an identification model that assumes an input that does not include errors.

そこで、前記課題を解決するため、請求項１に記載のモデル学習方法は、入力された音声または文字のデータに含まれる各単語に対して、単語の種別を示すラベルを付与するためのモデル情報を学習するモデル学習方法であって、音声または文字のデータであるモデル学習用のデータを、音声認識または文字認識により認識した単語の列を認識単語列とし、かつ、前記モデル学習用のデータに対応する正しい認識結果である単語列中の各単語に対して当該単語の種別を示す正しいラベルが付与された単語列を参照単語列としたときに、前記モデル情報を学習するモデル学習装置が、前記認識単語列の各単語と前記参照単語列中の各単語とを比較して、前記参照単語列と一致する前記認識単語列中の単語には認識結果が正しいことを示す情報を認識確信度として付与し、前記参照単語列と一致しない前記認識単語列中の単語には認識結果が誤りであること示す情報を認識確信度として付与することで、認識確信度を備えた正誤情報つき認識単語列を生成する単語列対応付けステップと、前記正誤情報つき認識単語列と前記参照単語列とを比較し、前記参照単語列と一致する前記正誤情報つき認識単語列中の各単語に、当該単語と一致した前記参照単語列に付与されたラベルを付与することで、認識確信度とラベルとを備えた認識単語列学習データを作成する認識単語列学習データ作成ステップと、前記参照単語列中の各単語に、認識結果が正しいことを示す情報を前記認識確信度として付与することで、認識確信度とラベルとを備えた参照単語列学習データを作成する参照単語列学習データ作成ステップと、前記認識単語列学習データおよび前記参照単語列学習データを入力として、サポートベクトルマシン、最大エントロピーモデル、または条件付き確率場を用いて、少なくとも前記認識確信度を素性として、前記入力された音声または文字のデータに含まれる各単語に対して最適な単語の種別を示すラベルを付与するための前記モデル情報を学習し、記憶手段に格納するモデル作成ステップとを含むことを特徴とする。 Therefore, in order to solve the above-mentioned problem, the model learning method according to claim 1, wherein model information for giving a label indicating a word type to each word included in input speech or character data is provided. Model learning data, which is speech or character data, a word sequence recognized by speech recognition or character recognition is used as a recognition word sequence, and the model learning data is used as a model learning method. A model learning device that learns the model information when a word string provided with a correct label indicating the type of the word is used as a reference word string for each word in the word string that is a corresponding correct recognition result , Information indicating that the recognition result is correct for each word in the recognition word string that matches the reference word string by comparing each word in the recognition word string and each word in the reference word string. Correct / incorrect information provided with recognition certainty by giving information indicating that the recognition result is incorrect to the words in the recognized word string that do not match the reference word string as recognition certainty A word string associating step for generating a false recognition word string, comparing the recognition word string with correct / incorrect information with the reference word string, and for each word in the recognized word string with correct / incorrect information that matches the reference word string A recognition word string learning data creation step of creating recognition word string learning data having a recognition certainty factor and a label by giving a label given to the reference word string that matches the word; and the reference word each word in the column, the recognition result that imparts information indicating that correct as the recognition confidence, the reference word sequence learning to create a reference word sequence learning data and a labeled recognition certainty factor And over data producing step, as an input the recognized word sequence training data and the reference word sequence training data, support vector machines, maximum entropy models or by using conditional random field, at least the recognition certainty as features, the A model creation step of learning the model information for assigning a label indicating an optimum word type to each word included in the input speech or character data, and storing the model information in a storage unit. And

かかる手順によれば、モデル学習装置は、正誤情報が付与された認識単語列および参照単語列を比較することが可能となり、それによってモデル情報を作成するので、情報抽出時の精度が向上する。 According to such a procedure, the model learning device can compare the recognized word sequence and the reference word sequence to which the correct / incorrect information is given, thereby creating the model information, so that the accuracy at the time of information extraction is improved.

また、請求項２に記載の情報抽出方法は、音声または文字の入力データから、所定の種別を示すラベルが付与された単語を抽出対象情報として抽出する情報抽出方法であって、音声または文字のデータであるモデル学習用のデータを、音声認識または文字認識により認識した単語の列を認識単語列とし、かつ、前記モデル学習用のデータに対応する正しい認識結果である単語列中の各単語に対して当該単語の種別を示す正しいラベルが付与された単語列を参照単語列としたときに、入力された音声または文字のデータに含まれる各単語に対して、単語の種別を示すラベルを付与するためのモデル情報を学習するモデル学習装置が、前記認識単語列の各単語と前記参照単語列中の各単語とを比較して、前記参照単語列と一致する前記認識単語列中の単語には認識結果が正しいことを示す情報を認識確信度として付与し、前記参照単語列と一致しない前記認識単語列中の単語には認識結果が誤りであること示す情報を認識確信度として付与することで、認識確信度を備えた正誤情報つき認識単語列を生成する単語列対応付けステップと、前記正誤情報つき認識単語列と前記参照単語列とを比較し、前記参照単語列と一致する前記正誤情報つき認識単語列中の各単語に、当該単語と一致した前記参照単語列に付与されたラベルを付与することで、認識確信度とラベルとを備えた認識単語列学習データを作成する認識単語列学習データ作成ステップと、前記参照単語列中の各単語に、認識結果が正しいことを示す情報を前記認識確信度として付与することで、認識確信度とラベルとを備えた参照単語列学習データを作成する参照単語列学習データ作成ステップと、前記認識単語列学習データおよび前記参照単語列学習データを入力として、サポートベクトルマシン、最大エントロピーモデル、または条件付き確率場を用いて、少なくとも前記認識確信度を素性として、前記音声または文字の入力データに含まれる各単語に対して最適な単語の種別を示すラベルを付与するための前記モデル情報を学習し、記憶手段に格納するモデル作成ステップと、前記音声または文字の入力データを音声認識または文字認識により単語列に認識し、認識した複数の単語列の候補をグラフ表現で表わした単語ラティスを作成する単語列認識ステップと、前記単語ラティスに含まれる各単語について、当該単語の認識の正しさを連続値で表現したスコアを算出し、当該スコアが所定の閾値以上であれば認識が正しいことを示す情報を認識確信度として付与し、それ以外の場合は認識が誤りであることを示す情報を認識確信度として各単語に付与することで、認識確信度を備えた前記単語ラティスに対する確信度情報付き認識単語列を作成する単語確信度計算ステップと、前記モデル作成ステップにおいて作成したモデル情報を用いて、前記単語ラティスに対する確信度情報付き認識単語列の各単語に対し、ラベルを付与するラベル付与ステップと、前記単語ラティスに対する確信度情報付き認識単語列から、前記所定の種別に対応するラベルが付与された単語を前記抽出対象情報として抽出する情報抽出ステップとを含むことを特徴とする。 An information extraction method according to claim 2 is an information extraction method for extracting a word with a label indicating a predetermined type as extraction target information from voice or character input data . The model learning data, which is data, is a recognition word string that is a word string recognized by speech recognition or character recognition, and each word in the word string is a correct recognition result corresponding to the model learning data. When a word string with a correct label indicating the word type is used as a reference word string, a label indicating the word type is assigned to each word included in the input speech or character data. A model learning device that learns model information for performing comparison between each word in the recognized word string and each word in the reference word string, and the recognized word string that matches the reference word string Information indicating that the recognition result is correct is given as a recognition certainty, and information indicating that the recognition result is incorrect for a word in the recognition word string that does not match the reference word string is used as the recognition certainty. A word string associating step for generating a recognized word string with correct / incorrect information having a recognition certainty, and comparing the recognized word string with correct / incorrect information with the reference word string to match the reference word string The recognition word string learning data including the recognition certainty factor and the label is created by giving a label given to the reference word string that matches the word to each word in the recognition word string with the correct / incorrect information. A recognition word string learning data creation step, and providing each word in the reference word string with information indicating that the recognition result is correct as the recognition certainty factor, thereby providing a recognition certainty factor and a label. Using a support vector machine, a maximum entropy model, or a conditional random field, with reference word string learning data creating step for creating reference word string learning data, and input of the recognized word string learning data and the reference word string learning data The model information for giving a label indicating the optimum word type to each word included in the speech or character input data is learned and stored in the storage means with at least the recognition certainty as a feature. A model creation step; a word string recognition step of recognizing the input data of the voice or characters into a word string by voice recognition or character recognition; and creating a word lattice representing a plurality of recognized word string candidates in a graph expression ; For each word included in the word lattice, a score representing the correctness of recognition of the word as a continuous value If the score is equal to or higher than a predetermined threshold, information indicating that the recognition is correct is given as a recognition certainty, and otherwise, information indicating that the recognition is incorrect is used as the recognition certainty for each word. by imparting to, using the word confidence calculating step of creating a recognized word sequence with confidence information for the word lattice having a recognition confidence, the model information created in the modeling step, for the word lattice for each word in confidence information with recognized word sequence, wherein the labeling step of applying a label, the certainty factor information-recognized word sequence for the word lattice, a word label corresponding to the predetermined type is granted characterized in that it comprises an information extraction step to extract as the extraction target information.

かかる手順によれば、情報抽出装置は、認識単語列の確信度を正誤情報として算出して、認識単語列に付与する。それにより、正誤情報に基づいて付与された抽出対象情報を用いて、情報抽出することができる。 According to this procedure, the information extraction device calculates the certainty factor of the recognized word string as correct / incorrect information and assigns it to the recognized word string. Thereby, information can be extracted using the extraction object information given based on correct / incorrect information.

また、請求項３に記載のモデル学習装置は、入力された音声または文字のデータに含まれる各単語に対して、単語の種別を示すラベルを付与するためのモデル情報を学習するモデル学習装置であって、音声または文字のデータであるモデル学習用のデータを、音声認識または文字認識により認識した単語の列である認識単語列を作成する単語列認識手段と、前記モデル学習用のデータに対応する正しい認識結果である単語列中の各単語に対して当該単語の種別を示す正しいラベルが付与された単語列を参照単語列である参照単語列中の各単語と前記認識単語列の各単語とを比較して、前記参照単語列と一致する前記認識単語列中の単語には認識結果が正しいことを示す情報を認識確信度として付与し、前記参照単語列と一致しない前記認識単語列中の単語には認識結果が誤りであること示す情報を認識確信度として付与することで、認識確信度を備えた正誤情報つき認識単語列を生成する単語列対応付け手段と、前記正誤情報つき認識単語列と前記参照単語列とを比較し、前記参照単語列と一致する前記正誤情報つき認識単語列中の各単語に、当該単語と一致した前記参照単語列に付与されたラベルを付与することで、認識確信度とラベルとを備えた認識単語列学習データを作成する認識単語列学習データ作成手段と、前記参照単語列中の各単語に、認識結果が正しいことを示す情報を前記認識確信度として付与することで、認識確信度とラベルとを備えた参照単語列学習データを作成する参照単語列学習データ作成手段と、前記認識単語列学習データおよび前記参照単語列学習データを入力として、サポートベクトルマシン、最大エントロピーモデル、または条件付き確率場を用いて、少なくとも前記認識確信度を素性として、前記入力された音声または文字のデータに含まれる各単語に対して最適な単語の種別を示すラベルを付与するための前記モデル情報を学習し、記憶手段に格納するモデル作成手段とを備えることを特徴とする。 The model learning device according to claim 3 is a model learning device that learns model information for assigning a label indicating a word type to each word included in input speech or character data. Corresponding to word sequence recognition means for creating a recognition word sequence that is a sequence of words recognized by speech recognition or character recognition of data for model learning that is speech or character data, and the data for model learning For each word in the word string that is the correct recognition result, a word string that is given a correct label indicating the type of the word is referred to as each word in the reference word string that is a reference word string and each word in the recognized word string Is added to the word in the recognized word string that matches the reference word string as information indicating that the recognition result is correct, and the recognition unit that does not match the reference word string. Recognition result of a word in the string to confer information indicating that an error as a recognition confidence, the word string correlating means for generating errata with recognized word sequence having a recognition confidence, the errata The word recognition word string is compared with the reference word string, and a label given to the reference word string that matches the word is assigned to each word in the recognition word string with correct / incorrect information that matches the reference word string Recognition word string learning data creation means for creating recognition word string learning data having a recognition certainty factor and a label, and information indicating that the recognition result is correct for each word in the reference word string by imparting a recognition confidence, the reference word sequence learning data generating means for generating a reference word sequence learning data and a recognition confidence and labels, the recognition word sequence training data and the reference word sequence learning de As input data, support vector machine, maximum entropy models or by using conditional random field, at least the recognition certainty as features, optimal for each word contained in the data of the input voice or character It comprises a model creation means for learning the model information for giving a label indicating the type of word and storing it in a storage means.

かかる構成によれば、モデル学習装置は、正誤情報が付与された認識単語列および参照単語列を比較することが可能となり、それによってモデル情報を作成するので、情報抽出時の精度が向上する。 According to such a configuration, the model learning device can compare the recognized word sequence to which the correct / incorrect information is given and the reference word sequence, thereby creating the model information, thereby improving the accuracy at the time of information extraction.

また、請求項４に記載の情報抽出装置は、音声または文字の入力データから、所定の種別を示すラベルが付与された単語を抽出対象情報として抽出する情報抽出装置であって、モデル学習時に音声または文字のデータであるモデル学習用のデータを、音声認識または文字認識により認識した単語の列である認識単語列を作成すると共に、情報抽出時に前記音声または文字の入力データを音声認識または文字認識により単語列に認識し、認識した複数の単語列の候補をグラフ表現で表わした単語ラティスを作成する単語列認識手段と、モデル学習時に前記モデル学習用のデータに対応する正しい認識結果である単語列中の各単語に対して当該単語の種別を示す正しいラベルが付与された単語列を参照単語列である参照単語列中の各単語と前記認識単語列の各単語とを比較して、前記参照単語列と一致する前記認識単語列中の単語には認識結果が正しいことを示す情報を認識確信度として付与し、前記参照単語列と一致しない前記認識単語列中の単語には認識結果が誤りであること示す情報を認識確信度として付与することで、認識確信度を備えた正誤情報つき認識単語列を生成する単語列対応付け手段と、モデル学習時に前記正誤情報つき認識単語列と前記参照単語列とを比較し、前記参照単語列と一致する前記正誤情報つき認識単語列中の各単語に、当該単語と一致した前記参照単語列に付与されたラベルを付与することで、認識確信度とラベルとを備えた認識単語列学習データを作成する認識単語列学習データ作成手段と、モデル学習時に前記参照単語列中の各単語に、認識結果が正しいことを示す情報を前記認識確信度として付与することで、認識確信度とラベルとを備えた参照単語列学習データを作成する参照単語列学習データ作成手段と、モデル学習時に前記認識単語列学習データおよび前記参照単語列学習データを入力として、サポートベクトルマシン、最大エントロピーモデル、または条件付き確率場を用いて、少なくとも前記認識確信度を素性として、前記入力された音声または文字のデータに含まれる各単語に対して最適な単語の種別を示すラベルを付与するためのモデル情報を学習し、記憶手段に格納するモデル作成手段と、情報抽出時に前記単語ラティスに含まれる各単語について、当該単語の認識の正しさを連続値で表現したスコアを算出し、当該スコアが所定の閾値以上であれば認識が正しいことを示す情報を認識確信度として付与し、それ以外の場合は認識が誤りであることを示す情報を認識確信度として各単語に付与することで、認識確信度を備えた前記単語ラティスに対する確信度情報付き認識単語列を作成する単語確信度計算手段と、前記モデル作成手段で作成したモデル情報を用いて、情報抽出時に前記単語ラティスに対する確信度情報付き認識単語列の各単語に対し、ラベルを付与するラベル付与手段と、情報抽出時に前記単語ラティスに対する確信度情報付き認識単語列から、前記所定の種別に対応するラベルが付与された単語を前記抽出対象情報として抽出する情報抽出手段とを備えることを特徴とする。 Further, the information extraction device according to claim 4, the input data of voice or text, an information extracting device for extracting the extraction target information words label indicating the predetermined type is applied, the speech during model training Alternatively, a recognition word string that is a string of words recognized by voice recognition or character recognition is created from model learning data that is character data, and the voice or character input data is voice-recognized or character-recognized at the time of information extraction. And a word string recognition means for creating a word lattice representing a plurality of recognized word string candidates in a graph expression, and a word that is a correct recognition result corresponding to the model learning data during model learning A word string in which a correct label indicating the type of the word is assigned to each word in the string is referred to as each word in the reference word string that is a reference word string and the recognition word. Each word in the word string is compared, and information indicating that the recognition result is correct is given to the word in the recognized word string that matches the reference word string as a recognition certainty, and does not match the reference word string A word string associating means for generating a recognition word string with correct / incorrect information having a recognition certainty by giving information indicating that a recognition result is an error to a word in the recognition word string as a recognition certainty; The recognition word string with correct / incorrect information is compared with the reference word string during model learning, and each reference word string in the recognition word string with correct / incorrect information that matches the reference word string is added to the reference word string that matches the word. Recognizing each word in the reference word string at the time of model learning by recognizing word string learning data creating means for creating recognition word string learning data having a recognition certainty factor and a label by giving a given label result By giving information indicating correctness as the recognition certainty, reference word string learning data creating means for creating reference word string learning data having a recognition certainty factor and a label, and the recognized word string learning at the time of model learning Data and the reference word string learning data are input and included in the input speech or character data using at least the recognition confidence as a feature using a support vector machine, a maximum entropy model, or a conditional random field. Learning model information for giving a label indicating the optimum word type for each word, storing the model information in the storage means, and for each word included in the word lattice at the time of information extraction , Calculate a score representing the correctness of recognition as a continuous value, and if the score is equal to or greater than a predetermined threshold, the recognition is correct Is given as a recognition certainty factor, and in other cases, information indicating that the recognition is incorrect is given to each word as a recognition certainty factor , so that the certainty factor for the word lattice having the recognition certainty factor a word confidence measure calculation means for creating information-recognized word sequence, using the model information created by said model creating means, against the time information extraction each word confidence information with recognized word sequence for the word lattice, the label a labeling means for applying, from confidence information-recognized word sequence for the word lattice during information extraction, and information extracting means to extract a word label corresponding to said predetermined type is assigned as the extraction target information It is characterized by providing.

かかる構成によれば、情報抽出装置は、認識単語列の確信度を正誤情報として算出して、認識単語列に付与する。それにより、正誤情報に基づいて付与された抽出対象情報を用いて、情報抽出することができる。 According to this configuration, the information extraction device calculates the certainty factor of the recognized word string as correct / incorrect information and assigns it to the recognized word string. Thereby, information can be extracted using the extraction object information given based on correct / incorrect information.

また、請求項５に記載のモデル学習プログラムは、請求項１に記載のモデル学習方法を、コンピュータに実行させることを特徴とする。
かかる構成によれば、モデル学習プログラムがインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 Further, the model learning program according to claim 5, the model learning process according to claim 1, characterized in that to be executed by a computer.
According to such a configuration, the computer in which the model learning program is installed can realize each function based on this program.

また、請求項６に記載の情報抽出プログラムは、請求項２に記載の情報抽出方法を、コンピュータに実行させることを特徴とする。
かかる構成によれば、情報抽出プログラムがインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 An information extraction program according to claim 6 causes a computer to execute the information extraction method according to claim 2 .
According to this configuration, the computer in which the information extraction program is installed can realize each function based on this program.

また、請求項７に記載の記憶媒体は、請求項５に記載のモデル学習プログラム、または請求項６に記載の情報抽出プログラムを記録したコンピュータに読み取り可能な記録媒体であることを特徴とする。記録媒体には、例えば、ハードディスク、ＣＤ−ＲＯＭ、ＤＶＤ、フレキシブルディスク、メモリなどがある。この記憶媒体をコンピュータに読み込ませることにより、任意のコンピュータ上で情報抽出プログラムの各機能を実行することが可能となる。 The storage medium according to claim 7, characterized in that it is a model learning programs or computer-readable recording medium which records information extraction program according to claim 6, of claim 5. Examples of the recording medium include a hard disk, a CD-ROM, a DVD, a flexible disk, and a memory. By reading this storage medium into a computer, each function of the information extraction program can be executed on an arbitrary computer.

本発明によれば、認識誤りによる影響を軽減させることができる。 According to the present invention, the influence of recognition errors can be reduced.

以下、図面を参照して本発明のモデル学習方法および情報抽出方法を実施するための最良の形態（以下「実施形態」という）について、詳細に説明する。
なお、本実施形態において、モデル学習とは、情報抽出時に用いるモデル情報を作成（更新）するものであり、情報抽出とは、入力されたデータに対し、抽出対象情報を用いて情報抽出を行うものである。 Hereinafter, the best mode (hereinafter referred to as “embodiment”) for carrying out the model learning method and the information extraction method of the present invention will be described in detail with reference to the drawings.
In the present embodiment, model learning is to create (update) model information used at the time of information extraction, and information extraction is information extraction using input object information for input data. Is.

[端末装置の構成]
図１は、本発明に係る端末装置（モデル学習装置、情報抽出装置）の一例を示す機能ブロック図である。
端末装置１００は、図１に示すように、入出力手段１１０と、制御手段１３０と、記憶手段１５０とを備える。 [Configuration of terminal device]
FIG. 1 is a functional block diagram showing an example of a terminal device (model learning device, information extraction device) according to the present invention.
As illustrated in FIG. 1, the terminal device 100 includes an input / output unit 110, a control unit 130, and a storage unit 150.

入出力手段１１０は、例えば、入出力インタフェース等から構成され、入力されたデータや命令を取得したり、所定のデータを出力したりするものである。
ここでは、入出力手段１１０は、入力されたデータを取得し、制御手段１３０へ出力するデータ入力手段１１１と、制御手段１３０から送信された抽出結果の情報を取得し、ディスプレイやプリンタ等に出力するデータ出力手段１１５とを備える。 The input / output means 110 is composed of, for example, an input / output interface, etc., and acquires input data and commands, or outputs predetermined data.
Here, the input / output unit 110 acquires the input data, acquires the data input unit 111 that outputs the data to the control unit 130, and acquires the extraction result information transmitted from the control unit 130, and outputs the information to a display, a printer, or the like. Data output means 115.

制御手段１３０は、例えば、ＣＰＵ（Central Processing Unit）等から構成され、入出力手段１１０および記憶手段１５０を制御すると共に、入力されたデータの情報抽出を行うものであり、図１に示すように、認識部１３１と、学習データ処理部１３３と、モデル作成部１３５と、情報抽出部１３７とを備える。 The control means 130 is composed of, for example, a CPU (Central Processing Unit) and the like, and controls the input / output means 110 and the storage means 150 and extracts information of the input data, as shown in FIG. , A recognition unit 131, a learning data processing unit 133, a model creation unit 135, and an information extraction unit 137.

認識部１３１は、入力された元データをデータ入力手段１１１から取得し、認識単語列などを出力する。元データとは、モデル学習用の音声データや文字データ、情報抽出を行う対象となるデータ等である。
認識部１３１は、音声認識あるいは文字認識を行う単語列認識手段１３１０と、各認識単語の確信度を計算する単語確信度計算手段１３１１とによって構成される。 The recognition unit 131 acquires the input original data from the data input unit 111 and outputs a recognition word string and the like. The original data is model learning speech data, character data, data to be extracted, and the like.
The recognition unit 131 includes a word string recognition unit 1310 that performs voice recognition or character recognition, and a word certainty factor calculation unit 1311 that calculates the certainty factor of each recognized word.

単語列認識手段１３１０は、モデル学習時には、入力された元データ（学習用入力データ）に対し、１またはそれ以上の認識単語列を出力する。また、情報抽出時には、入力された元データに対し、単語ラティス（複数の認識単語列をコンパクトに表現するための、有向非循環のグラフ）を作成し、単語確信度計算手段１３１１に出力する。
これらの音声認識技術の一例として、例えば参考文献１が挙げられる。
［参考文献１］
李晃伸、河原達也、堂下修司、“文法カテゴリ対制約を用いたＡ＊探索に基づく大語彙連続音声認識パーザ”、情報処理学会論文誌、1999、Vol.40，No.4，pp.1374-1382
なお、単語ラティスの具体的な説明は、後記する。 The word string recognition means 1310 outputs one or more recognized word strings for the input original data (learning input data) during model learning. Further, at the time of information extraction, a word lattice (a directed acyclic graph for compactly expressing a plurality of recognized word strings) is created for the input original data and output to the word certainty calculation means 1311. .
As an example of these voice recognition techniques, for example, Reference 1 is cited.
[Reference 1]
Lee Sung-nobu, Kawahara Tatsuya, Doshita Shuji, “Large Vocabulary Continuous Speech Recognition Parser Based on A * Search Using Grammar Category Pair Constraints”, IPSJ Journal, 1999, Vol.40, No.4, pp.1374 -1382
A specific explanation of the word lattice will be given later.

単語確信度計算手段１３１１は、情報抽出時に単語列認識手段１３１０から出力される情報（単語ラティス）を用いて、各認識単語の確信度（認識確信度）を計算する。具体的な計算手法としては、音声認識分野で公知の単語確信度計算技術が利用可能である。例えば、参考文献２に記載の方法では、「対象単語を含む音声認識仮説のスコアの和」と「単語グラフから得られる全音声認識仮説のスコア総和」の比として単語事後確率の計算を行う。
［参考文献２］
F.Wessel et al.、“Confidence measures for large vocabulary continuous speech recognition”、IEEE transaction on Speech and Audio Processing、2001、vol9、No.3、pp.288-298 The word certainty calculating means 1311 calculates the certainty (recognition certainty) of each recognized word using information (word lattice) output from the word string recognizing means 1310 at the time of information extraction. As a specific calculation method, a word certainty calculation technique known in the speech recognition field can be used. For example, in the method described in Reference 2, the word posterior probability is calculated as a ratio of “the sum of the scores of speech recognition hypotheses including the target word” and “the total score of all speech recognition hypotheses obtained from the word graph”.
[Reference 2]
F. Wessel et al., “Confidence measures for large vocabulary continuous speech recognition”, IEEE transaction on Speech and Audio Processing, 2001, vol9, No.3, pp.288-298

単語確信度計算手段１３１１は、認識確信度を算出し、確信度付き認識単語列を作成する。算出される認識確信度の認識確信度（正解／不正解のスコア）は連続値であり、値が大きければ単語の認識は正しい可能性が高く、値が小さければ誤りである可能性が高い。そこで、単語確信度計算手段１３１１は、認識確信度について、閾値以上ならば「正」、そうでなければ「誤」という値を付与した確信度情報付き認識単語列を作成する。
なお、単語確信度計算手段１３１１を、単語列認識手段１３１０に含ませることで、単語列の認識と同時に実行することもできる。また、単語列認識手段１３１０において、認識確信度に相当する尤度等のスコアを認識単語列に付与することで、それを入力として実行することもできる。 The word certainty calculating means 1311 calculates the recognition certainty and creates a recognized word string with certainty. The recognition certainty (correct answer / incorrect answer score) of the calculated recognition certainty is a continuous value. If the value is large, the recognition of the word is likely to be correct, and if the value is small, the possibility of an error is high. Therefore, the word certainty calculating means 1311 creates a recognized word string with certainty information to which a value of “correct” is given if the recognition certainty is equal to or greater than the threshold, and “false” otherwise.
In addition, by including the word certainty calculation means 1311 in the word string recognition means 1310, it can be executed simultaneously with the recognition of the word string. In addition, the word string recognition unit 1310 can also execute a score such as a likelihood corresponding to the recognition certainty degree as an input by giving the recognition word string a score.

学習データ処理部１３３は、入力された元データに対する認識単語列と、元データに対応する抽出対象情報（例えば、固有表現ラベル）が付与された参照単語列とを入力とし、識別モデルの学習用データを出力する。
ここで、参照単語列とは、元データに対応する正解単語列（人手による書き起こし結果等）に、各単語が抽出対象情報のどの種別に該当するかを示す識別情報に相当するラベル（固有表現ラベル）が付与されたものである。この参照単語列と、端末装置１００に入力される元データとが、学習元データベース（以下、「学習元ＤＢ」と適宜記載）に格納されており、学習元データベースが端末装置１００に入力されることで、モデル学習処理が開始される。 The learning data processing unit 133 receives a recognition word string for the input original data and a reference word string to which extraction target information (for example, a specific expression label) corresponding to the original data is input, for learning an identification model Output data.
Here, the reference word string is a label (unique) corresponding to identification information indicating which type of extraction target information each word corresponds to a correct word string (manual transcription result, etc.) corresponding to the original data. Expression label). This reference word string and the original data input to the terminal device 100 are stored in a learning source database (hereinafter referred to as “learning source DB” as appropriate), and the learning source database is input to the terminal device 100. As a result, the model learning process is started.

学習データ処理部１３３は、認識単語列中の認識誤りを判定する単語列対応付け手段１３３０、正誤情報付き認識単語列を用いて学習データを作成する認識単語列学習データ作成手段１３３１、参照単語列を用いて学習データを作成する参照単語列学習データ作成手段１３３２とを備える。 The learning data processing unit 133 includes a word string associating means 1330 for determining a recognition error in the recognized word string, a recognition word string learning data creating means 1331 for creating learning data using a recognition word string with correct / incorrect information, a reference word string And a reference word string learning data creating means 1332 for creating learning data using the.

単語列対応付け手段１３３０は、認識単語列と参照単語列（正解単語列）とを、公知の技術であるＤＰ（Dynamic Programming）マッチング等により対応付けし、各認識単語について、対応付けされた参照単語と同じであれば「正」、そうでなければ「誤」とする、認識単語の正誤情報を付与した正誤情報付き認識単語列を出力する。 The word string associating unit 1330 associates the recognized word string with the reference word string (correct word string) by DP (Dynamic Programming) matching, which is a known technique, and associates each recognized word with the associated reference. If it is the same as the word, “correct”, otherwise “false”, a recognition word string with correct / incorrect information to which correct / incorrect information of the recognized word is added is output.

認識単語列学習データ作成手段１３３１は、単語列対応付け手段１３３０で作成された正誤情報付き認識単語列に対して、識別モデルで用いる素性の抽出を行う。さらに、正誤情報付き認識単語列と、学習元データベースの参照単語列の固有表現ラベルとを比較し、対応させて、各認識単語がどの抽出対象情報に対応するかを示すラベル（固有表現ラベル）を付与し、認識単語列学習データ１５１０としてモデル学習用データベース１５１に格納する。 The recognition word string learning data creation unit 1331 extracts features used in the identification model for the recognition word string with correct / incorrect information created by the word string association unit 1330. Furthermore, the recognition word string with correct / incorrect information and the specific expression label of the reference word string in the learning source database are compared and matched to each other, and a label indicating which extraction target information corresponds to each recognition word (specific expression label) Is stored in the model learning database 151 as recognized word string learning data 1510.

認識確信度を除く素性の設計については、本実施形態では特に規定しないが、例としては非特許文献２に記載されている「単語」「品詞」「文字種」等が考えられる。 The feature design excluding the recognition certainty is not particularly defined in the present embodiment, but examples include “word”, “part of speech”, “character type”, and the like described in Non-Patent Document 2.

図２は、認識確信度素性を追加して格納された認識単語列学習データの例を示す図である。なお、ここで付与されるラベル（抽出対象情報）は、以下に基づく。
-------
BEGIN：固有表現の最初の単語に付与されるラベル（人名-BEGIN等）
MIDDLE：固有表現の中間の単語に付与されるラベル（人名-MIDDLE等）
END：固有表現の最後の単語に付与されるラベル（人名-END等）
SINGLE：１単語から成る固有表現に付与されるラベル（人名-SINGLE等）
-------
図２では、ラベルの情報から、抽出対象情報として「地名」「人工物名」を指定して作成されていることが分かる。 FIG. 2 is a diagram illustrating an example of recognized word string learning data stored with the recognition certainty feature added. Note that the label (extraction target information) given here is based on the following.
-------
BEGIN: Label given to the first word of the proper expression (person name-BEGIN, etc.)
MIDDLE: Labels given to intermediate words in proper expressions (person names-MIDDLE, etc.)
END: Label given to the last word of the proper expression (person name-END, etc.)
SINGLE: A label attached to a unique expression consisting of one word (person name-SINGLE, etc.)
-------
In FIG. 2, it can be seen from the label information that “place name” and “artifact name” are designated as extraction target information.

本実施形態において、端末装置１００は、認識単語が正しかった場合には、対応する参照単語に付与されているラベルをそのまま付与し、認識誤りであった場合には、その他のラベル（例えば、抽出すべき情報は含まれないことを示すラベル「対象外（OTHER）」等）を付与する。
抽出すべき情報が複数の単語に渡る場合には、その中に含まれる１単語でも認識誤りであれば、その中に含まれる他の単語についても同様に別種のラベル（対象外）を付与することも考えられる。例えば図２において、正しく認識された「京都」、「金閣寺」には、ラベルの情報としてそれぞれ「地名」、「人工物名」が付与されている。ここで、抽出すべき情報が「大宮」と「駅」の２語からなる「大宮駅」である場合に、「大宮駅」が誤認識によって「大宮行き」となっているので、「地名-BEGIN」の「大宮」は認識されている（認識確信度素性＝「正」）が、「地名-END」の「駅」が誤認識されている（認識確信度素性＝「誤」）ため、「地名-BEGIN」の「大宮」も含めて抽出対象外（つまり、ラベルの情報が「対象外」）となっている。 In the present embodiment, the terminal device 100 assigns the label attached to the corresponding reference word as it is when the recognized word is correct, and other labels (for example, extraction) when the recognition word is incorrect. A label “not applicable (OTHER)” indicating that information to be included is not included.
When the information to be extracted extends over a plurality of words, if even one word included in the information is a recognition error, another type of label (excluded) is similarly given to the other words included therein. It is also possible. For example, in FIG. 2, “Kyoto” and “Kinkakuji” that are correctly recognized are assigned “place name” and “artifact name” as label information, respectively. Here, if the information to be extracted is “Omiya Station” consisting of the two words “Omiya” and “Station”, “Omiya Station” is “bound to Omiya” due to misrecognition. BEGIN's "Omiya" is recognized (recognition certainty feature = "correct"), but "ge name-END""station" is misrecognized (recognition certainty feature = "false") Including “Omiya” of “Place name-BEGIN”, it is not subject to extraction (that is, the label information is “Not applicable”).

図１に戻り、参照単語列学習データ作成手段１３３２は、参照単語列において、識別モデルで用いる素性の抽出を行い、さらに認識確信度素性（すべての認識単語は正しいとして）を参照単語列に付与し、さらに固有表現ラベルを付与して、参照単語列学習データ１５１１としてモデル学習用ＤＢ１５１に格納する。
図３は、参照単語列学習データの例である。なお、内容は図２のものに対応している。 Returning to FIG. 1, the reference word string learning data creation unit 1332 extracts features used in the identification model in the reference word string, and further assigns recognition confidence features (assuming all recognition words are correct) to the reference word string. Further, a specific expression label is assigned and stored as reference word string learning data 1511 in the model learning DB 151.
FIG. 3 is an example of reference word string learning data. The contents correspond to those in FIG.

モデル作成部１３５はモデル作成手段１３５０を備え、このモデル作成手段１３５０は、識別モデルの学習用データ（符号１５１０，１５１１）を入力として、作成したモデル情報を出力する。
詳細には、モデル作成手段１３５０は、モデル学習用データベース１５１から学習データ（符号１５１０，１５１１）を読み込み、情報抽出時の入力データ中の単語列に対して、正しいラベル系列が付与できるように学習を行う。なお、識別モデルの種類およびその学習手法に関して、本実施形態では特に規定しないが、識別モデルとしてはＳＶＭ（サポートベクトルマシン）、最大エントロピーモデル、条件付き確率場（Conditional Random Fields）等が考えられる。なお、ＳＶＭを利用した技術としては、非特許文献２が挙げられる。 The model creation unit 135 includes a model creation unit 1350. The model creation unit 1350 receives the learning data (reference numerals 1510 and 1511) of the identification model and outputs the created model information.
Specifically, the model creation means 1350 reads the learning data (reference numerals 1510 and 1511) from the model learning database 151 and learns so that a correct label sequence can be assigned to the word string in the input data at the time of information extraction. I do. Although the type of the identification model and its learning method are not particularly defined in the present embodiment, SVM (support vector machine), maximum entropy model, conditional random field (Conditional Random Fields), and the like can be considered. Note that Non-Patent Document 2 can be cited as a technique using SVM.

情報抽出部１３７は、元データに対する認識単語列・確信度を入力し、学習されたモデルの情報を用いて、認識単語列から情報抽出を行う。
情報抽出部１３７は、モデル作成部１３５で作成されたモデル情報１５２０を利用して認識単語列にラベル付与を行うラベル付与手段１３７０、指定された抽出対象情報に対応する情報抽出を、認識単語列から行う情報抽出手段１３７１とを備える。 The information extraction unit 137 inputs a recognized word string and certainty factor for the original data, and extracts information from the recognized word string by using learned model information.
The information extraction unit 137 uses the model information 1520 created by the model creation unit 135 to label the recognition word string, and provides information extraction corresponding to the specified extraction target information. Information extraction means 1371 to be provided.

ラベル付与手段（抽出対象情報付与手段）１３７０は、単語確信度計算手段１３１１からの確信度情報付き認識単語列を入力として、各認識単語がどのラベルに該当するかを、モデル情報１５２０を利用して算出し、算出した結果に基づいて、最適なラベルを付与する。識別モデルの利用に関しては、ＳＶＭ、最大エントロピーモデル、条件付確率場等、モデルの種類に準じた技術を利用する。
図４は、情報抽出時の、元データの例として「アメリカの大統領官邸はホワイトハウス」を入力した場合に、ラベル付与手段１３７０によってラベル付与された結果の例を示した図である。 The label giving means (extraction target information giving means) 1370 receives the recognition word string with certainty information from the word certainty degree calculating means 1311 and uses the model information 1520 to identify which label each recognition word corresponds to. And an optimal label is assigned based on the calculated result. Regarding the use of the identification model, a technique according to the type of model such as SVM, maximum entropy model, conditional random field, or the like is used.
FIG. 4 is a diagram showing an example of a result of label assignment by the label attaching unit 1370 when “America's presidential palace is the White House” is input as an example of original data at the time of information extraction.

情報抽出手段１３７１は、ラベル付与手段１３７０で付与されたラベルを参照して、情報抽出を行う。例えば、図４に例示するラベル付与結果において、指定した抽出対象情報が“人工物名”であったとすると、単語「ホワイトハウス」を抽出する。 The information extraction unit 1371 performs information extraction with reference to the label provided by the label addition unit 1370. For example, in the labeling result illustrated in FIG. 4, if the specified extraction target information is “artifact name”, the word “white house” is extracted.

記憶手段１５０は、例えば、ＲＡＭ（Random Access Memory）と、ＨＤＤ（Hard Disk Drive）とを備える。この場合に、ＲＡＭは、制御手段１３０による演算処理等に利用されると共に、入出力手段１１０を介して取得した情報等を記憶し、ＨＤＤは、各種データベース、所定のプログラム、制御手段１３０の処理結果等を格納する。 The storage unit 150 includes, for example, a RAM (Random Access Memory) and an HDD (Hard Disk Drive). In this case, the RAM is used for arithmetic processing by the control unit 130 and stores information acquired via the input / output unit 110. The HDD stores various databases, predetermined programs, and processing of the control unit 130. Stores results etc.

また、記憶手段１５０は、前記した学習データ処理部１３３によって作成された、モデル学習用データベース（以下、「モデル学習用ＤＢ」と適宜記載）１５１と、モデル作成部１３５によって作成された、モデル情報データベース（以下、「モデル情報ＤＢ」と適宜記載）１５２とを備えている。 Further, the storage unit 150 includes a model learning database (hereinafter referred to as “model learning DB” as appropriate) 151 created by the learning data processing unit 133 and model information created by the model creating unit 135. A database (hereinafter referred to as “model information DB” as appropriate) 152 is provided.

モデル学習用ＤＢ１５１は、認識単語列学習データ１５１０と、参照単語列学習データ１５１１とを備える。認識単語列学習データ１５１０には、認識単語列から作成された学習データが格納されている。参照単語列学習データ１５１１には、参照単語列から作成された学習データが格納されている。 The model learning DB 151 includes recognition word string learning data 1510 and reference word string learning data 1511. The recognition word string learning data 1510 stores learning data created from the recognition word string. The reference word string learning data 1511 stores learning data created from the reference word string.

モデル情報ＤＢ１５２は、モデル情報１５２０として、クラス、素性、パラメータ等の情報が格納されている。 The model information DB 152 stores information such as classes, features, and parameters as model information 1520.

なお、前記した制御手段１３０が備える各手段１３１０〜１３７１は、ＣＰＵが記憶手段１５０のＨＤＤに格納された所定のプログラムをＲＡＭに展開して実行することにより実現されるものである。 Each of the means 1310 to 1371 included in the control means 130 is realized by the CPU developing and executing a predetermined program stored in the HDD of the storage means 150 on the RAM.

[情報抽出装置の動作]
図１に示した端末装置１００の動作について、図５、図１４を参照（適宜図１を参照）して説明する。図５は、図１に示した端末装置が行うモデル学習方法を示すフローチャートである。図１４は、図１に示した端末装置が行う情報抽出方法を示すフローチャートである。モデル学習、情報抽出それぞれの処理について、新聞記事を読み上げた音声データからの固有表現抽出を例として説明する。 [Operation of information extraction device]
The operation of the terminal device 100 shown in FIG. 1 will be described with reference to FIGS. 5 and 14 (refer to FIG. 1 as appropriate). FIG. 5 is a flowchart showing a model learning method performed by the terminal device shown in FIG. FIG. 14 is a flowchart illustrating an information extraction method performed by the terminal device illustrated in FIG. Each process of model learning and information extraction will be described by taking, as an example, extraction of specific expressions from speech data read from newspaper articles.

＜モデル学習＞
本実施形態で用いられる学習元データベース（学習元ＤＢ）には、形態素解析済みの新聞記事データ（参照単語列）、および当該新聞記事を読み上げた音声データ（元データ：１記事が１音声ファイルに対応）が含まれる。なお、学習元データベースは、請求項における学習用入力データに相当する。 <Model learning>
The learning source database (learning source DB) used in this embodiment includes newspaper article data (reference word string) that has been subjected to morphological analysis, and voice data that reads out the newspaper article (original data: one article into one voice file). Included). The learning source database corresponds to the learning input data in the claims.

参照単語列は、図６に例示するような形式で学習元ＤＢに格納されており、１列目は正解単語列に相当する単語の情報（表層十読み＋品詞情報）であり、２列目は抽出対象情報の例として固有表現ラベルの情報が示される。
固有表現ラベルとは、情報抽出時に用いるそれぞれの単語の特徴（属性）を示す情報であり、このラベルを参照することで、情報抽出時に単語の抽出が行われる。 The reference word string is stored in the learning source DB in the format illustrated in FIG. 6, and the first column is word information corresponding to the correct word string (surface layer ten reading + part of speech information). Indicates the information on the specific expression label as an example of the extraction target information.
The specific expression label is information indicating the characteristics (attributes) of each word used at the time of information extraction. By referring to this label, the word is extracted at the time of information extraction.

本実施形態では、固有表現ラベルの例として、人名，地名，組織名，人工物名，日付，時間，金額，割合の８種の固有表現ラベルが付与されている。なお、いずれにも属さない単語には「OTHER」のラベルが付与されている。また、複数単語に渡る固有表現を表すために、前記８種の固有表現ラベルに加え、以下の単語の位置に関する情報も含んだラベルを利用する（「OTHER」は除く）。
-------
BEGIN：固有表現の最初の単語に付与されるラベル（人名-BEGIN等）
MIDDLE：固有表現の中間の単語に付与されるラベル（人名-MIDDLE等）
END：固有表現の最後の単語に付与されるラベル（人名-END等）
SINGLE：１単語から成る固有表現に付与されるラベル（人名-SINGLE等）
-------
従って、本実施形態では８種類×４位置＋OTHER、の３３種類のラベルが存在する。 In the present embodiment, eight types of specific expression labels of person name, place name, organization name, artifact name, date, time, amount, and ratio are given as examples of the specific expression label. A word that does not belong to any of them is labeled “OTHER”. In addition, in order to represent a specific expression over a plurality of words, in addition to the eight types of specific expression labels, a label including information on the position of the following words is used (except “OTHER”).
-------
BEGIN: Label given to the first word of the proper expression (person name-BEGIN, etc.)
MIDDLE: Labels given to intermediate words in proper expressions (person names-MIDDLE, etc.)
END: Label given to the last word of the proper expression (person name-END, etc.)
SINGLE: A label attached to a unique expression consisting of one word (person name-SINGLE, etc.)
-------
Accordingly, in this embodiment, there are 33 types of labels of 8 types × 4 positions + OTHER.

端末装置１００は、データ入力手段１１１を介して学習用入力データを取得し、記憶手段１５０に学習元ＤＢとして格納する。そして、認識部１３１は、学習元ＤＢに含まれる元データ（音声データ）および参照単語列を取得する。 The terminal device 100 acquires learning input data via the data input unit 111 and stores it in the storage unit 150 as a learning source DB. Then, the recognition unit 131 acquires original data (voice data) and a reference word string included in the learning source DB.

単語列認識手段１３１０は、元データ（音声データ）を入力として、公知の技術である音声認識技術を用いて認識を行う（Ｓ１：単語列認識ステップ）。認識した結果は、図７のような認識単語列の形式で得られる。 The word string recognition unit 1310 receives the original data (voice data) as input and performs recognition using a known voice recognition technique (S1: word string recognition step). The recognized result is obtained in the form of a recognized word string as shown in FIG.

図７に例示した認識単語列において、符号７０１には認識１位候補の単語列が示され、符号７０２には単語列のスコア（「-30922.7」）と、その内訳（音響モデルのスコア（「AM=-35053.9」）、言語モデルのスコア（「LM=4131.28」））が示されている。 In the recognition word string illustrated in FIG. 7, the reference numeral 701 indicates the word string of the first recognition candidate, and the reference numeral 702 indicates the word string score (“-30922.7”) and its breakdown (acoustic model score (“ AM = -35053.9 ") and the language model score (" LM = 4131.28 ")).

続いて、単語列対応付け手段１３３０が、学習元ＤＢの参照単語列（正解単語列）、およびステップＳ１で得られた認識単語列の対応付けを行い、それに基づいて正誤情報付き認識単語列を作成する（Ｓ２：単語列対応付けステップ）。
具体的には、まず、単語列対応付け手段１３３０が、認識単語列と参照単語列（正解単語列）とを比較し、対応付けを行う。図８は、対応付け結果を示した図である。１列目が参照単語列（参照単語側）、２列目が認識単語列（認識単語側）であり、対応する単語がない部分（認識時に単語が挿入・脱落した箇所）には（null）が記されている。
そして、単語列対応付け手段１３３０は、図８に示される単語列対応付け結果の情報を用いて、参照単語側の単語と認識単語側の単語とが一致するものには「正」、そうでないもの（つまり、（null）又は不一致）には「誤」の情報を認識単語列に付与することで、図９に示す正誤情報付き認識単語列を作成する。 Subsequently, the word string associating unit 1330 associates the reference word string (correct word string) of the learning source DB with the recognition word string obtained in step S1, and based on this, the recognition word string with correct / incorrect information is obtained. Create (S2: word string association step).
Specifically, first, the word string associating unit 1330 compares the recognized word string with the reference word string (correct word string) and performs association. FIG. 8 is a diagram illustrating the association result. The first column is the reference word sequence (reference word side), the second column is the recognition word sequence (recognition word side), and there is no corresponding word (where the word was inserted / dropped during recognition) (null) Is marked.
The word string associating unit 1330 uses the information of the word string associating result shown in FIG. 8 and is “correct” when the word on the reference word side coincides with the word on the recognized word side. The recognition word string with correct / incorrect information shown in FIG. 9 is created by adding “false” information to the recognized word string for the thing (that is, (null) or mismatch).

続いて、認識単語列学習データ作成手段１３３１は、参照単語列に付与された抽出対象情報を示すラベル（固有表現ラベル）と、正誤情報付き認識単語列とを、公知技術であるＤＰマッチング等の方法を用いて対応させることで、図１０に示す認識単語列学習データを作成する（Ｓ３：認識単語列学習データ作成ステップ）。 Subsequently, the recognized word string learning data creating unit 1331 uses a label (specific expression label) indicating the extraction target information given to the reference word string and the recognized word string with correct / incorrect information, such as DP matching which is a known technique. The recognition word string learning data shown in FIG. 10 is created by using the method (S3: recognition word string learning data creation step).

本実施形態で利用する単語の素性の例として、以下の４つを用いる。
-------
単語表層素性
品詞素性
文字種素性（ひらがな、カタカナ、漢字一文字、漢字、等）
認識確信度素性（学習データでは対応付け結果から得られる認識単語の正誤情報）
------- The following four are used as examples of word features used in the present embodiment.
-------
Word surface features Part-of-speech features Character type features (Hiragana, Katakana, single kanji, kanji, etc.)
Recognition certainty feature (correction information of recognition word obtained from matching result in learning data)
-------

つまり、認識単語列学習データ作成手段１３３１は、対応の結果、これらの素性と、固有表現ラベルとを認識単語列に付与することで、認識単語列学習データを作成する。 That is, the recognized word string learning data creating unit 1331 creates recognized word string learning data by adding these features and the unique expression label to the recognized word string as a result of the correspondence.

図１０に示される認識単語列学習データにおいて、１列目は単語表層素性、２列目は品詞素性、３列目は文字種素性、４列目は認識確信度素性、５列目は固有表現ラベルである。 In the recognition word string learning data shown in FIG. 10, the first column is a word surface feature, the second column is a part of speech feature, the third column is a character type feature, the fourth column is a recognition certainty feature, and the fifth column is a specific expression label. It is.

本実施形態では、認識単語列において、固有表現が認識誤りを含む場合は、当該固有表現に対応する認識単語に対しては、すべて、固有表現でない（OTHER）ラベルを付与する。 In the present embodiment, when the unique expression includes a recognition error in the recognized word string, all of the recognized words corresponding to the specific expression are assigned labels other than the unique expression (OTHER).

つまり、例えば、入力データにおける「村山富市」という人名は、認識単語列において「村山」「氏」「に」「位置」「し」「使用」と図１０に示されているように、認識誤りを起こしている。この「村山富市」では、参照単語列（図６参照）の固有表現ラベルにおいて、「人名-BEGIN」から「人名-END」までが人名を示すものである。図１０で、「富市」が認識誤りを起こして別の単語に置き換わってしまっているため、「村山」は認識確信度素性で「正」となっているが、該当する「人名-BEGIN」の固有表現ラベルは付与されておらず、「OTHER」が付与されている。 That is, for example, the person name “Murayama Tomi City” in the input data is recognized as “Murayama” “Mr.” “Ni” “Position” “Sh” “Use” in the recognition word string as shown in FIG. It is causing an error. In this “Murayama Tomi City”, “person name-BEGIN” to “person name-END” indicate person names in the unique expression label of the reference word string (see FIG. 6). In FIG. 10, “Tomiichi” caused a recognition error and was replaced with another word, so “Murayama” is “positive” in the recognition confidence feature, but the corresponding “person name-BEGIN” The unique expression label is not attached, and “OTHER” is assigned.

認識単語列学習データ作成手段１３３１は、作成した認識単語列学習データを、記憶手段１５０のモデル学習用ＤＢ１５１に、認識単語列学習データ１５１０として格納する。
なお、図５に示すモデル学習用データベースの認識単語列学習データ内の情報と、図１０に示す認識単語列学習データの含む情報とは、認識単語列＝単語表層素性、素性＝（品詞素性、文字種素性）、正誤情報＝認識確信度素性、抽出対象情報＝固有表現ラベル、というように対応している。 The recognized word string learning data creating unit 1331 stores the created recognized word string learned data in the model learning DB 151 of the storage unit 150 as recognized word string learned data 1510.
Note that the information in the recognition word string learning data of the model learning database shown in FIG. 5 and the information included in the recognition word string learning data shown in FIG. 10 are: recognition word string = word surface layer feature, feature = (part of speech feature, (Character type feature), correct / incorrect information = recognition certainty feature, extraction target information = specific expression label, and so on.

参照単語列学習データ作成手段１３３２は、参照単語列に対して、認識単語列学習データ１５１０と同様の素性と、認識確信度素性を付与した参照単語列学習データを作成する（Ｓ４：参照単語列学習データ作成ステップ）。その際、参照単語列においてはすべての単語が正しい認識単語と考えられるため、すべての単語に対して認識確信度素性は「正」となる。図１１は、参照単語列学習データ作成手段１３３２が作成した参照単語列学習データの例を示す図である。 The reference word string learning data creating unit 1332 creates reference word string learning data to which the same features and recognition certainty factors as the recognition word string learning data 1510 are assigned to the reference word strings (S4: reference word string). Learning data creation step). At that time, since all the words are considered to be correct recognition words in the reference word string, the recognition certainty feature is “positive” for all the words. FIG. 11 is a diagram illustrating an example of reference word string learning data created by the reference word string learning data creation unit 1332.

参照単語列学習データ作成手段１３３２は、作成した参照単語列学習データを、記憶手段１５０のモデル学習用ＤＢ１５１に、参照単語列学習データ１５１１として格納する。
なお、図５に示すモデル学習用データベースの参照単語列学習データ内の情報と、図１１に示す参照単語列学習データの含む情報とは、参照単語列＝単語表層素性、素性＝（品詞素性、文字種素性）、正誤情報＝認識確信度素性、抽出対象情報＝固有表現ラベル、というように対応している。 The reference word string learning data creating unit 1332 stores the created reference word string learning data as reference word string learning data 1511 in the model learning DB 151 of the storage unit 150.
Note that the information in the reference word string learning data of the model learning database shown in FIG. 5 and the information included in the reference word string learning data shown in FIG. 11 are: reference word string = word surface layer feature, feature = (part of speech feature, (Character type feature), correct / incorrect information = recognition certainty feature, extraction target information = specific expression label, and so on.

続いて、モデル作成部１３５のモデル作成手段１３５０が、認識単語列学習データ１５１０および参照単語列学習データ１５１１を用いて、モデル情報を作成する（Ｓ５：モデル作成ステップ）。 Subsequently, the model creation unit 1350 of the model creation unit 135 creates model information using the recognized word string learning data 1510 and the reference word string learning data 1511 (S5: model creation step).

本実施形態では、固有表現抽出のための識別モデルとして、公知の技術であるＳＶＭを利用する。ＳＶＭの実装として、フリーソフトウェアとして公開されているTinySVMを、また、モデル作成手段１３５０を実現するために、フリーソフトウェアとして公開されているYamChaを利用する。 In the present embodiment, SVM, which is a well-known technique, is used as an identification model for extracting a specific expression. As implementation of SVM, TinySVM published as free software is used, and YamCha published as free software is used to realize the model creation means 1350.

ＳＶＭは２値分類器であるため、固有表現抽出のような、多クラスヘの分類問題にそのまま適用することはできない。そのため、各クラスに属するか否かを分類するＳＶＭをクラスごとに作成し、多クラスへの分類はそれらのＳＶＭすべての結果から、最も高いスコアを示したクラスを正解とする、one-against-all（あるいはone-versus-rest）と呼ばれる方法を用いた。YamChaは、学習データから自動的にone-against-allによる各ＳＶＭごとの学習データを作成し、TinySVMによってＳＶＭを学習し、最終的に（本実施形態では３３個の）ＳＶＭのパラメータをまとめて１つのファイルに、モデル情報１５２０として格納する。また、本実施形態では、ラベル付与対象の単語の素性（４種類）に加え、前後２単語ずつの素性（４種類×４単語＝１６種類）も同時に利用する。YamChaのオプションでは、「F:-2..2:0..」ならびに「MULTI＿CLASS=2」に相当する。 Since SVM is a binary classifier, it cannot be directly applied to a multi-class classification problem such as eigenexpression extraction. Therefore, an SVM that classifies whether or not it belongs to each class is created for each class, and the classification to multiple classes is based on the result of all those SVMs, and the class showing the highest score is the correct one-against- A method called all (or one-versus-rest) was used. YamCha automatically creates learning data for each SVM by one-against-all from the learning data, learns SVM by TinySVM, and finally summarizes the parameters of SVM (33 in this embodiment). The model information 1520 is stored in one file. In the present embodiment, in addition to the features (4 types) of the words to be labeled, the features of 2 words before and after (4 types × 4 words = 16 types) are simultaneously used. The YamCha option corresponds to “F: -2..2: 0 ..” and “MULTI_CLASS = 2”.

固有表現抽出のためのＳＶＭのカーネルは２次の多項式カーネルとし、ソフトマージンは「0.1」に設定する。TinySVMのオプションでは「-d 2 -C 0.1」に相当する。 The SVM kernel for extracting the specific expression is a second-order polynomial kernel, and the soft margin is set to “0.1”. It corresponds to “-d 2 -C 0.1” in the TinySVM option.

その結果、図１２に例示するように、モデル情報（クラス・素性記述部）がモデル情報１５２０として書き出される。
ここでは、まず、すべてのＳＶＭで共通に利用される情報が図１２のように書き出される。
ClassList:では、情報抽出のためにこのモデル情報によって付与されるラベルの一覧が出力されている。
続いて、素性番号の定義が出力されている。例えば、素性１８番は、自身の単語表層（「F」は素性であること、「+0」は単語の位置、「0」は学習データにおける列の位置を表す）が「ＡＰ通信」であることを示している。
例えば、図１１に示す参照単語列学習データにおいて、最初の単語（単語表層素性「村山」）に固有表現ラベルを付与するための素性を説明すると、その１つ後の単語（単語表層素性「富市」）の品詞素性「名詞-固有名詞」が図１２のモデル情報で表現される場合、「F:+1:1:名詞-固有名詞」と示される。同様に、２つ後の単語(単語表層素性「首相」)の文字種素性「漢字」がモデル情報で表現される場合、「F:+2:2:漢字」と示される。 As a result, model information (class / feature description portion) is written as model information 1520 as illustrated in FIG.
Here, first, information that is commonly used in all SVMs is written as shown in FIG.
In ClassList :, a list of labels given by the model information for information extraction is output.
Subsequently, feature number definitions are output. For example, in the feature No. 18, its own word surface layer (“F” is a feature, “+0” is a word position, “0” is a column position in learning data) is “AP communication”. It is shown that.
For example, in the reference word string learning data shown in FIG. 11, a feature for giving a specific expression label to the first word (word surface feature “Murayama”) will be described. The next word (word surface feature “rich” When the part-of-speech feature “noun-proper noun” of “city” is represented by the model information of FIG. 12, it is indicated as “F: +1: 1: noun-proper noun”. Similarly, when the character type feature “kanji” of the next word (word surface feature “prime”) is expressed by model information, “F: +2: 2: kanji” is indicated.

それに続いて、図１３に例示するように、各ＳＶＭで利用される、学習データ中の各クラスに属するか否かを判別するためのＳＶＭのパラメータ（サポートベクトルとその重み等）がモデル情報１５２０に追加して書き出される。
図１３において、符号１３０１は、ある単語が「人工物名-BEGIN」というラベルに分類されるか否かを判定するＳＶＭのパラメータの一部を示している。符号１３０２は、各行がサポートベクトルを表し、最初にサポートベクトルの重み、その後にそのサポートベクトルの持つ素性番号と素性値の一覧が出力されている。 Subsequently, as illustrated in FIG. 13, model information 1520 includes SVM parameters (support vectors and their weights, etc.) used in each SVM to determine whether they belong to each class in the learning data. Added to and exported.
In FIG. 13, reference numeral 1301 indicates a part of SVM parameters for determining whether a certain word is classified into the label “artifact name-BEGIN”. Reference numeral 1302 indicates that each row represents a support vector. First, the support vector weight is output, and then a list of feature numbers and feature values of the support vector is output.

以上の処理により、モデル情報１５２０が作成される。なお、新たな学習元データが入力された場合には、ステップＳ１〜Ｓ５の処理を行うことで、モデル情報ＤＢ１５２に格納されるモデル情報１５２０が更新される。具体的には、図１３に示すモデル情報のパラメータ部分が更新され、新しいラベルが追加された場合や、新しい単語が追加された場合には、図１２の情報も併せて更新されることとなる。
つまり、正誤情報およびラベルの情報を含むモデル情報を作成することができ、情報抽出時の精度が向上する。この実験結果については、後記する。 The model information 1520 is created by the above processing. When new learning source data is input, the model information 1520 stored in the model information DB 152 is updated by performing the processes of steps S1 to S5. Specifically, when the parameter portion of the model information shown in FIG. 13 is updated and a new label is added or a new word is added, the information of FIG. 12 is also updated. .
That is, model information including correct / incorrect information and label information can be created, and the accuracy in extracting information is improved. The results of this experiment will be described later.

＜情報抽出＞
ここでは、図１４を用いて情報抽出の処理を説明する。なお、端末装置１００には、図５のステップＳ１〜Ｓ５によって、予めモデル情報１５２０が格納されているものとする。なお、情報抽出のための元データ（入力データ）として「米国がアヘン戦争で香港を占領」を使用する。 <Information extraction>
Here, the information extraction process will be described with reference to FIG. It is assumed that the model information 1520 is stored in the terminal device 100 in advance in steps S1 to S5 in FIG. As the original data (input data) for extracting information, “the United States occupied Hong Kong in the Opium War” is used.

まず、単語列認識手段１３１０が、入出力手段１１０を介して取得した元データ（入力データ）を用いて、単語ラティスを作成する（Ｓ１１：単語列認識ステップ）。 First, word string recognition unit 1310, using the original data obtained through the input output means 110 (input data), to create a word lattice (S11: word sequence recognition step).

図１５は、単語列認識手段１３１０が作成した単語ラティスの例を示す図である。本実施形態における単語ラティスは、重みつき有限状態トランスデューサ（WFST：weighted finite state transducer）の形式をとっており、各行が状態遷移を表す表記となっている。１列目は遷移元状態番号、２列目は遷移先状態番号、３〜４列目はこの状態遷移に対応する音声中の時刻フレーム（音声ファイルのスタート時点が「0」、時刻「1」が20msに対応）、５列目は入力記号（ここではすべて「eps」：入力記号なし）、６列目は出力記号（認識単語）、７列目は状態遷移重み（スコア）である。 FIG. 15 is a diagram illustrating an example of a word lattice created by the word string recognition unit 1310. The word lattice in the present embodiment takes the form of a weighted finite state transducer (WFST), and each line is represented as a state transition. The first column is the transition source state number, the second column is the transition destination state number, the third to fourth columns are the time frames in the voice corresponding to this state transition (the start time of the audio file is “0”, the time “1”) Corresponds to 20 ms), the fifth column is an input symbol (here, “eps”: no input symbol), the sixth column is an output symbol (recognition word), and the seventh column is a state transition weight (score).

続いて、単語確信度計算手段１３１１が単語ラティスを用いて確信度計算を行い、確信度情報付き認識単語列を作成する（Ｓ１２：単語確信度計算ステップ）。
ここでの単語の確信度の計算方法として、本実施形態では、ＳＶＭを用いた確信度計算方法を利用するが、正解／不正解のラベルが付与できるものであれば、この方法に限るものではない。
また、単語確信度計算のための素性は、例として以下のものを用いる。
-------
単語表層
品詞番号
単語事後確率（10段階に分割：(1)0より大きく0.1以下，(2)0.1より大きく0.2以下，…，(10)0.9より大きく1.0以下）
------- Subsequently, the word certainty calculation means 1311 performs certainty calculation using the word lattice, and creates a recognized word string with certainty information (S12: word certainty calculation step).
In this embodiment, the certainty factor calculation method using SVM is used as a method for calculating the certainty factor of the word here. However, the method is not limited to this method as long as a correct / incorrect answer label can be assigned. Absent.
In addition, as features for calculating the word certainty factor, the following is used as an example.
-------
Word surface layer Part of speech number Word posterior probability (divided into 10 levels: (1) greater than 0 and less than 0.1, (2) greater than 0.1 and less than 0.2, ..., (10) greater than 0.9 and less than 1.0)
-------

本実施形態では、固有表現抽出のためのモデル作成時と同様、前後２単語についての前記素性も同時に利用する。そして、単語事後確率の計算は、前記した参考文献２に記載の、単語グラフを用いた事後確率計算方法を利用する。そして、単語事後確率素性等の情報を用いた単語確信度計算のＳＶＭの実装にも、YamChaとTinySVMを用い、各認識単語が正しい（CORRECT）か誤り（ERROR）かを判別するようなＳＶＭを、図１６に例示する単語確信度計算用ＳＶＭ学習データを用いて学習（作成）する。なお、単語事後確率および単語確信度の計算は、別の方法を用いてもよい。その場合、図１６に例示する単語確信度計算用ＳＶＭ学習データは不要としてもよい。 In the present embodiment, the features for the two words before and after are also used at the same time as when creating a model for extracting a specific expression. The calculation of the word posterior probability uses the posterior probability calculation method using the word graph described in Reference Document 2 described above. And, for the implementation of SVM for calculating word confidence using information such as word posterior probabilities, use SVM that uses YamCha and TinySVM to determine whether each recognized word is correct (CORRECT) or error (ERROR). Learning (creation) is performed using the word certainty factor calculation SVM learning data illustrated in FIG. Note that another method may be used to calculate the word posterior probability and the word certainty factor. In that case, the SVM learning data for word certainty calculation illustrated in FIG. 16 may be unnecessary.

ここで、図１６に例示する単語確信度計算用ＳＶＭ学習データを認識単語列学習データ作成手段１３３１が生成する処理の一例を説明する。
モデル学習時に、単語列認識手段１３１０が、元データを用いて単語ラティスを作成する。
そして、認識単語列学習データ作成手段１３３１が、単語列認識手段１３１０から受信した単語ラティスと、正誤情報付き認識単語列（図９参照）とを用いて、単語確信度計算用ＳＶＭ学習データ（図１６参照）を作成する。なお、単語確信度計算用ＳＶＭ学習データ（図１６参照）における「正解／不正解のラベル」は、正誤情報付き認識単語列（図９参照）における正誤情報に対応して付与される。 Here, an example of processing in which the recognized word string learning data creation unit 1331 generates the word certainty calculation SVM learning data illustrated in FIG. 16 will be described.
During model learning, the word string recognition unit 1310 creates a word lattice using the original data.
Then, the recognition word string learning data creation means 1331 uses the word lattice received from the word string recognition means 1310 and the recognition word string with correct / incorrect information (see FIG. 9) to calculate word confidence SVM learning data (see FIG. 9). 16). The “correct / incorrect label” in the SVM learning data for word certainty calculation (see FIG. 16) is given in correspondence with the correct / incorrect information in the recognized word string with correct / incorrect information (see FIG. 9).

単語確信度計算用ＳＶＭ学習データによる学習は、２値の分類問題であるので、one-against-allを利用する必要はなく、CORRECT（正解）か否かを判別する、１つのＳＶＭを学習すればよい。本実施形態では、単語確信度計算のためのＳＶＭのカーネルは２次の多項式カーネルとし、ソフトマージンは「0.01」に設定する。TinySVMのオプションでは、「-d 2 -C 0.01」に相当する。 Learning with SVM learning data for word certainty calculation is a binary classification problem, so there is no need to use one-against-all, and one SVM that determines whether or not CORRECT (correct answer) is learned. That's fine. In this embodiment, the SVM kernel for calculating word certainty is a second-order polynomial kernel, and the soft margin is set to “0.01”. In TinySVM options, this is equivalent to "-d 2 -C 0.01".

単語確信度計算用ＳＶＭ学習データを用いて学習されたＳＶＭのパラメータは、モデル学習時のモデル作成手段１３５０の処理と同様に、モデル情報ＤＢ１５２に格納される。 The SVM parameters learned using the word certainty calculation SVM learning data are stored in the model information DB 152 in the same manner as the processing of the model creation means 1350 during model learning.

なお、前記のとおり、単語確信度計算用ＳＶＭ学習データ（図１６参照）を生成し、モデル情報ＤＢ１５２に格納する処理は、ステップＳ１２でＳＶＭを用いたことによるために行ったものであり、他の方法を用いる場合は省略可能である。 As described above, the process of generating the SVM learning data for calculating word certainty (see FIG. 16) and storing it in the model information DB 152 is performed by using SVM in step S12. This method can be omitted.

情報抽出時に、単語列認識手段１３１０から単語ラティスが入力されたら、単語確信度計算手段１３１１は、単語確信度計算のための単語事後確率素性を付与した単語確信度計算用認識単語列データを作成する。図１７は、単語確信度計算用認識単語列データの例を示す図である。この単語確信度計算用認識単語列データ（「あれ」は「アヘン」の誤認識）をYamChaに入力すると、前記単語確信度計算用ＳＶＭ学習データからモデル情報が作成され、格納されたＳＶＭのパラメータを用いて、図１８の単語確信度計算結果に示す出力が得られる。図１７のデータから追加された列は、CORRECTであると判別されるスコアと、ERRORであると判別されるスコア（「CORRECT」の場合のスコアの符号反転）であり、抽出対象情報の確信度に相当する。続いて、本実施形態では、ＳＶＭの出力スコアｓを、式（２）に示すシグモイド関数を用い、スコアを０から１の範囲の値に正規化したスコアｃ（ｓ）を、認識単語の確信度として用いる。

When a word lattice is input from the word string recognition unit 1310 at the time of information extraction, the word certainty factor calculation unit 1311 creates recognition word string data for word certainty factor calculation with a word a posteriori probability feature for the word certainty factor calculation. To do. FIG. 17 is a diagram illustrating an example of recognition word string data for word certainty calculation. When this recognition word string data for word certainty calculation (“that” is misrecognition of “opium”) is input to YamCha, model information is created from the SVM learning data for word certainty calculation and stored SVM parameters. Is used to obtain the output shown in the word certainty calculation result of FIG. The columns added from the data in FIG. 17 are the score determined to be CORRECT and the score determined to be ERROR (the sign inversion of the score in the case of “CORRECT”), and the certainty of the extraction target information It corresponds to. Subsequently, in this embodiment, the score c (s) obtained by normalizing the output score s of the SVM to a value in the range of 0 to 1 using the sigmoid function shown in Expression (2) is used as the confidence of the recognized word. Use as a degree.

図１８における「正解／不正解のスコア」の「CORRECT」のスコアを、式（２）を用いて正規化した値を、図１９に示す。 FIG. 19 shows values obtained by normalizing the score of “CORRECT” of “Score of correct / incorrect answer” in FIG. 18 using Expression (2).

確信度素性の決定で利用する閾値に、本実施形態では「0.4」を用いて、正規化した値を判別し、閾値以上であれば「正」、そうでない場合は「誤」という情報を付与して、認識確信度素性として追加する。その結果を確信度情報付き認識単語列として図２０に示す。以上の手続により、単語確信度計算手段１３１１によって、単語ラティスからなる認識単語列に、認識確信度を含む素性が付与されたデータが作成される。
以上の処理によって、入力データの認識単語列に、正誤情報が付与される。 In the present embodiment, “0.4” is used as the threshold value used in determining the certainty factor feature, and the normalized value is determined. If the threshold value is equal to or greater than the threshold value, information “correct” is given and information “error” is given otherwise. And added as a recognition certainty feature. The result is shown in FIG. 20 as a recognized word string with certainty information. Through the above procedure, the word certainty factor calculation unit 1311 creates data in which a feature including the recognition certainty factor is added to the recognized word string composed of the word lattice.
Through the above processing, correct / incorrect information is added to the recognized word string of the input data.

続いて、ラベル付与手段１３７０は、単語確信度計算手段１３１１から入力された確信度情報付き認識単語列（図２０参照）に対し、ラベル付与を行う（Ｓ１３：抽出対象情報付与ステップ）。
具体的には、単語確信度計算手段１３１１から、ラベル付与手段１３７０に、確信度情報付き認識単語列（図２０参照）が入力される。そして、ラベル付与手段１３７０は、モデル作成手段１３５０で作成した（図５のステップＳ５）モデル情報１５２０を利用して、入力された確信度情報付き認識単語列に対し、例えばＳＶＭを用いてラベル付与を行う。その結果は、図２１のように出力される。図２０と比較して追加された部分は、ラベルとそのラベルのスコア（各ＳＶＭの出力スコア）である。なお、すべての出力ラベルについてのスコアが出力されるが、図２１の例では、３位候補以降の出力ラベルについては省略して示してある。 Subsequently, the label providing unit 1370 performs labeling on the recognized word string with certainty factor information (see FIG. 20) input from the word certainty factor calculating unit 1311 (S13: extraction target information adding step).
Specifically, a recognition word string with certainty factor information (see FIG. 20) is input from the word certainty factor calculating unit 1311 to the label providing unit 1370. Then, the label assigning means 1370 uses the model information 1520 created by the model creating means 1350 (step S5 in FIG. 5) to label the input recognition word string with certainty information using, for example, SVM. I do. The result is output as shown in FIG. A part added in comparison with FIG. 20 is a label and a score of the label (output score of each SVM). Although the scores for all the output labels are output, in the example of FIG. 21, the output labels after the third candidate are omitted.

図２１の出力をそのままラベル付与結果として利用すると、ラベル列として不整合が起こりうる（「米国」であれば、「地名-BEGIN」の後に「人工物名-BEGIN」が来る、等）ため、以下のようにして、ラベルの連接の制約を満たす最適なラベル系列を得る。 If the output of FIG. 21 is used as a labeling result as it is, an inconsistency may occur as a label string (in the case of “USA”, “artificial name-BEGIN” comes after “place name-BEGIN”, etc.). In the following manner, an optimum label sequence satisfying the label concatenation constraint is obtained.

まず、シグモイド関数（式（２）参照）により、ラベルのスコアｓを０から１の範囲の値に正規化する。その後、公知のViterbiアルゴリズム等により、正規化されたスコアｃ（ｓ）の和が最大となるようなラベル系列を選択する（非特許文献２参照）。この手続きによって、前記のYamChaの出力を補正し、図２２に例示するような結果を得る。 First, the label score s is normalized to a value in the range of 0 to 1 using a sigmoid function (see equation (2)). Thereafter, a label sequence that maximizes the sum of normalized scores c (s) is selected by a known Viterbi algorithm or the like (see Non-Patent Document 2). By this procedure, the output of the above-mentioned YamCha is corrected, and the result as illustrated in FIG. 22 is obtained.

続いて、情報抽出手段１３７１は、ラベル付与手段１３７０の出力から、固有表現に相当する部分（「BEGIN」から「END」まで、又は「SINGLE」）を抽出する（Ｓ１４：情報抽出ステップ）。抽出した情報を、適宜データ出力手段１１５を介して出力する。
例えば、図２２に例示したデータの場合は、固有表現ラベルが付与されている
-------
米国地名
香港地名
-------
が出力される。 Subsequently, the information extraction unit 1371 extracts a portion (“BEGIN” to “END” or “SINGLE”) corresponding to the unique expression from the output of the label assigning unit 1370 (S14: information extraction step). The extracted information is output via the data output means 115 as appropriate.
For example, in the case of the data illustrated in FIG. 22, a specific expression label is given.
-------
United States Place name Hong Kong Place name
-------
Is output.

本実施形態によれば、識別モデルにおいて、認識単語列学習データ１５１０および参照単語列学習データ１５１１それぞれに、各認識単語が正しいか否かを表す２値の素性（認識確信度素性）を正誤情報として含ませる構成としたことで、精度の高いモデル情報を作成し、そのモデル情報を用いた情報抽出時に、正誤情報を用いて情報抽出を実施することができる。それにより、認識誤りの影響を軽減させることができる。つまり、元データで誤認識された単語素性「あれ」は、出力ラベル（OTHER）によって固有表現として抽出されることはない。
また、正しく認識された単語だけでなく、誤認識された単語についても、単語の持つ様々な素性情報を利用することができる。 According to the present embodiment, in the identification model, each of the recognized word string learning data 1510 and the reference word string learning data 1511 has a binary feature (recognition certainty feature) indicating whether each recognized word is correct or not. Therefore, it is possible to create highly accurate model information and extract information using correct / incorrect information at the time of information extraction using the model information. Thereby, the influence of recognition errors can be reduced. That is, the word feature “that” misrecognized in the original data is not extracted as a specific expression by the output label (OTHER).
In addition, not only correctly recognized words but also misrecognized words, various feature information of the words can be used.

なお、端末装置１００は、前記したステップＳ１〜Ｓ５を一般的なコンピュータに実行させる、モデル学習プログラム実行することでも実現できる。また、端末装置１００は、前記したステップＳ１１〜Ｓ１４を一般的なコンピュータに実行させる、情報抽出プログラム実行することでも実現できる。これらのプログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 The terminal device 100 can also be realized by executing a model learning program that causes a general computer to execute the above-described steps S1 to S5. The terminal device 100 can also be realized by executing an information extraction program that causes a general computer to execute the above-described steps S11 to S14. These programs can be distributed via a communication line, or can be written on a recording medium such as a CD-ROM for distribution.

［実験］
本実施形態に示した方法による実験の方法および結果を以下に示す。 [Experiment]
The method and results of the experiment by the method shown in this embodiment are shown below.

＜実験条件＞
・元データ：新聞記事の読み上げ音声（1,174記事，10,718文，読み上げ者106名（1名あたり約100文））
・参照単語列：前記新聞記事の文書データ（括弧等の読まない記号は削除し、数字はすべて漢数字に置換）
・抽出対象情報：参照単語列中の固有表現（人工物名，組織名，地名，人名，日付，時間表，金額，割合）計約19,000 <Experimental conditions>
・ Original data: Speech readings of newspaper articles (1,174 articles, 10,718 sentences, 106 readers (about 100 sentences per person))
-Reference word string: Document data of the newspaper article (letters that cannot be read such as parentheses are deleted, and all numbers are replaced with Chinese numerals)
-Information to be extracted: Specific expressions (artifact names, organization names, place names, person names, dates, timetables, monetary values, ratios) in the reference word string, total approximately 19,000

参照単語列の文書データは、公知の日本語形態素解析器「茶筌」によって形態素解析を行い、単語区切りと品詞情報を与えたものである。
単語文字種としては、“漢字一文字”（漢字一文字）、“漢字”（一文字以上の漢字のみ）、“ひらがな”（ひらがなのみ）、“カタカナ”（カタカナのみ）、“数字”（数字のみ）、“大文字一文字”（英字大文字一文字）、“大文字”（英字大文字のみ）、“大文字開始”（英字で最初の一文字のみが大文字）、“英字”（英字のみのその他の単語）、“その他”（その他の単語）を用いた。 The document data of the reference word string is obtained by performing morphological analysis using a well-known Japanese morphological analyzer “tea bowl” and giving word break and part of speech information.
The word character types are “one Kanji character” (one Kanji character), “Kanji” (only one or more Kanji characters), “Hiragana” (Hiragana only), “Katakana” (only Katakana), “numbers” (numbers only), “ "Uppercase one letter" (English uppercase one letter), "Uppercase" (English uppercase only), "Uppercase start" (Only the first letter is uppercase), "English" (other words only in English letters), "Other" (Other ).

認識単語列作成のために利用した音声認識器は、別の読み上げ音声で作成した音響モデルと、別の新聞記事データから作成した単語トライグラムモデルを利用した。音声認識時の言語モデル重みは「15」に設定した。
この条件での10,718文の読み上げ音声に対する単語認識精度は79.45％であり、音声認識結果の中には82.0％の固有表現が残っていた。 The speech recognizer used for creating the recognition word string used an acoustic model created with another reading speech and a word trigram model created from another newspaper article data. The language model weight for speech recognition was set to “15”.
Under this condition, the word recognition accuracy for reading speech of 10,718 sentences was 79.45%, and 82.0% specific expressions remained in the speech recognition result.

＜評価指標＞
評価の指標には、当該分野で一般的に用いられている、固有表現のＦ値を用いた。Ｆ値とは適合率と再現率の調和平均であり、適合率と再現率はそれぞれ以下の式（３）、式（４）のように表される。

<Evaluation index>
As an evaluation index, an F value of proper expression, which is generally used in the field, was used. The F value is a harmonic average of the precision and the recall, and the precision and the recall are represented by the following formulas (3) and (4), respectively.

抽出された固有表現は、音声認識が正しく行われて単語の誤りがなく、単語の過不足がなく、また、固有表現ラベル（人名、地名、等）が正しかった場合を正解とし、そうでない場合は誤りとした。 The extracted proper expression is correct when speech recognition is correctly performed, there are no word errors, there are no words in excess or shortage, and the proper expression label (person name, place name, etc.) is correct. Was an error.

＜比較した手法＞
（１）確信度情報なし：確信度の情報を用いず、テキストデータ（参照単語列）のみから固有表現抽出を行う方法（非特許文献２に相当）。音声認識結果をテキストとして利用。
（２）比較例（Ａ）：確信度の情報を利用。学習データにおいてテキストデータ（参照単語列）を用いない。
（３）比較例（Ｂ）：確信度の情報を利用。学習・テスキトデータにおいて、認識確信度素性が「誤」であるとき、単語素性・品詞素性・文字種素性を用いない（「誤認識した単語」という情報を用いた）。
（４）本実施形態：確信度の情報を利用。本実施形態の全機能を使用。
（５）本実施形態上限値：認識単語の正誤が誤りなく得られたと仮定した場合の本実施形態の方法。本実施形態で得られる性能の上限。 <Compared method>
(1) No certainty information: A method of extracting a specific expression from only text data (reference word string) without using certainty information (corresponding to Non-Patent Document 2). Use speech recognition results as text.
(2) Comparative example (A): Utilization of confidence information. Text data (reference word string) is not used in learning data.
(3) Comparative Example (B): Utilization of certainty information. In the learning / tesquite data, when the recognition certainty feature is “false”, the word feature / part-of-speech feature / character type feature is not used (the information “misrecognized word” is used).
(4) This embodiment: Uses certainty factor information. All functions of this embodiment are used.
(5) Upper limit value of the present embodiment: The method according to the present embodiment when it is assumed that the correctness of the recognized word is obtained without error. Upper limit of performance obtained in this embodiment.

＜実験結果＞
実験結果を図２３に示す。
確信度情報を利用しない場合（確信度情報なし）と比較して、本実施形態はＦ値で2.0％ほど高い性能を示している。この性能向上は、適合率の7.4％の向上（再現率は1.9％低下）によるものである。 <Experimental result>
The experimental results are shown in FIG.
Compared with the case where the certainty factor information is not used (there is no certainty factor information), the present embodiment shows a performance as high as 2.0% in terms of the F value. This performance improvement is due to a 7.4% improvement in precision (reproduction rate is reduced by 1.9%).

比較例（Ａ）は、本実施形態には劣るものの、確信度情報を利用することによってＦ値を0.7％向上できている。比較例（Ａ）が本実施形態に劣る理由として、比較例（Ａ）においてはテキストデータ（参照単語列）をモデルの学習に利用していないので、固有表現に相当する部分が、音声認識の誤りによって失われているような箇所において学習データが不足しているためと考えられる。つまり、認識確信度を識別モデルに導入することによる本実施形態の効果は、参照単語列のデータも誤りのない認識単語列のデータとしてモデルの学習に用いる（また、そのために認識確信度を連続値ではなく２値化する）という枠組みを導入することによって、さらなる改善を得られることを確認した。 Although the comparative example (A) is inferior to the present embodiment, the F value can be improved by 0.7% by using the certainty factor information. As a reason why the comparative example (A) is inferior to the present embodiment, the text data (reference word string) is not used for learning the model in the comparative example (A). This is thought to be due to the lack of learning data in places that are lost due to errors. That is, the effect of this embodiment by introducing the recognition certainty factor into the identification model is that the reference word string data is also used for learning the model as the recognition word string data with no error (and the recognition certainty factor is continuously used for that purpose). It was confirmed that further improvement could be obtained by introducing a framework of binarization instead of value.

比較例（Ｂ）は、ほぼ本実施形態に近いが、本実施形態と比較して若干（Ｆ値で0.3％、適合率で0.6％）劣る。その理由として、本実施形態では、比較例（Ｂ）よりも多くの情報（誤認識した単語に関する素性）を利用しており、こうした素性を利用することが本実施形態の性能改善に貢献していることが認められた。こうした素性は、従来生成モデルを用いた固有表現抽出手法では用いることが難しかったため、本実施形態を用いることによって初めてその効果を得られるようになったものである。 The comparative example (B) is almost similar to the present embodiment, but is slightly inferior to the present embodiment (0.3% in F value and 0.6% in conformity). The reason for this is that this embodiment uses more information (features related to misrecognized words) than the comparative example (B), and the use of such features contributes to the performance improvement of this embodiment. It was recognized that Such a feature has been difficult to use in a conventional method for extracting a specific expression using a generated model. Therefore, the effect can be obtained only by using this embodiment.

本実施形態では認識確信度の情報を利用することで、認識が誤っている可能性の高い単語を考慮して固有表現抽出を行うことができる。そのため、誤認識の単語を固有表現に含めて抽出してしまうことを防ぐことができ、それが適合率の向上につながったものと考えられる。また、認識確信度の正確性の向上により、情報抽出分野においてさらなる性能向上が期待できることを、「本実施形態上限値」の結果が示唆している。 In the present embodiment, by using the recognition certainty information, it is possible to perform specific expression extraction in consideration of words that are likely to be erroneously recognized. For this reason, it is possible to prevent misrecognized words from being extracted by including them in the proper expression, which is considered to have led to an improvement in the precision. Further, the result of the “upper limit value of the present embodiment” suggests that further improvement in performance can be expected in the information extraction field by improving the accuracy of recognition confidence.

以上説明したように、本実施形態によれば、識別モデルを用いて、認識誤りを正誤の２値の素性情報とすることで、参照単語列と認識単語列との比較を行い、識別モデルの学習を行うことで、情報抽出時の精度を高めることが可能となる。また、誤認識した単語に関する素性も、利用可能である。 As described above, according to the present embodiment, by using the identification model, the recognition error is converted into correct and incorrect binary feature information, and the reference word string is compared with the recognized word string. By performing learning, it is possible to improve accuracy during information extraction. In addition, features related to misrecognized words can also be used.

以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、その趣旨を変えない範囲で実施することができる。例えば、本実施形態において、単語の素性やラベルの種類、内容については、自由に設定可能である。
また、利用した公知技術による手法についても、これに限るものではなく、他の方法を用いて実施しても何ら問題ない。
また、適用できる技術分野においても、情報検索システム、自然言語処理システム、音声処理システム以外にも、様々な分野で適用可能である。 As mentioned above, although embodiment of this invention was described, this invention is not limited to this, It can implement in the range which does not change the meaning. For example, in this embodiment, word features, label types, and contents can be freely set.
In addition, the technique using the known technique is not limited to this, and there is no problem even if it is carried out using another method.
Also, in the applicable technical field, the present invention can be applied in various fields other than the information search system, the natural language processing system, and the speech processing system.

本発明に係る端末装置（モデル学習装置、情報抽出装置）の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the terminal device (model learning apparatus, information extraction apparatus) which concerns on this invention. 認識単語列学習データの例を示す図である。It is a figure which shows the example of recognition word sequence learning data. 参照単語列学習データの例を示す図である。It is a figure which shows the example of reference word sequence learning data. ラベル付与された結果を示す図である。It is a figure which shows the result to which the label was provided. 図１に示した端末装置が行うモデル学習方法を示すフローチャートである。It is a flowchart which shows the model learning method which the terminal device shown in FIG. 1 performs. 学習元データベースに格納される参照単語列の例を示す図である。It is a figure which shows the example of the reference word string stored in the learning origin database. 元データから作成される認識単語列の例を示す図である。It is a figure which shows the example of the recognition word sequence produced from original data. 単語列対応付け手段が対応付けを行った結果を示す図である。It is a figure which shows the result as which word string matching means matched. 正誤情報付き認識単語列の例を示す図である。It is a figure which shows the example of the recognition word sequence with correct / incorrect information. 認識単語列学習データの例を示す図である。It is a figure which shows the example of recognition word sequence learning data. 参照単語列学習データの例を示す図である。It is a figure which shows the example of reference word sequence learning data. モデル情報（クラス・素性記述部）の例を示す図である。It is a figure which shows the example of model information (class and feature description part). モデル情報（パラメータ記述部）の例を示す図である。It is a figure which shows the example of model information (parameter description part). 図１に示した端末装置が行う情報抽出方法を示すフローチャートである。It is a flowchart which shows the information extraction method which the terminal device shown in FIG. 1 performs. 情報抽出時における入力データから作成される単語ラティスを示す図である。It is a figure which shows the word lattice created from the input data at the time of information extraction. 単語確信度計算用ＳＶＭ学習データの例を示す図である。It is a figure which shows the example of SVM learning data for word reliability calculation. 情報抽出時における単語確信度計算用認識単語列データの例を示す図である。It is a figure which shows the example of the recognition word sequence data for word certainty calculation at the time of information extraction. 情報抽出時における単語確信度計算結果の例を示す図である。It is a figure which shows the example of the word reliability calculation result at the time of information extraction. 図１８におけるスコアを正規化したデータを示す図である。It is a figure which shows the data which normalized the score in FIG. 確信度情報付き認識単語列の例を示す図である。It is a figure which shows the example of the recognition word sequence with certainty factor information. ラベル付与結果の例を示す図である。It is a figure which shows the example of a label provision result. 補正済みラベル付与結果を示す図である。It is a figure which shows the corrected label provision result. 実験結果を示す図である。It is a figure which shows an experimental result.

Explanation of symbols

１００端末装置（モデル学習装置、情報抽出装置）
１１０入出力手段
１１１データ入力手段
１１５データ出力手段
１３０制御手段
１３１認識部
１３３学習データ処理部
１３５モデル作成部
１３７情報抽出部
１５０記憶手段
１５１モデル学習用データベース
１５２モデル情報データベース
１３１０単語列認識手段
１３１１単語確信度計算手段
１３３０単語列対応付け手段
１３３１認識単語列学習データ作成手段
１３３２参照単語列学習データ作成手段
１３５０モデル作成手段
１３７０ラベル付与手段（抽出情報付与手段）
１３７１情報抽出手段
１５１０認識単語列学習データ
１５１１参照単語列学習データ
１５２０モデル情報 100 terminal device (model learning device, information extraction device)
DESCRIPTION OF SYMBOLS 110 Input / output means 111 Data input means 115 Data output means 130 Control means 131 Recognition part 133 Learning data processing part 135 Model creation part 137 Information extraction part 150 Storage means 151 Model learning database 152 Model information database 1310 Word string recognition means 1311 Word Certainty factor calculation means 1330 Word string association means 1331 Recognition word string learning data creation means 1332 Reference word string learning data creation means 1350 Model creation means 1370 Label assignment means (extraction information addition means)
1371 Information extraction means 1510 Recognition word string learning data 1511 Reference word string learning data 1520 Model information

Claims

A model learning method for learning model information for providing a label indicating a word type for each word included in input speech or character data ,
The model learning data, which is speech or character data, is a recognition word string that is a word string recognized by speech recognition or character recognition, and is a correct recognition result corresponding to the model learning data. When a word string to which a correct label indicating the type of the word is assigned to each word is used as a reference word string,
A model learning device for learning the model information,
Recognize certainty information by comparing each word in the recognized word string with each word in the reference word string and indicating that the recognition result is correct for the word in the recognized word string that matches the reference word string The recognition word with correct / incorrect information having the recognition certainty is given to the word in the recognition word string that does not match the reference word string as the recognition certainty by giving information indicating that the recognition result is incorrect. A word sequence matching step for generating a sequence;
The recognized word string with correct / incorrect information is compared with the reference word string, and each word in the recognized word string with correct / incorrect information that matches the reference word string is given to the reference word string that matches the word A recognition word string learning data creation step for creating recognition word string learning data having a recognition certainty factor and a label by giving a label ;
Reference word string learning for creating reference word string learning data having a recognition certainty factor and a label by giving information indicating that a recognition result is correct to each word in the reference word string as the recognition certainty factor A data creation step;
Using the recognition word string learning data and the reference word string learning data as input, using a support vector machine, a maximum entropy model, or a conditional random field , at least the recognition confidence as a feature, the input speech or character And a model creation step of learning the model information for assigning a label indicating the optimum word type to each word included in the data and storing the model information in a storage means.

An information extraction method for extracting, as extraction target information, a word provided with a label indicating a predetermined type from speech or character input data,
The model learning data, which is speech or character data, is a recognition word string that is a word string recognized by speech recognition or character recognition, and is a correct recognition result corresponding to the model learning data. When a word string to which a correct label indicating the type of the word is assigned to each word is used as a reference word string,
A model learning device that learns model information for assigning a label indicating the type of a word to each word included in the input speech or character data,
Recognize certainty information by comparing each word in the recognized word string with each word in the reference word string and indicating that the recognition result is correct for the word in the recognized word string that matches the reference word string The recognition word with correct / incorrect information having the recognition certainty is given to the word in the recognition word string that does not match the reference word string as the recognition certainty by giving information indicating that the recognition result is incorrect. A word sequence matching step for generating a sequence;
The recognized word string with correct / incorrect information is compared with the reference word string, and each word in the recognized word string with correct / incorrect information that matches the reference word string is given to the reference word string that matches the word A recognition word string learning data creation step for creating recognition word string learning data having a recognition certainty factor and a label by giving a label;
Reference word string learning for creating reference word string learning data having a recognition certainty factor and a label by giving information indicating that a recognition result is correct to each word in the reference word string as the recognition certainty factor A data creation step;
Using the recognition word string learning data and the reference word string learning data as inputs, using a support vector machine, a maximum entropy model, or a conditional random field, at least the recognition confidence as a feature, and input data of the speech or characters A model creation step of learning the model information for assigning a label indicating an optimum word type for each word included in the storage, and storing the model information in a storage unit;
A word string recognition step of recognizing the input data of the voice or characters into a word string by voice recognition or character recognition, and creating a word lattice representing a plurality of recognized word string candidates in a graph expression ;
For each word included in the word lattice, a score expressing the correctness of recognition of the word as a continuous value is calculated, and information indicating that the recognition is correct if the score is equal to or greater than a predetermined threshold value as the recognition certainty The recognition word string with certainty information for the word lattice having the recognition certainty is created by adding to each word information indicating that the recognition is incorrect otherwise as the certainty of recognition. A word confidence calculation step;
Using the model information created in the modeling step, For each word in confidence information with recognized word sequence for the word lattice, and labeling step of applying a label,
Wherein the confidence factor information with the recognized word sequence for the word lattice, the information extraction method characterized by including the information extracting step to extract a word label corresponding to said predetermined type is assigned as the extraction target information.

A model learning device that learns model information for assigning a label indicating a word type to each word included in input speech or character data ,
Word string recognition means for creating a recognition word string that is a string of words recognized by voice recognition or character recognition from data for model learning that is voice or character data ;
Each word in the reference word string that is a reference word string is a word string in which a correct label indicating the type of the word is assigned to each word in the word string that is a correct recognition result corresponding to the data for model learning Is compared with each word in the recognized word string, and information indicating that the recognition result is correct is given to the word in the recognized word string that matches the reference word string as a recognition certainty, and the reference word string A word string association that generates a recognition word string with correct / incorrect information having a recognition certainty by giving information indicating that the recognition result is incorrect to a word in the recognized word string that does not match with the recognition certainty Means,
The recognized word string with correct / incorrect information is compared with the reference word string, and each word in the recognized word string with correct / incorrect information that matches the reference word string is given to the reference word string that matches the word A recognition word string learning data creating means for creating recognition word string learning data having a recognition certainty factor and a label by giving a label ;
Reference word string learning for creating reference word string learning data having a recognition certainty factor and a label by giving information indicating that a recognition result is correct to each word in the reference word string as the recognition certainty factor Data creation means;
Using the recognition word string learning data and the reference word string learning data as input, using a support vector machine, a maximum entropy model, or a conditional random field , at least the recognition confidence as a feature, the input speech or character A model learning apparatus comprising: a model creating unit that learns the model information for assigning a label indicating an optimum word type to each word included in the data and stores the model information in a storage unit.

An information extraction device that extracts, as extraction target information, a word to which a label indicating a predetermined type is given from voice or character input data,
Creates a recognition word sequence that is a sequence of words recognized by speech recognition or character recognition from the data for model learning that is speech or character data during model learning, and the input data of the speech or characters during information extraction A word string recognition means for recognizing a word string by recognition or character recognition and creating a word lattice representing a plurality of recognized word string candidates in a graph expression ;
In a reference word string that is a reference word string, a word string in which a correct label indicating the type of the word is assigned to each word in the word string that is a correct recognition result corresponding to the model learning data at the time of model learning Each word of the recognition word string and each word of the recognition word string, the word in the recognition word string that matches the reference word string, the information indicating that the recognition result is correct is given as a recognition certainty, A word that generates a recognition word string with correct / incorrect information having a recognition certainty by giving information indicating that the recognition result is incorrect to a word in the recognized word string that does not match the reference word string as a recognition certainty. Column association means;
The recognition word string with correct / incorrect information is compared with the reference word string during model learning, and each reference word string in the recognition word string with correct / incorrect information that matches the reference word string is added to the reference word string that matches the word. A recognition word string learning data creating means for creating recognition word string learning data having a recognition certainty factor and a label by giving the given label;
Reference for creating reference word string learning data having a recognition certainty factor and a label by giving information indicating that a recognition result is correct to each word in the reference word string at the time of model learning as the recognition certainty factor Word string learning data creation means;
The recognition word string learning data and the reference word string learning data are input at the time of model learning, and the input is performed using at least the recognition certainty as a feature using a support vector machine, a maximum entropy model, or a conditional random field. Model creation means for learning model information for assigning a label indicating an optimum word type for each word included in speech or character data, and storing it in a storage means;
For each word included in the word lattice at the time of information extraction, a score expressing the correctness of recognition of the word as a continuous value is calculated, and information indicating that the recognition is correct is recognized if the score is equal to or greater than a predetermined threshold. Recognized word string with certainty information for the word lattice provided with the certainty of recognition by giving information indicating that the recognition is incorrect otherwise to each word as the certainty of recognition. A word certainty calculation means for creating
Using the model information created by said model creating means, For each word in confidence information with recognized word sequence for the word lattice during information extraction, and the label applying means for applying the labels,
From confidence information-recognized word sequence for the word lattice during information extraction, information, comprising an information extracting means to extract a word label corresponding to said predetermined type is assigned as the extraction target information Extraction device.

Model learning program for the model learning process according to claim 1, causes the computer to execute.

An information extraction program for causing a computer to execute the information extraction method according to claim 2 .

A computer-readable recording medium in which the model learning program according to claim 5 or the information extraction program according to claim 6 is recorded.