JP2001318947A

JP2001318947A - Information integration system, its method and recording medium having information integration program recorded thereon

Info

Publication number: JP2001318947A
Application number: JP2000139268A
Authority: JP
Inventors: Mitsunori Matsushita; 光範松下; Tsuneaki Kato; 恒昭加藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-05-12
Filing date: 2000-05-12
Publication date: 2001-11-16

Abstract

PROBLEM TO BE SOLVED: To make it possible to acquire variable information also from an information source of low reliability by providing the degree of confidence of information itself simultaneously with the possibility of the information. SOLUTION: Information development parts 11 to 13 extend the information of information sources by referring to respective tables 14 to 16 describing the degree of the relation between the pieces of information and an information integration part 18 integrates the extended information on the basis of a similarity function of all the information sources, finds out the possibility value of information from the similarity obtained by using similarity function and finds out the reliability of information as a confidence degree value from the scatter of the similarity. An expression generation part 19 selects an expression from a possibility expression table on the basis of the possibility value of information, selects the expression from a confidence degree expression table on the basis of the confidence degree value and generates a natural sentence by the combination of the selected results. An order determination part 20 selects one of the information by applying a selection function on the basis of the possibility value and confidence degree value of each information and determines the order of the arrangement of all the information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、可能性値と確信度
値を付与して複数の情報を統合するシステムおよびその
統合方法、ならびにそのプログラムを記録した記録媒体
に関し、特にユーザにとり信頼性の低い情報源から価値
のある情報を取得でき、かつ確信度と可能性に基づいて
ユーザにとって分り易い自然言語表現を得ることができ
る情報統合システムと統合方法と記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for integrating a plurality of pieces of information by assigning a possibility value and a certainty factor, and a method for integrating the same, and a recording medium on which the program is recorded. The present invention relates to an information integration system, an integration method, and a recording medium that can acquire valuable information from a low information source and obtain a natural language expression that is easy for a user to understand based on certainty and possibility.

【０００２】[0002]

【従来の技術】近年、インターネットで代表されるネッ
トワークインフラストラクチャの普及により、大量でか
つ多様なデータをユーザが容易に獲得することができる
ようになってきた。しかし、その反面、インターネット
のように情報発信の容易さを持つという性質上、情報の
欠落や誤りを含む信頼性の低い情報源も氾濫するように
なった。このように信頼性の低い情報源から信頼性の高
い情報を抽出するためには、一般的に複数の情報源から
情報を得ることが行われており、様々な情報提示システ
ムが提案されている。従来の情報提示システムにおいて
用いられる方法としては、概略以下の４種類に分類する
ことができる。（１）複数の情報源からの情報をそのま
まユーザに提示することにより、ユーザに判断を委ねる
方法、（２）情報源の間で多数決を行い、最も多くの情
報源で採用されている値をユーザに提示する方法、
（３）情報源の平均値をユーザに提示する方法、（４）
最も信頼できる情報源の値を参照する方法、である。2. Description of the Related Art In recent years, with the spread of a network infrastructure represented by the Internet, it has become possible for a user to easily acquire a large amount of various data. However, on the other hand, due to the ease of information transmission, such as the Internet, unreliable information sources, including missing or incorrect information, have been flooded. In order to extract highly reliable information from such unreliable information sources, information is generally obtained from a plurality of information sources, and various information presentation systems have been proposed. . Methods used in conventional information presentation systems can be roughly classified into the following four types. (1) A method in which information from a plurality of information sources is presented to the user as it is to leave the decision to the user. (2) A majority decision is made between the information sources, and the value adopted by the most information sources is determined. How to present it to the user,
(3) a method of presenting the average value of the information source to the user, (4)
A method of referring to the value of the most reliable information source.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記
（１）の方法では、ユーザが複数の情報について自分で
判断しなくてはならないため、利便性の点で問題があっ
た。また、上記（２）の方法では、例えば、情報源が３
つあり、その各々が異なる情報であった場合には、異な
る情報同士で多数決はとることができず、値を選択する
ことができない、という問題があった。また、上記
（３）の方法では、数値属性の場合にのみ適用できる方
法であって、例えば、天候のように名義属性の情報で
は、その平均値を単純に得ることができない。なお、一
次元の軸上にそれらの名義属性情報を配置して、値を付
与することにより擬似的に数値属性化して統合すること
も可能ではあるが、次のような問題が依然として残る。
例えば、天気予報を例にとると、晴、曇、雨の名義情報
を一次元上に各々晴＝１．０、曇＝０．５、雨＝０．０
として配置した場合、情報源Ａが晴、情報源Ｂが雨であ
ったとき、その平均値は０．５となり、ユーザには曇と
いう名義情報が提示されることになる。また、情報源
Ａ，Ｂが共に曇であった場合にも、その平均値は０．５
になり、ユーザには曇という名義情報が提示される。い
ずれも曇という予報がユーザに与えられるが、直感的に
は情報の信頼性が異なるにも係わらず、ユーザはそのこ
とを知るすべがない、ということになる。特に、前者の
場合には、得られた結果にも疑問が残る。さらに、前者
の場合には、雨の可能性が低いながら存在するにも係わ
らず、統合によってその情報が失われてしまうため、例
えば『雨は降るのだろうか？』というユーザの興味に適
切に答えることができない、という問題点があった。However, the method (1) has a problem in terms of convenience because the user has to judge by himself or herself a plurality of pieces of information. In the method (2), for example, if the information source is 3
There is a problem that if each of the information is different, a majority decision cannot be made between the different information, and a value cannot be selected. Further, the method (3) is a method applicable only to the case of the numerical attribute. For example, with the information of the nominal attribute such as the weather, the average value cannot be simply obtained. Although it is possible to arrange the nominal attribute information on a one-dimensional axis and assign a value to the pseudo-numerical attribute to integrate them, the following problem still remains.
For example, taking the weather forecast as an example, the nominal information of fine, cloudy, and rain is one-dimensionally fine = 1.0, cloudy = 0.5, rain = 0.0.
When the information source A is fine and the information source B is rain, the average value is 0.5, and the user is presented with nominal information of cloudy. Also, when both the information sources A and B are cloudy, the average value is 0.5.
, And the user is presented with nominal information of cloudy. In any case, the user is given a forecast of cloudiness, but intuitively, despite the fact that the reliability of the information is different, the user has no way of knowing that. In particular, in the former case, the obtained result remains questionable. Furthermore, in the former case, although the possibility of rain is low but existing, the information is lost due to the integration. For example, "Does it rain? However, there is a problem that the user's interest cannot be properly answered.

【０００４】また、上記（４）の方法では、情報源の信
頼性を事前に評価しておく必要があること、また信頼性
の高い情報源であったとしても、ネットワーク不調や情
報源のサーバメンテナンスなどによる情報の欠如は本質
的に避けられない。この点については、複数の情報源を
参照する方法に比べて劣っている。また、上記（１）と
（４）の方法は、情報の信頼性を評価する指標を持たな
いため、ユーザはその情報がどの程度信頼できるかを自
分で判断しなくてはならないという問題もある。さら
に、上記（２）や（３）の方法のように、情報を統合す
る方法では、ファジイ推論や確率推論の手法を用いて情
報に可能性値を付与することはできるが、導出された信
頼度がどの程度信頼できるかについては分からない。例
えば、天気予報の場合、複数の情報が統合された結果、
『雨の可能性が０．７である』という予測が統合情報と
して得られた場合、統合に用いた情報源が多いときと少
ないときとでは、得られた結果自体の信頼性は自ずと異
なるにも係わらず、ユーザにはそれを知るすべもない。In the method (4), it is necessary to evaluate the reliability of the information source in advance, and even if the information source is highly reliable, a network malfunction or a server of the information source is required. Lack of information due to maintenance and the like is essentially unavoidable. This is inferior to the method of referring to multiple information sources. In addition, since the methods (1) and (4) do not have an index for evaluating the reliability of information, there is also a problem that the user has to judge how much the information is reliable. . Further, in a method of integrating information, such as the above methods (2) and (3), it is possible to assign a possibility value to information using a fuzzy inference method or a probability inference method. I don't know how reliable the degree is. For example, in the case of a weather forecast, as a result of integrating multiple pieces of information,
If the prediction that “the probability of rain is 0.7” is obtained as integrated information, the reliability of the obtained result itself differs depending on whether the information sources used for integration are large or small. Nevertheless, the user has no way of knowing it.

【０００５】上記問題を解決するために、上界確率と下
界確率の２値で表現する方法として、例えば論文『推論
の信頼性を考慮した不確実な知識の表現法と推論法につ
いて』（鈴木誠、松嶋敏泰、平沢茂一：情報処理学会論
文誌、Ｖｏｌ．35、No.5,pp.691〜705,1994)を応用する
方法があるが、この方法を上記の例に当て嵌めると『雨
の可能性が0.5から0.8の範囲にある』というような表現
をすることになる。この場合には、可能性の値が区間で
表現されているため、統合結果の信頼性をその区間値の
幅として知ることができるが、ユーザが求めているの
は、どのような情報が得られるのか、そして、その情報
は信頼できるのか、ということであって、区間値からユ
ーザの求めているこれらの情報への解釈が依然としてユ
ーザ自身に委ねられるという点で、十分な解決策とはな
っていない。さらに、同じ尺度の複数の数値(この場合
には二つの可能性値)が情報に付与されると、ユーザの
直感的理解を損ねることにもなってしまう。In order to solve the above problem, as a method of expressing the upper and lower bound probabilities in two values, for example, a dissertation “Uncertain knowledge expression method and inference method considering reliability of inference” (Suzuki) Makoto, Toshiyasu Matsushima, Shigeichi Hirasawa: There is a method of applying IPSJ Transactions, Vol. 35, No. 5, pp. 691 to 705, 1994). When this method is applied to the above example, The likelihood of rain is in the range of 0.5 to 0.8. " In this case, since the value of the possibility is represented by a section, the reliability of the integration result can be known as the width of the section value, but what information the user seeks is what information is obtained. And that the information is reliable, is a sufficient solution in that the interpretation of the interval values into the information desired by the user is still left to the user himself. Not. Furthermore, if a plurality of numerical values of the same scale (in this case, two possible values) are added to the information, the user's intuitive understanding may be impaired.

【０００６】そこで、本発明の目的は、これら従来の課
題を解決し、情報の可能性を提示するだけでなく、統合
された情報がどの程度信頼できるかを提示でき、ユーザ
にとって信頼性の低い情報源からでも価値のある情報を
獲得でき、また確信度と可能性に基づいてユーザにとっ
て分かり易い自然言語表現を得ることができる情報統合
システムおよび情報統合方法を提供することである。Accordingly, an object of the present invention is to solve these conventional problems, not only to present the possibility of information, but also to show how reliable the integrated information is, and to provide the user with low reliability. It is an object of the present invention to provide an information integration system and an information integration method capable of acquiring valuable information even from an information source and obtaining a natural language expression that is easy for a user to understand based on a certainty factor and a possibility.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の情報統合システムでは、複数の情報源の
情報を比較して妥当な可能性を付与すると同時に、その
統合した情報自体の確信度を付与するシステムであっ
て、情報源の情報を情報間の関連度を記述したテーブル
を参照して関連情報に展開し、展開した情報を全ての情
報源について類似度関数に基づいて統合し、その類似度
関数により得られる類似度から情報の可能性値を求める
と同時に、その類似度の散らばりから情報の信頼度値と
して求めるようにしている。また、上記のシステム
により得られる情報の可能性値と確信度値に基づいて、
ユーザに理解し易い自然文を生成する方法であって、該
情報の可能性値に基づいて可能性表現テーブルから表現
を選択し、確信度値に基づいて確信度表現テーブルから
表現を選択し、組み合わせて自然文を生成するようにし
ている。さらに、上記のシステムにおいて得られる
複数情報の可能性値と確信度値に基づいてユーザに提示
する情報を決定する方法であって、各情報の可能性値と
確信度値に基づいて選択関数を適用して情報を１つ選択
するものである。In order to achieve the above object, the information integration system of the present invention compares information from a plurality of information sources to give a reasonable possibility, and at the same time, gives a certainty to the integrated information itself. A system for assigning degrees of information, expands information of information sources into related information with reference to a table describing the degree of relevance between information, and integrates the expanded information for all information sources based on a similarity function. And the likelihood value of the information is obtained from the similarity obtained by the similarity function, and the reliability value of the information is obtained from the dispersion of the similarity. Also, based on the likelihood value and confidence value of the information obtained by the above system,
A method of generating a natural sentence that is easy for a user to understand, wherein the method selects an expression from a possibility expression table based on the possibility value of the information, and selects an expression from the confidence expression table based on the confidence value. Natural sentences are generated in combination. Furthermore, a method for determining information to be presented to a user based on the likelihood value and certainty value of a plurality of pieces of information obtained in the above system, wherein a selection function is determined based on the likelihood value and certainty value of each piece of information. This is to select one piece of information by applying.

【０００８】本発明においては、上記のシステムによ
り、複数の情報源の情報を比較して妥当な可能性を付与
すると同時に、その統合した情報自体の確信度を付与す
ることができるので、統合された情報がどの程度信頼で
きるかを直感的に理解できることから、分かり易いとい
う利点がある。また、上記の方法により、該情報の可
能性値に基づいて可能性表現テーブルから表現を選択
し、確信度値に基づいて確信度表現テーブルから表現を
選択し、組み合わせて自然文を生成するようにしている
ため、得られる情報の可能性値と確信度値に基づいてユ
ーザに理解し易い自然文を生成することができるという
利点がある。さらに、上記の方法により、得られる複
数情報の可能性値と確信度値に基づいてユーザに提示す
る情報を決定することができるという利点がある。[0008] In the present invention, the above-described system can compare information from a plurality of information sources to give a reasonable possibility, and at the same time, give the certainty of the integrated information itself. This is an advantage that the user can intuitively understand how reliable the information is. Further, according to the above method, an expression is selected from the possibility expression table based on the possibility value of the information, an expression is selected from the confidence expression table based on the confidence value, and a natural sentence is generated by combining the expressions. Therefore, there is an advantage that a natural sentence that can be easily understood by the user can be generated based on the possibility value and the certainty value of the obtained information. Further, the method described above has an advantage that information to be presented to the user can be determined based on the possibility value and the certainty factor value of the plurality of pieces of information obtained.

【０００９】[0009]

【発明の実施の形態】以下、本発明の実施例を、図面に
より詳細に説明する。図１は、本発明の一実施例を示す
情報統合システムのブロック構成図である。本発明の情
報統合システムは、複数の情報源からの情報を統合して
可能性値と確信度値を伴う複数の情報を出力すると共
に、それらの情報に対する自然文表現を生成する。図１
において、ある情報ソース（入力）から得られた情報を
関連度に基づいて可能性値を付与して展開する情報展開
部１１〜１３と、情報源の情報を情報間の関連度を記述
したテーブル１４〜１６と、展開された複数の情報源か
らの情報を統合し、確信度を付与する情報統合部１８
と、付与された可能性値と確信度値を基に自然言語表現
を生成する表現生成部１９と、該可能性値および確信度
値を基に順位を決定する順序決定部２０とから構成され
ている。ここで、もし自然言語表現をする必要がなけれ
ば、表現生成部１９の処理を行う必要はない。また、順
序付けする必要がなければ、順序決定部２０の処理を行
う必要はない。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram of an information integration system according to an embodiment of the present invention. The information integration system of the present invention integrates information from a plurality of information sources, outputs a plurality of information with a possibility value and a certainty factor, and generates a natural sentence expression for the information. FIG.
An information expansion unit 11 to 13 for assigning a possibility value to information obtained from a certain information source (input) based on the degree of relevance, and a table describing information of the information source in relation to the information 14 to 16 and an information integration unit 18 that integrates information from a plurality of developed information sources and gives a certainty factor
And an expression generation unit 19 that generates a natural language expression based on the assigned possibility value and certainty value, and an order determination unit 20 that determines the order based on the possibility value and the certainty value. ing. Here, if there is no need to perform natural language expression, there is no need to perform the processing of the expression generation unit 19. If the ordering is not required, there is no need to perform the processing of the order determining unit 20.

【００１０】図２は、図１における情報展開部の処理フ
ローチャートである。なお、図６は情報関連テーブルの
データ構成例を示し、図７は展開情報テーブルのデータ
構成例を示している。具体例として、天気予報情報の統
合を例にとって説明する。図１の各入力には、個々の情
報源からの情報が入力される。天気予報の例において、
情報源とは天気情報を提供するＷＥＢサイトなどを指し
ている。また、情報とは、『晴』や『曇』などの天気予
報を指している。本発明では、各情報源から入力される
情報を受け付け（ステップ１０１）、各々の情報源に対
応する情報展開部１１〜１３に送られる。情報展開部１
１〜１３は、図６に示すような情報関連テーブル１０を
参照し（ステップ１０２）、入力された情報を展開す
る。なお、このテーブル１０は、図６に示すように、観
測値と各情報源の差分から導出もしくは統合した結果
と、各情報源の差分からの導出もしくは人手による作成
などの様々な方法で作成することができる。ここでは、
その作成方法については言及しない。FIG. 2 is a processing flowchart of the information expanding unit in FIG. FIG. 6 shows an example of the data configuration of the information-related table, and FIG. 7 shows an example of the data configuration of the development information table. As a specific example, the integration of weather forecast information will be described as an example. Each input in FIG. 1 receives information from an individual information source. In the weather forecast example,
The information source indicates a WEB site that provides weather information. The information indicates a weather forecast such as "sunny" or "cloudy". In the present invention, information input from each information source is received (step 101), and is sent to the information expanding units 11 to 13 corresponding to each information source. Information development unit 1
1 to 13 refer to the information related table 10 as shown in FIG. 6 (step 102) and expand the input information. As shown in FIG. 6, the table 10 is created by various methods such as a result derived or integrated from a difference between an observed value and each information source and a result derived from the difference between each information source or manually created. be able to. here,
It does not mention how to make it.

【００１１】情報展開部１１〜１３は、入力が与えられ
たならば、図６に示す情報関連テーブル１０を参照し
て、入力と一致する入力天気を持つ行を取り出し（ステ
ップ１０３）、全ての一致した行を出力して（ステップ
１０４）、情報統合部１８に渡して処理を終了する。例
えば、情報源Ａから情報展開部１１〜１３に与えられた
入力が『晴』であった場合には、図６の情報関連テーブ
ル１０を参照して、入力と一致する関連情報テーブルの
入力天気の行を選択することにより、図７に示すような
展開情報テーブルが得られることになる。図７に示すよ
うに展開された情報は、全て情報統合部１８に送出され
る。全ての情報源について、同様の処理が行われる。た
だし、常に全ての情報源に情報が存在する必要はない。When an input is given, the information developing units 11 to 13 refer to the information-related table 10 shown in FIG. 6 to fetch a row having an input weather that matches the input (step 103). The matching line is output (step 104) and passed to the information integration unit 18 to end the processing. For example, when the input given from the information source A to the information developing units 11 to 13 is “fine”, the input weather of the related information table that matches the input is referred to by referring to the information related table 10 of FIG. By selecting the row, a development information table as shown in FIG. 7 is obtained. All the information developed as shown in FIG. 7 is sent to the information integration unit 18. Similar processing is performed for all information sources. However, information need not always be present in all sources.

【００１２】図３は、図１における情報統合部の処理フ
ローチャートである。なお、図８は信頼係数テーブルの
データ構成例、図９および図１０は情報源ＢおよびＤか
らの各展開情報テーブルのデータ構成例、図１１は各情
報源の展開情報を合成したテーブルのデータ構成例、図
１２は図１１の内容を補完したテーブルのデータ構成
例、図１３は情報源の信頼係数のテーブルデータ構成
例、図１４は統合テーブルのデータ構成例である。情報
統合部１８は、各情報源の情報展開部１１〜１３から展
開情報テーブルを受け取り、同じ情報ごとに統合し、各
々の情報に可能性値と確信度値を付与する処理を行う。
ここで、可能性値とは、その情報が真である可能性を示
す値であり、確信度とは、その可能性値がどれだけ信頼
の置けるものであるのかを示す値である。すなわち、情
報統合部１８は、まず展開情報テーブルを読み込み（ス
テップ２０１）、情報が存在する情報源と存在しない情
報源を識別し、存在しない情報源を計数する（ステップ
２０２）。情報が存在する情報源（これを統合対象情報
源と呼ぶ）の数に基づいて、図８に示すような信頼係数
テーブル１０ｃを参照して、統合情報の信頼係数を決定
する（ステップ２０３）。例えば、情報源がＡ，Ｂ，
Ｃ，Ｄの４つあり、情報源Ｃから情報が獲得できなかっ
た場合、統合対象情報源は３であるため、信頼係数は図
８の信頼係数テーブル１０ｃから信頼係数０．９と決定
される。FIG. 3 is a processing flowchart of the information integration unit in FIG. 8 is a data configuration example of a reliability coefficient table, FIGS. 9 and 10 are data configuration examples of each development information table from information sources B and D, and FIG. 11 is a data of a table in which development information of each information source is combined. FIG. 12 shows an example of a data configuration of a table supplementing the contents of FIG. 11, FIG. 13 shows an example of a table data configuration of a reliability coefficient of an information source, and FIG. 14 shows an example of a data configuration of an integrated table. The information integration unit 18 receives the expansion information tables from the information expansion units 11 to 13 of each information source, integrates the same information, and performs a process of assigning a possibility value and a certainty factor to each information.
Here, the likelihood value is a value indicating the possibility that the information is true, and the certainty factor is a value indicating how reliable the possibility value is. That is, the information integration unit 18 first reads the development information table (step 201), identifies the information source where information exists and the information source that does not exist, and counts the information sources that do not exist (step 202). Based on the number of information sources in which information exists (this is referred to as an integration target information source), the reliability coefficient of the integrated information is determined with reference to the reliability coefficient table 10c as shown in FIG. 8 (step 203). For example, if the information sources are A, B,
When there are four information sources C and D, and information cannot be obtained from the information source C, the integration target information source is 3, and the reliability coefficient is determined to be 0.9 from the reliability coefficient table 10c in FIG. .

【００１３】次に、統合対象情報源から受け取った展開
情報テーブルを参照し、含まれている情報の和集合を生
成する（ステップ２０４）。そして、この生成された情
報の和集合と統合対象情報源からなる２次元テーブルを
作成する（ステップ２０５）。例えば、情報源がＡ，
Ｂ，Ｃ，Ｄと４つあり、情報源Ｃから情報が獲得できな
かった場合に、情報源Ａからの展開情報テーブルが図７
の展開情報テーブルであり、情報源Ｂからの展開情報テ
ーブルが図９に示す展開情報テーブル、情報源Ｄからの
展開情報テーブルが図１０に示す展開情報テーブルとす
る。次に、これらのテーブルから図１１に示す合成テー
ブルが作成される。そして、各統合対象情報源の値を上
記合成テーブルのセルに入力する（ステップ２０６）。
次に、図１１の合成された展開情報テーブルに対して、
値のないセルに０．０を補完する（ステップ２０７）。
この処理により、図１２に示す補完テーブルが得られ
る。次に、上記処理により補完された図１２のテーブル
に基づいて情報の統合を開始し（ステップ２０８）、確
信度と可能性を決定する。なお、ステップ２０８とステ
ップ２１３とは同じ処理の開始ブロックと終端ブロック
を意味しており、時間的にこの間継続して情報を統合す
る処理を行っていることを表わしている。次に、各可能
性値に対して確信度を計算する（ステップ２０９）。Next, a union of the included information is generated with reference to the development information table received from the integration target information source (step 204). Then, a two-dimensional table including the union of the generated information and the information source to be integrated is created (step 205). For example, if the information source is A,
B, C, and D. When information cannot be obtained from the information source C, the development information table from the information source A is shown in FIG.
The expansion information table from the information source B is the expansion information table shown in FIG. 9 and the expansion information table from the information source D is the expansion information table shown in FIG. Next, a synthesis table shown in FIG. 11 is created from these tables. Then, the values of the information sources to be integrated are input into the cells of the synthesis table (step 206).
Next, with respect to the combined development information table of FIG.
A valueless cell is supplemented with 0.0 (step 207).
By this processing, the complement table shown in FIG. 12 is obtained. Next, integration of information is started based on the table of FIG. 12 complemented by the above processing (step 208), and the certainty factor and the possibility are determined. Steps 208 and 213 mean the start block and the end block of the same process, and indicate that the process of integrating information is continuously performed during this period. Next, a certainty factor is calculated for each possibility value (step 209).

【００１４】情報の統合は、次のようにして行う。すな
わち、統合対象の情報源集合をＳ（＝｛ｓ₁，・・・
ｓ_j｝）とする。ある情報源ｓ_j（∈Ｓ）において、ある
気象事象ｗが起こる可能性はＰｏｓｓ_w（ｓ_j）で与えら
れるものとする。気象事象ｗの起こる可能性がＰｏｓｓ
_w（ｓ_j）である確信度ＣＦ（Ｐｏｓｓ_w（ｓ_j））は、下
式（数３）により求められる。The integration of information is performed as follows. That is, the information source set to be integrated is S (= ｛s ₁ ,...)
s _j ｝). At a certain information source s _j (∈S), the possibility that a certain weather event w occurs is given by Pos _w (s _j ). Possibility of weather event w
_w (s _j) is a confidence CF (Poss _w (s _j)) is calculated by the following equation (Equation 3).

【数３】ここで、関数ｓｉｍ（）は、二つの可能性の一致度を
示す関数である。ｓｉｍ（）は、様々な関数が適用可
能である。例えば、下式（数４）の関数を使用すること
ができる。この関数は、類似度として各情報源から得ら
れる可能性の類似を意味し、sim(ΔPoss ｗ）が大きけ
れば類似度は高く、sim(ΔPoss ｗ）が小さければ類似
度は低いと考えられる。(Equation 3) Here, the function sim () is a function indicating the degree of coincidence between two possibilities. Various functions can be applied to sim (). For example, the function of the following equation (Equation 4) can be used. This function indicates the similarity of the possibility obtained from each information source as the similarity. It is considered that the similarity is high if sim (ΔPoss w) is large, and the similarity is low if sim (ΔPoss w) is small.

【数４】また、α（ｓ）は情報源ｓの信頼係数を意味する重みで
あって、〔０，１〕の値をとる。つまり、α（ｓ）が１
の時に最も信頼度が高く、０の時に最も信頼度が低い。(Equation 4) Α (s) is a weight indicating the reliability coefficient of the information source s, and takes a value of [0, 1]. That is, α (s) is 1
Is the highest reliability, and 0 is the lowest reliability.

【００１５】次に、図１３に示すような情報源の信頼係
数テーブルを参照して、α（ｓ）を決定する。α（ｓ）
は静的である必要はなく、更新頻度や最新更新日時、情
報源の母数などに応じて変化しても差し支えない。統合
した際の情報ｗのＰｏｓｓ値（可能性値）は、下式（数
５）を満たすものを採用する。Next, α (s) is determined with reference to the reliability coefficient table of the information source as shown in FIG. α (s)
Does not need to be static, and may vary according to the update frequency, the latest update date and time, the number of information sources, and the like. As the Pos value (possibility value) of the information w at the time of integration, a value that satisfies the following expression (Equation 5) is adopted.

【数５】このようにして、確信度が計算で求められたならば、確
信度の最も大きい可能性値を統合情報の可能性値に採用
して、統合テーブルに書き込む（ステップ２１０）。上
記可能性値の確信度を統合情報の確信度として採用し
（ステップ２１１）、上記確信度値に信頼係数を掛け
て、統合テーブルに書き込む（ステップ２１２）。これ
らの処理により、図１４に示すような統合テーブルが生
成される。統合された情報のリストは、表現生成部１９
に渡すために出力される（ステップ２１４）。(Equation 5) In this way, if the certainty is obtained by the calculation, the possibility value having the largest certainty is adopted as the possibility value of the integrated information and written into the integrated table (step 210). The certainty factor of the possibility value is adopted as the certainty factor of the integrated information (step 211), and the certainty value is multiplied by a confidence coefficient and written into the integrated table (step 212). Through these processes, an integrated table as shown in FIG. 14 is generated. The list of integrated information is stored in the expression generation unit 19.
(Step 214).

【００１６】図４は、図１における表現生成部の処理フ
ローチャートである。なお、図１５は可能性値対応テー
ブルのデータ構成例、図１６は確信度値対応テーブルの
データ構成例である。表現生成部１９は、可能性値と確
信度値に基づいて、対応する言語表現を生成する。ただ
し、表現の生成が不要な場合には、この処理は行わなく
てもよい。先ず、表現生成部１９は、情報統合部１８か
ら渡された統合テーブルを読み込み（ステップ３０
１）、読み込んだテーブルの各行の取り出しを開始する
（ステップ３０２）。なお、なお、ステップ３０２とス
テップ３０７とは同じ処理の開始ブロックと終端ブロッ
クを意味しており、時間的にこの間継続してテーブルの
各行の取り出し処理を行っていることを表わしている。
次に、表現生成部１９は、先ず各情報の可能性値に基づ
いて言語表現を生成する。図１５は、可能性値と表現と
の対応関係を示したテーブル１０ａである。ただし、０
＜ａ＜ｂ＜ｃ＜１で、〔情報〕には統合テーブルの情報
の項目の値（名義属性天気予測、例えば『晴』など）が
挿入される。可能性値対応テーブル１０ａを参照するこ
とにより、可能性値に対応する表現を可能性値対応テー
ブル１０ａより取り出す（ステップ３０３）。FIG. 4 is a processing flowchart of the expression generation unit in FIG. FIG. 15 is an example of a data configuration of the possibility value correspondence table, and FIG. 16 is an example of a data configuration of the certainty value correspondence table. The expression generation unit 19 generates a corresponding linguistic expression based on the likelihood value and the certainty factor value. However, if generation of an expression is not necessary, this processing may not be performed. First, the expression generation unit 19 reads the integration table passed from the information integration unit 18 (Step 30).
1), retrieval of each row of the read table is started (step 302). Steps 302 and 307 mean the start block and the end block of the same process, and indicate that the process of retrieving each row of the table is performed continuously during this period.
Next, the expression generation unit 19 first generates a linguistic expression based on the likelihood value of each piece of information. FIG. 15 is a table 10a showing the correspondence between the possibility values and the expressions. Where 0
In the case of <a <b <c <1, the value of the item of information in the integrated table (nominal attribute weather forecast, for example, “fine”) is inserted in [information]. By referring to the possibility value correspondence table 10a, an expression corresponding to the possibility value is extracted from the possibility value correspondence table 10a (step 303).

【００１７】次に、表現生成部１９は、各情報の確信度
値に基づいて言語表現を生成する。図１６は、確信度値
と表現との対応関係を示したテーブル１０ｂである。こ
の確信度値対応テーブル１０ｂを参照することにより、
確信度値に対応する表現を確信度値対応テーブル１０ｂ
より取り出す（ステップ３０４）。ただし、０＜ｄ＜ｅ
＜ｆ＜ｇ＜１で、〔可能性〕には図１６のテーブル１０
ｂの可能性値によって決定された表現が挿入される。例
えば、ａ＝０．２，ｂ＝０．６，ｃ＝０．８，ｄ＝０．
６，ｆ＝０．８とすると、上記処理により、図１７に示
す表現付統合テーブルが得られる。このようにして、上
記ステップで得られた表現を合成する（ステップ３０
５）。この処理によって、生成された表現を統合テーブ
ルに追加する（ステップ３０６）。表現付統合テーブル
を順序決定部２０に渡すために出力する（ステップ３０
８）。Next, the expression generator 19 generates a linguistic expression based on the certainty factor of each piece of information. FIG. 16 is a table 10b showing a correspondence relationship between a certainty factor value and an expression. By referring to the certainty factor value correspondence table 10b,
The expression corresponding to the certainty value is stored in the certainty value correspondence table 10b.
Is retrieved (step 304). Where 0 <d <e
<F <g <1, and [possibility] is the table 10 in FIG.
The expression determined by the likelihood value of b is inserted. For example, a = 0.2, b = 0.6, c = 0.8, d = 0.
Assuming that 6, f = 0.8, the above processing yields the integrated table with expressions shown in FIG. In this way, the expressions obtained in the above steps are synthesized (step 30).
5). By this processing, the generated expression is added to the integration table (step 306). The integrated table with expressions is output to be passed to the order determination unit 20 (step 30).
8).

【００１８】図５は、図１における順序決定部の処理フ
ローチャートである。なお、図１８は、ソートされた表
現付統合テーブルのデータ構成例を示す図である。順序
決定部２０は、表現生成部１９から渡された統合テーブ
ルの読み込みを行い（ステップ４０１）、順序決定部２
０は、順序算出関数を各情報の可能性値と確信度値に適
用する（ステップ４０２）。すなわち、順序決定部２０
は、各情報に付与された確信度値と可能性値を基に下式
（数６）を適用し、得られた値を降順にソートして順位
を決定する（ステップ４０３）。FIG. 5 is a processing flowchart of the order determining unit in FIG. FIG. 18 is a diagram illustrating a data configuration example of the sorted expression-added integrated table. The order determining unit 20 reads the integration table passed from the expression generating unit 19 (Step 401), and reads the integrated table.
0 applies the order calculation function to the likelihood value and certainty value of each piece of information (step 402). That is, the order determining unit 20
Applies the following equation (Equation 6) based on the certainty factor value and the likelihood value given to each piece of information, sorts the obtained values in descending order, and determines the order (step 403).

【数６】ここで、Ｐｏｓｓ値は可能性値、ＣＦ値は確信度値を意
味する。また、βはＰｏｓｓ値とＣＦ値のどちらを重視
するかを決定するパラメータであり、ユーザもしくはシ
ステムが事前に設定する。βが１のとき、調和平均（ｎ
個の数ｘ１，ｘ２，・・，ｘｎに対して１／ｘ１，１／
ｘ２，・・１／ｘｎのように、相加平均の逆数）を意味
する。例えば、図１７の統合テーブルが入力されたと
き、β＝１の場合、このテーブルはソートされて図１８
のテーブルが得られる。これは、可能性値の降順にソー
トされている。ソート結果（図１８のテーブル）を出力
部２１から出力する（ステップ４０４）。(Equation 6) Here, the Pos value means the possibility value, and the CF value means the certainty value. Β is a parameter that determines which of the Pos value and the CF value is important, and is set in advance by the user or the system. When β is 1, the harmonic mean (n
1 / x1,1 / for the numbers x1, x2,.
x2,... 1 / xn). For example, when the integrated table in FIG. 17 is input and β = 1, this table is sorted and
Is obtained. It is sorted in descending order of likelihood value. The sorting result (table in FIG. 18) is output from the output unit 21 (step 404).

【００１９】このようにして、図１７のテーブルのよう
に、複数の表現が可能性値と確信度値を伴って生成され
た後に、これらを順序付けるために式〔数４〕のＦ値を
用いる。各行でＦ値を計算し、Ｆの値が大きい行から順
に並べ換える。例えば、『明日の天気は？』と尋ねられ
たとき、全ての情報を提示するのは対話システムとして
不適当であるため、最も適切な文の１つを発話するよう
にするためにソートする。図１８のテーブルがソートさ
れた結果であり、このテーブルから、最もＦ値が高かっ
た、つまり最も上位にある『確実に晴れである』が発話
されるようになる。なお、ここでは、１つを選択してい
るが、上位３つのように複数でも差し支えない。In this way, as shown in the table of FIG. 17, after a plurality of expressions are generated with the possibility value and the certainty factor value, the F value of the expression (Equation 4) is changed in order to order them. Used. The F value is calculated for each row, and the rows are rearranged in descending order of the F value. For example, "What's the weather tomorrow? ], It is inappropriate to present all the information as a dialogue system, so sort so that one of the most appropriate sentences is spoken. This is the result of sorting the table in FIG. 18, and from this table, the highest F-number, that is, the highest rank “certainly clear” is uttered. Here, one is selected, but a plurality such as the top three may be used.

【００２０】図２〜図５のフローチャートの各ステップ
をプログラムに変換し、変換されたプログラムをＣＤ−
ＲＯＭ、ハードディスク等の記録媒体に格納しておくこ
とで、任意のパーソナルコンピュータにこの記録媒体を
装着し、記録媒体から主メモリにインストールするか、
ネットワークを介して他の任意のパーソナルコンピュー
タの主メモリにダウンロードすることにより、それらの
パソコンでプログラムを実行すれば、任意の場所で本発
明の情報統合方法を実現することができる。Each step of the flowcharts of FIGS. 2 to 5 is converted into a program, and the converted program is stored on a CD-ROM.
By storing the recording medium in a recording medium such as a ROM or a hard disk, the recording medium can be mounted on an arbitrary personal computer and installed from the recording medium to the main memory, or
By downloading the program to the main memory of any other personal computer via a network and executing the program on those personal computers, the information integration method of the present invention can be realized at any place.

【００２１】[0021]

【発明の効果】以上説明したように、本発明によれば、
複数の情報源の情報を比較して妥当な可能性を付与する
とともに、その統合した情報自体の確信度を付与するこ
とにより、情報の可能性を提示すると同時に統合された
情報がどの程度信頼できるかを提示することが可能にな
る。従って、ユーザにとって信頼性の低い情報源からで
も価値のある情報を獲得することができる。また、確信
度と可能性に基づいて、ユーザにとって分かり易い自然
言語表現を取得することが可能となる。As described above, according to the present invention,
By comparing information from multiple information sources and giving reasonable possibilities, and by giving the certainty of the integrated information itself, it is possible to present the possibility of information and at the same time how reliable the integrated information is Can be presented. Therefore, valuable information can be obtained even from an information source that is unreliable to the user. In addition, it is possible to acquire a natural language expression that is easy for the user to understand based on the certainty factor and the possibility.

[Brief description of the drawings]

【図１】本発明の一実施例を示す情報統合システムの全
体構成図である。FIG. 1 is an overall configuration diagram of an information integration system according to an embodiment of the present invention.

【図２】図１における情報展開部の処理フローチャート
である。FIG. 2 is a processing flowchart of an information expanding unit in FIG. 1;

【図３】図１における情報統合部の処理フローチャート
である。FIG. 3 is a processing flowchart of an information integration unit in FIG. 1;

【図４】図１における表現生成部の処理フローチャート
である。FIG. 4 is a processing flowchart of an expression generation unit in FIG. 1;

【図５】図１における順序決定部の処理フローチャート
である。FIG. 5 is a processing flowchart of an order determining unit in FIG. 1;

【図６】本発明で使用される情報関連テーブルのデータ
構成例を示す図である。FIG. 6 is a diagram showing a data configuration example of an information relation table used in the present invention.

【図７】本発明で使用される展開情報テーブルのデータ
構成例を示す図である。FIG. 7 is a diagram showing a data configuration example of a development information table used in the present invention.

【図８】本発明で使用される信頼係数テーブルのデータ
構成例を示す図である。FIG. 8 is a diagram showing a data configuration example of a reliability coefficient table used in the present invention.

【図９】本発明で使用される情報源Ｂからの展開情報テ
ーブルのデータ構成例を示す図である。FIG. 9 is a diagram showing a data configuration example of a development information table from an information source B used in the present invention.

【図１０】本発明で使用される情報源Ｄからの展開情報
テーブルのデータ構成例を示す図である。FIG. 10 is a diagram showing a data configuration example of a development information table from an information source D used in the present invention.

【図１１】本発明で使用される各情報源の展開情報を合
成したテーブルのデータ構成例の図である。FIG. 11 is a diagram showing a data configuration example of a table in which development information of each information source used in the present invention is synthesized.

【図１２】図１１を補完したテーブルのデータ構成例を
示す図である。FIG. 12 is a diagram showing an example of a data configuration of a table supplementing FIG. 11;

【図１３】本発明で使用される情報源の信頼係数のテー
ブルを示す図である。FIG. 13 is a diagram showing a table of information source reliability coefficients used in the present invention.

【図１４】本発明で使用される統合テーブルの図であ
る。FIG. 14 is a diagram of an integration table used in the present invention.

【図１５】本発明で使用される可能性値対応テーブルの
図である。FIG. 15 is a diagram of a possibility value correspondence table used in the present invention.

【図１６】本発明で使用される確信度値対応テーブルの
図である。FIG. 16 is a diagram of a certainty factor value correspondence table used in the present invention.

【図１７】本発明で使用される表現付統合テーブルの図
である。FIG. 17 is a diagram of an integrated table with expressions used in the present invention.

【図１８】本発明で使用されるソートされる表現付統合
テーブルの図である。FIG. 18 is a diagram of an integrated table with expressions used in the present invention.

[Explanation of symbols]

１１〜１３…情報展開部、１４〜１６…情報関連テーブ
ル、１８…情報統合部、１９…表現生成部、２０…順序
決定部、２１…出力部。11 to 13: information development unit, 14 to 16: information relation table, 18: information integration unit, 19: expression generation unit, 20: order determination unit, 21: output unit.

Claims

[Claims]

1. An information integration system for comparing information from a plurality of information sources to give a reasonable possibility and giving a certainty factor of information itself obtained by integrating the plurality of information, comprising: An information expansion unit that expands the information into related information with reference to a table describing the degree of relevance between information, and integrates the expanded information for all information sources based on a similarity function and obtains the similarity obtained by the similarity function. An information integration unit that obtains the likelihood value of the information from the degree of similarity and obtains the reliability of the information as a certainty value from the state of dispersion of the similarity, and selects an expression from the possibility expression table based on the possibility value of the information, An expression generator that selects an expression from the certainty expression table based on the certainty value and combines the selected expressions to generate a natural sentence, and an array order based on the likelihood value and certainty value assigned to each piece of information Determine Information integration system; and a sequence determination section for outputting.

2. A method of comparing information from a plurality of information sources to give a reasonable possibility and at the same time giving certainty of the information itself by integrating the information, comprising: Developing the related information with reference to a table describing the degree of relevance of the information, integrating the expanded information based on the similarity function for all information sources, and obtaining information from the similarity obtained by the similarity function. An information integration method, comprising the steps of: determining the likelihood of the information; and determining the reliability of the information as a confidence value from the state of dispersion of the similarity.

3. The information integration method according to claim 2, wherein, in addition to the steps, selecting an expression from a possibility expression table based on the possibility value of the information; An information integration method comprising: selecting an expression from a certainty expression table based on the expression; and generating a natural sentence that is easy for the user to understand by combining the selected expressions.

4. The information integration method according to claim 2, wherein, in addition to said steps, a step of applying a selection function based on a possibility value and a certainty factor value of said information; And a step of selecting information to determine information to be presented to the user by selecting one piece of information.

5. The information integration method according to claim 2, wherein the similarity function used in the step of integrating based on the similarity function is a function indicating the degree of coincidence between two possibilities, for example, w is a phenomenon, If ΔPoss is the difference between the possibilities, An information integration method characterized by using the following function.

6. The information integration method according to claim 4, wherein the function used in the step of sorting based on the value of the function is based on the confidence value Poss and the possibility value CF assigned to each piece of information. An information integration method comprising applying an expression and sorting the obtained values in descending order to determine a rank. (Equation 2) (β is a parameter that determines which of the Poss value and the CF value is important)

7. A program readable recording medium, wherein each step of the information integration method according to claim 2 is converted into a program, and the converted program is stored in the recording medium.