JP5800974B1

JP5800974B1 - Synonym determination device

Info

Publication number: JP5800974B1
Application number: JP2014201667A
Authority: JP
Inventors: 有加井出; 友浩谷口
Original assignee: Kyocera Communication Systems Co Ltd
Current assignee: Kyocera Communication Systems Co Ltd
Priority date: 2014-09-30
Filing date: 2014-09-30
Publication date: 2015-10-28
Anticipated expiration: 2034-09-30
Also published as: JP2016071708A

Abstract

【課題】辞書などのメンテナンスが不要であり、処理が迅速な同義語判定装置を提供する。【解決手段】キーワード抽出手段２は、記録部８に記録された複数の投稿データ１０から、キーワードを抽出する。推移算出手段４は、抽出したキーワードの投稿時間に基づいて、当該キーワードの登場回数の時間的推移を算出する。判断手段６は、上記登場回数の時間的推移が類似するキーワード同士を、同義語であると判断する。【選択図】図１PROBLEM TO BE SOLVED: To provide a synonym determination device which does not require maintenance of a dictionary or the like and can be processed quickly. A keyword extracting unit extracts keywords from a plurality of posted data recorded in a recording unit. The transition calculation unit 4 calculates a temporal transition of the number of appearances of the keyword based on the extracted posting time of the keyword. The determination means 6 determines that keywords having similar temporal transitions in the number of appearances are synonyms. [Selection] Figure 1

Description

この発明は、同義語を判定するための装置に関するものである。 The present invention relates to an apparatus for determining synonyms.

キーワードを統制して、検索結果や解析結果の正確性を増すため等の目的から、同義語を判定することが行われている。たとえば、解析対象とする文書において、キーワードを抽出し、その数の大小により、当該文書におけるキーワードの出現頻度を解析する場合、同義語を無視して解析を行うと、不適切な結果が得られることになってしまう。具体例で示すと次のようである。ある文書におけるキーワードＡ、Ｂ、Ｃ、Ｄの出現頻度が以下のようであったとする。 Synonyms are determined for the purpose of controlling keywords and increasing the accuracy of search results and analysis results. For example, if a keyword is extracted from a document to be analyzed and the frequency of occurrence of the keyword in the document is analyzed based on the number of the keywords, an inappropriate result is obtained if the analysis is performed while ignoring synonyms. It will be. A specific example is as follows. Assume that the appearance frequencies of keywords A, B, C, and D in a document are as follows.

Ａ：１３４
Ｂ：１０３
Ｃ：６８
Ｄ：２５
上記の結果からは、キーワードＡが最もよく出現する言葉であると判断できそうである。しかし、キーワードＢとＣが同義語であった場合には、キーワードＢ（Ｃ）こそが、最もよく出現する言葉であるということになる。 A: 134
B: 103
C: 68
D: 25
From the above results, it is likely that the keyword A is the most frequently used word. However, when the keywords B and C are synonyms, the keyword B (C) is the most frequently used word.

このように、キーワードによる解析において、同義語が含まれていないかどうかを判定することは、重要なことである。 As described above, it is important to determine whether or not a synonym is included in the analysis using a keyword.

同義語判定の手法として、同義語辞書を準備しておき、これにしたがって判定するという方法がある。また、特許文献１では、文脈に基づいて同義語を抽出することが開示されている。 As a method of synonym determination, there is a method of preparing a synonym dictionary and determining according to this. Patent Document 1 discloses extracting synonyms based on context.

国際公開２０１４−００２７７６International Publication 2014-002776

しかしながら、同義語辞書を用意しておく方法では、辞書のメンテナンスが煩雑であるという問題があった。また、文脈に基づいて判断する方法では、処理が極めて複雑であって、処理時間がかかるという問題があった。このため、リアルタイムに同義語を判断する場合などには、適さないという問題があった。 However, the method of preparing a synonym dictionary has a problem that the maintenance of the dictionary is complicated. Further, the method of determining based on the context has a problem that the processing is extremely complicated and takes a long time. For this reason, there is a problem that it is not suitable when synonyms are determined in real time.

この発明は、上記のような問題点を解決して、辞書などのメンテナンスが不要であり、処理が迅速な同義語判定装置を提供することを目的とする。 An object of the present invention is to provide a synonym determination device that solves the above-described problems, does not require maintenance of a dictionary or the like, and is quick in processing.

この発明の独立して適用可能ないくつかの特徴を示す。 Several independently applicable features of the invention are shown.

(1)(2)この発明にかかる同義語判定装置は、ウエブ上での投稿に基づいて同義語を判定する装置であって、各投稿に含まれるキーワードを抽出するキーワード抽出手段と、抽出した各キーワードの登場回数の時間的推移を算出する推移算出手段と、算出した時間的推移が類似するキーワード同士を、同義語であると判断する判断手段とを備えている。 (1) (2) A synonym determination device according to the present invention is a device for determining a synonym based on a post on the web, and a keyword extraction means for extracting a keyword included in each post, and an extraction A transition calculating unit that calculates a temporal transition of the number of appearances of each keyword and a determination unit that determines that keywords having similar calculated temporal transitions are synonyms are included.

したがって、同義語辞書などを設けなくとも、登場回数の時間的推移に基づいて、同義語の判定を行うことができる。 Therefore, the synonym can be determined based on the temporal transition of the appearance frequency without providing a synonym dictionary or the like.

(3)この発明にかかる同義語判定装置は、判断手段は、判断対象である２つのキーワードが、同一の投稿内に含まれて登場する回数が多いほど、同義語であるとする度合いを下げて判断することを特徴としている。 (3) In the synonym determination device according to the present invention, the determination means decreases the degree that two keywords as determination objects are synonyms as the number of times they appear in the same post increases. It is characterized by judging.

したがって、より正確に同義語判定を行うことができる。 Therefore, synonym determination can be performed more accurately.

(4)この発明にかかる同義語判定装置は、判断手段は、判断対象である２つのキーワードの文字列の一致する度合いが高いほど、同義語であるとする度合いを上げて判断することを特徴としている。 (4) The synonym determination device according to the present invention is characterized in that the determination means determines the degree of being a synonym as the degree of matching of the character strings of the two keywords to be determined increases. It is said.

(5)この発明にかかる同義語判定装置は、判断手段は、判断対象である２つのキーワードのそれぞれについて、当該キーワードが登場する投稿内に登場する判断対象以外のキーワードを抽出し、判断対象以外のキーワードの一致度が高いほど、同義語であるとする度合いを上げて判断することを特徴としている。 (5) In the synonym determination device according to the present invention, the determination means extracts, for each of the two keywords that are the determination target, keywords other than the determination target that appear in the post in which the keyword appears, and is other than the determination target The higher the matching degree of the keyword is, the higher the degree of being synonymous is determined.

(6)この発明にかかる同義語判定装置は、推移算出手段は、所定の番組の開始時刻に対応する時刻から、終了時刻に対応する時刻までの期間の各キーワードの登場回数の時間的推移を算出するものであることを特徴としている。 (6) In the synonym determining device according to the present invention, the transition calculating means calculates the temporal transition of the number of appearances of each keyword in the period from the time corresponding to the start time of the predetermined program to the time corresponding to the end time. It is characterized by being calculated.

したがって、所定の番組においての盛り上がりに対応したキーワードの推移などを正確に取得するための、キーワード統一を行うことができる。 Therefore, keyword unification can be performed in order to accurately acquire keyword transitions corresponding to excitement in a predetermined program.

「キーワード抽出手段」は、実施形態においては、ステップＳ２がこれに対応する。 In the embodiment, “keyword extraction means” corresponds to step S2.

「推移算出手段」は、実施形態においては、ステップＳ３がこれに対応する。 In the embodiment, “transition calculation means” corresponds to step S3.

「判断手段」は、実施形態においては、ステップＳ１０がこれに対応する。 In the embodiment, “determination means” corresponds to step S10.

「プログラム」とは、ＣＰＵにより直接実行可能なプログラムだけでなく、ソース形式のプログラム、圧縮処理がされたプログラム、暗号化されたプログラム等を含む概念である。 The “program” is a concept that includes not only a program that can be directly executed by the CPU, but also a source format program, a compressed program, an encrypted program, and the like.

この発明の一実施形態による同義語判定装置２０の機能ブロック図である。It is a functional block diagram of the synonym determination apparatus 20 by one Embodiment of this invention. 同義語判定装置２０のハードウエア構成である。It is a hardware configuration of the synonym determination device 20. 同義語判定プログラム４６のフローチャートである。5 is a flowchart of a synonym determination program 46. 同義語判定プログラム４６のフローチャートである。5 is a flowchart of a synonym determination program 46. 取得した投稿の例である。It is an example of the acquired posting. 抽出したキーワードの例である。It is an example of the extracted keyword. キーワードの登場回数の時間的推移のグラフである。It is a graph of the time transition of the appearance frequency of a keyword. ２つのキーワードの登場回数の時間的推移のグラフである。It is a graph of the time transition of the appearance frequency of two keywords. 同時登場率Ｄuの算出を説明するための図である。It is a figure for demonstrating calculation of the simultaneous appearance rate Du. 文字列一致率Ｓtの算出を説明するための図である。It is a figure for demonstrating calculation of the character string matching rate St. 周辺語一致率Ｓpの算出を説明するための図である。It is a figure for demonstrating calculation of the peripheral word matching rate Sp.

１．全体構成
図１に、この発明の一実施形態による同義語判定装置２０の機能ブロック図を示す。キーワード抽出手段２は、記録部８に記録された複数の投稿データ１０から、キーワードを抽出する。推移算出手段４は、抽出したキーワードの投稿時間に基づいて、当該キーワードの登場回数の時間的推移を算出する。 1. Overall Configuration FIG. 1 shows a functional block diagram of a synonym determination device 20 according to an embodiment of the present invention. The keyword extraction unit 2 extracts keywords from the plurality of post data 10 recorded in the recording unit 8. The transition calculation unit 4 calculates a temporal transition of the number of appearances of the keyword based on the extracted posting time of the keyword.

判断手段６は、上記登場回数の時間的推移が類似するキーワード同士を、同義語であると判断する。
The determination means 6 determines that keywords having similar temporal transitions in the number of appearances are synonyms.

２．ハードウエア構成
図２に、同義語判定装置２０のハードウエア構成を示す。ＣＰＵ３０には、メモリ３２、ディスプレイ３４、ハードディスク３６、通信回路３８、キーボード／マウス４０、ＤＶＤ−ＲＯＭドライブ４２が接続されている。通信回路３８は、インターネットに接続するためのものである。 2. Hardware Configuration FIG. 2 shows a hardware configuration of the synonym determination device 20. A memory 32, a display 34, a hard disk 36, a communication circuit 38, a keyboard / mouse 40, and a DVD-ROM drive 42 are connected to the CPU 30. The communication circuit 38 is for connecting to the Internet.

ハードディスク３６には、オペレーティングシステム４４、同義語判定プログラム４６が記録されている。同義語判定プログラム４６は、オペレーティングシステム４４と協働してその機能を発揮するものである。これらプログラムは、ＤＶＤ−ＲＯＭ５２に記録されていたものを、ＤＶＤ−ＲＯＭドライブ４２を介して、ハードディスク３６にインストールしたものである。なお、インターネットを介してダウンロードしたものであってもよい。
An operating system 44 and a synonym determination program 46 are recorded on the hard disk 36. The synonym determination program 46 performs its function in cooperation with the operating system 44. These programs are those recorded on the DVD-ROM 52 and installed on the hard disk 36 via the DVD-ROM drive 42. It may be downloaded via the Internet.

３．同義語判定プログラム４６の処理
図３、図４に、同義語判定プログラム４６のフローチャートを示す。ここでは、テレビ番組に関する投稿を集め、各投稿に含まれるキーワードの時間的変動を解析するものとする。この際、同義語であるキーワードは一つにまとめて評価するものとする。 3. Processing of Synonym Determination Program 46 FIGS. 3 and 4 show flowcharts of the synonym determination program 46. Here, it is assumed that posts related to a television program are collected and temporal variations of keywords included in each post are analyzed. At this time, keywords that are synonyms are evaluated together.

ＣＰＵ３０は、インターネット上の投稿サイト（たとえば、ツイッター（商標）など）にアクセスする。対象とするテレビ番組名を含む投稿（あるいは、テレビ番組名のタグの付された投稿）を収集する（ステップＳ１）。この実施形態では、対象となるテレビ番組の開始時刻から終了時刻までの投稿を収集するようにしている。 The CPU 30 accesses a posting site (for example, Twitter (trademark)) on the Internet. Posts including the target TV program name (or a post tagged with the TV program name) are collected (step S1). In this embodiment, posts from the start time to the end time of the target TV program are collected.

収集した投稿の例を、図５に示す。この図では、投稿６０、６２、６４のみが示されているが、実際には多くの投稿が収集されることになる。 An example of collected posts is shown in FIG. In this figure, only posts 60, 62, and 64 are shown, but many posts are actually collected.

次に、ＣＰＵ３０は、収集した各投稿を形態素解析し、キーワードを抽出し、記録する（ステップＳ２）。抽出したキーワードのリストを図６に示す。なお、この実施形態では、キーワード抽出時に、各キーワードの登場回数も計数するようにしている。 Next, the CPU 30 performs morphological analysis on each collected post, extracts keywords, and records them (step S2). A list of extracted keywords is shown in FIG. In this embodiment, the number of appearances of each keyword is also counted at the time of keyword extraction.

続いて、ＣＰＵ３０は、各キーワードについて、前記登場回数の時間的変化を記録する（ステップＳ３）。図７に、キーワード「まおちゃん」の登場回数の時間的変化を示す。このようにして、各キーワードについて、登場回数の時間的変化を得ることができる。この例では、１分間隔（所定間隔）で登場回数の変化を見ている。 Subsequently, the CPU 30 records the temporal change in the number of appearances for each keyword (step S3). FIG. 7 shows a temporal change in the number of appearances of the keyword “Mao-chan”. In this way, a temporal change in the number of appearances can be obtained for each keyword. In this example, the change in the number of appearances is seen at 1 minute intervals (predetermined intervals).

次に、ＣＰＵ３０は、各キーワード間での時間的推移の類似度を判定する（ステップＳ４）。図８に、比較する２つのキーワードの登場回数の時間的推移を示す。番組に関連した投稿に用いられるキーワードは、番組の内容や盛り上がりと連動して登場する。したがって、登場回数の時間的推移が類似するキーワードは同義語である可能性が高いことになる。そこで、この実施形態では、２つのキーワードの登場回数の時間的推移のグラフ形状を比較し、その図形的な類似度Ｓfを算出し、同義語率（同義語である確からしさ）の指標としている。 Next, the CPU 30 determines the similarity of temporal transition between the keywords (step S4). FIG. 8 shows a temporal transition of the number of appearances of two keywords to be compared. Keywords used for posting related to the program appear in conjunction with the contents and excitement of the program. Therefore, a keyword having a similar time transition of the number of appearances is likely to be a synonym. Therefore, in this embodiment, the graph shapes of the temporal transitions of the number of appearances of two keywords are compared, the graphical similarity Sf is calculated, and is used as an index of the synonym rate (probability of being a synonym). .

ＣＰＵ３０は、対象とするキーワード（たとえば、所定回数以上登場するキーワード）の全ての組合せについて、時間的推移の類似度Ｓfを算出する。この実施形態では、時間的推移の類似度Ｓfとして、両グラフの相関関数を用いている。 The CPU 30 calculates the similarity Sf of the temporal transition for all combinations of target keywords (for example, keywords appearing a predetermined number of times). In this embodiment, the correlation function of both graphs is used as the similarity Sf of the temporal transition.

次に、ＣＰＵ３０は、２つのキーワードが同じ投稿中に登場する同時登場率Ｄuを算出する（ステップＳ５）。これは、２つのキーワードが同じ投稿中に登場すれば、同義語である確率は低いと判断するものである。つまり、同じ投稿中で、同じ意味の言葉を言い変えることは少ないであろうとの仮定に基づくものである。特に、ツイッター（商標）などの、１つの投稿の文字数が少ない場合には、この仮定がより正しく反映される。 Next, the CPU 30 calculates a simultaneous appearance rate Du at which two keywords appear in the same posting (step S5). This means that if two keywords appear in the same posting, the probability that they are synonyms is low. In other words, it is based on the assumption that words with the same meaning will not be rephrased in the same post. In particular, this assumption is more correctly reflected when the number of characters in one posting is small, such as Twitter (trademark).

この実施形態では、次のようにして同時登場率Ｄuを算出している。まず、比較する一方のキーワードが、対象とする全投稿中で、いくつの投稿に登場するかを算出する。これを、一方のキーワードの登場投稿数Ｘとする。同様に、比較する他方のキーワードが、対象とする全投稿中で、いくつの投稿に登場するかを算出する。これを、他方のキーワードの登場投稿数Ｙとする。さらに、双方のキーワードが同時に登場する投稿がいくつあるかを算出する。これを、共起投稿数Ｚとする。図９に、この関係を模式化して示す。 In this embodiment, the simultaneous appearance rate Du is calculated as follows. First, the number of posts in which one of the keywords to be compared appears in all the targeted posts is calculated. This is the number of appearance posts X of one keyword. Similarly, the number of posts in which the other keyword to be compared appears in all the postings targeted is calculated. This is the number Y of appearance posts of the other keyword. Furthermore, the number of posts in which both keywords appear simultaneously is calculated. This is the co-occurrence posting number Z. FIG. 9 schematically shows this relationship.

同時登場率Ｄuは、以下の式によって算出する。 The simultaneous appearance rate Du is calculated by the following equation.

Ｄu＝Ｚ／Ｑ
ここで、Ｑは、ＸとＹの大きい方である。 Du = Z / Q
Here, Q is the larger of X and Y.

以上のようにして、全てのキーワードの組合せについて、同時登場率Ｄuを算出する。 As described above, the simultaneous appearance rate Du is calculated for all keyword combinations.

次に、ＣＰＵ３０は、比較する２つのキーワードの文字列の一致率を算出する（ステップＳ６）。これは、文字列としての一致度が高いほど、同義語である可能性が高いからである。この実施形態では、文字列一致率Ｓtは、以下のようにして算出する。 Next, the CPU 30 calculates a matching rate between character strings of two keywords to be compared (step S6). This is because the higher the degree of matching as a character string, the higher the possibility of synonyms. In this embodiment, the character string match rate St is calculated as follows.

たとえば、図１０の例１に示すように、キーワード「松本潤」とキーワード「松本くん」との文字列一致率Ｓtを算出するものとする。ＣＰＵ３０は、ハードディスク３６に記録されている漢字読みの辞書を参照し、それぞれ、「マツモトジュン」「マツモトクン」という読みを得る。なお、読みの辞書は、漢字一文字ごとに持ってもよいし、熟語として持つようにしてもよい。 For example, as shown in Example 1 of FIG. 10, the character string matching rate St between the keyword “Matsumoto Jun” and the keyword “Matsumoto-kun” is calculated. The CPU 30 refers to the kanji reading dictionary recorded in the hard disk 36 and obtains readings “Matsumoto Jun” and “Matsumoto Kun”, respectively. Note that a reading dictionary may be provided for each character of kanji or as a idiom.

文字列一致率Ｓtは、以下のようにして算出する。 The character string matching rate St is calculated as follows.

Ｓt＝Ｃ／Ｌ
ここで、Ｃは読みの一致数、Ｌは長い方の読み数である。 St = C / L
Here, C is the number of coincidence of readings, and L is the number of readings of the longer one.

例１の場合、長い方の読み数Ｌが「７」であり、読みの一致数が「５」であるから、文字列一致率は「５／７」となる。 In the case of Example 1, since the longer reading number L is “7” and the matching number of readings is “5”, the character string matching rate is “5/7”.

同様に、例２の場合、文字列一致率は「５／７」となる。 Similarly, in the case of Example 2, the character string matching rate is “5/7”.

上記の処理を全てのキーワードの組合せについて行い、文字列一致率Ｓtを算出する。 The above processing is performed for all keyword combinations to calculate the character string match rate St.

次に、比較する２つのキーワードの周辺語（同一投稿内に登場する他のキーワード）の一致率Ｓpを算出する（ステップＳ７、Ｓ８）。ＣＰＵ３０は、まず、一方のキーワードの、周辺語を抽出する。つまり、当該一方のキーワードと同じ投稿内に登場するキーワードを取得する。同様にして、他方のキーワードの周辺語を抽出する（ステップＳ７）。 Next, the coincidence rate Sp of the peripheral words (other keywords appearing in the same post) of the two keywords to be compared is calculated (steps S7 and S8). The CPU 30 first extracts peripheral words of one keyword. That is, a keyword that appears in the same post as the one keyword is acquired. Similarly, the peripheral words of the other keyword are extracted (step S7).

次に、比較する２つのキーワードの間で、その周辺語の一致率Ｓpを算出する（ステップＳ８）。周辺語一致率Ｓpの算出は、次のようにして行う。これらの関係を、模式的に表すと、図１１のようになる。 Next, the matching rate Sp of the neighboring words is calculated between the two keywords to be compared (step S8). The peripheral word coincidence rate Sp is calculated as follows. These relationships are schematically shown in FIG.

Ｓp＝Ｂ／Ｍ
ここで、Ｂは、双方のキーワードにおいて周辺語となっている周辺語の数、Ｍは数の多い方の周辺語の数である。たとえば、一方のキーワードの周辺語が３０であり、他方のキーワードの周辺語が２５であれば、Ｍは３０となる。なお、周辺語の数は、周辺語の登場回数ではなく、周辺語の種類数である。たとえば、対象となるキーワードと同じ投稿に登場した他のキーワード（周辺語）が、同じ投稿内で再度登場しても、１つと計数する。また、たとえば、対象となるキーワードと同じ投稿に登場した他のキーワード（周辺語）が、他の投稿において登場したとしても、１つと計数する。 Sp = B / M
Here, B is the number of neighboring words that are neighboring words in both keywords, and M is the number of neighboring words with the larger number. For example, if the peripheral word of one keyword is 30 and the peripheral word of the other keyword is 25, M is 30. The number of peripheral words is not the number of appearances of the peripheral words but the number of types of peripheral words. For example, even if another keyword (peripheral language) that appeared in the same post as the target keyword appears again in the same post, it is counted as one. Also, for example, even if another keyword (peripheral language) that appeared in the same post as the target keyword appears in another post, it is counted as one.

以上のようにして、時間的推移の類似度Ｓf、同時登場率Ｄu、文字列一致率Ｓt、周辺語一致率Ｓpを算出すると、ＣＰＵ３０は、キーワード間の同義語率Ｅを算出する（ステップＳ９）。同義語率Ｅは、以下によって算出する。 When the similarity Sf, the simultaneous appearance rate Du, the character string match rate St, and the peripheral word match rate Sp are calculated as described above, the CPU 30 calculates the synonym rate E between keywords (step S9). ). The synonym rate E is calculated as follows.

Ｅ＝Ｓf−Ｄu＋Ｓt＋Ｓp
ＣＰＵ３０は、キーワードの全ての組合せについて、同義語率を算出する。 E = Sf-Du + St + Sp
The CPU 30 calculates the synonym rate for all combinations of keywords.

次に、ＣＰＵ３０は、所定値以上の同義語率Ｅを持つキーワードの組合せを抽出し、これを同義語とする（ステップＳ１０）。以上のようにして、同義語を判定する。 Next, the CPU 30 extracts a combination of keywords having a synonym rate E equal to or greater than a predetermined value, and designates this as a synonym (step S10). As described above, synonyms are determined.

上記のようにして同義語の判定ができれば、ＣＰＵ３０は、同義語と判定されたキーワードを一つのキーワードにまとめ、投稿内に現れるキーワードの時間的変化などの解析を行う。これにより、より正確な解析が可能となる。
If the synonym can be determined as described above, the CPU 30 combines the keywords determined to be synonyms into one keyword, and analyzes the temporal change of the keyword appearing in the post. Thereby, a more accurate analysis becomes possible.

４．その他
(1)上記実施形態では、グラフの図形的類似度によって、時間的推移の類似度Ｓfを算出するようにしている。しかし、いずれの時刻に登場数がしきい値を超えたかなどの情報に基づいて、時間的推移の類似度Ｓfを算出するようにしてもよい。 4). Other
(1) In the above embodiment, the temporal transition similarity Sf is calculated based on the graphical similarity of the graph. However, the temporal transition similarity Sf may be calculated based on information such as at which time the number of appearances exceeds the threshold.

(2)上記実施形態では、漢字を読みに置き換えて文字列一致率Ｓtを算出している。しかし、漢字のままで文字列一致率Ｓtを算出するようにしてもよい。 (2) In the above embodiment, the character string matching rate St is calculated by replacing kanji with readings. However, the character string matching rate St may be calculated with Kanji characters as it is.

(3)上記実施形態では、時間的推移の類似度Ｓf、同時登場率Ｄu、文字列一致率Ｓt、周辺語一致率Ｓpの全てを考慮して、同義語率Ｅを算出している。しかし、それぞれを単独で用いて同義語率Ｅを算出するようにしてもよい。たとえば、時間的推移の類似度Ｓfをそのまま同義語率Ｅとしてもよい。 (3) In the above embodiment, the synonym rate E is calculated in consideration of all of the temporal transition similarity Sf, simultaneous appearance rate Du, character string match rate St, and peripheral word match rate Sp. However, the synonym rate E may be calculated using each of them alone. For example, the temporal transition similarity Sf may be directly used as the synonym rate E.

また、これらの任意の組合せ（２つまたは３つ）によって、同義語率Ｅを算出するようにしてもよい。たとえば、時間的推移の類似度Ｓf、同時登場率Ｄu、周辺語一致率Ｓpを考慮して同義語率Ｅを算出してもよい。 Further, the synonym rate E may be calculated by an arbitrary combination (two or three). For example, the synonym rate E may be calculated in consideration of the similarity Sf of the temporal transition, the simultaneous appearance rate Du, and the peripheral word matching rate Sp.

(4)上記実施形態では、対象とする番組の開始時刻から終了時刻までの投稿を対象として、同義語の判定を行っている。しかし、当該放送チャネル全体などを対象として判定を行うようにしてもよい。

(4) In the above embodiment, synonyms are determined for posts from the start time to the end time of the target program. However, the determination may be made on the entire broadcast channel.

Claims

A device that determines synonyms based on posts on the web,
Keyword extraction means for extracting keywords included in each post;
A transition calculating means for calculating the temporal transition of the number of appearances of each extracted keyword,
A determination means for determining that keywords having similar temporal transitions are synonyms,
A synonym determination device comprising:

A synonym determination program for realizing a synonym determination device by a computer, wherein the computer
Keyword extraction means for extracting keywords included in each post;
A transition calculating means for calculating the temporal transition of the number of appearances of each extracted keyword,
A synonym determination program for causing keywords with similar calculated temporal transitions to function as a determination unit that determines that they are synonyms.

In the apparatus of claim 1 or the program of claim 2,
The apparatus or program according to claim 1, wherein the determination means determines the degree that two keywords to be determined are synonyms as the number of appearances included in the same posting increases.

In the apparatus or program of Claims 1-3,
The apparatus or program according to claim 1, wherein the determination unit increases the degree of synonym as the degree of matching between the character strings of two keywords to be determined is higher.

In the apparatus or program of Claims 1-4,
The determination means extracts a keyword other than the determination target that appears in the posting in which the keyword appears for each of the two keywords that are the determination target, and the higher the matching degree of the keywords other than the determination target, An apparatus or a program characterized by determining a certain degree of increase.

In the apparatus or program of Claims 1-5,
The transition calculation means calculates a temporal transition of the number of appearances of each keyword during a period from a time corresponding to a start time of a predetermined program to a time corresponding to an end time, or program.