JP6447680B2

JP6447680B2 - Information processing apparatus, information processing method, and program

Info

Publication number: JP6447680B2
Application number: JP2017154336A
Authority: JP
Inventors: 凌澤田; 賢太郎山田
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2019-01-09
Anticipated expiration: 2033-06-25
Also published as: JP2017224334A

Description

本発明は、文章を投稿した投稿者を分類する技術に関する。 The present invention relates to a technique for classifying a contributor who has posted a sentence.

近年、ソーシャルメディアの発達に伴い、大衆からの情報発信が増大している。代表的なソーシャルメディアとして、マイクロブログと呼ばれる、短文でかつ匿名性が高いという特徴を持つものがある。これらの特性からマイクロブログのユーザは、よりリアルタイムに本音を発言しやすいと考えられるため、マイクロブログへの投稿内容は、マーケティングや風評把握等に活用されている。 In recent years, with the development of social media, information transmission from the public has been increasing. A typical social media is called microblogging, which has a short text and high anonymity. Because of these characteristics, it is considered that microblog users are more likely to speak their real intentions in real time, so the content posted on microblogs is used for marketing and reputation assessment.

マイクロブログを分析しマーケティングなどに活用する場合、分析精度を高めることが重要となる。そのためには、発言内容や投稿者の立場の違いなどにより分類した上で分析をすることが必要になる。 When analyzing microblogs and utilizing them for marketing, etc., it is important to improve the accuracy of analysis. For that purpose, it is necessary to analyze after classifying according to the content of the remarks and the position of the contributor.

例えば、ある製品に関するマイクロブログ上での発言を分析したい場合、一つ一つの発言がメディアからによる広告目的の発信なのか、その製品のユーザからによる使用感の共有目的の発信なのかを分類した上で分析する必要がある。 For example, if you want to analyze microblogging comments about a product, categorize whether each comment is for advertising purposes from the media or for sharing the feeling of use by users of the product. It is necessary to analyze above.

分類の例として、特許文献１には投稿者の特性に基づき、投稿者を分類するシステムが開示されている。 As an example of classification, Patent Document 1 discloses a system for classifying a poster based on the characteristics of the poster.

特開２０１２−２２１２８６号公報JP 2012-212286 A

上述の通り、投稿者を分類する精度を高める必要があるが、投稿者は不特定多数存在し、投稿者を正確に分類することは容易ではない。特許文献１により開示された技術では、投稿者の分類のために投稿者の特性のみを使用しており、投稿文の内容までは考慮していないため、分類の精度を高めるには不十分である。 As described above, it is necessary to improve the accuracy of classifying posters, but there are many unspecified posters, and it is not easy to classify posters accurately. The technique disclosed in Patent Document 1 uses only the characteristics of the poster for classification of the poster, and does not consider the content of the posted text, which is insufficient to increase the accuracy of the classification. is there.

また、特許文献１は特定のカテゴリに分けられるルールを事前に設定しているが、このルールは固定であり、ユーザが持っている有用な知識や知見を設定する事はできない。 Moreover, although patent document 1 sets the rule divided into a specific category in advance, this rule is fixed and cannot set the useful knowledge and knowledge which a user has.

そこで、本発明は上記の課題を解決するためになされたものであり、マイクロブログ等に記事を投稿した投稿者を適切に分類することが可能な仕組みを提供することを目的とする。 Accordingly, the present invention has been made to solve the above-described problems, and an object of the present invention is to provide a mechanism capable of appropriately classifying a poster who has posted an article on a microblog or the like.

本発明は、ユーザの指示により各カテゴリに分類された投稿者による投稿内容に基づき、辞書データを作成する辞書データ作成手段と、投稿者を分類するためのルールの設定を受け付けるルール設定手段と、前記辞書データ作成手段により作成された辞書データと、分類済みの投稿者数の総和に対する各カテゴリに分類された投稿者の比率と、前記ルール設定手段により設定されたルールとに基づき、未分類の投稿者を各カテゴリに分類する分類手段と、を備えることを特徴とする情報処理装置である。 The present invention includes dictionary data creation means for creating dictionary data based on the content posted by a poster classified into each category according to a user instruction, rule setting means for accepting setting of a rule for classifying a poster, Based on the dictionary data created by the dictionary data creation means, the ratio of the posters classified into each category with respect to the sum of the number of classified posters, and the rules set by the rule setting means, unclassified An information processing apparatus comprising classification means for classifying a poster into each category.

また、本発明は、情報処理装置の辞書データ作成手段が、ユーザの指示により各カテゴリに分類された投稿者による投稿内容に基づき、辞書データを作成する辞書データ作成工程と、前記情報処理装置のルール設定手段が、投稿者を分類するためのルールの設定を受け付けるルール設定工程と、前記情報処理装置の分類手段が、前記辞書データ作成工程により作成された辞書データと、分類済みの投稿者数の総和に対する各カテゴリに分類された投稿者の比率と、前記ルール設定工程により設定されたルールとに基づき、未分類の投稿者を各カテゴリに分類する分類工程と、を備えることを特徴とする情報処理方法である。 Further, the present invention provides a dictionary data creation step in which dictionary data creation means of an information processing device creates dictionary data based on the content posted by a poster classified into each category according to a user's instruction; A rule setting step in which the rule setting means accepts the setting of a rule for classifying a poster; the dictionary means created by the dictionary data creation step in the classification means of the information processing device; and the number of classified posters A classification step of classifying uncategorized contributors into each category based on the ratio of the posters classified into each category with respect to the sum of the total and the rules set in the rule setting step Information processing method.

また、本発明は、情報処理装置において実行されるプログラムであって、前記情報処理装置を、ユーザの指示により各カテゴリに分類された投稿者による投稿内容に基づき、辞書データを作成する辞書データ作成手段と、投稿者を分類するためのルールの設定を受け付けるルール設定手段と、前記辞書データ作成手段により作成された辞書データと、分類済みの投稿者数の総和に対する各カテゴリに分類された投稿者の比率と、前記ルール設定手段により設定されたルールとに基づき、未分類の投稿者を各カテゴリに分類する分類手段として機能させることを特徴とする。 In addition, the present invention is a program executed in an information processing device, the dictionary data creation for creating the dictionary data based on the content posted by a contributor classified into each category according to a user instruction Means, rule setting means for accepting setting of a rule for classifying a poster, dictionary data created by the dictionary data creating means, and a poster classified into each category with respect to the total number of classified posters Based on the ratio and the rule set by the rule setting means, it is made to function as a classification means for classifying unclassified contributors into each category.

本発明によれば、マイクロブログ等に投稿した投稿者を高い精度で分類することが可能な仕組みを提供することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the structure which can classify the poster who contributed to the microblog etc. with high precision.

本発明の実施形態における投稿者分類システムの構成を示す図である。It is a figure which shows the structure of the contributor classification | category system in embodiment of this invention. 本発明の実施形態におけるユーザ側で特定のカテゴリに分類するためのルールの登録をする画面の一例を示す図である。It is a figure which shows an example of the screen which registers the rule for classifying into a specific category at the user side in embodiment of this invention. 本発明の実施形態における図２で列「設定」を“確定”と設定したときに表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed when column "setting" is set to "confirmation" in FIG. 2 in embodiment of this invention. 本発明の実施形態におけるユーザ側で設定できるルールの一例を示す図である。It is a figure which shows an example of the rule which can be set by the user side in embodiment of this invention. 本発明の実施形態における辞書テーブルを作成する画面の一例を示す図である。It is a figure which shows an example of the screen which produces the dictionary table in embodiment of this invention. 本発明の実施形態における投稿文を登録するテーブルの一例を示す図である。It is a figure which shows an example of the table which registers the posting text in embodiment of this invention. 本発明の実施形態における投稿者のデータを登録するテーブルの一例を示す図である。It is a figure which shows an example of the table which registers the data of the contributor in embodiment of this invention. 本発明の実施形態におけるユーザにより登録されたルールを登録するテーブルの一例を示す図である。It is a figure which shows an example of the table which registers the rule registered by the user in embodiment of this invention. 本発明の実施形態における分類されるカテゴリを登録するテーブルの一例を示す図である。It is a figure which shows an example of the table which registers the category classified in embodiment of this invention. 本発明の実施形態におけるカテゴリに分類する際に用いる辞書のテーブルの一例を示す図である。It is a figure which shows an example of the table of the dictionary used when classifying into the category in embodiment of this invention. 本発明の実施形態における「辞書の作成処理」を説明したフローチャートである。It is a flowchart explaining the "dictionary creation process" in the embodiment of the present invention. 本発明の実施形態における「未判定投稿者分類処理」を説明したフローチャートである。It is the flowchart explaining the "undecided poster classification process" in the embodiment of the present invention. 本発明の実施形態における「ルール更新及び削除時の投稿者判定」を説明したフローチャートである。It is a flowchart explaining "Contributor determination at the time of rule update and deletion" in the embodiment of the present invention. 本発明の実施形態におけるカテゴリＩＤとカテゴリ名とを対応付けたテーブルの一例を示す図であるIt is a figure which shows an example of the table which matched category ID and category name in embodiment of this invention. 本発明の実施形態における辞書テーブルの一例を示す図である。It is a figure which shows an example of the dictionary table in embodiment of this invention. 本発明の実施形態におけるルール登録データテーブルの一例を示す図であるIt is a figure which shows an example of the rule registration data table in embodiment of this invention. 本発明の実施形態における投稿者データに変更があった場合の処理の一例を示すフローチャートであるIt is a flowchart which shows an example of a process when there is a change in contributor data in the embodiment of the present invention. 情報処理装置１０１のハードウエア構成の一例を示す図である。2 is a diagram illustrating an example of a hardware configuration of an information processing apparatus 101. FIG. 選択されたルールが適切なルールではない場合に表示される警告画面の一例を示す図である。It is a figure which shows an example of the warning screen displayed when the selected rule is not an appropriate rule. 確定ルールが設定された場合に、誤分類の可能性がある旨の警告を表示する画面の一例を示す図である。It is a figure which shows an example of the screen which displays the warning that there exists a possibility of misclassification, when a decision rule is set. ユーザにより設定されたカテゴリが、重みの最も高いカテゴリではない場合の警告画面の一例を示す図である。It is a figure which shows an example of a warning screen when the category set by the user is not a category with the highest weight.

図１は、本発明の実施形態における投稿者分類システムの構成を示す図である。投稿者分類システムは、ユーザルール設定部、教師データ登録部、辞書作成部、ルール重み設定部、投稿文分解部、カテゴリ分類確率計算部、判定部を備えている。 FIG. 1 is a diagram showing a configuration of a contributor classification system according to an embodiment of the present invention. The poster classification system includes a user rule setting unit, a teacher data registration unit, a dictionary creation unit, a rule weight setting unit, a posted sentence decomposition unit, a category classification probability calculation unit, and a determination unit.

また、ルール登録データテーブル、カテゴリマスタテーブル、投稿者データテーブル、投稿文データテーブル、辞書テーブルを備えている。 Further, a rule registration data table, a category master table, a contributor data table, a posted sentence data table, and a dictionary table are provided.

ユーザルール設定部は、ユーザにより設定されたルールを登録する機能を備える。ユーザは、図２に一例を示すルール設定画面を介して、ルールの設定をする。 The user rule setting unit has a function of registering a rule set by the user. The user sets a rule via the rule setting screen shown as an example in FIG.

図２は、ルール設定画面の一例を示した図である。本実施例では、投稿者を３つのカテゴリ（「ＢＯＴ」、「ニュース」、「一般ユーザ」）に分類する場合について説明する。ここで「ＢＯＴ」とは、特定のサイトに誘導することを目的とし、自動的に投稿するようにプログラムされたものを指す。また、「ニュース」とは、ニュースサイトなどのメディアを指す。また、「一般ユーザ」とは、ＢＯＴやニュース以外の投稿者を指す。本実施例では、投稿者をこれら３つのカテゴリに分類する場合について説明する。そのため、図９に示すカテゴリマスタテーブルには、３つのタプルが登録されていることとなる。 FIG. 2 is a diagram illustrating an example of the rule setting screen. In this embodiment, a case will be described in which a poster is classified into three categories (“BOT”, “news”, and “general user”). Here, “BOT” refers to a program programmed to automatically post for the purpose of guiding to a specific site. “News” refers to media such as news sites. The “general user” refers to a contributor other than BOT and news. In this embodiment, a case will be described in which a poster is classified into these three categories. Therefore, three tuples are registered in the category master table shown in FIG.

図２においてユーザからの設定を受け付けるのは、「ルール２０１」と「設定２０６」の項目である。 In FIG. 2, items “rule 201” and “setting 206” accept settings from the user.

ルール２０１はプルダウンになっており、ユーザからプルダウンで表示されるルールのいずれかの選択を受け付ける。プルダウンに表示されるルールは、図４に示すテーブルに登録されたものである。これらのルールは、不図示のルール登録画面などを介してユーザにより登録されるものである。また、ルールは論理積を用いることも可能であるとする。 The rule 201 has a pull-down, and accepts selection of any of the rules displayed by the pull-down from the user. The rules displayed in the pull-down are registered in the table shown in FIG. These rules are registered by the user via a rule registration screen (not shown). Further, it is assumed that a logical product can be used for the rule.

ここでルールとは、図４に示すように投稿文（投稿内容）とは関係のない情報に関する条件である。例えば自己紹介文や、投稿件数や、リンク者に関する情報などである。このように、投稿内容とは関係のない情報によりルールを作成することで、投稿文以外の情報をカテゴリ分類の指標とすることが可能となる。このように、投稿文以外の情報も用いてカテゴリ分類をすることで、単に投稿内容だけを用いた場合や、投稿者情報だけを用いた場合に比べ、より適切なカテゴリ分類を実現することができる。 Here, the rule is a condition relating to information not related to the posted text (posted content) as shown in FIG. For example, a self-introduction sentence, the number of posts, and information on a link person. In this way, by creating a rule based on information unrelated to the posted content, information other than the posted text can be used as an index for category classification. In this way, categorization using information other than posted text can achieve more appropriate categorization than when using only posted content or using only poster information. it can.

該当数２０２には、ルール２０１が選択されると、教師データフラグが１である投稿者のうち、当該選択されたルールに該当する投稿者の数が算出され、表示される。また、ＢＯＴ２０３、ニュース２０４、一般ユーザ２０５には、教師データフラグが１であり、かつ選択されたルールに該当する投稿者のうち、それぞれのカテゴリに分類された投稿者の数と比率（割合）とが算出され、表示される。 When the rule 201 is selected, the number of hits 202 is calculated and displayed among the contributors whose teacher data flag is 1, corresponding to the selected rule. In addition, for the BOT 203, the news 204, and the general user 205, the number and ratio (ratio) of the contributors classified into each category among the contributors whose teacher data flag is 1 and corresponding to the selected rule. Are calculated and displayed.

このように、投稿者の数や比率を表示することで、ユーザは、これらの数値を参考にしながらルールを設定することが可能となるため、適切なルールを設定することが可能となる。 Thus, by displaying the number and ratio of contributors, the user can set rules while referring to these numerical values, so that it is possible to set appropriate rules.

また、ここで算出された比率は、「重み」として図８に示すルール登録データテーブルに登録される。例えば、ルール１が図２に示すような比率（９０％、１０％、０％）を持つ場合、重みとして、各カテゴリにそれぞれ整数値９０、１０、０が登録される。 Further, the ratio calculated here is registered as a “weight” in the rule registration data table shown in FIG. For example, when Rule 1 has a ratio (90%, 10%, 0%) as shown in FIG. 2, integer values 90, 10, and 0 are registered in each category as weights.

ここで、例えば算出された比率（重み）が３０、３０、４０といった値となった場合、当該ルールは、投稿者をどのカテゴリに分類すべきかを決定するためのルールとしては不適切であると言える。すなわち、このようなルールは、各カテゴリに属する投稿者の特徴を表したルールではないといえるため、このようなルールを用いても適切な分類は出来ない。そこで、このような場合には、適切なルールではない旨をユーザに対して警告する。 Here, for example, when the calculated ratio (weight) is a value such as 30, 30, 40, the rule is inappropriate as a rule for determining which category the poster should be classified into. I can say that. That is, since such a rule is not a rule representing the characteristics of the posters belonging to each category, appropriate classification cannot be performed using such a rule. Therefore, in such a case, the user is warned that the rule is not appropriate.

適切なルールではない旨を警告する画面の一例を図１９に示す。 An example of a screen for warning that the rule is not appropriate is shown in FIG.

適切なルールか否かの判断においては、例えば予め比率（重み）が８０を超えるカテゴリがあるか否か（ある場合は適切、ない場合は不適切）といった基準を登録しておき、その基準により判定することが可能である。 In determining whether or not the rule is appropriate, for example, a criterion such as whether or not there is a category having a ratio (weight) exceeding 80 (appropriate if there is, or inappropriate if not) is registered in advance. It is possible to determine.

設定２０６は、ユーザからラジオボタンにより「自動」または「確定」のいずれかの選択を受け付ける。「自動」とは、判定部においてカテゴリ分類確率計算部で計算された確率を用いて判定することを意味する（自動が設定されたルールを、「自動ルール」とする）。また、「確定」は、判定部において当該ルールだけを用いて判定することを意味する（確定が設定されたルールを、「確定ルール」とする）。 The setting 206 accepts a selection of “automatic” or “confirmed” by a radio button from the user. “Automatic” means that the determination unit uses the probability calculated by the category classification probability calculation unit to make a determination (a rule for which automatic is set is referred to as an “automatic rule”). Further, “determined” means that the determination unit makes a determination using only the rule (a rule for which determination is set is referred to as a “determined rule”).

重みが１００となるカテゴリが存在しないルールにおいて、「確定」が選択された場合には、誤分類が含まれることになる旨をユーザに警告する。すなわち確定ルールの場合には、当該ルールに該当する投稿者については、必ず図３に示す画面で設定されたカテゴリに分類されることになる。そのため、他のカテゴリに属する確率（可能性）があるにもかかわらず、それを考慮せずに設定されたカテゴリに属すると判断されるため、誤分類の可能性が残る旨の警告を出すことで、ユーザに注意を与えることが必要となる。 In a rule where there is no category with a weight of 100, if “determined” is selected, the user is warned that misclassification will be included. That is, in the case of the final rule, the posters corresponding to the rule are always classified into the categories set on the screen shown in FIG. Therefore, even though there is a probability (possibility) of belonging to another category, it is judged that it belongs to the set category without considering it, so a warning that the possibility of misclassification remains is issued. Therefore, it is necessary to give attention to the user.

「確定」が選択されると、図３に示す確定ルール登録画面が表示され、当該ルールに該当する投稿者をどのカテゴリに所属させるかの設定を受け付ける。図３の例では、ルール３に該当する投稿者は、カテゴリ「一般ユーザ」に属すると決定される。また、ユーザにより設定されたカテゴリが、重みの最も高いカテゴリではない場合に警告を出しても良い。例えば重みが「ＢＯＴ」１０、「ニュース」１０、「一般ユーザ」８０であるにもかかわらず、図３の画面で「ＢＯＴ」が選択された場合に警告を出してもよい。このような警告により、ルールの誤設定を防止することが可能となる。誤分類の可能性がある旨の警告画面の一例を図２０に示す。 When “Confirm” is selected, a confirmation rule registration screen shown in FIG. 3 is displayed, and a setting as to which category a contributor corresponding to the rule belongs is accepted. In the example of FIG. 3, a poster who falls under rule 3 is determined to belong to the category “general user”. Further, a warning may be issued when the category set by the user is not the category with the highest weight. For example, a warning may be issued when “BOT” is selected on the screen of FIG. 3 even though the weights are “BOT” 10, “news” 10, and “general user” 80. Such warnings can prevent erroneous rule settings. An example of a warning screen indicating the possibility of misclassification is shown in FIG.

ユーザにより設定されたカテゴリが、重みの最も高いカテゴリではない場合の警告画面の一例を図２１に示す。 FIG. 21 shows an example of a warning screen when the category set by the user is not the category with the highest weight.

なお、教師フラグが１の投稿者とは、図１１に示すフローチャートの処理により、ユーザによりカテゴリ分類された投稿者である。すなわちユーザの目視による確認によってカテゴリが判定された投稿者である。 A contributor with a teacher flag of 1 is a contributor who has been categorized by the user by the processing of the flowchart shown in FIG. That is, it is the contributor whose category is determined by the user's visual confirmation.

教師データ登録部は、教師データを登録する機能を備える。教師データを登録する際に用いられる画面の一例を図５に示す。 The teacher data registration unit has a function of registering teacher data. An example of a screen used when registering teacher data is shown in FIG.

図５には、投稿者データテーブルから無作為に抽出された投稿者ＩＤ（５０１）と、その投稿者により投稿された投稿文（５０２）が表示される。ユーザはこの投稿文を確認し、投稿者がいずれのカテゴリに属するかを判断し登録する。 In FIG. 5, a contributor ID (501) randomly extracted from the contributor data table and a posted sentence (502) posted by the contributor are displayed. The user confirms the posted text, determines which category the poster belongs to, and registers.

カテゴリ欄（５０３）には、カテゴリマスタテーブル（図９）に登録されたカテゴリ名が表示され、ラジオボタンにより選択を受け付ける。 In the category column (503), the category name registered in the category master table (FIG. 9) is displayed, and selection is accepted by a radio button.

また、投稿者ＩＤをクリックすることで、その投稿者の投稿者データテーブルに登録されているデータを表示することが可能である。また、投稿文をクリックすることで、その投稿者の過去の投稿文を新たに収集し表示することが可能である。また、過去の投稿文が取得できない場合には、投稿文データテーブルからその投稿者の他の投稿文を表示する。 In addition, by clicking the poster ID, it is possible to display data registered in the poster data table of the poster. Also, by clicking on the posted text, it is possible to newly collect and display the past posted text of the poster. In addition, when a past posted sentence cannot be acquired, another posted sentence of the poster is displayed from the posted sentence data table.

ユーザにより登録ボタン（５０４）が押下されることで、投稿者のカテゴリが登録される。登録されると、当該投稿者ＩＤの投稿者データテーブル（図７）の属性「カテゴリＩＤ」に選択されたカテゴリＩＤが登録される。また、属性「教師データフラグ」には「１」が登録され、教師データとして用いられる。 When the user presses the registration button (504), the poster category is registered. When registered, the selected category ID is registered in the attribute “category ID” of the poster data table (FIG. 7) of the poster ID. Also, “1” is registered in the attribute “teacher data flag” and used as teacher data.

一括登録ボタン（５０５）を押下することで、図５に示す画面に表示された投稿者のカテゴリを一括で登録することも可能である。 By pressing the batch registration button (505), it is possible to register the category of the poster displayed on the screen shown in FIG.

辞書データ作成部は、教師データをもとに辞書テーブル（図１０）を作成する機能を備える。辞書データ作成部は、投稿者データテーブル（図７）の属性「教師データフラグ」が「１」である投稿者ＩＤの投稿文を投稿文データテーブル（図６）から取得する。そして、取得した投稿文データを形態素解析し、辞書テーブル（図１０）の属性「単語」に形態素解析で得られた単語を登録する。また、属性「単語出現数」に当該単語が当該カテゴリの投稿文に出現する個数を登録する。また、属性「カテゴリＩＤ」に当該単語が含まれていた投稿者のカテゴリＩＤを登録する。また、属性「カテゴリ別投稿者数」にカテゴリ別の投稿者数の合計を登録する。 The dictionary data creation unit has a function of creating a dictionary table (FIG. 10) based on teacher data. The dictionary data creation unit obtains a posted sentence with a poster ID whose attribute “teacher data flag” in the poster data table (FIG. 7) is “1” from the posted text data table (FIG. 6). Then, the acquired posted sentence data is subjected to morphological analysis, and the word obtained by the morphological analysis is registered in the attribute “word” of the dictionary table (FIG. 10). In addition, the number of occurrences of the word in the posted text of the category is registered in the attribute “number of occurrences of word”. Also, the category ID of the poster who included the word in the attribute “category ID” is registered. Also, the total number of posters by category is registered in the attribute “number of posters by category”.

ここで、投稿文にＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）が含まれている場合は、そのホスト名を単語として登録する。また、ＵＲＬが短縮ＵＲＬである場合には、実際にその短縮ＵＲＬが示すサイトにアクセスし、実際のＵＲＬを取得する。そして、当該取得したＵＲＬのホスト名を単語として登録する。 Here, when a URL (Uniform Resource Locator) is included in the posted text, the host name is registered as a word. If the URL is a shortened URL, the user actually accesses the site indicated by the shortened URL and acquires the actual URL. Then, the host name of the acquired URL is registered as a word.

短縮ＵＲＬとは、実際のＵＲＬを、より少ない文字数で表したＵＲＬである。各種のサービスが実際のＵＲＬを短縮ＵＲＬに変換するサービスを行っており、実際のＵＲＬが同じであっても変換するサービスによって異なる文字列に変換されることがある。そのため、実際にサイトにアクセスして実際のＵＲＬを取得する必要がある。すなわち、実際には全く同じ内容（同じサイト）を示しているにもかかわらず、異なる単語として登録されてしまうのを防ぐために、実際のサイトにアクセスして実際のＵＲＬを取得する必要がある。 The shortened URL is a URL that represents an actual URL with a smaller number of characters. Various services provide services that convert actual URLs into shortened URLs, and even if the actual URLs are the same, they may be converted into different character strings depending on the services to be converted. Therefore, it is necessary to actually access the site and obtain an actual URL. That is, it is necessary to access an actual site and acquire an actual URL in order to prevent registration as a different word even though the same content (same site) is actually shown.

また、実際のＵＲＬのすべての文字列を単語として登録するのではなく、ホスト名を登録するのは、ＵＲＬが示すコンテンツではなく、ＵＲＬが示すサイトを登録するためである。 In addition, the host name is registered instead of registering all character strings of actual URLs as words because the site indicated by the URL is registered, not the content indicated by the URL.

すなわち、ＵＲＬのすべての文字列を登録した場合、それは一過性のもの（各サイトのコンテンツ）を登録することになるが、ホスト名を登録することで、恒常的なもの（サイト）を登録することができる。 That is, when all the character strings of the URL are registered, it will register the temporary one (content of each site), but register the permanent one (site) by registering the host name. can do.

例えば、「ニュース」のカテゴリに分類すべき投稿者は、「ＡＢＣ新聞の２０１３年６月１日の記事について投稿した投稿者」ではなく、「ＡＢＣ新聞の記事について投稿した投稿者」である。このように、一過性のものについて投稿したか否かではなく、恒常的なものについて投稿したか否かで判断することが適切であると言えるため、ホスト名を辞書に登録する。 For example, a contributor who should be classified into the “news” category is not “a contributor who posted an article on June 1, 2013 of an ABC newspaper” but a “poster who posted an article of an ABC newspaper”. As described above, since it can be said that it is appropriate to judge whether or not the posting is made for the permanent one, not the one for the temporary one, the host name is registered in the dictionary.

さらに、ＵＲＬのすべての文字列を単語として辞書に登録した場合、当該登録された単語は、今後の分類には使われなくなる単語になる可能性が高い。 Further, when all character strings of URLs are registered in the dictionary as words, the registered words are likely to become words that will not be used for future classification.

すなわち、例えば辞書に「商品Ａ」を示すＵＲＬを登録した場合、この商品Ａが終売してしまうと、今後、商品Ａについて投稿する者はいなくなっていく可能性が高いといえる。そのため、辞書に登録したＵＲＬは、今後は使われなくなっていくと言える。同様にニュースについても、過去の記事について投稿する者は少なくなっていくと考えられる。 That is, for example, when a URL indicating “product A” is registered in the dictionary, if the product A is sold out, it is highly likely that no one will post about the product A in the future. Therefore, it can be said that the URL registered in the dictionary will no longer be used. Similarly, with regard to news, it seems that fewer people will post about past articles.

以上の観点から、本実施形態では、ＵＲＬについては、そのホスト名を辞書に登録する。 From the above viewpoint, in the present embodiment, the host name of the URL is registered in the dictionary.

ルール重み計算部は、図８に示すルール登録データテーブルに登録されているルールに当てはまる投稿者を、図７の投稿者データテーブルから抽出し、各カテゴリ別に重みを算出して、投稿者データテーブルに算出した重みを登録する機能を備える。 The rule weight calculation unit extracts a poster who applies to the rule registered in the rule registration data table shown in FIG. 8 from the poster data table in FIG. 7, calculates a weight for each category, and creates a poster data table. The function of registering the calculated weight is provided.

ここで登録する重みは当てはまるルール全ての重みを合計したものである。 The weight registered here is the sum of the weights of all the applicable rules.

投稿文分析部は、投稿文データテーブル（図６）から投稿文を抽出し、形態素解析をして単語に分解する機能を備える。なお、形態素解析については、公知の技術により実現可能であるため、ここでは詳細の説明を省略する。 The posted sentence analysis unit has a function of extracting a posted sentence from the posted sentence data table (FIG. 6), performing morphological analysis, and decomposing it into words. Note that the morphological analysis can be realized by a known technique, and thus detailed description thereof is omitted here.

カテゴリ分類確率計算部は、ルール重み計算部で計算された重みと、投稿文分析部で得られた単語と、図１０に示す辞書テーブルとを用いて、特定の投稿者が分類されるカテゴリのそれぞれの確率を計算する機能を備える。具体的な処理については、後述する。 The category classification probability calculation unit uses the weights calculated by the rule weight calculation unit, the words obtained by the posted sentence analysis unit, and the dictionary table shown in FIG. It has a function to calculate each probability. Specific processing will be described later.

判定部は、カテゴリ分類確率計算部で算出された確率を比較して、確率が最も高いカテゴリに投稿者を分類する機能を備える。そして、図７の投稿者データテーブルを更新する。 The determination unit has a function of comparing the probabilities calculated by the category classification probability calculation unit and classifying a poster into a category having the highest probability. Then, the contributor data table of FIG. 7 is updated.

図６に示す投稿文データテーブルは、マイクロブログ等から収集した投稿文のデータが登録されたテーブルである。投稿文データテーブルには、投稿文を一意に識別するための投稿文ＩＤ、当該投稿文を投稿した投稿者を識別するための投稿者ＩＤ、投稿文、投稿日時、投稿された場所を示す位置情報などが登録されている。 The posted text data table shown in FIG. 6 is a table in which posted text data collected from a microblog or the like is registered. In the posted text data table, a posted text ID for uniquely identifying a posted text, a poster ID for identifying a poster who posted the posted text, a posted text, a posting date and time, and a position indicating a posted location Information is registered.

図７に示す投稿者データテーブルは、マイクロブログ等から収集した投稿者のデータが登録されたテーブルである。 The contributor data table shown in FIG. 7 is a table in which contributor data collected from a microblog or the like is registered.

投稿者データテーブルには、投稿者を識別するための投稿者ＩＤ、投稿者の登録名、判定部により判定されたカテゴリを識別するカテゴリＩＤ、教師データフラグ、ルール重み計算部により計算された各カテゴリの重みの合計、データ登録日時、投稿者の自己紹介文などが登録されている。 In the contributor data table, the contributor ID for identifying the contributor, the registered name of the contributor, the category ID for identifying the category determined by the determination unit, the teacher data flag, and each calculated by the rule weight calculation unit The total category weight, date and time of data registration, and the self-introduction sentence of the poster are registered.

図８に示すルール登録データテーブルは、ユーザにより図２に示す画面を介して設定されたルールが登録されたテーブルである。 The rule registration data table shown in FIG. 8 is a table in which rules set by the user via the screen shown in FIG. 2 are registered.

なお、削除フラグには「０」が登録され、ユーザにより図２の画面を介して削除指示がなされた場合に、「１」が登録される。 It should be noted that “0” is registered in the deletion flag, and “1” is registered when a deletion instruction is given by the user via the screen of FIG.

図９に示すカテゴリマスタテーブルは、投稿者を分類するカテゴリを管理するテーブルである。カテゴリマスタテーブルには、カテゴリＩＤとカテゴリ名とが対応付けて登録されている。 The category master table shown in FIG. 9 is a table for managing categories for classifying posters. In the category master table, category IDs and category names are registered in association with each other.

図１０に示す辞書テーブルは、カテゴリ分類確率計算部で使用される辞書データが登録されたテーブルである。上述の通り、辞書テーブルに登録されるデータは、辞書データ作成部により作成され登録される。 The dictionary table shown in FIG. 10 is a table in which dictionary data used in the category classification probability calculation unit is registered. As described above, the data registered in the dictionary table is created and registered by the dictionary data creation unit.

次に、図１１を用いて、教師データ登録部と辞書データ作成部による辞書データ作成処理を説明する。 Next, dictionary data creation processing by the teacher data registration unit and the dictionary data creation unit will be described with reference to FIG.

図１１に示す処理は、情報処理装置１０１のＣＰＵ１８０１が所定の制御プログラムを読み出して実行する処理である。 The process illustrated in FIG. 11 is a process in which the CPU 1801 of the information processing apparatus 101 reads and executes a predetermined control program.

ステップＳ１０１では、情報処理装置１０１のＣＰＵ１８０１が、マイクロブログ等の分析対象とする投稿文と当該投稿文を投稿した投稿者に関する情報とをインターネット等から取得する。 In step S 101, the CPU 1801 of the information processing apparatus 101 acquires a posted sentence to be analyzed such as a microblog and information on a poster who posted the posted sentence from the Internet or the like.

取得した情報については、投稿文データテーブル（図６）や、投稿者データテーブル（図７）に格納する。 The acquired information is stored in a posted text data table (FIG. 6) or a poster data table (FIG. 7).

ステップＳ１０２では、情報処理装置１０１のＣＰＵ１８０１は、図５に示す教師データ作成画面を表示し、ステップＳ１０１で取得したデータを表示する。そして、ユーザから各投稿者に対するカテゴリ分類を受け付ける。ユーザにより登録ボタン（５０４、５０５）が押下されると、受け付けたカテゴリを示すカテゴリＩＤを含むデータを、投稿者データテーブル（図７）に登録し、教師データフラグを「１」として登録する。 In step S102, the CPU 1801 of the information processing apparatus 101 displays the teacher data creation screen shown in FIG. 5, and displays the data acquired in step S101. And the category classification | category with respect to each contributor is received from a user. When the registration button (504, 505) is pressed by the user, data including a category ID indicating the accepted category is registered in the contributor data table (FIG. 7), and the teacher data flag is registered as “1”.

ステップＳ１０３では、情報処理装置１０１のＣＰＵ１８０１は、カテゴリ別の投稿文を取得する。具体的には、ステップＳ１０２で登録されたカテゴリＩＤが同一の投稿者による投稿文を投稿文データテーブルから取得する。 In step S103, the CPU 1801 of the information processing apparatus 101 acquires a posted message for each category. Specifically, a posted message by a poster who has the same category ID registered in step S102 is acquired from the posted message data table.

ステップＳ１０４では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ１０３で取得した投稿文を形態素解析により単語に分解する。 In step S104, the CPU 1801 of the information processing apparatus 101 decomposes the posted sentence acquired in step S103 into words by morphological analysis.

ステップＳ１０５では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ１０４の処理により得られた単語毎に、カテゴリ毎の当該単語の出現数、当該単語を使用した投稿者数を辞書テーブルに登録する。そして、本フローチャートの処理を終了する。 In step S105, the CPU 1801 of the information processing apparatus 101 registers the number of appearances of the word for each category and the number of contributors using the word in the dictionary table for each word obtained by the process of step S104. And the process of this flowchart is complete | finished.

以上の処理により、ユーザの判断により投稿者がカテゴリ分類され、カテゴリ分類の結果から辞書データを作成することが可能となる。ここで作成された辞書データを基準として、図１２におけるカテゴリ分類処理が実行される。 By the above processing, the poster is classified by the user's judgment, and dictionary data can be created from the result of the category classification. Based on the dictionary data created here, the category classification processing in FIG. 12 is executed.

次に図１２を用いて、カテゴリ判定を行っていない投稿者について、図１１の処理により作成された辞書データを用いてカテゴリ分類をする処理について説明する。 Next, with reference to FIG. 12, a process for classifying a poster who has not performed category determination using the dictionary data created by the process of FIG.

図１２に示す処理は、情報処理装置１０１のＣＰＵ１８０１が所定の制御プログラムを読み出して実行する処理である。 The process illustrated in FIG. 12 is a process in which the CPU 1801 of the information processing apparatus 101 reads and executes a predetermined control program.

ステップＳ２０１では、情報処理装置１０１のＣＰＵ１８０１は、判定対象（カテゴリ分類する対象）となる投稿者の投稿文を投稿文データテーブルから取得する。 In step S 201, the CPU 1801 of the information processing apparatus 101 acquires a posted sentence of a poster who is a determination target (target to be classified) from a posted text data table.

投稿文データテーブルから取得する投稿文は、図１１のステップＳ１０１で格納された投稿文のうち、カテゴリ分類されていない投稿者による投稿文である。 The posted text acquired from the posted text data table is a posted text by a contributor who is not categorized among the posted text stored in step S101 in FIG.

ステップＳ２０２では、情報処理装置１０１のＣＰＵ１８０１は、判定対象の投稿者の投稿者情報（自己紹介文）を投稿者データテーブル（図７）から取得する。 In step S202, the CPU 1801 of the information processing apparatus 101 acquires the poster information (self-introduction text) of the judgment subject poster from the poster data table (FIG. 7).

ステップＳ２０３では、情報処理装置１０１のＣＰＵ１８０１は、辞書テーブルに登録されたデータを用いて、事前確率をカテゴリ別に算出する。
事前確率は、以下の計算式により算出される。
事前確率＝カテゴリ別投稿者数／カテゴリ別投稿者数の総和 In step S203, the CPU 1801 of the information processing apparatus 101 calculates prior probabilities for each category using data registered in the dictionary table.
The prior probability is calculated by the following formula.
Prior probability = total number of contributors by category / total number of contributors by category

ステップＳ２０４では、情報処理装置１０１のＣＰＵ１８０１は、ルール登録データテーブル（図８）から、ユーザにより登録されたルールを取得する。 In step S204, the CPU 1801 of the information processing apparatus 101 acquires a rule registered by the user from the rule registration data table (FIG. 8).

ステップＳ２０５では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ２０４で取得したルールのうち、確定ルールに判定対象の投稿者が合致するか否かを判定する。 In step S205, the CPU 1801 of the information processing apparatus 101 determines whether or not the posting subject to be determined matches the determination rule among the rules acquired in step S204.

判定対象の投稿者が該当する確定ルールがある場合（ステップＳ２０５：ＹＥＳ）は、処理をステップＳ２１１に移行する。 If there is a confirmed rule to which the determination subject poster is applicable (step S205: YES), the process proceeds to step S211.

ステップＳ２１１では、情報処理装置１０１のＣＰＵ１８０１は、判定対象の投稿者を、確定ルールに対応するカテゴリに分類して処理を終了する。ここで、複数のカテゴリに分類される場合（複数の確定ルールに該当する場合）は、事前確率が最も高いカテゴリに分類する。 In step S 211, the CPU 1801 of the information processing apparatus 101 classifies the determination subject poster into a category corresponding to the confirmation rule, and ends the process. Here, when it classify | categorizes into a some category (when it corresponds to a some decision rule), it classify | categorizes into the category with the highest prior probability.

判定対象の投稿者が該当する確定ルールがない場合（ステップＳ２０５：ＮＯ）は、処理をステップＳ２０６に移行する。 If there is no confirmed rule to which the determination subject poster is applicable (step S205: NO), the process proceeds to step S206.

ステップＳ２０６では、情報処理装置１０１のＣＰＵ１８０１は、判定対象の投稿者が該当する自動ルールの重みの合計をカテゴリ別に算出し、算出したカテゴリ別の重みの合計を投稿者データテーブルに登録する。 In step S 206, the CPU 1801 of the information processing apparatus 101 calculates the total weight of the automatic rule to which the determination subject poster corresponds, by category, and registers the calculated total category weight in the poster data table.

ステップＳ２０７では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ２０６で算出したカテゴリ別の重みの合計値が最も高いカテゴリについて、その尤度を高めるために、事前確率を更新する。
事前確率の更新について処理の一例を以下説明する。 In step S207, the CPU 1801 of the information processing apparatus 101 updates the prior probability in order to increase the likelihood of the category having the highest total weight value for each category calculated in step S206.
An example of processing for updating the prior probability will be described below.

例えば、カテゴリが３つ（ＢＯＴ、ニュース、一般ユーザ）あり、事前確率がそれぞれＡ、Ｂ、Ｃであるとする。このうちＡが最も高い値であるとする。他方で、重みの合計値については、ニュースのカテゴリが最も大きいとする。この場合、重みが最も高いニュースのカテゴリの尤度を高くしたいので、ニュースのカテゴリの事前確率をＢからＡに更新する。これにより、ニュースの事前確率を高めることが可能となる。 For example, assume that there are three categories (BOT, news, general user), and the prior probabilities are A, B, and C, respectively. Of these, A is the highest value. On the other hand, it is assumed that the news category has the largest weight total value. In this case, since it is desired to increase the likelihood of the news category having the highest weight, the prior probability of the news category is updated from B to A. This makes it possible to increase the prior probability of news.

このように、ルールに基づき算出された重みを考慮して、事前確率を更新することで、投稿内容だけでなく、投稿者の情報を考慮したカテゴリ分類が可能となる。また、事前確率を最大のカテゴリと合わせることで、事前確率を考慮に入れず、投稿内容によるカテゴリ分類をすることが可能となる。 In this way, by updating the prior probabilities in consideration of the weights calculated based on the rules, it is possible to perform category classification that considers not only the posted content but also the poster information. Further, by combining the prior probability with the maximum category, it is possible to perform category classification based on the posted content without taking the prior probability into consideration.

この方法以外にも、例えばカテゴリ別の重みの比をそれぞれの事前確率に乗算する方法なども考えられる。本発明においては、事前確率の更新処理の具体的方法はいずれの方法であってもよい。 In addition to this method, for example, a method of multiplying each prior probability by a weight ratio for each category is also conceivable. In the present invention, the specific method of the prior probability update process may be any method.

ステップＳ２０８では、情報処理装置１０１のＣＰＵ１８０１は、判定対象の投稿者の投稿文を形態素解析により単語に分解する。ここでの形態素解析の条件（解析に用いる辞書や求める品詞など）は、辞書テーブルを作成した際に用いた条件と同じものとする。 In step S 208, the CPU 1801 of the information processing apparatus 101 decomposes the posted text of the determination target poster into words by morphological analysis. The conditions of morphological analysis here (the dictionary used for analysis, the part of speech to be obtained, etc.) are the same as the conditions used when the dictionary table was created.

ここで、投稿文にＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）が含まれている場合は、そのホスト名を単語として登録する。また、ＵＲＬが短縮ＵＲＬである場合には、実際にその短縮ＵＲＬが示すサイトにアクセスし、実際にＵＲＬを取得する。そして、当該取得したＵＲＬのホスト名を単語として登録する。 Here, when a URL (Uniform Resource Locator) is included in the posted text, the host name is registered as a word. If the URL is a shortened URL, the site actually indicated by the shortened URL is accessed and the URL is actually acquired. Then, the host name of the acquired URL is registered as a word.

短縮ＵＲＬとは、実際のＵＲＬを、より少ない文字数で表したＵＲＬである。各種のサービスが実際のＵＲＬを短縮ＵＲＬに変換するサービスを行っており、同じＵＲＬであっても変換するサービスによって異なる文字列に変換される。そのため、実際にサイトにアクセスして実際のＵＲＬを取得する必要がある。すなわち、実際には全く同じ内容を示しているにもかかわらず、異なる単語として登録されてしまうのを防ぐために、実際のサイトにアクセスしてＵＲＬを取得する必要がある。 The shortened URL is a URL that represents an actual URL with a smaller number of characters. Various services provide services for converting actual URLs into shortened URLs, and even the same URL is converted into a different character string depending on the service to be converted. Therefore, it is necessary to actually access the site and obtain an actual URL. That is, it is necessary to access an actual site and acquire a URL in order to prevent registration as a different word even though the same content is actually shown.

ステップＳ２０９では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ２０８の形態素解析により得られた単語と辞書テーブルとを用いて、尤度をカテゴリ別に単語ごとに求める。ここで単語別の尤度は以下の計算式により求められる。 In step S209, the CPU 1801 of the information processing apparatus 101 obtains likelihood for each word by category using the word obtained by the morphological analysis in step S208 and the dictionary table. Here, the likelihood for each word is obtained by the following calculation formula.

単語別尤度＝辞書テーブルに登録されている単語の数／カテゴリ別の辞書テーブルに含まれている単語の総数 Likelihood by word = number of words registered in dictionary table / total number of words contained in dictionary table by category

ステップＳ２１０では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ２０７で求めた事前確率と、ステップＳ２０９で求めた単語別の尤度とを用いて、カテゴリ別に事後確率を求める。 In step S210, the CPU 1801 of the information processing apparatus 101 obtains the posterior probability for each category using the prior probability obtained in step S207 and the likelihood for each word obtained in step S209.

ステップＳ２１１では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ２１０で求めた事後確率が最大となるカテゴリを、判定対象の投稿者のカテゴリと判定し、投稿者データテーブルに当該カテゴリを登録する。 In step S211, the CPU 1801 of the information processing apparatus 101 determines the category having the maximum posterior probability obtained in step S210 as the category of the poster as a determination target, and registers the category in the poster data table.

なお、本実施例のようにナイーブベイズ分類を用いた場合、辞書テーブルに登録されていない単語が出現すると、事後確率が０になってしまうゼロ頻度問題が生じる。この問題に対しては、ラプラススムージング等の手法を用いることで対処することが可能である。 When naive Bayes classification is used as in this embodiment, a zero frequency problem that the posterior probability becomes zero occurs when a word not registered in the dictionary table appears. This problem can be dealt with by using a technique such as Laplace smoothing.

次に、図１３を用いて、ルール登録データテーブルに新たなルールが追加された場合、および既存のルールが削除された場合に発生する処理について説明する。 Next, processing that occurs when a new rule is added to the rule registration data table and when an existing rule is deleted will be described with reference to FIG.

図１３に示す処理は、情報処理装置１０１のＣＰＵ１８０１が所定の制御プログラムを読み出して実行する処理である。 The process illustrated in FIG. 13 is a process in which the CPU 1801 of the information processing apparatus 101 reads and executes a predetermined control program.

ステップＳ３０１では、情報処理装置１０１のＣＰＵ１８０１は、新たに追加登録されたルール、削除されたルールを抽出する。これは、図２に示すルール設定画面により、登録ボタンが押下された場合または削除ボタンが押下された場合に実行される。 In step S301, the CPU 1801 of the information processing apparatus 101 extracts a newly added rule and a deleted rule. This is executed when the registration button is pressed or the delete button is pressed on the rule setting screen shown in FIG.

ステップＳ３０２では、情報処理装置１０１のＣＰＵ１８０１は、新たに追加または削除されたルールに該当する投稿者の投稿文と、投稿者データとをそれぞれ投稿文データテーブル、投稿者データテーブルから取得する。 In step S 302, the CPU 1801 of the information processing apparatus 101 acquires a posted message and a poster data of a poster corresponding to a newly added or deleted rule from a posted text data table and a poster data table, respectively.

ステップＳ３０３では、情報処理装置１０１のＣＰＵ１８０１は、事前確率を算出する。ここでの算出処理は、ステップＳ２０３における処理と同様である。 In step S303, the CPU 1801 of the information processing apparatus 101 calculates a prior probability. The calculation process here is the same as the process in step S203.

そして、ステップＳ３０２で取得した全投稿者に対して、ステップＳ３０５〜Ｓ３１３の処理を実行する（ステップＳ３０４）。 And the process of step S305-S313 is performed with respect to all the contributors acquired at step S302 (step S304).

ステップＳ３０５では、情報処理装置１０１のＣＰＵ１８０１は、ルール登録データテーブルに登録されたルールのうち確定ルールに、判定対象の投稿者が合致するか否かを判定する。 In step S 305, the CPU 1801 of the information processing apparatus 101 determines whether or not the determination subject poster matches the determined rule among the rules registered in the rule registration data table.

判定対象の投稿者が該当する確定ルールがある場合（ステップＳ３０５：ＹＥＳ）は、処理をステップＳ３１３に移行する。 If there is a confirmation rule to which the determination subject poster is applicable (step S305: YES), the process proceeds to step S313.

ステップＳ３１３では、情報処理装置１０１のＣＰＵ１８０１は、判定対象の投稿者を、確定ルールに対応するカテゴリと判定し、投稿者データテーブルに当該カテゴリを登録する。ここで、複数のカテゴリに分類される場合（複数の確定ルールに該当する場合）は、事前確率が最も高いカテゴリに分類する。 In step S313, the CPU 1801 of the information processing apparatus 101 determines that the determination subject poster is a category corresponding to the confirmation rule, and registers the category in the poster data table. Here, when it classify | categorizes into a some category (when it corresponds to a some decision rule), it classify | categorizes into the category with the highest prior probability.

判定対象の投稿者が該当する確定ルールがない場合（ステップＳ３０５：ＮＯ）は、処理をステップＳ３０６に移行する。 If there is no confirmed rule to which the determination subject poster is applicable (step S305: NO), the process proceeds to step S306.

ステップＳ３０６では、情報処理装置１０１のＣＰＵ１８０１は、投稿者データテーブルからカテゴリ別の重みを取得する。 In step S306, the CPU 1801 of the information processing apparatus 101 acquires a category-specific weight from the poster data table.

ステップＳ３０７では、情報処理装置１０１のＣＰＵ１８０１は、ルールが新たに追加された場合は、当該ルールの重みをカテゴリ毎に加算する。他方、ルールが削除された場合は、当該ルールの重みをカテゴリごとに減算する。そして、加算・減算した重みにより、投稿者データテーブルを更新する。 In step S307, when a rule is newly added, the CPU 1801 of the information processing apparatus 101 adds the weight of the rule for each category. On the other hand, when a rule is deleted, the weight of the rule is subtracted for each category. Then, the poster data table is updated with the added / subtracted weights.

ステップＳ３０８では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ３０７の処理により、最も大きい重みを持つカテゴリが変更されたか否かを判定する。 In step S308, the CPU 1801 of the information processing apparatus 101 determines whether or not the category having the largest weight has been changed by the processing in step S307.

最も重みの大きいカテゴリが変更された場合（ステップＳ３０８：ＹＥＳ）は、処理をステップＳ３０９に移行する。 If the category with the largest weight has been changed (step S308: YES), the process proceeds to step S309.

最も重みの大きいカテゴリが変更されていない場合（ステップＳ３０８：ＮＯ）は、次の投稿者に対する処理に移行し、次の投稿者に対してステップＳ３０５〜Ｓ３１３の処理を実行する。 When the category having the largest weight has not been changed (step S308: NO), the process proceeds to the process for the next poster, and the processes of steps S305 to S313 are executed for the next poster.

ステップＳ３０９では、情報処理装置１０１のＣＰＵ１８０１は、最も重みの高いカテゴリの事前確率を更新する。ステップＳ３０９の処理は、図１２のステップＳ２０７の処理と同様である。 In step S309, the CPU 1801 of the information processing apparatus 101 updates the prior probability of the category with the highest weight. The process of step S309 is the same as the process of step S207 of FIG.

また、ステップＳ３１０〜Ｓ３１３の処理については、それぞれ図１２のステップＳ２０８〜Ｓ２１１の処理と同一であるため、ここでの説明は省略する。 Further, the processes in steps S310 to S313 are the same as the processes in steps S208 to S211 in FIG. 12, respectively, and thus description thereof is omitted here.

次に図１７を用いて、投稿者データに変更があった場合に実行される処理について説明する。 Next, processing executed when there is a change in the poster data will be described with reference to FIG.

図１７に示す処理は、情報処理装置１０１のＣＰＵ１８０１が所定の制御プログラムを読み出すことで実行される処理である。 The process illustrated in FIG. 17 is a process executed when the CPU 1801 of the information processing apparatus 101 reads a predetermined control program.

ステップＳ４０１では、情報処理装置１０１のＣＰＵ１８０１は、投稿者データテーブルと同じ属性値を持つダミーテーブルを作成する。そして、投稿者データテーブルに登録され、教師データフラグが「０」である投稿者のすべてに対して、ステップＳ４０３〜Ｓ４１９の処理を実行する（ステップＳ４０２）。 In step S401, the CPU 1801 of the information processing apparatus 101 creates a dummy table having the same attribute value as that of the poster data table. Then, the processes of steps S403 to S419 are executed for all of the posters registered in the poster data table and whose teacher data flag is “0” (step S402).

ステップＳ４０３では、情報処理装置１０１のＣＰＵ１８０１は、最新の投稿者データを取得し、ステップＳ４０１で作成したダミーテーブルに登録する。このとき、カテゴリＩＤとカテゴリＮの重みは、投稿者データテーブルに登録されている属性値を登録する。教師データフラグについては、「０」を登録する。 In step S403, the CPU 1801 of the information processing apparatus 101 acquires the latest poster data and registers it in the dummy table created in step S401. At this time, the attribute value registered in the poster data table is registered as the weight of the category ID and category N. “0” is registered for the teacher data flag.

ステップＳ４０４では、情報処理装置１０１のＣＰＵ１８０１は、カテゴリＩＤとカテゴリＮの重みを除いた投稿者データテーブルの属性値と、ダミーテーブルに登録された属性値とを比較し、差異があるかを比較する。 In step S404, the CPU 1801 of the information processing apparatus 101 compares the attribute value of the poster data table excluding the category ID and the weight of category N with the attribute value registered in the dummy table, and compares whether there is a difference. To do.

差異がある場合（ステップＳ４０５：ＹＥＳ）は、処理をステップＳ４０６に移行する。 If there is a difference (step S405: YES), the process proceeds to step S406.

差異がない場合（ステップＳ４０５：ＮＯ）は、次の投稿者に対する処理に移行する。 If there is no difference (step S405: NO), the process proceeds to the process for the next poster.

差異がある場合は、全ルールについて、ダミーテーブルに登録されたデータの判定処理を行う（ステップＳ４０６）。 If there is a difference, the data registered in the dummy table is determined for all rules (step S406).

ステップＳ４０７では、情報処理装置１０１のＣＰＵ１８０１は、ルール登録データテーブルに登録されたルールのうち確定ルールに、ダミーテーブルに登録された投稿者が合致するか否かを判定する。 In step S407, the CPU 1801 of the information processing apparatus 101 determines whether the poster registered in the dummy table matches the confirmed rule among the rules registered in the rule registration data table.

合致する確定ルールがある場合（ステップＳ４０７：ＹＥＳ）は、処理をステップＳ４１８に移行する。 When there is a matching rule that matches (step S407: YES), the process proceeds to step S418.

ステップＳ４１８では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ４０７で合致すると判定された確定ルールに設定されたカテゴリをダミーテーブルに登録する。 In step S418, the CPU 1801 of the information processing apparatus 101 registers the category set in the confirmation rule determined to match in step S407 in the dummy table.

合致する確定ルールがない場合（ステップＳ４０７：ＮＯ）は、処理をステップＳ４０８に移行する。 If there is no matching confirmation rule (step S407: NO), the process proceeds to step S408.

ステップＳ４０８では、情報処理装置１０１のＣＰＵ１８０１は、投稿者データテーブルからカテゴリ別の重みを取得する。 In step S408, the CPU 1801 of the information processing apparatus 101 acquires the category-specific weight from the poster data table.

ステップＳ４０９では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ４０８で取得した重みをダミーテーブルに登録する。 In step S409, the CPU 1801 of the information processing apparatus 101 registers the weight acquired in step S408 in the dummy table.

以上のステップＳ４０７〜Ｓ４０９の処理を、全ルールについて行う（ステップＳ４１０）。 The above steps S407 to S409 are performed for all the rules (step S410).

ステップＳ４１１では、情報処理装置１０１のＣＰＵ１８０１は、重みが最高のカテゴリが変わったか否かを判定する。 In step S411, the CPU 1801 of the information processing apparatus 101 determines whether the category having the highest weight has changed.

重みが最高のカテゴリが変わった場合（ステップＳ４１１：ＹＥＳ）は、処理をステップＳ４１２に移行する。 If the category having the highest weight has changed (step S411: YES), the process proceeds to step S412.

重みが最高のカテゴリが変わっていない場合（ステップＳ４１１：ＮＯ）は、処理をステップＳ４１９に移行する。 If the category with the highest weight has not changed (step S411: NO), the process proceeds to step S419.

ステップＳ４１２では、情報処理装置１０１のＣＰＵ１８０１は、事前確率を計算する。 In step S412, the CPU 1801 of the information processing apparatus 101 calculates a prior probability.

ステップＳ４１３では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ４１２で計算された事前確率により、更新する。 In step S413, the CPU 1801 of the information processing apparatus 101 updates with the prior probability calculated in step S412.

ステップＳ４１４では、情報処理装置１０１のＣＰＵ１８０１は、判定対象者の投稿文を取得する。 In step S414, the CPU 1801 of the information processing apparatus 101 acquires the posted text of the determination target person.

ステップＳ４１５では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ４１４で取得した投稿文を形態素解析により単語に分解する。 In step S415, the CPU 1801 of the information processing apparatus 101 decomposes the posted sentence acquired in step S414 into words by morphological analysis.

ステップＳ４１６では、情報処理装置１０１のＣＰＵ１８０１は、ステップＳ４１５で得られた単語と辞書テーブルとを用いて、尤度をカテゴリ別に単語ごとに求める。 In step S416, the CPU 1801 of the information processing apparatus 101 obtains likelihood for each word by category using the word obtained in step S415 and the dictionary table.

ステップＳ４１７では、情報処理装置１０１のＣＰＵ１８０１は、事前確率と、単語別の尤度とを用いて、カテゴリ別に事後確率を求める。 In step S417, the CPU 1801 of the information processing apparatus 101 obtains the posterior probability for each category using the prior probability and the likelihood for each word.

ステップＳ４１８では、情報処理装置１０１のＣＰＵ１８０１は、事後確率が最大となるカテゴリを特定し、ダミーテーブルに当該カテゴリを登録する。 In step S418, the CPU 1801 of the information processing apparatus 101 identifies the category having the maximum posterior probability and registers the category in the dummy table.

ステップＳ４１９では、情報処理装置１０１のＣＰＵ１８０１は、ダミーテーブルに登録された値を、投稿者データテーブルに上書きする。 In step S419, the CPU 1801 of the information processing apparatus 101 overwrites the poster data table with the value registered in the dummy table.

以上の処理により、投稿者情報が変更された場合に、投稿者を再度分類することが可能となる。 Through the above processing, when the poster information is changed, the poster can be classified again.

次に、カテゴリ判定が行われていない投稿者を分類する処理について、具体例を用いて説明する。 Next, a process for classifying a poster who has not been subjected to category determination will be described using a specific example.

ここでは、図１４〜図１６に示すテーブルがあらかじめ用意されていることを前提として、未判定の投稿者（ここでは「Ａ＿ｎｅｗｓ」として説明する）を実際に分類する具体的な処理を、図１２のフローチャートに沿って説明する。 Here, on the assumption that the tables shown in FIGS. 14 to 16 are prepared in advance, a specific process for actually classifying an undetermined poster (explained here as “A_news”) is shown in FIG. This will be described with reference to the flowchart.

なお、辞書テーブル作成時の条件として、形態素解析時に抽出する品詞は、名詞と形容詞とする。 Note that the part-of-speech extracted at the time of morpheme analysis is a noun and an adjective as a condition for creating a dictionary table.

まず、判定対象の投稿者の投稿文を取得する。ここで判定対象の投稿者の投稿文が「［速報］セールが始まりました！」の１つだけであったとする。 First, the posted text of the poster subject to determination is acquired. Here, it is assumed that there is only one post sentence of the judgment subject contributor, “[Breaking News] Sale has started!”.

次に、投稿者の投稿情報を取得する。ここで判定対象の投稿者の自己紹介文が「Ａニュースサイトからの投稿です。」であったとする。 Next, the posting information of the poster is acquired. Here, it is assumed that the self-introduction sentence of the poster subject to determination is “A post from A news site”.

次に、事前確率を計算する。図１５の辞書テーブルより、ＢＯＴ、ニュース、一般ユーザのそれぞれの事前確率は、ＢＯＴ：７０／１００、ニュース：２０／１００、一般ユーザ：１０／１００となる。具体的には、辞書テーブルに登録された投稿者数は１００であり、そのうちカテゴリがＢＯＴである投稿者数は７０、ニュースは２０、一般ユーザは１０である。 Next, the prior probability is calculated. From the dictionary table of FIG. 15, the prior probabilities of BOT, news, and general user are BOT: 70/100, news: 20/100, and general user: 10/100. Specifically, the number of contributors registered in the dictionary table is 100, of which the number of contributors whose category is BOT is 70, news is 20, and general users are 10.

次に、該当するルールを確認する。ここでＡ＿ｎｅｗｓは、ルール２（ルール２は「自己紹介文に“ニュース”という単語が含まれる」というルールであるとする）に該当する。そのため、ルール２のそれぞれの重みを投稿者データテーブルに登録する。 Next, check the applicable rules. Here, A_news corresponds to rule 2 (rule 2 is a rule that “the word“ news ”is included in the self-introduction sentence”). Therefore, each weight of rule 2 is registered in the contributor data table.

そして、事前確率を更新する。上記の処理で得られたそれぞれのカテゴリの重みを比較すると、カテゴリＢ（ニュース）の重みが最大であることが分かる（図１６）。そこで、カテゴリＢの事前確率を２０／１００から７０／１００に更新する。ここでは、上述した重みが最大であるカテゴリの事前確率を、事前確率が最大のものに合わせることによって更新する。 Then, the prior probability is updated. Comparing the weights of the respective categories obtained by the above processing, it can be seen that the weight of the category B (news) is the largest (FIG. 16). Therefore, the prior probability of category B is updated from 20/100 to 70/100. Here, the prior probability of the category having the largest weight is updated by matching the prior probability with the largest prior probability.

次に、投稿文を形態素解析し、単語に分解する。Ａ＿ｎｅｗｓの投稿文「［速報］セールが始まりました！」から名詞と形容詞を抜き出すと、「速報」、「セール」が得られる。 Next, morphological analysis is performed on the posted text and it is broken down into words. By extracting the nouns and adjectives from the A_news post, “[Breaking News] Sale!”, You get “Breaking News” and “Sale”.

次にカテゴリ別に尤度を計算する。まずＢＯＴの尤度を求める。図１５の辞書テーブルを見ると、ＢＯＴ（カテゴリＩＤ：Ａ）の単語に「速報」は存在しないが、「セール」は存在している。よって、ＢＯＴの「速報」の尤度は１／６０となり、「セール」の尤度は１０／６０となる。なお、尤度の計算は、当該カテゴリにおける当該単語の出現数／当該カテゴリの単語出現数の総数となる。また、「速報」の尤度が０／６０としないのは、上述したラプラススムージングという手法を用いたためである。 Next, the likelihood is calculated for each category. First, the likelihood of BOT is obtained. Looking at the dictionary table in FIG. 15, “breaking news” does not exist in the word of BOT (category ID: A), but “sale” exists. Therefore, the likelihood of BOT “breaking news” is 1/60, and the likelihood of “sale” is 10/60. The likelihood calculation is the number of occurrences of the word in the category / the total number of word appearances of the category. The reason why the likelihood of “breaking news” is not 0/60 is that the above-described technique called Laplace smoothing is used.

以下同様に、ニュースの「速報」と「セール」の尤度は、それぞれ２０／６０、１／６０となる。一般ユーザの「速報」と「セール」の尤度は、それぞれ１／４０、１／４０となる。 Similarly, the likelihood of “breaking news” and “sale” of news is 20/60 and 1/60, respectively. The likelihoods of general users' “breaking news” and “sale” are 1/40 and 1/40, respectively.

次に、事後確率を計算する。まず、ＢＯＴの事後確率を求める。ＢＯＴの事前確率は７０／１００であり、単語（「速報」「セール」）の尤度はそれぞれ１／６０、１０／６０である。そのため、事後確率は、（７０／１００×１／６０）×（７０／１００×１０／６０）＝０．００１３６（小数第６位四捨五入）となる。 Next, the posterior probability is calculated. First, the posterior probability of BOT is obtained. The prior probability of BOT is 70/100, and the likelihood of words (“breaking news” and “sale”) is 1/60 and 10/60, respectively. Therefore, the posterior probability is (70/100 × 1/60) × (70/100 × 10/60) = 0.00136 (rounded to the sixth decimal place).

以下同様に、ニュースの事後確率は更新された事前確率を用いて（７０／１００×２０／６０）×（７０／１００×１／６０）＝０．００２７２となる。また一般ユーザの事後確率は、（１０／１００×１／４０）×（１０／１００×１／４０）＝０．００００１となる。 Similarly, the posterior probability of news is (70/100 × 20/60) × (70/100 × 1/60) = 0.00272 using the updated prior probability. Further, the posterior probability of the general user is (10/100 × 1/40) × (10/100 × 1/40) = 0.00001.

次に算出した事後確率を比較し、カテゴリに分類する。上記の計算で得られた事後確率を比較すると、ニュースの事後確率が最も高い。そのため、投稿者「Ａ＿ｎｅｗｓ」はニュースのカテゴリに分類される。 Next, the calculated posterior probabilities are compared and classified into categories. Comparing the posterior probabilities obtained by the above calculations, the posterior probability of news is the highest. Therefore, the poster “A_news” is classified into the news category.

最後に、投稿者データテーブルの属性“投稿者ＩＤ”がＡ＿ｎｅｗｓであるテーブルの属性“カテゴリＩＤ”にニュースのカテゴリＩＤである“Ｂ”が登録される。 Finally, the news category ID “B” is registered in the attribute “category ID” of the table in which the attribute “poster ID” of the poster data table is A_news.

図１８は、情報処理装置１０１のハードウエア構成を示す図である。 FIG. 18 is a diagram illustrating a hardware configuration of the information processing apparatus 101.

図１８において、１８０１はＣＰＵで、システムバス１８０４に接続される各デバイスやコントローラを統括的に制御する。また、ＲＯＭ１８０３あるいは外部メモリ１８１１には、ＣＰＵ１８０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やオペレーティングシステムプログラム（以下、ＯＳ）や、各サーバ或いは各ＰＣの実行する機能を実現するために必要な各種プログラム等が記憶されている。 In FIG. 18, reference numeral 1801 denotes a CPU, which comprehensively controls each device and controller connected to the system bus 1804. Further, the ROM 1803 or the external memory 1811 is necessary for realizing a BIOS (Basic Input / Output System) or an operating system program (hereinafter referred to as an OS), which is a control program of the CPU 1801, and functions executed by each server or each PC. Various programs are stored.

１８０２はＲＡＭで、ＣＰＵ１８０１の主メモリ、ワークエリア等として機能する。ＣＰＵ１８０１は、処理の実行に際して必要なプログラム等をＲＯＭ１８０３あるいは外部メモリ１８１１からＲＡＭ１８０２にロードして、該ロードしたプログラムを実行することで各種動作を実現するものである。 Reference numeral 1802 denotes a RAM that functions as a main memory, work area, and the like of the CPU 1801. The CPU 1801 implements various operations by loading a program or the like necessary for execution of processing from the ROM 1803 or the external memory 1811 to the RAM 1802 and executing the loaded program.

また、１８０５は入力コントローラで、入力装置１８０９等からの入力を制御する。１８０６はビデオコントローラで、液晶ディスプレイ等のディスプレイ装置１８１０への表示を制御する。なお、ディスプレイ装置は、液晶ディスプレイに限られず、ＣＲＴディスプレイなどであっても良い。これらは必要に応じてクライアントが使用するものである。 An input controller 1805 controls input from the input device 1809 or the like. A video controller 1806 controls display on a display device 1810 such as a liquid crystal display. The display device is not limited to a liquid crystal display, and may be a CRT display or the like. These are used by clients as needed.

１８０７はメモリコントローラで、ブートプログラム、各種のアプリケーション、フォントデータ、ユーザファイル、編集ファイル、各種データ等を記憶するハードディスク（ＨＤ）や、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ１８１１へのアクセスを制御する。 Reference numeral 1807 denotes a memory controller which is connected via an adapter to a hard disk (HD), a flexible disk (FD), or a PCMCIA card slot for storing a boot program, various applications, font data, user files, editing files, various data, and the like. Controls access to an external memory 1811 such as a compact flash (registered trademark) memory.

１８０８は通信Ｉ／Ｆコントローラで、ネットワークを介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信等が可能である。 Reference numeral 1808 denotes a communication I / F controller which is connected to and communicates with an external device via a network, and executes communication control processing on the network. For example, communication using TCP / IP is possible.

なお、ＣＰＵ１８０１は、例えばＲＡＭ１８０２内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ装置１８１０上での表示を可能としている。また、ＣＰＵ１８０１は、ディスプレイ装置１８１０上の不図示のマウスカーソル等でのユーザ指示を可能とする。 Note that the CPU 1801 can perform display on the display device 1810 by executing outline font rasterization processing on a display information area in the RAM 1802, for example. Further, the CPU 1801 enables a user instruction with a mouse cursor (not shown) on the display device 1810.

ハードウエア上で動作する各種プログラムは、外部メモリ１８１１に記録されており、必要に応じてＲＡＭ１８０２にロードされることによりＣＰＵ１８０１によって実行されるものである。 Various programs operating on the hardware are recorded in the external memory 1811 and executed by the CPU 1801 by being loaded into the RAM 1802 as necessary.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the configuration and contents of the various data described above are not limited to this, and it goes without saying that the various data and configurations are configured according to the application and purpose.

また、本発明におけるプログラムは、図１１〜図１３、図１７の処理をコンピュータに実行させるプログラムである。なお、本発明におけるプログラムは、図１１〜図１３、図１７の各処理ごとのプログラムであってもよい。 Moreover, the program in this invention is a program which makes a computer perform the process of FIGS. 11-13, and FIG. The program in the present invention may be a program for each process in FIGS. 11 to 13 and FIG.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, a recording medium that records a program that implements the functions of the above-described embodiments is supplied to a system or apparatus, and a computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium realizes the novel function of the present invention, and the recording medium recording the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, nonvolatile memory card, ROM, EEPROM, silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on an instruction of the program is actually It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the processing and the processing is included.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is based on the instructions of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 The present invention may be applied to a system constituted by a plurality of devices or an apparatus constituted by a single device. Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Furthermore, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or apparatus can enjoy the effects of the present invention. In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

１０１情報処理装置
101 Information processing apparatus

Claims

An information processing apparatus for storing teacher data in which a poster is associated with a category of the poster,
The author and receiving means for receiving a set of criteria to classify in each category,
Classification means for classifying uncategorized contributors into each category based on the breakdown of the categories of contributors that satisfy the conditions accepted by the accepting means among the contributors stored as the teacher data ;
An information processing apparatus comprising:

The accepting means accepts a plurality of conditions for classifying a poster into each category,
The classifying means sets the breakdown of the category of the poster that satisfies the condition as the probability of the category of the poster that satisfies the condition, and classifies the unclassified poster into each category based on the probability in the plurality of conditions. The information processing apparatus according to claim 1.

  A confirmation condition setting accepting means for accepting a setting for accepting the setting by the accepting means, that the poster is classified into categories only by satisfying the condition, and which category is classified when the condition is satisfied; ,
  Further comprising
  3. The information processing apparatus according to claim 1, wherein the classifying unit classifies a poster who satisfies a condition whose setting is received by the confirmation condition setting receiving unit, into a category set in the condition.

2. The apparatus according to claim 1, further comprising: a determination unit that determines whether or not the condition is appropriate based on a breakdown of a category of a poster who satisfies a condition received by the reception unit among the posters stored as the teacher data. 4. The information processing apparatus according to any one of items 1 to 3.

The information processing apparatus according to claim 4, further comprising a notification unit that notifies a determination result by the determination unit.

An information processing method in an information processing apparatus for storing teacher data in which a poster is associated with a category of the poster,
An accepting step in which the accepting means of the information processing apparatus accepts the setting of conditions for classifying the poster into each category ;
The classification means of the information processing apparatus classifies uncategorized contributors into each category based on the breakdown of the categories of contributors that satisfy the conditions accepted by the accepting process among the contributors stored as the teacher data. A classification process;
An information processing method comprising:

A program executable in an information processing apparatus for storing teacher data in which a poster and a category of the poster are associated with each other,
The information processing apparatus;
The author and receiving means for receiving a set of criteria to classify in each category,
The program for functioning as a classification | category means to classify | categorize an uncategorized contributor into each category based on the breakdown of the category of the contributor who satisfy | fills the conditions received by the said reception means among the contributors memorize | stored as the said teacher data .