JP2017027168A

JP2017027168A - Taste learning method, taste learning program and taste learning device

Info

Publication number: JP2017027168A
Application number: JP2015142468A
Authority: JP
Inventors: 康司白石; Koji Shiraishi; 純佐伯; Jun Saeki; 一英西部; Kazuhide Nishibe; 実紀油谷; Sanenori Yuya
Original assignee: TIS Inc
Current assignee: TIS Inc
Priority date: 2015-07-16
Filing date: 2015-07-16
Publication date: 2017-02-02
Anticipated expiration: 2035-07-16
Also published as: JP6710907B2

Abstract

PROBLEM TO BE SOLVED: To extract pieces of data which indicate a taste in an unified manner, from sentences created by plural users.SOLUTION: A taste learning method includes: a decomposition step for reading a sentence associated with a user, and decomposing the sentence into words; a feature word extraction step for extracting prescribed words from the sentence which is decomposed into words; and an interest object list generation step for using a table in which vectors indicating features of the words generated by using plural sentences are clustered, and association of representative words which are words close to gravity of respective clusters with words included in the respective clusters, is stored, and replacing the words extracted in the feature word extraction step with the representative words. A computer executes these steps.SELECTED DRAWING: Figure 3

Description

本発明は、嗜好学習方法、嗜好学習プログラム、及び嗜好学習装置に関する。 The present invention relates to a preference learning method, a preference learning program, and a preference learning device.

従来、ユーザの嗜好を抽出するための手法として、例えば、人が興味を示す際に使用する可能性のある予め定めた動詞を用いて、分析の対象となる文書データから動詞及び名詞をセットで抽出するという技術が提案されている（特許文献１）。 Conventionally, as a method for extracting a user's preference, for example, using a predetermined verb that may be used when a person shows interest, a verb and a noun are set from document data to be analyzed. A technique of extracting has been proposed (Patent Document 1).

特開２０１１−１８０６４６号公報JP 2011-180646 A

従来、予め定められた単語を抽出することにより文章からユーザの興味を抽出する技術が提案されていた。しかしながら、似た嗜好を有するユーザであっても、同義語等により異なる文章で表現されることがある。 Conventionally, a technique for extracting a user's interest from a sentence by extracting a predetermined word has been proposed. However, even users with similar preferences may be expressed in different sentences depending on synonyms and the like.

そこで、本発明は、複数のユーザが作成する文章から統一的に嗜好を表すデータを抽出することを目的とする。 Therefore, an object of the present invention is to extract data that represents preferences uniformly from sentences created by a plurality of users.

本発明に係る嗜好学習方法は、ユーザに関連付けられた文章を読み出し、単語に分解する分解ステップと、単語に分解された文章から、所定の単語を抽出する特徴語抽出ステップと、複数の文章を用いて生成された、単語の特徴を表すベクトルをクラスタリングし、各クラスタの重心に近い単語である代表語と各クラスタに含まれる単語との対応付けを記憶しているテーブルを用いて、特徴語抽出ステップで抽出された単語を代表語に置き換える興味対象リスト生成ステップとをコンピュータが実行する。 The preference learning method according to the present invention includes a decomposition step of reading a sentence associated with a user and decomposing it into words, a feature word extracting step of extracting a predetermined word from the sentence decomposed into words, and a plurality of sentences. Using the table that stores the correspondence between the representative words that are words close to the center of gravity of each cluster and the words included in each cluster The computer executes an interest list generation step of replacing the word extracted in the extraction step with a representative word.

代表語に置き換えることにより、ユーザが自由に記載した文章の用字又は用語の不統一を吸収し、統一的に利用できる代表語を用いた嗜好情報を生成できるようになる。すなわち、複数のユーザがそれぞれ作成する文章から統一的に嗜好を表すデータを抽出することができるようになる。 By substituting with the representative word, it becomes possible to absorb the inconsistency of the script characters or terms freely described by the user and generate the preference information using the representative word that can be used uniformly. That is, data representing preferences in a unified manner can be extracted from sentences created by a plurality of users.

また、ＳＮＳ（Social Networking Service）が公開する情報から、ユーザの識別情報
と関連付けられた文章を取得するステップと、取得した文章について、ＳＮＳごと且つユーザごとに重みづけされたパラメータを用いて所定の関心度を算出するステップとをさらに含み、興味対象リスト生成ステップにおいて、単語が抽出された文章に対して算出された関心度を、単語を置き換えた代表語に対応付けて記憶させるようにしてもよい。このようにすれば、例えばユーザのＳＮＳの使い方等に応じてＳＮＳごとに重みづけしたパラメータを用いて関心度を求めることができるようになる。 In addition, a step of acquiring a sentence associated with the identification information of the user from information disclosed by SNS (Social Networking Service), and using the parameter weighted for each SNS and each user for the acquired sentence A step of calculating an interest level, and in the interest list generation step, the interest level calculated for the sentence from which the word is extracted is stored in association with the representative word in which the word is replaced. Good. In this way, for example, the degree of interest can be obtained using a parameter weighted for each SNS according to how the user uses the SNS.

また、関心度は、ＳＮＳに対するユーザの操作、又はＳＮＳへの投稿の添付ファイルの有無若しくは外部サイトへのリンクの有無によってさらに重みづけされたパラメータを用いて算出されるようにしてもよい。このようにすれば、ＳＮＳへの投稿等のような外部サイトが公開する情報に基づいて、ユーザの操作や付随的な情報の有無等を加味した関心度を求めることができる。 Further, the degree of interest may be calculated using a parameter further weighted by the user's operation on the SNS, the presence / absence of an attached file of posting to the SNS, or the presence / absence of a link to an external site. In this way, it is possible to obtain the degree of interest taking into account the user's operation, the presence or absence of incidental information, and the like based on information published by an external site such as posting to SNS.

特徴語抽出ステップにおいて用いられる予め定められた単語は、所定の分野に関する単語であってもよい。このようにすれば、所定の分野に関するユーザの嗜好を学習することができるようになる。 The predetermined word used in the feature word extraction step may be a word related to a predetermined field. In this way, it becomes possible to learn the user's preference regarding a predetermined field.

また、ユーザに関連付けられた文章は、ユーザに対して行われた質問への回答であり、質問の回答として期待される単語の品詞が定められており、分解ステップにおいて、回答として期待される単語の品詞に該当する単語を回答の内容として抽出するようにしてもよい。このようにすれば、文章から目的の単語を抽出する際の精度が向上する。 The sentence associated with the user is an answer to the question made to the user, the part of speech of the word expected as the answer to the question is defined, and the word expected as the answer in the decomposition step A word corresponding to the part of speech may be extracted as the content of the answer. In this way, the accuracy in extracting the target word from the sentence is improved.

なお、課題を解決するための手段に記載の内容は、本発明の課題や技術的思想を逸脱しない範囲で可能な限り組み合わせることができる。また、課題を解決するための手段の内容は、コンピュータ等の装置若しくは複数の装置を含むシステム、コンピュータが実行する方法、又はコンピュータに実行させるプログラムとして提供することができる。なお、コンピュータが読み取り可能な記録媒体を提供するようにしてもよい。 The contents described in the means for solving the problems can be combined as much as possible without departing from the problems and technical ideas of the present invention. The contents of the means for solving the problems can be provided as a device such as a computer or a system including a plurality of devices, a method executed by the computer, or a program executed by the computer. Note that a computer-readable recording medium may be provided.

ここで、コンピュータが読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータから読み取ることができる記録媒体をいう。このような記録媒体の内コンピュータから取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、CD-ROM、CD-R/W、DVD、DAT、8mmテープ、メモリカード等がある。また、コンピュータに固定された記録
媒体としてハードディスクやＲＯＭ（Read Only Memory）等がある。 Here, the computer-readable recording medium refers to a recording medium that accumulates information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from the computer. . Examples of such a recording medium that can be removed from the computer include a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R / W, a DVD, a DAT, an 8 mm tape, and a memory card. Further, there are a hard disk, a ROM (Read Only Memory) and the like as a recording medium fixed to the computer.

複数のユーザが作成する文章から統一的に嗜好を表すデータを抽出することができるようになる。 Data representing preferences in a unified manner can be extracted from sentences created by a plurality of users.

図１は、システム構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a system configuration. 図２は、エージェントの一例を示す図である。FIG. 2 is a diagram illustrating an example of an agent. 図３は、嗜好分析装置の一例を示す機能ブロック図である。FIG. 3 is a functional block diagram illustrating an example of a preference analysis apparatus. 図４は、文章記憶部に格納される文書テーブルの一例を示す図である。FIG. 4 is a diagram illustrating an example of a document table stored in the text storage unit. 図５は、文章記憶部に格納される回答テーブルの一例を示す図である。FIG. 5 is a diagram illustrating an example of an answer table stored in the text storage unit. 図６は、パラメータ記憶部に記憶される第１パラメータテーブルの一例を示す図である。FIG. 6 is a diagram illustrating an example of a first parameter table stored in the parameter storage unit. 図７は、パラメータ記憶部に記憶される第２パラメータテーブルの一例を示す図である。FIG. 7 is a diagram illustrating an example of a second parameter table stored in the parameter storage unit. 図８は、パラメータ記憶部に記憶される第３パラメータテーブルの一例を示す図である。FIG. 8 is a diagram illustrating an example of a third parameter table stored in the parameter storage unit. 図９は、単語記憶部に記憶される単語テーブルの一例を示す図である。FIG. 9 is a diagram illustrating an example of a word table stored in the word storage unit. 図１０は、単語記憶部に記憶される特徴語テーブルの一例を示す図である。FIG. 10 is a diagram illustrating an example of a feature word table stored in the word storage unit. 図１１は、用語記憶部へ格納されるデータの一例を示す図である。FIG. 11 is a diagram illustrating an example of data stored in the term storage unit. 図１２は、意味ベクトルの一例を示す図である。FIG. 12 is a diagram illustrating an example of a semantic vector. 図１３は、代表語辞書に保持されるリストの一例を示す図である。FIG. 13 is a diagram illustrating an example of a list held in the representative word dictionary. 図１４は、代表語とスコアとの対応付けの一例を示す図である。FIG. 14 is a diagram illustrating an example of association between representative words and scores. 図１５は、コンピュータの一例を示す装置構成図である。FIG. 15 is an apparatus configuration diagram illustrating an example of a computer. 図１６は、文書分析処理の一例を示す処理フロー図である。FIG. 16 is a process flowchart showing an example of the document analysis process. 図１７は、回答分析処理の一例を示す処理フロー図である。FIG. 17 is a process flow diagram illustrating an example of an answer analysis process.

以下、図面を参照して本発明を実施するための形態について説明する。なお、実施形態に示す構成は例示であり、本発明は下記の構成に限定されない。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. In addition, the structure shown in embodiment is an illustration and this invention is not limited to the following structure.

＜システム構成＞
図１は、本実施形態に係るシステム構成の一例を示す図である。図１のシステムは、ユーザ装置１と、嗜好分析装置２と、ＳＮＳ（Social Networking Service）提供装置３と
を含み、これらがインターネット等のネットワーク４を介して相互に接続されている。本実施形態では、ユーザの嗜好を抽出し、例えば旅行の計画を提案するソフトウェアエージェント（「エージェントプログラム」又は単に「エージェント」とも呼ぶ）をユーザが自己のコンピュータにインストールして使用するものとする。なお、各構成要素は複数存在してもよい。ユーザ装置１は、ユーザが携帯するコンピュータ装置であって本実施形態に係るエージェントプログラムを実行する。エージェントプログラムは、ユーザの嗜好を学習するための質問を行い、例えば文章でされた回答を嗜好分析装置２へ送信する。また、ユーザは自己のアカウントでＳＮＳ等へ文章を投稿することもある。嗜好分析装置２は、ユーザの回答やＳＮＳ等への投稿を分析し、ユーザの嗜好を抽出する。なお、嗜好分析装置２又は他の装置は、抽出された嗜好に基づいてユーザへ例えば旅行等の計画を提案するようにしてもよい。ＳＮＳ提供装置３は、いわゆるＳＮＳのようなユーザ同士がコミュニケーションを行う場を提供するサーバ装置である。本実施形態では、ミニブログのようなサービスもＳＮＳに含まれるものとする。 <System configuration>
FIG. 1 is a diagram illustrating an example of a system configuration according to the present embodiment. The system of FIG. 1 includes a user device 1, a preference analysis device 2, and an SNS (Social Networking Service) providing device 3, which are connected to each other via a network 4 such as the Internet. In the present embodiment, it is assumed that a user installs a software agent (also referred to as an “agent program” or simply “agent”) that extracts a user's preferences and proposes a travel plan, for example, on his / her computer. A plurality of each component may exist. The user device 1 is a computer device carried by the user and executes the agent program according to the present embodiment. The agent program makes a question for learning the user's preference, and transmits, for example, an answer made in a sentence to the preference analysis device 2. In addition, the user may post a sentence to SNS or the like with his / her account. The preference analysis device 2 analyzes the user's answer, SNS posting, and the like, and extracts the user's preference. Note that the preference analysis device 2 or another device may propose a plan such as a trip to the user based on the extracted preferences. The SNS providing device 3 is a server device that provides a place where users communicate with each other like a so-called SNS. In the present embodiment, a service such as a miniblog is also included in the SNS.

図２は、本実施形態に係るエージェントの実行画面の一例を示す図である。本実施形態に係るエージェントは、ユーザ装置１の一例であるスマートフォンやタブレットＰＣ（Personal Computer）、ラップトップ型のＰＣ、スマートウォッチといったモバイル装置や
、デスクトップ型のＰＣのような据置型のコンピュータ、キオスク端末やデジタルサイネージ等の街頭コンピュータ、いわゆるパーソナルロボットのような生活空間でサービスを提供するコンピュータ等において動作するアプリケーションプログラム（単に「アプリケーション」とも呼ぶ）である。いわゆるウィジェットのように、ユーザ装置１に常駐するものであってもよい。エージェントは、ユーザや、ＡＰＩ等のソフトウェアとの間で自律的に情報の入出力等を行う。具体的には、エージェントは、ユーザ装置１が備えるセンサによって測定される情報や、ユーザ装置１にインストールされたカメラ等のアプリケーションの使用状況、ユーザ装置１上又はいわゆるクラウド上で管理されるユーザのスケジュール等を取得してユーザの状態を認識し、所定のタイミングで情報の出力を行うようにしてもよい。また、ユーザ装置１は、マイクロフォン等の音声入力部及びスピーカ等の音声出力部を備え、エージェントは、既存の音声認識技術や音声合成技術を利用し、ユーザとの間で音声による情報の入出力を行うようにしてもよい。なお、図２に示すように、ユーザ装置１の表示装置にはキャラクタの画像が表示されるようにしてもよい。 FIG. 2 is a diagram illustrating an example of an agent execution screen according to the present embodiment. The agent according to the present embodiment is a mobile device such as a smartphone or a tablet PC (Personal Computer), a laptop PC, or a smart watch, which is an example of the user device 1, a stationary computer such as a desktop PC, or a kiosk. It is an application program (also simply referred to as “application”) that runs on a street computer such as a terminal or digital signage, a computer that provides services in a living space such as a so-called personal robot. It may be resident in the user device 1 like a so-called widget. The agent autonomously inputs and outputs information between the user and software such as an API. Specifically, the agent is information that is measured by a sensor included in the user device 1, a usage status of an application such as a camera installed in the user device 1, a user managed on the user device 1 or a so-called cloud. Information such as a schedule may be acquired to recognize a user's state and output information at a predetermined timing. In addition, the user device 1 includes a voice input unit such as a microphone and a voice output unit such as a speaker, and the agent inputs / outputs information by voice using the existing voice recognition technology and voice synthesis technology. May be performed. As shown in FIG. 2, a character image may be displayed on the display device of the user device 1.

＜機能構成＞
図３は、本実施形態に係る嗜好分析装置２の一例を示す機能ブロック図である。図３の嗜好分析装置２は、文書取得部２０１と、回答取得部２０２と、文章記憶部２０３と、パラメータ記憶部２０４と、関心度付与部２０５と、形態素解析部２０６と、単語記憶部２０７と、特徴語抽出部２０８と、特徴語記憶部２０９と、特定分野辞書２１０と、分野限定部２１１と、用語記憶部２１２と、代表語辞書２１３と、代表語置換部２１４と、代表語記憶部２１５と、スコア決定部２１６と、嗜好情報記憶部２１７とを有する。 <Functional configuration>
FIG. 3 is a functional block diagram illustrating an example of the preference analysis apparatus 2 according to the present embodiment. 3 includes a document acquisition unit 201, an answer acquisition unit 202, a text storage unit 203, a parameter storage unit 204, an interest level assignment unit 205, a morpheme analysis unit 206, and a word storage unit 207. A feature word extraction unit 208, a feature word storage unit 209, a specific field dictionary 210, a field limitation unit 211, a term storage unit 212, a representative word dictionary 213, a representative word replacement unit 214, and a representative word storage Unit 215, score determination unit 216, and preference information storage unit 217.

文書取得部２０１は、予め記憶されているＳＮＳ等におけるユーザのアカウント情報に基づいて、ユーザがＳＮＳに投稿等（投稿、お気に入り登録、共有（再投稿）といった操作を含む）を行った文章をＳＮＳ提供装置３から取得する。本実施形態では、ユーザがＳＮＳへ投稿等したエントリを文書と呼ぶものとする。また、回答取得部２０２は、ユーザ
装置１のエージェントを介してユーザへ質問を行うとともに、ユーザが入力した回答をユーザ装置１から受信する。なお、回答はユーザが自由にテキストデータを入力するようにしてもよいし、ユーザが音声で入力した文章をユーザ装置１において既存の音声認識によりテキストデータ化してもよい。また、文章記憶部２０３は、主記憶装置又は補助記憶装置等からなり、文書又は回答として取得された文章を保持する。例えば、図４又は図５に示すような情報が文章記憶部２０３に記憶される。 Based on the user account information stored in the SNS or the like stored in advance, the document acquisition unit 201 performs SNS on a sentence in which the user has posted to the SNS (including operations such as posting, favorite registration, and sharing (reposting)). Obtained from the providing device 3. In this embodiment, an entry posted by the user to the SNS is called a document. In addition, the answer acquisition unit 202 makes a question to the user via the agent of the user device 1 and receives the answer input by the user from the user device 1. Note that the user may freely input text data, or the text input by the user by voice may be converted into text data by the existing voice recognition in the user device 1. The text storage unit 203 includes a main storage device or an auxiliary storage device, and holds a text acquired as a document or an answer. For example, information as shown in FIG. 4 or 5 is stored in the text storage unit 203.

図４は、文章記憶部２０３に格納される文書テーブルの一例を示す図である。図４のテーブルは、ユーザＩＤ、日時、ＳＮＳ、操作、文章、添付、及び関心度の項目を含む。ユーザＩＤのフィールドには、ユーザを一意に特定する識別情報が登録される。なお、ユーザＩＤと関連付けて、当該ユーザが各ＳＮＳにおいて使用している識別情報（図示せず）も保持しているものとする。日時のフィールドには、投稿等の操作がなされた日時を示す情報が登録される。ＳＮＳのフィールドには、ユーザが投稿等の操作を行ったＳＮＳを一意に特定する識別情報が登録される。操作のフィールドには、投稿、お気に入り登録、共有（再投稿）等のようなユーザがＳＮＳに対して行った操作を示す情報が登録される。文章のフィールドには、ユーザが投稿等を行った文章が登録される。添付のフィールドには、当該投稿に添付ファイルが付加されているか否かを示す情報が登録される。関心度のフィールドには、後述する処理によって算出される関心度の値が登録される。 FIG. 4 is a diagram illustrating an example of a document table stored in the text storage unit 203. The table of FIG. 4 includes items of user ID, date and time, SNS, operation, text, attachment, and interest level. Identification information for uniquely identifying the user is registered in the user ID field. It is assumed that identification information (not shown) used by each user in each SNS is also stored in association with the user ID. Information indicating the date and time when an operation such as posting is performed is registered in the date and time field. In the SNS field, identification information for uniquely identifying the SNS for which the user has performed an operation such as posting is registered. Information indicating operations performed by the user on the SNS such as posting, favorite registration, sharing (reposting), and the like is registered in the operation field. In the text field, text posted by the user is registered. Information indicating whether or not an attached file is added to the post is registered in the attached field. In the interest level field, an interest level value calculated by a process described later is registered.

図５は、文章記憶部２０３に格納される回答テーブルの一例を示す図である。図５のテーブルは、質問ＩＤ、質問、回答区分、ユーザＩＤ、回答、及び関心度の項目を含む。質問ＩＤのフィールドには、質問を管理するための識別情報が登録される。また、質問のフィールドには、エージェントからユーザに質問する内容が登録されている。質問は、ユーザの嗜好を学習するために用いられる。また、回答区分には、回答として期待される文言に関する情報（単語や品詞等）が予め登録されている。本実施形態における質問は、主として、はい又はいいえ（肯定又は否定）で回答できる質問、２以上の選択肢のいずれかで回答できる質問、名詞又は形容詞等の単語で回答できるようないわゆるファクトイド型の質問等に分類できる。図５の回答区分にはこのような分類を示す情報が登録され、分類を示す情報はユーザから入力される回答の内容を認識する際に利用される。ユーザＩＤのフィールドには、ユーザを一意に特定するための識別情報が登録される。回答のフィールドには、ユーザからの回答が登録される。関心度のフィールドには、後述する処理によって設定される関心度の値が登録される。 FIG. 5 is a diagram illustrating an example of an answer table stored in the text storage unit 203. The table of FIG. 5 includes items of question ID, question, answer category, user ID, answer, and interest level. Identification information for managing the question is registered in the question ID field. Also, in the question field, contents for questions from the agent to the user are registered. Questions are used to learn user preferences. In addition, information (words, parts of speech, etc.) related to words expected as answers is registered in advance in the answer classification. The questions in this embodiment are mainly questions that can be answered with yes or no (affirmation or denial), questions that can be answered with any of two or more options, so-called factoid questions that can be answered with words such as nouns or adjectives. And so on. Information indicating such a classification is registered in the answer classification of FIG. 5, and the information indicating the classification is used when recognizing the contents of the answer input from the user. In the user ID field, identification information for uniquely identifying the user is registered. The answer from the user is registered in the answer field. In the interest level field, an interest level value set by processing to be described later is registered.

パラメータ記憶部２０４は、主記憶装置又は補助記憶装置等からなり、ユーザごと且つＳＮＳの種類ごとに関心度の算出に用いるパラメータ（第１のパラメータとも呼ぶ）を予め記憶しているものとする。換言すれば、各ユーザが、複数のＳＮＳのそれぞれを重要視する度合いを例えば数値化して保持している。パラメータの値は、例えば、ユーザがＳＮＳの投稿等を行う頻度や、ＳＮＳを閲覧する頻度等に基づいて定めることができる。また、パラメータ記憶部２０４は、投稿、お気に入り登録、共有（再投稿）といった操作ごとに、関心度の算出に用いるパラメータ（第２のパラメータとも呼ぶ）を記憶すると共に、文章がＵＲＩ（Uniform Resource Identifier）を含むか否か、添付ファイル（画像ファ
イル等）を含むか否かといった付加情報に対応付けてパラメータ（第３のパラメータとも呼ぶ）を記憶しているものとする。そして、関心度付与部２０５は、文書取得部２０１が取得した文章について、第１のパラメータ、第２のパラメータ及び第３のパラメータ、並びに所定の数式を用いて関心度を算出し、文章記憶部２０３に記憶されている文章に対応づけて関心度を登録する。一方、回答取得部２０２が取得した文章については、関心度として所定の値が付与される。なお、回答を得るために行った質問ごとに異なる値を付与してもよい。 The parameter storage unit 204 is composed of a main storage device, an auxiliary storage device, or the like, and stores in advance a parameter (also referred to as a first parameter) used for calculating the degree of interest for each user and for each SNS type. In other words, each user holds the degree of importance of each of the plurality of SNSs, for example, in numerical form. The value of the parameter can be determined based on, for example, the frequency with which the user posts SNS, the frequency with which the SNS is browsed, or the like. The parameter storage unit 204 stores a parameter (also referred to as a second parameter) used for calculating the degree of interest for each operation such as posting, favorite registration, and sharing (reposting), and the sentence is a URI (Uniform Resource Identifier). ) And whether or not an attached file (image file or the like) is included, and a parameter (also referred to as a third parameter) is stored. Then, the interest level assigning unit 205 calculates the interest level of the text acquired by the document acquisition unit 201 using the first parameter, the second parameter, the third parameter, and a predetermined mathematical formula, and the text storage unit The degree of interest is registered in association with the text stored in 203. On the other hand, for the sentence acquired by the answer acquisition unit 202, a predetermined value is assigned as the degree of interest. In addition, you may give a different value for every question made in order to obtain an answer.

具体的には、ＳＮＳへの投稿等については、例えば下記のような数式（１）により関心
度を求めるようにしてもよい。
関心度＝ａ₀＋ａ₁×ｘ₁＋ａ₂×ｘ₂＋ａ₃×ｘ₃ ・・・（１）
なお、ａ₀は、所定の係数である。また、ａ₁〜ａ₃は、上述した第１のパラメータ〜第３
のパラメータとする。また、ｘ₁は、ＳＮＳの種類によって重みづけするための重みパラ
メータである。例えば、各ユーザにとって重要度の高いと評価されたＳＮＳほど上述の関心度が高くなるようなパラメータが予め設定されるものとする。また、ｘ₂は、ユーザに
よるＳＮＳへの投稿、閲覧、お気に入り登録、共有といった操作（アクション）の回数又はこれに基づく値である。例えば、操作の種類ごとに操作の回数を示す値を含むベクトルで表される。ｘ₃は、ＵＲＩや添付ファイルといった付加情報の数又はこれに基づく値で
ある。例えば、付加情報の種類ごとに、付加情報を含む投稿等がなされた数を示す値を含むベクトルで表される。 Specifically, for posting to the SNS, the degree of interest may be obtained by, for example, the following formula (1).
Interest level = a ₀ + a ₁ × x ₁ + a ₂ × x ₂ + a ₃ × x ₃ (1)
A ₀ is a predetermined coefficient. Further, a _{1 to} a ₃ are the first parameter to the third described above.
Parameter. X ₁ is a weight parameter for weighting according to the type of SNS. For example, it is assumed that a parameter is set in advance such that the above-described degree of interest becomes higher for an SNS that is evaluated to be more important for each user. Further, x ₂ is post to SNS user, browsing, bookmark, a number or a value based thereon operations such as shared (action). For example, it is represented by a vector including a value indicating the number of operations for each type of operation. x ₃ is the number of additional information such as a URI or an attached file, or a value based on this. For example, each type of additional information is represented by a vector including a value indicating the number of postings including additional information.

ｘ₁は、ＳＮＳの種別に応じたパラメータであり、ユーザごとに例えばＳＮＳの利用状
況に応じて重みづけされた値が定められているものとする。図６は、パラメータ記憶部２０４に記憶され、第１のパラメータ（ｘ₁）を保持する第１パラメータテーブルの一例を
示す図である。図６のテーブルは、ユーザＩＤ、ＳＮＳ、及びパラメータの項目を含む。ユーザＩＤのフィールドには、ユーザを一意に特定するための識別情報が登録される。ＳＮＳのフィールドには、ＳＮＳを一意に特定するための識別情報が登録される。パラメータ１のフィールドには、ユーザごと且つＳＮＳごとに予め重みづけされる重みパラメータが登録される。そして、ユーザ毎に、複数のＳＮＳに対する重みパラメータを所定の順序で含むベクトルを生成し、第１のパラメータ（ｘ₁）として用いるものとする。 x ₁ is a parameter corresponding to the type of SNS, and a value weighted according to, for example, the SNS usage status is defined for each user. FIG. 6 is a diagram illustrating an example of a first parameter table stored in the parameter storage unit 204 and holding the _first parameter (x ₁ ). The table in FIG. 6 includes items of user ID, SNS, and parameters. In the user ID field, identification information for uniquely identifying the user is registered. In the SNS field, identification information for uniquely specifying the SNS is registered. In the parameter 1 field, a weight parameter that is weighted in advance for each user and for each SNS is registered. For each user, a vector including weight parameters for a plurality of SNSs in a predetermined order is generated and used as the first parameter (x ₁ ).

ｘ₂は、ユーザがＳＮＳにおいて行う操作に応じて重みづけされたパラメータである。
図７は、パラメータ記憶部２０４に記憶され、第２のパラメータ（ｘ₂）を保持する第２
パラメータテーブルの一例を示す図である。図７のテーブルは、ユーザＩＤ、ＳＮＳ、操作及びパラメータ２の項目を含む。ユーザＩＤのフィールドには、ユーザを一意に特定するための識別情報が登録される。ＳＮＳのフィールドには、ＳＮＳを一意に特定するための識別情報が登録される。操作のフィールドには、投稿、お気に入り、共有、リンク先参照、表示、引用等のようなユーザがＳＮＳにおいて行う操作が登録される。また、パラメータ２のフィールドには、操作ごとに重みづけされた第２のパラメータが予め登録される。 x ₂ is the user is weighted in accordance with the operation performed in the SNS parameters.
FIG. 7 shows the second stored in the parameter storage unit 204 and holds the _second parameter (x ₂ ).
It is a figure which shows an example of a parameter table. The table of FIG. 7 includes items of user ID, SNS, operation, and parameter 2. In the user ID field, identification information for uniquely identifying the user is registered. In the SNS field, identification information for uniquely specifying the SNS is registered. In the operation field, operations performed by the user in the SNS, such as posting, favorite, sharing, link destination reference, display, and quotation, are registered. In the parameter 2 field, a second parameter weighted for each operation is registered in advance.

ｘ₃は、ユーザの操作に付随する要素に応じて重みづけされたパラメータである。図８
は、パラメータ記憶部２０４に記憶され、第３のパラメータ（ｘ₃）を保持する第３パラ
メータテーブルの一例を示す図である。図８のテーブルは、ユーザＩＤ、ＳＮＳ、付加情報及びパラメータ３の項目を含む。ユーザＩＤのフィールドには、ユーザを一意に特定するための識別情報が登録される。ＳＮＳのフィールドには、ＳＮＳを一意に特定するための識別情報が登録される。付加情報のフィールドには、ＵＲＩを含む場合、又は添付ファイルがある場合のような、操作に付随する条件が登録される。また、パラメータ３のフィールドには、付加情報ごとに重みづけされた第３のパラメータが予め登録される。 x ₃ is a parameter weighted according to an element accompanying the user's operation. FIG.
These are figures which show an example of the 3rd parameter table memorize | stored in the parameter memory | storage part 204 and hold | maintains the _3rd parameter (x3). The table of FIG. 8 includes items of user ID, SNS, additional information, and parameter 3. In the user ID field, identification information for uniquely identifying the user is registered. In the SNS field, identification information for uniquely specifying the SNS is registered. In the additional information field, a condition associated with the operation, such as a URI or an attached file, is registered. In the parameter 3 field, a third parameter weighted for each additional information is registered in advance.

本実施形態では、所定のＳＮＳにおいてユーザが行った投稿等に基づき、文章に含まれる所定の単語に対するユーザの関心の度合いを表す値として、上記のような関心度が算出される。 In the present embodiment, the degree of interest as described above is calculated as a value representing the degree of interest of the user with respect to a predetermined word included in the sentence based on a post made by the user in a predetermined SNS.

図３の形態素解析部２０６は、文章記憶部２０３に記憶されている、ユーザに関連付けられた文章を読み出して形態素解析を行い、単語に分解する。なお、形態素解析部２０６は、図示していない形態素解析用の辞書や、例えば不特定多数のユーザが編集可能なオンライン辞書サービスの見出し語のリストを用いて一般名詞や固有名詞を抽出すると共に、辞書に登録されていない固有名詞や日時表現のような固有表現も抽出する。なお、後述す
る通り、文脈を解析して辞書に登録されていない単語（「未知語」とも呼ぶ）も固有表現として抽出することができる。単語記憶部２０７は、主記憶装置又は補助記憶装置等からなり、形態素解析部２０６が分解した単語を保持する。このとき、分解前の文章に対して付与された関心度を、分解後の各単語に対応付けて記憶させておく。 The morpheme analysis unit 206 in FIG. 3 reads the sentence associated with the user stored in the sentence storage unit 203, performs morpheme analysis, and breaks it down into words. The morpheme analysis unit 206 extracts general nouns and proper nouns using a dictionary for morpheme analysis (not shown) or a list of entry words of an online dictionary service that can be edited by an unspecified number of users. It also extracts proper nouns and date expressions that are not registered in the dictionary. As will be described later, a word that is not registered in the dictionary (also referred to as “unknown word”) can be extracted as a specific expression by analyzing the context. The word storage unit 207 includes a main storage device or an auxiliary storage device, and holds the words decomposed by the morpheme analysis unit 206. At this time, the degree of interest given to the sentence before decomposition is stored in association with each word after decomposition.

図９は、単語記憶部２０７に記憶される単語テーブルの一例を示す図である。図９のテーブルは、図４とほぼ同様であるが、文章の項目の代わりに単語の項目を含む。また、単語のフィールドには、形態素解析によって分解された、上述の文章に含まれる単語が登録される。 FIG. 9 is a diagram illustrating an example of a word table stored in the word storage unit 207. The table of FIG. 9 is substantially the same as FIG. 4, but includes word items instead of text items. In the word field, words included in the above-described sentence, which are decomposed by morphological analysis, are registered.

図３の特徴語抽出部２０８は、いわゆる係り受け解析器を用いて固有表現を特定するとともに文章内で単語の重複を排除し、特定された固有表現を特徴語として抽出する。なお、係り受け解析器は、文章に含まれる単語及びその単語の品詞に基づいて（すなわち、文脈に基づいて）、固有表現であるか否か判断することができる。同時に、係り受け解析器は、固有表現であると判断された単語が表す内容を所定の項目に分類することができる。例えば、抽出された固有表現を、人物や場所、行動等といった分類項目に分けることができる。このような係り受け解析器は、既存の様々な方式を採用することができる。また、特徴語記憶部２０９は、主記憶装置又は補助記憶装置等からなり、抽出された特徴語を、上述した関心度と対応付けて記憶する。 The feature word extraction unit 208 shown in FIG. 3 specifies a specific expression using a so-called dependency analyzer, eliminates duplication of words in the sentence, and extracts the specified specific expression as a feature word. Note that the dependency analyzer can determine whether or not it is a unique expression based on a word included in a sentence and a part of speech of the word (that is, based on a context). At the same time, the dependency analyzer can classify the content represented by the word determined to be a specific expression into predetermined items. For example, the extracted unique expressions can be divided into classification items such as a person, a place, and an action. Such a dependency analyzer can employ various existing methods. The feature word storage unit 209 includes a main storage device or an auxiliary storage device, and stores the extracted feature words in association with the above-described degree of interest.

図１０は、特徴語記憶部２０９に記憶される特徴語テーブルの一例を示す図である。図１０のテーブルは、特徴語、分類、及び関心度の項目を含む。特徴語のフィールドには、抽出された特徴語が登録される。また、分類のフィールドには、人物や場所、行動等、係り受け解析器が分類した、特徴語が表す内容の分類項目が登録される。 FIG. 10 is a diagram illustrating an example of a feature word table stored in the feature word storage unit 209. The table of FIG. 10 includes items of feature words, classification, and interest level. In the feature word field, the extracted feature words are registered. In the classification field, the classification items of the content represented by the feature word, such as a person, a place, and an action, classified by the dependency analyzer are registered.

特定分野辞書２１０は、例えば、不特定多数のユーザが編集可能なオンライン辞書サービスの特定の分野に分類された単語のリストである。なお、特定分野辞書２１０は嗜好分析装置２でなくネットワーク４を介して接続された他の装置が有していてもよい。本実施形態では、例えば対象のトピックである観光やこれに関連する分野にカテゴライズされた用語の辞書（図示せず）を用意しておくものとする。分野限定部２１１は、特定分野辞書２１０を用いて、特徴語記憶部２０９に記憶された特徴語から、所定の分野に関連する用語を抽出する。また、用語記憶部２１２は、主記憶装置又は補助記憶装置等からなり、分野限定部２１１によって抽出された用語を記憶する。例えば、図１０に示した特徴語のリストから、特定分野辞書２１０に登録されていない単語が削除され、図１１に示すような観光分野に関連する用語のリストが生成されて用語記憶部２１２へ格納される。 The specific field dictionary 210 is, for example, a list of words classified into specific fields of an online dictionary service that can be edited by an unspecified number of users. Note that the specific field dictionary 210 may be included in another device connected via the network 4 instead of the preference analysis device 2. In the present embodiment, it is assumed that a dictionary (not shown) of terms categorized in, for example, tourism as a target topic and a field related thereto is prepared. The field limiting unit 211 uses the specific field dictionary 210 to extract terms related to a predetermined field from the feature words stored in the feature word storage unit 209. The term storage unit 212 includes a main storage device or an auxiliary storage device, and stores the terms extracted by the field limiting unit 211. For example, words not registered in the specific field dictionary 210 are deleted from the feature word list shown in FIG. 10, and a list of terms related to the tourism field as shown in FIG. Stored.

代表語辞書２１３は、主記憶装置又は補助記憶装置等からなり、特徴語と当該特徴語と似た意味を持つ代表語との対応関係を予め保持しているものとする。代表語は、予め所定のタイミングで特徴語と対応付けて記憶される。 The representative word dictionary 213 includes a main storage device, an auxiliary storage device, or the like, and holds in advance a correspondence relationship between a feature word and a representative word having a similar meaning to the feature word. The representative word is stored in advance in association with the feature word at a predetermined timing.

具体的には、所定の分野の文章においてユーザが用いた特徴語をサンプリングし、共起関係を表す空間ベクトル（「意味ベクトル」、「分散表現」とも呼ぶ）を生成する。例えば、サンプルとして、対象となる観光分野について言及した文章を、記述したユーザを限定せずに収集し、コーパスを生成する。また、コーパスに基づいて、例えばＷｏｒｄ２Ｖｅｃのような技術を利用し、単語の意味的な特徴が反映された意味ベクトルを生成する。本実施形態では、上述した用語を含む図１２に示すような意味ベクトルが予め生成されているものとする。図１２のテーブルは、単語及び意味ベクトルの項目を含み、コーパスに含まれる文章を分解して得られた単語について、意味ベクトルが生成及び記憶される。そして、意味ベクトルをクラスタリングし、似た意味を持つと推定される特徴語を集約する。クラスタリングは、例えばＫ−ｍｅａｎｓ法のような既存の技術を利用して行うことが
できる。そして、例えば、同一のクラスタに含まれる特徴語の意味ベクトルが、当該クラスタの重心に最も近い単語を代表語に決定し、代表語と特徴語とを対応付けたリストを、代表語辞書２１３に保持する。本実施形態では、図１３に示すようなリストが登録されているものとする。図１３の例では、単語「キャンプ」が代表語「アウトドア」に対応付けられている。 Specifically, the feature words used by the user in a sentence in a predetermined field are sampled to generate a space vector (also referred to as “semantic vector” or “distributed expression”) representing the co-occurrence relationship. For example, as a sample, sentences referring to a target tourism field are collected without limiting the described users, and a corpus is generated. Further, based on the corpus, a semantic vector reflecting the semantic features of the word is generated using a technique such as Word2Vec. In the present embodiment, it is assumed that a semantic vector as shown in FIG. 12 including the above-described terms is generated in advance. The table of FIG. 12 includes items of words and semantic vectors, and semantic vectors are generated and stored for words obtained by decomposing sentences included in the corpus. Then, the semantic vectors are clustered to collect the feature words estimated to have similar meanings. Clustering can be performed using an existing technique such as the K-means method. Then, for example, the word closest to the centroid of the cluster whose feature vectors included in the same cluster are determined as a representative word, and a list in which the representative words and the feature words are associated is stored in the representative word dictionary 213. Hold. In the present embodiment, it is assumed that a list as shown in FIG. 13 is registered. In the example of FIG. 13, the word “camp” is associated with the representative word “outdoor”.

また、代表語置換部２１４は、代表語辞書２１３に保持されている情報を用いて特徴語記憶部２０９に記憶された特徴語を代表語に置き換える。上述したような代表語に置き換えることにより、ユーザが文章中で使用した単語（特徴語）が異なる場合であっても、似た意味の特徴語を代表語に置き換えることができ、用字又は用語の不統一を吸収できるようになる。代表語記憶部２１５は、主記憶装置又は補助記憶装置等からなり、代表語置換部２１４が置き換えた代表語を、上述した関心度と対応付けて記憶する。 Further, the representative word replacement unit 214 replaces the feature words stored in the feature word storage unit 209 with the representative words using information held in the representative word dictionary 213. By replacing with a representative word as described above, even if the word (feature word) used by the user in the sentence is different, the characteristic word with a similar meaning can be replaced with the representative word. It becomes possible to absorb the inconsistency. The representative word storage unit 215 includes a main storage device or an auxiliary storage device, and stores the representative words replaced by the representative word replacement unit 214 in association with the above-described degree of interest.

スコア決定部２１６は、代表語記憶部２１５が記憶している代表語と関心度とに基づいて、ユーザの嗜好情報を生成し、代表語と対応付けて嗜好情報記憶部２１７に格納する。嗜好情報は、代表語と関心度に基づいて生成されたスコアとの組み合わせの集合で表される。スコアは、例えば複数の特徴語が同一の代表語に置換された場合、代表語の重複を除き、置換前の特徴語に対応付けられていた関心度のうち最も値が高いものを置換後の代表語のスコアとする。本実施形態では、図１４に示すような代表語とスコアとの対応付けが格納される。ここでは、図１１の用語「キャンプ」が代表語「アウトドア」に置換され、重複する「アウトドア」のうち関心度の高い（本実施形態では関心度の値は同一）レコードに絞られている。 The score determination unit 216 generates user preference information based on the representative word stored in the representative word storage unit 215 and the degree of interest, and stores it in the preference information storage unit 217 in association with the representative word. The preference information is represented by a set of combinations of representative words and scores generated based on the degree of interest. For example, when a plurality of feature words are replaced with the same representative word, the score is obtained by substituting the highest interest value associated with the feature word before replacement after replacing the representative word. The representative word score. In the present embodiment, the correspondence between representative words and scores as shown in FIG. 14 is stored. Here, the term “camp” in FIG. 11 is replaced with the representative word “outdoor”, and the records of high interest (with the same interest value in this embodiment) are narrowed down among the overlapping “outdoors”.

このようにして生成された代表語とスコアとの組み合わせの集合（すなわち、嗜好情報）を用いれば、ユーザに対し嗜好情報に応じた情報の提供を行うことができるようになる。例えば、いわゆるコンテンツベースフィルタリングによって嗜好情報が所定の傾向を示すユーザを抽出し、何らかの情報を提供するようにしてもよい。また、例えばいわゆる協調フィルタリングによって嗜好情報の傾向が似たユーザに対し、他のユーザが興味を示した情報を提供するようにしてもよい。 If a set of combinations of representative words and scores generated in this way (that is, preference information) is used, information corresponding to the preference information can be provided to the user. For example, a user whose preference information shows a predetermined tendency may be extracted by so-called content-based filtering to provide some information. Further, for example, information on which other users have shown interest may be provided to users whose preference information has a similar tendency by so-called collaborative filtering.

＜装置構成＞
なお、ユーザ装置１、嗜好分析装置２、ＳＮＳ提供装置３は、図１５に示すようなコンピュータである。図１５は、コンピュータの一例を示す装置構成図である。例えば、コンピュータは、ＣＰＵ（Central Processing Unit）１００１、主記憶装置１００２、補助
記憶装置１００３、通信ＩＦ（Interface）１００４、入出力ＩＦ（Interface）１００５、ドライブ装置１００６、通信バス１００７を備えている。ＣＰＵ１００１は、プログラムを実行することにより本実施の形態で説明する処理を行う。主記憶装置１００２は、ＣＰＵ１００１が読み出したプログラムやデータをキャッシュしたり、ＣＰＵの作業領域を展開したりする。主記憶装置は、具体的には、ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）等である。補助記憶装置１００３は、ＣＰＵ１００１により実行されるプログラムや、本実施の形態で用いる設定情報などを記憶する。補助記憶装置１００３は、具体的には、ＨＤＤ（Hard-disk Drive）やＳＳＤ（Solid State Drive）、フラッシュメモリ等である。主記憶装置１００２や補助記憶装置１００３は、嗜好情報記憶部１０３、観光情報記憶部１０７、予定情報記憶部１０８等として働く。通信ＩＦ１００４は、他のコンピュータ装置との間でデータを送受信する。通信ＩＦ１００４は、具体的には、有線又は無線のネットワークカード等である。入出力ＩＦ１００５は、入出力装置と接続され、ユーザから入力を受け付けたり、ユーザへ情報を出力したりする。入出力装置は、具体的には、カメラ等の撮像装置、キーボード、マウス、ディスプレイ、タッチパネル、又はＧＰＳ受信機や磁気センサ、加速度センサ等のセンサ等である。ドライブ装置１００６は、フレキシブルディスク、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Di
sc）、ＢＤ（Blu-ray(登録商標) Disc）等の記憶媒体に記録されたデータを読み出したり、記憶媒体にデータを書き込んだりする。以上のような構成要素が、通信バス１００７で接続されている。なお、これらの構成要素は複数設けられていてもよいし、一部の構成要素（例えば、ドライブ装置１００６）を設けないようにしてもよい。また、入出力装置がコンピュータと一体に構成されていてもよい。そして、ドライブ装置で読み取り可能な可搬性の記憶媒体や、ＵＳＢメモリのような補助記憶装置、ネットワークＩＦなどを介して、本実施の形態で実行されるプログラムが提供されるようにしてもよい。そして、ＣＰＵ１００１がプログラムを実行することにより、上記のようなコンピュータを嗜好分析装置２として働かせる。なお、上記構成の一部を担う複数の装置によって上記機能が提供される態様であってもよい。 <Device configuration>
Note that the user device 1, the preference analysis device 2, and the SNS providing device 3 are computers as shown in FIG. FIG. 15 is an apparatus configuration diagram illustrating an example of a computer. For example, the computer includes a CPU (Central Processing Unit) 1001, a main storage device 1002, an auxiliary storage device 1003, a communication IF (Interface) 1004, an input / output IF (Interface) 1005, a drive device 1006, and a communication bus 1007. The CPU 1001 performs processing described in this embodiment by executing a program. The main storage device 1002 caches programs and data read by the CPU 1001 and develops a work area of the CPU. Specifically, the main storage device is a RAM (Random Access Memory), a ROM (Read Only Memory), or the like. The auxiliary storage device 1003 stores programs executed by the CPU 1001, setting information used in the present embodiment, and the like. Specifically, the auxiliary storage device 1003 is an HDD (Hard-disk Drive), an SSD (Solid State Drive), a flash memory, or the like. The main storage device 1002 and the auxiliary storage device 1003 function as the preference information storage unit 103, the tourism information storage unit 107, the schedule information storage unit 108, and the like. The communication IF 1004 transmits / receives data to / from other computer devices. The communication IF 1004 is specifically a wired or wireless network card or the like. The input / output IF 1005 is connected to the input / output device and accepts input from the user or outputs information to the user. Specifically, the input / output device is an imaging device such as a camera, a keyboard, a mouse, a display, a touch panel, or a sensor such as a GPS receiver, a magnetic sensor, or an acceleration sensor. The drive device 1006 includes a flexible disk, a CD (Compact Disc), and a DVD (Digital Versatile Di).
sc), BD (Blu-ray (registered trademark) Disc), etc., data recorded on a storage medium is read or data is written to the storage medium. The above components are connected by a communication bus 1007. A plurality of these components may be provided, or some of the components (for example, the drive device 1006) may not be provided. Further, the input / output device may be integrated with the computer. The program executed in this embodiment may be provided via a portable storage medium that can be read by the drive device, an auxiliary storage device such as a USB memory, a network IF, or the like. Then, when the CPU 1001 executes the program, the computer as described above works as the preference analysis device 2. In addition, the aspect by which the said function is provided by the some apparatus which bears a part of said structure may be sufficient.

＜文書分析処理＞
図１６は文書分析処理の一例を示す処理フロー図である。嗜好分析装置２の文書取得部２０１は、ユーザがＳＮＳに対して行った操作を監視し、ユーザがＳＮＳにおいて使用するアカウントによる投稿等が行われた場合に投稿等された文章を取得して文章記憶部２０３に格納しておくものとする。そして、嗜好分析装置２の関心度付与部２０５は、文章が投稿等されたＳＮＳの種別や、ユーザがＳＮＳに対して行った操作、操作に付随する所定の要素の有無等に基づいて関心度を算出し、文章に対応付けて記憶させる（図１６：Ｓ１）。本ステップでは、例えば上述した数式（１）により関心度を求める。そして、図４に示すようなデータに関心度の値が登録される。また、形態素解析部２０６は、文章記憶部２０３に記憶されている文章に対していわゆる形態素解析を行い、文章を構成する単語に分解する（Ｓ２）。本ステップでは、図９に示すようなデータが生成される。また、特徴語抽出部２０８は単語から一般名詞や固有名詞である特徴語を抽出すると共に、分野限定部２１１は、本実施形態において対象とする分野の用語を予め保持している特定分野辞書を用いて、所定の分野における特徴語を抽出し、用語記憶部２１２に格納する（Ｓ３）。分野限定部２１１の説明で述べたように、本ステップでは、例えば観光分野に関する辞書に登録されている用語のみに絞り込む。そして、図１１に示すような、用語と関心度との組み合わせが生成される。なお、特徴語抽出部２０８の処理を省略し、分野限定部２１１が単語記憶部２０７の単語から特定分野の用語に絞り込むようにしてもよい。また、代表語置換部２１４は、予め代表語辞書２１３に記憶されている代表語と用語との組み合わせに基づき、用語記憶部２１２に格納されている用語を代表語に置き換える（Ｓ４）。そして、スコア決定部２１６は、代表語と関心度との組み合わせに基づいて代表語とスコアとの組み合わせの集合で表されるユーザの嗜好情報を生成し、嗜好情報記憶部２１７に格納する（Ｓ５）。嗜好情報は、換言すればユーザの興味の対象をリスト化すると共に、興味の程度を示すスコアを設定したデータである。 <Document analysis processing>
FIG. 16 is a processing flowchart showing an example of document analysis processing. The document acquisition unit 201 of the preference analysis device 2 monitors an operation performed by the user on the SNS, acquires a posted sentence when a posting is performed by an account used by the user in the SNS, and the sentence is acquired. Assume that the data is stored in the storage unit 203. Then, the interest level assigning unit 205 of the preference analysis device 2 is based on the type of SNS in which the text is posted, the operation performed by the user on the SNS, the presence of a predetermined element associated with the operation, and the like. Is calculated and stored in association with the text (FIG. 16: S1). In this step, for example, the degree of interest is obtained by the above-described equation (1). Then, the interest level value is registered in the data as shown in FIG. The morpheme analysis unit 206 performs so-called morpheme analysis on the sentence stored in the sentence storage unit 203, and decomposes it into words constituting the sentence (S2). In this step, data as shown in FIG. 9 is generated. In addition, the feature word extraction unit 208 extracts feature words that are general nouns and proper nouns from words, and the field limitation unit 211 stores a specific field dictionary that holds in advance the terms of the target field in the present embodiment. The feature words in the predetermined field are extracted and stored in the term storage unit 212 (S3). As described in the explanation of the field limiting unit 211, in this step, for example, only terms that are registered in a dictionary related to the tourism field are narrowed down. Then, a combination of a term and an interest level as shown in FIG. 11 is generated. Note that the process of the feature word extraction unit 208 may be omitted, and the field limitation unit 211 may narrow down the words in the word storage unit 207 to terms in a specific field. Further, the representative word replacement unit 214 replaces the term stored in the term storage unit 212 with the representative word based on the combination of the representative word and the term stored in advance in the representative word dictionary 213 (S4). And the score determination part 216 produces | generates the user preference information represented by the set of the combination of a representative word and a score based on the combination of a representative word and an interest degree, and stores it in preference information storage part 217 (S5). ). In other words, the preference information is data in which a target of the user's interest is listed and a score indicating the degree of interest is set.

Ｓ４において代表語へ置き換えることにより、ユーザが自由に記載した文章の用字又は用語の不統一を吸収し、統一的に利用できる代表語を用いた嗜好情報を生成できるようになる。また、代表語辞書２１３について説明したように、例えばクラスタリングされた意味ベクトルの重心に最も近い単語を当該クラスタに含まれる用語の代表語とすることにより、用語の意味的に適切な置き換えを行うことができるようになっている。したがって、このような処理によれば、複数のユーザが作成した文章から統一的に嗜好を表すデータを抽出できるようになる。 By replacing it with a representative word in S4, it becomes possible to absorb the inconsistency of the script characters or terms freely described by the user and generate preference information using representative words that can be used uniformly. In addition, as described for the representative word dictionary 213, for example, the word closest to the centroid of the clustered semantic vector is used as the representative word of the term included in the cluster, so that the term is replaced appropriately in terms of meaning. Can be done. Therefore, according to such a process, it becomes possible to extract data representing preferences uniformly from sentences created by a plurality of users.

＜回答分析処理＞
図１７は、回答分析処理の一例を示す処理フロー図である。嗜好分析装置２の回答取得部２０２は、ネットワーク４を介してユーザ装置１にユーザへの質問を出力させる（図１７：Ｓ１１）。本ステップでは、例えば図５のＱ１に示す質問「今までに行ったことのある好きな観光地はどこですか？」が出力される。そして、回答取得部２０２は、ネットワーク４を介してユーザ装置１からユーザの回答を取得し、文章記憶部２０３に記憶させる
（Ｓ１２）。本ステップでは、ユーザが「○×遊園地です」と回答したものとする。なお、質問や回答はテキストや音声で入出力を行うようにしてもよいし、選択肢の提示及び選択によって入出力を行うようにしてもよい。例えば、図２に示したようなエージェントを介して情報が収集される。また、形態素解析部２０６は、文章記憶部２０３に記憶されている文章に対していわゆる形態素解析を行い、文章を構成する単語に分解する（Ｓ１３）。本ステップは、文書分析処理と同様であり、例えば上述した回答が「○×遊園地」及び「です」に分解される。また、特徴語抽出部２０８は単語から一般名詞や固有名詞である特徴語を抽出すると共に、分野限定部２１１は、本実施形態において対象とする分野の用語を予め保持している特定分野辞書を用いて、所定の分野における特徴語を抽出し、用語記憶部２１２に格納する（Ｓ１４）。分野限定部２１１の説明で述べたように、本ステップでは、例えば観光分野に関する辞書に登録されている用語のみに絞り込む。本ステップでは、例えば、Ｓ１３で生成された単語のうち、「○×遊園地」が抽出される。また、回答分析処理では抽出された単語に対して所定の関心度が対応付けられる。なお、質問ごとに対応付ける関心度の値を設定しておくようにしてもよい。そして、用語と関心度との組み合わせが用語記憶部２１２に記憶される。なお、回答分析処理においても、特徴語抽出部２０８の処理を省略し、分野限定部２１１が単語記憶部２０７の単語から特定分野の用語に絞り込むようにしてもよい。また、代表語置換部２１４は、予め代表語辞書２１３に記憶されている代表語と用語との組み合わせに基づき、用語記憶部２１２に格納されている用語を代表語に置き換える（Ｓ１５）。本実施形態では、例えば用語「○×遊園地」に対応付けて代表語「テーマパーク」が代表語辞書２１３に登録されており、置換されるものとする。そして、スコア決定部２１６は、代表語と関心度との組み合わせに基づいて代表語とスコアとの組み合わせの集合で表されるユーザの嗜好情報を生成し、嗜好情報記憶部２１７に格納する（Ｓ１６）。なお、Ｓ１４において関心度を設定せず、Ｓ１６においてはじめて所定のスコアを設定するようにしてもよい。回答分析処理においても、嗜好情報は、換言すればユーザの興味の対象をリスト化すると共に、興味の程度を示すスコアを設定したデータである。 <Response analysis process>
FIG. 17 is a process flow diagram illustrating an example of an answer analysis process. The answer acquisition unit 202 of the preference analysis device 2 causes the user device 1 to output a question to the user via the network 4 (FIG. 17: S11). In this step, for example, the question “Where is your favorite sightseeing spot you have been to?” Shown in Q1 of FIG. 5 is output. And the reply acquisition part 202 acquires a user's reply from the user apparatus 1 via the network 4, and memorize | stores it in the text memory | storage part 203 (S12). In this step, it is assumed that the user replied “Oh amusement park”. Questions and answers may be input / output by text or voice, or may be input / output by presenting and selecting options. For example, information is collected via an agent as shown in FIG. Further, the morpheme analysis unit 206 performs so-called morpheme analysis on the sentence stored in the sentence storage unit 203, and decomposes it into words constituting the sentence (S13). This step is the same as the document analysis process. For example, the above-described answer is decomposed into “○ × Amusement park” and “I”. In addition, the feature word extraction unit 208 extracts feature words that are general nouns and proper nouns from words, and the field limitation unit 211 stores a specific field dictionary that holds in advance the terms of the target field in the present embodiment. The feature words in the predetermined field are extracted and stored in the term storage unit 212 (S14). As described in the explanation of the field limiting unit 211, in this step, for example, only terms that are registered in a dictionary related to the tourism field are narrowed down. In this step, for example, “◯ × Amusement park” is extracted from the words generated in S13. In the answer analysis process, a predetermined interest level is associated with the extracted word. In addition, you may make it set the value of the interest level matched for every question. Then, the combination of the term and the degree of interest is stored in the term storage unit 212. Also in the answer analysis process, the processing of the feature word extraction unit 208 may be omitted, and the field limitation unit 211 may narrow down the words in the word storage unit 207 to terms in a specific field. Further, the representative word replacement unit 214 replaces the term stored in the term storage unit 212 with the representative word based on the combination of the representative word and the term stored in advance in the representative word dictionary 213 (S15). In the present embodiment, for example, it is assumed that the representative word “theme park” is registered in the representative word dictionary 213 in association with the term “Ox amusement park” and replaced. And the score determination part 216 produces | generates the user preference information represented by the set of the combination of a representative word and a score based on the combination of a representative word and an interest degree, and stores it in preference information storage part 217 (S16). ). Note that the degree of interest may not be set in S14, and a predetermined score may be set for the first time in S16. Also in the answer analysis process, the preference information is data in which, in other words, the user's interests are listed and a score indicating the degree of interest is set.

回答分析処理においても、Ｓ１５で用語を代表語に置き換えることにより、ユーザが文章で行った回答の用字又は用語の不統一を吸収し、統一的に利用できる代表語を用いた嗜好情報を生成できるようになる。また、回答分析処理においては、予め回答として想定される品詞等が想定できる質問を行うことで、文章からユーザの嗜好に関連する単語を抽出する精度を向上させることができる。 Also in the answer analysis process, by replacing the term with a representative word in S15, the user can absorb the inconsistency of the answer script or term made by the sentence and generate preference information using a representative word that can be used uniformly. become able to. Also, in the answer analysis process, by asking questions that can be assumed in advance as part of speech that is assumed as answers, it is possible to improve the accuracy of extracting words related to user preferences from sentences.

＜変形例＞
上述した実施形態では、１つの投稿や回答に同一の単語が出現しても、特徴語抽出部２０８が重複を排除する。また、異なる投稿や回答に同一の単語が出現しても、スコア決定部が最も高い関心度を採用するようにしている。しかしながら、このような態様には限定されず、同一の単語が複数出願する場合に、例えば関心度の値を高くするような数式を採用してもよい。 <Modification>
In the embodiment described above, even if the same word appears in one post or answer, the feature word extraction unit 208 eliminates duplication. Also, even if the same word appears in different posts and answers, the score determination unit adopts the highest degree of interest. However, the present invention is not limited to this mode, and when a plurality of applications for the same word are applied, for example, a mathematical expression that increases the value of the degree of interest may be adopted.

上述したように、実施形態における質問は、肯定又は否定で回答できる質問や、２以上の選択肢のいずれかで回答できる質問であってもよい。この場合、ユーザの嗜好は、例えば質問と回答との組み合わせによって表すようにしてもよい。具体的には、例えば質問に対する回答を要素とする特徴ベクトルの形式でユーザの嗜好情報を生成することができる。 As described above, the question in the embodiment may be a question that can be answered in an affirmative or negative manner or a question that can be answered in any of two or more options. In this case, the user's preference may be expressed by a combination of a question and an answer, for example. Specifically, for example, user preference information can be generated in the form of a feature vector whose elements are answers to questions.

また、実施の形態では、テーブルに格納する情報を、行及び列からなる一般的な表で例示したが、テーブル設計や形式は特に限定されない。例えば、テーブル設計については、ソフトウェアの性能等を考慮して適切に正規化することができる。データベースの形式については、ＲＤＢＭＳ（Relational Database Management System）に限定されず、いわ
ゆるＮｏＳＱＬと呼ばれるような、キーバリュー型等の管理システムを採用してもよい。 In the embodiment, the information stored in the table is exemplified by a general table including rows and columns, but the table design and format are not particularly limited. For example, the table design can be properly normalized in consideration of software performance and the like. The database format is not limited to RDBMS (Relational Database Management System), but a key-value management system such as so-called NoSQL may be adopted.

１ユーザ装置
２嗜好分析装置
２０１文書取得部
２０２回答取得部
２０３文章記憶部
２０４パラメータ記憶部
２０５関心度付与部
２０６形態素解析部
２０７単語記憶部
２０８特徴語抽出部
２０９特徴語記憶部
２１０特定分野辞書
２１１分野限定部
２１２用語記憶部
２１３代表語辞書
２１４代表語置換部
２１５代表語記憶部
２１６スコア決定部
２１７嗜好情報記憶部
３ＳＮＳ提供装置
４ネットワーク DESCRIPTION OF SYMBOLS 1 User apparatus 2 Preference analysis apparatus 201 Document acquisition part 202 Answer acquisition part 203 Text storage part 204 Parameter storage part 205 Interest degree provision part 206 Morphological analysis part 207 Word storage part 208 Feature word extraction part 209 Feature word storage part 210 Specific field dictionary 211 Field Limiting Unit 212 Term Storage Unit 213 Representative Word Dictionary 214 Representative Word Replacement Unit 215 Representative Word Storage Unit 216 Score Determination Unit 217 Preference Information Storage Unit 3 SNS Providing Device 4 Network

Claims

Reading a sentence associated with the user and breaking it down into words;
A feature word extraction step of extracting a predetermined word from the sentence decomposed into words;
Clustering vectors representing the characteristics of words generated using multiple sentences, using a table that stores correspondence between representative words that are words near the center of gravity of each cluster and words included in each cluster An interest list generation step of replacing the word extracted in the feature word extraction step with the representative word;
A preference learning method that the computer executes.

Obtaining a sentence associated with the user from information published by SNS (Social Networking Service);
For the acquired sentence, calculating a predetermined degree of interest using a parameter weighted for each SNS and for each user;
Further including
The preference learning method according to claim 1, wherein in the interest list generation step, the degree of interest calculated for the sentence from which the word is extracted is stored in association with the representative word in which the word is replaced.

The preference according to claim 2, wherein the degree of interest is calculated using a parameter further weighted according to a user operation on the SNS, presence / absence of a file attached to the SNS, or presence / absence of a link to an external site. Learning method.

The preference learning method according to any one of claims 1 to 3, wherein the predetermined word used in the feature word extraction step is a word related to a predetermined field.

The sentence associated with the user is an answer to a question made to the user,
The preference learning according to claim 1, wherein a part of speech of a word expected as an answer to the question is defined, and a word corresponding to the part of speech of the word expected as the answer is extracted as the content of the answer in the decomposition step. Method.

Reading a sentence associated with the user and breaking it down into words;
A feature word extraction step of extracting a predetermined word from the sentence decomposed into words;
Clustering vectors representing the characteristics of words generated using multiple sentences, using a table that stores correspondence between representative words that are words near the center of gravity of each cluster and words included in each cluster An interest list generation step of replacing the word extracted in the feature word extraction step with the representative word;
A preference learning program that causes a computer to execute.

Reading a sentence associated with the user and breaking it into words,
A feature word extraction unit that extracts a predetermined word from a sentence decomposed into words;
Clustering vectors representing the characteristics of words generated using multiple sentences, using a table that stores correspondence between representative words that are words near the center of gravity of each cluster and words included in each cluster An interest list generator that replaces the word extracted in the feature word extraction step with the representative word;
Preference learning device.