JP5068304B2

JP5068304B2 - Extraction apparatus, method and program

Info

Publication number: JP5068304B2
Application number: JP2009298413A
Authority: JP
Inventors: 充佐藤; 秀人湯澤
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2009-12-28
Filing date: 2009-12-28
Publication date: 2012-11-07
Anticipated expiration: 2029-12-28
Also published as: JP2011138347A

Description

本発明は、コンテンツの検索に使用されたクエリを抽出する抽出装置、方法及びプログラムに関する。 The present invention relates to an extraction apparatus, method, and program for extracting a query used for content search.

従来、インターネット上のコンテンツ検索システムにおいては、端末でのクエリの入力に基づいて、このクエリに含まれているキーワードとのマッチングによりコンテンツを選択する方法が用いられている。 2. Description of the Related Art Conventionally, a content search system on the Internet uses a method of selecting content by matching with a keyword included in a query based on a query input at a terminal.

このとき、検索結果として提示されるコンテンツは、検索システムを利用するユーザがクエリとして指定したキーワードに依存する。すなわち、ユーザが所望のコンテンツに辿り着けるか否かは、効率的なクエリが入力されるか否かに影響される。 At this time, the content presented as a search result depends on a keyword specified as a query by a user using the search system. That is, whether or not the user can reach the desired content is affected by whether or not an efficient query is input.

このような状況において、検索システムが予め蓄積しているデータベースから自動的にキーワードを追加推薦し、これによって、検索結果を絞り込む技術が提案されている（例えば、特許文献１参照）。 In such a situation, a technique has been proposed in which a search system automatically recommends additional keywords from a database stored in advance and thereby narrows down search results (see, for example, Patent Document 1).

特開２００９−２３７９１２号公報JP 2009-237912 A

ところで、近年、あるユーザからの質問の投稿に対して、別のユーザから回答が投稿される質問・回答型コンテンツを公開するサービスが提供されている。このような質問・回答型コンテンツの内容は、ある対象（直接目的語）に対して、例えば、「違い」、「値段」や「意味」等を質問・回答している。したがって、これらのコンテンツを検索するためには、直接目的語と共に、性質、意図、種類等を表現する抽象名詞（間接目的語）をクエリとして同時に指定することが効果的である。 By the way, in recent years, a service for publishing question / answer type content in which an answer is posted from another user in response to a question posted by a certain user has been provided. The contents of such question / answer type content asks / answers, for example, “difference”, “price”, “meaning”, etc. with respect to a certain target (direct object). Therefore, in order to search for these contents, it is effective to simultaneously specify abstract nouns (indirect objects) expressing properties, intentions, types and the like as queries together with direct objects.

しかしながら、直接目的語と間接目的語とを分類し、これらの組合せをユーザに推薦するためには、使用される辞書及び推薦アルゴリズムを整備するにあたって、その大部分を管理者の手作業によるメンテナンスに頼る必要があった。 However, in order to classify direct objects and indirect objects and recommend these combinations to the user, most of them are manually maintained by the administrator when preparing the dictionary and recommendation algorithm to be used. I had to rely on it.

本発明は、質問・回答型コンテンツを効率的に検索するためのキーワードの候補を自動的に抽出できる抽出装置、方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide an extraction apparatus, method, and program capable of automatically extracting keyword candidates for efficiently searching question / answer type contents.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）質問・回答型のコンテンツの検索に使用されたクエリを抽出する抽出装置であって、前記抽出装置と接続可能な端末から前記コンテンツのいずれかが閲覧されたときに使用された前記クエリの履歴を記憶するクエリ記憶手段と、前記クエリ記憶手段に記憶されている前記クエリのそれぞれから、当該クエリを構成しているキーワードを抽出する抽出手段と、前記抽出手段により抽出されたキーワードと共に、同一クエリ内において所定以上の頻度で共起する一又は複数のキーワードのそれぞれが属するカテゴリを判定する判定手段と、前記判定手段により判定された前記カテゴリのばらつきが所定未満である場合、前記抽出手段により抽出されたキーワードを第１キーワードに、前記カテゴリのばらつきが所定以上である場合、前記抽出手段により抽出されたキーワードを第２キーワードに分類する分類手段と、前記分類手段により分類された前記第１キーワード及び前記第２キーワードを記憶するキーワード記憶手段と、を備える抽出装置。 (1) An extraction device for extracting a query used for searching for question / answer type content, wherein the query is used when any of the content is viewed from a terminal connectable to the extraction device. Together with a query storage means for storing a history of the above, an extraction means for extracting a keyword constituting the query from each of the queries stored in the query storage means, and a keyword extracted by the extraction means, Determining means for determining a category to which each of one or a plurality of keywords co-occurring at a predetermined frequency or more in the same query belongs, and when the variation of the category determined by the determining means is less than a predetermined value, the extracting means If the variation extracted in the category is greater than or equal to a predetermined keyword, Extracting apparatus comprising: a classifying means for classifying the keywords extracted by the means to the second keyword, and a keyword storage means for storing classified the first keyword and the second keyword by the classifying means.

このような構成によれば、当該抽出装置は、質問・回答型コンテンツのいずれかが閲覧されたときに使用されたクエリの履歴を記憶し、記憶されているクエリのそれぞれから、当該クエリを構成しているキーワードを抽出する。そして、当該抽出装置は、抽出されたキーワードと共に、同一クエリ内において所定以上の頻度で共起する一又は複数のキーワードのそれぞれが属するカテゴリを判定し、カテゴリのばらつきが所定未満である場合、抽出されたキーワードを第１キーワードに、カテゴリのばらつきが所定以上である場合、抽出されたキーワードを第２キーワードに分類して記憶する。 According to such a configuration, the extraction device stores a history of queries used when any of the question / answer type contents is browsed, and configures the query from each of the stored queries. Extract keywords Then, the extraction device determines a category to which each of one or a plurality of keywords co-occurring in the same query together with the extracted keywords belongs, and if the variation in the categories is less than a predetermined value, If the extracted keyword is the first keyword and the variation of the category is a predetermined value or more, the extracted keyword is classified and stored as the second keyword.

したがって、当該抽出装置は、抽出した対象キーワードについて、共起するキーワードが属するカテゴリのばらつき度合いを判断材料として自動的に分類できる。すなわち、第１キーワード（直接目的語）では、このばらつきが小さく（所定未満）、逆に第２キーワード（間接目的語）では、このばらつきが大きい（所定以上）特徴があることを利用し、当該抽出装置は、直接目的語と間接目的語とを自動的に分類できる。その結果、当該抽出装置は、質問・回答型コンテンツが閲覧できたときに使用されたキーワードの中から、このコンテンツを効率的に検索するためのキーワードの候補を自動的に抽出できる。 Therefore, the extraction device can automatically classify the extracted degree of the target keyword as the determination material based on the degree of variation of the category to which the co-occurring keyword belongs. That is, using the fact that the first keyword (direct object) has a small variation (less than a predetermined value) and the second keyword (indirect object) has a large variation (a predetermined value or more), The extraction device can automatically classify direct objects and indirect objects. As a result, the extraction device can automatically extract keyword candidates for efficiently searching for the content from the keywords used when the question / answer type content can be browsed.

（２）前記キーワード記憶手段は、前記分類手段により分類された結果、同一クエリ内で共起しているキーワードが前記第１キーワードと前記第２キーワードとの組合せである場合に、当該組合せを対応付けて記憶する（１）に記載の抽出装置。 (2) The keyword storage means corresponds to the combination when the keyword co-occurred in the same query is a combination of the first keyword and the second keyword as a result of the classification by the classification means. The extraction device according to (1), which is attached and stored.

このような構成によれば、当該抽出装置は、第１キーワード（直接目的語）と第２キーワード（間接目的語）とを、共起する組合せとして記憶するので、コンテンツが閲覧できた実績のあるクエリを、推薦クエリとして利用することができる。その結果、当該抽出装置は、推薦クエリの精度を向上させることができる。 According to such a configuration, the extraction device stores the first keyword (direct object) and the second keyword (indirect object) as a co-occurring combination, and thus has a track record of browsing the content. The query can be used as a recommendation query. As a result, the extraction device can improve the accuracy of the recommendation query.

（３）前記組合せが含まれるクエリを使用して前記コンテンツのいずれかが閲覧された回数を集計する集計手段をさらに備え、前記キーワード記憶手段は、前記集計手段により集計された前記回数を、前記組合せと対応付けて記憶する（２）に記載の抽出装置。 (3) It further includes a counting unit that counts the number of times any of the contents has been viewed using a query including the combination, and the keyword storage unit calculates the number of times counted by the counting unit as The extraction device according to (2), which stores the information in association with the combination.

このような構成によれば、当該抽出装置は、第１キーワード（直接目的語）と第２キーワード（間接目的語）との組合せに基づく検索により、質問・回答型コンテンツのいずれかが閲覧された回数を集計する。したがって、当該抽出装置は、この集計した回数を、組合せと対応付けて記憶することにより、検索でコンテンツに辿り着きやすい（回数が多い）クエリを容易に選択できる。その結果、当該抽出装置は、コンテンツに辿り着きやすいクエリを優先して推薦することができる。 According to such a configuration, the extraction device browses any of the question / answer type contents by the search based on the combination of the first keyword (direct object) and the second keyword (indirect object). Count the number of times. Therefore, the extraction device can easily select a query that can easily reach the content (a large number of times) by searching by storing the total number of times in association with the combination. As a result, the extraction device can preferentially recommend a query that can easily reach the content.

（４）前記クエリ記憶手段は、前記コンテンツのいずれかが閲覧されたときに使用された前記クエリ、及び当該コンテンツのＵＲＬとを対応付けて記憶し、前記クエリ記憶手段を参照し、前記組合せが含まれるクエリを使用して閲覧された前記コンテンツのうち、相対的に閲覧頻度が高いコンテンツを選択する選択手段をさらに備え、前記キーワード記憶手段は、前記選択手段により選択された前記コンテンツのＵＲＬを、前記組合せと対応付けて記憶する（２）又は（３）に記載の抽出装置。 (4) The query storage means stores the query used when any of the contents is browsed and the URL of the content in association with each other, refers to the query storage means, and the combination is A selection unit configured to select a content having a relatively high browsing frequency among the content browsed using the included query, wherein the keyword storage unit stores the URL of the content selected by the selection unit; The extraction device according to (2) or (3), which stores the information in association with the combination.

このような構成によれば、当該抽出装置は、検索の履歴として、質問・回答型コンテンツのいずれかが閲覧されたときに使用されたクエリ、及び閲覧されたコンテンツのＵＲＬとを対応付けて記憶する。また、当該抽出装置は、これらの閲覧されたコンテンツのうち、相対的に閲覧頻度が高いコンテンツを選択する。したがって、当該抽出装置は、この選択されたコンテンツのＵＲＬを、第１キーワード（直接目的語）と第２キーワード（間接目的語）との組合せに対応づけて記憶することにより、この組合せをクエリとした場合に、閲覧頻度が高く有用なコンテンツを効率的に提供することができる。 According to such a configuration, the extraction device stores, as a search history, the query used when any of the question / answer type contents is browsed and the URL of the browsed content in association with each other. To do. In addition, the extraction device selects content having a relatively high browsing frequency from among the browsed content. Therefore, the extraction device stores the URL of the selected content in association with the combination of the first keyword (direct object) and the second keyword (indirect object), thereby making this combination a query. In this case, it is possible to efficiently provide useful content with high browsing frequency.

（５）前記キーワード記憶手段は、前記分類手段により分類された結果、同一クエリ内で複数の前記第１キーワードが共起している場合に、当該複数の前記第１キーワードを対応付けて記憶する（１）から（４）のいずれかに記載の抽出装置。 (5) When the plurality of first keywords co-occur in the same query as a result of the classification by the classification unit, the keyword storage unit stores the plurality of first keywords in association with each other. The extraction device according to any one of (1) to (4).

このような構成によれば、当該抽出装置は、複数の第１キーワード（直接目的語）を、共起する組合せとして記憶するので、組合せることに意味のある、又は組合せることにより所望のコンテンツに辿り着きやすい直接目的語を、効率的に推薦することができる。 According to such a configuration, the extraction device stores a plurality of first keywords (direct objects) as co-occurring combinations, so that it is meaningful to combine or desired content by combining It is possible to efficiently recommend direct objects that are easy to reach.

（６）質問・回答型のコンテンツの検索に使用されたクエリを抽出装置が抽出する方法であって、前記抽出装置と接続可能な端末から前記コンテンツのいずれかが閲覧されたときに使用された前記クエリの履歴を記憶するクエリ記憶ステップと、前記クエリ記憶ステップにおいて記憶されている前記クエリのそれぞれから、当該クエリを構成しているキーワードを抽出する抽出ステップと、前記抽出ステップにおいて抽出されたキーワードと共に、同一クエリ内において所定以上の頻度で共起する一又は複数のキーワードのそれぞれが属するカテゴリを判定する判定ステップと、前記判定ステップにおいて判定された前記カテゴリのばらつきが所定未満である場合、前記抽出ステップにおいて抽出されたキーワードを第１キーワードに、前記カテゴリのばらつきが所定以上である場合、前記抽出ステップにおいて抽出されたキーワードを第２キーワードに分類する分類ステップと、前記分類ステップにおいて分類された前記第１キーワード及び前記第２キーワードを記憶するキーワード記憶ステップと、を含む方法。 (6) A method in which an extraction device extracts a query used to search for question / answer type content, and is used when any of the content is viewed from a terminal connectable to the extraction device A query storage step for storing a history of the query, an extraction step for extracting a keyword constituting the query from each of the queries stored in the query storage step, and a keyword extracted in the extraction step And a determination step of determining a category to which each of one or a plurality of keywords co-occurring at a predetermined frequency or more in the same query belongs, and the variation of the category determined in the determination step is less than a predetermined value, The keyword extracted in the extraction step as the first keyword, A classification step for classifying the keyword extracted in the extraction step into a second keyword when the variation in the category is equal to or greater than a predetermined value; and a keyword storage for storing the first keyword and the second keyword classified in the classification step And a method comprising:

このような構成によれば、当該方法を抽出装置が実行することにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected when the extraction apparatus executes the method.

（７）（６）に記載の方法を前記抽出装置に実行させるプログラム。 (7) A program that causes the extraction device to execute the method according to (6).

このような構成によれば、当該プログラムを抽出装置に実行させることにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected by causing the extraction apparatus to execute the program.

本発明によれば、質問・回答型コンテンツを効率的に検索するためのキーワードの候補を自動的に抽出できる。 According to the present invention, keyword candidates for efficiently searching for question / answer type content can be automatically extracted.

第１実施形態に係る抽出サーバの機能構成を示す図である。It is a figure which shows the function structure of the extraction server which concerns on 1st Embodiment. 第１実施形態に係るクエリログＤＢに格納されているクエリログテーブルの一例を示す図である。It is a figure which shows an example of the query log table stored in query log DB which concerns on 1st Embodiment. 第１実施形態に係る直接目的語候補テーブル及び間接目的語候補テーブルの一例を示す図である。It is a figure which shows an example of the direct object candidate table and indirect object candidate table which concern on 1st Embodiment. 第１実施形態に係る制御部における処理を示すフローチャートである。It is a flowchart which shows the process in the control part which concerns on 1st Embodiment. 第２実施形態に係る抽出サーバの機能構成を示す図である。It is a figure which shows the function structure of the extraction server which concerns on 2nd Embodiment. 第２実施形態に係るキーワード候補テーブルの一例を示す図である。It is a figure which shows an example of the keyword candidate table which concerns on 2nd Embodiment. 第２実施形態に係る制御部における処理を示すフローチャートである。It is a flowchart which shows the process in the control part which concerns on 2nd Embodiment.

＜第１実施形態＞
以下、本発明の実施形態の一例である第１実施形態について図を参照しながら説明する。 <First Embodiment>
Hereinafter, a first embodiment, which is an example of an embodiment of the present invention, will be described with reference to the drawings.

［機能構成］
図１は、本実施形態に係る抽出サーバ１（抽出装置）の機能構成を示す図である。抽出サーバ１は、Ｗｅｂページ、特に質問・回答型コンテンツを検索するシステムを管理する管理者によって主に利用される装置である。すなわち、抽出サーバ１は、検索システムにより推薦されるべき、質問・回答型のコンテンツを検索するためのクエリを決定するアルゴリズム又は辞書データのメンテナンスのため、このクエリを構成する検索キーワードの候補を抽出する。 [Function configuration]
FIG. 1 is a diagram illustrating a functional configuration of an extraction server 1 (extraction apparatus) according to the present embodiment. The extraction server 1 is an apparatus mainly used by an administrator who manages a system that searches a Web page, particularly a question / answer type content. In other words, the extraction server 1 extracts search keyword candidates that constitute this query for the purpose of algorithm or dictionary data maintenance for determining a query for searching for question / answer type content to be recommended by the search system. To do.

本実施形態は、コンピュータ（抽出サーバ１）及びその周辺装置に適用される。本実施形態における各部は、コンピュータ及びその周辺装置が備えるハードウェア並びに該ハードウェアを制御するソフトウェアによって構成される。 This embodiment is applied to a computer (extraction server 1) and its peripheral devices. Each unit in the present embodiment is configured by hardware included in a computer and its peripheral devices, and software that controls the hardware.

上記ハードウェアには、制御部１０としてのＣＰＵの他、記憶部２０、通信部、表示部及び入力部が含まれる。記憶部２０としては、例えば、メモリ（ＲＡＭ、ＲＯＭ等）、ハードディスクドライブ（ＨＤＤ）及び光ディスク（ＣＤ、ＤＶＤ等）ドライブが挙げられる。通信部としては、例えば、各種有線及び無線インターフェース装置が挙げられる。表示部としては、例えば、液晶ディスプレイ、プラズマディスプレイ等の各種ディスプレイが挙げられる。入力部としては、例えば、キーボード及びポインティング・デバイス（マウス、トラッキングボール等）が挙げられる。 In addition to the CPU as the control unit 10, the hardware includes a storage unit 20, a communication unit, a display unit, and an input unit. Examples of the storage unit 20 include a memory (RAM, ROM, etc.), a hard disk drive (HDD), and an optical disk (CD, DVD, etc.) drive. Examples of the communication unit include various wired and wireless interface devices. Examples of the display unit include various displays such as a liquid crystal display and a plasma display. Examples of the input unit include a keyboard and a pointing device (mouse, tracking ball, etc.).

上記ソフトウェアには、上記ハードウェアを制御するコンピュータ・プログラムやデータが含まれる。コンピュータ・プログラムやデータは、記憶部２０により記憶され、制御部１０により適宜実行、参照される。また、コンピュータ・プログラムやデータは、通信回線を介して配布することも可能であり、ＣＤ−ＲＯＭ等のコンピュータ可読媒体に記録して配布することも可能である。 The software includes a computer program and data for controlling the hardware. The computer program and data are stored in the storage unit 20 and appropriately executed and referenced by the control unit 10. Further, the computer program and data can be distributed via a communication line, and can also be recorded and distributed on a computer-readable medium such as a CD-ROM.

抽出サーバ１の制御部１０は、キーワード抽出部１１（抽出手段）と、カテゴリ判定部１２（判定手段）と、キーワード分類部１３（分類手段）とを備える。また、抽出サーバ１の記憶部２０は、クエリログＤＢ（データベース）２１（クエリ記憶手段）と、キーワード候補ＤＢ２２（キーワード記憶手段）とを備える。 The control unit 10 of the extraction server 1 includes a keyword extraction unit 11 (extraction unit), a category determination unit 12 (determination unit), and a keyword classification unit 13 (classification unit). The storage unit 20 of the extraction server 1 includes a query log DB (database) 21 (query storage unit) and a keyword candidate DB 22 (keyword storage unit).

キーワード抽出部１１は、クエリログＤＢ２１に記憶されているクエリのそれぞれから、このクエリを構成しているキーワードを抽出する。具体的には、キーワード抽出部１１は、クエリのテキストを既定の区切り文字（例えば、スペース）を境界にして区切り、区切られたテキストのそれぞれを、キーワードとして抽出する。これにより、ユーザが明示的に指定した語句をキーワードとすることができる。また、クエリのテキストに区切り文字が含まれていない等の場合には、キーワード抽出部１１は、クエリのテキストを形態素解析し、予め設定されている所定の種類の単語をキーワードとして抽出してもよい。 The keyword extraction part 11 extracts the keyword which comprises this query from each of the query memorize | stored in query log DB21. Specifically, the keyword extraction unit 11 delimits the query text with a predetermined delimiter (for example, a space) as a boundary, and extracts each delimited text as a keyword. As a result, a phrase explicitly specified by the user can be used as a keyword. If the query text does not include a delimiter, the keyword extraction unit 11 may perform morphological analysis on the query text and extract a predetermined type of word as a keyword. Good.

なお、キーワード抽出部１１は、抽出された語句について、既存の辞書において同義語又は類義語が登録されている場合に、いずれかの同義語又は類義語を代表語として、語句を代表語に変換することとしてよい。これにより、クエリ毎の表現のばらつきが統一される。 Note that the keyword extraction unit 11 converts a phrase to a representative word by using any synonym or synonym as a representative word for the extracted word or phrase when the synonym or synonym is registered in an existing dictionary. As good as This unifies the variation in expression for each query.

ここで、クエリログＤＢ２１は、抽出サーバ１と接続可能な端末から質問・回答型コンテンツのいずれかが閲覧されたときに使用されたクエリの履歴を記憶する。
図２は、本実施形態に係るクエリログＤＢ２１に格納されているクエリログテーブルの一例を示す図である。 Here, the query log DB 21 stores a history of queries used when any of the question / answer type contents is browsed from a terminal connectable to the extraction server 1.
FIG. 2 is a diagram illustrating an example of a query log table stored in the query log DB 21 according to the present embodiment.

クエリログテーブルには、各種検索エンジンに入力されたクエリと、このクエリにより検索された検索結果リストの中から選択された（クリックされた）コンテンツのＵＲＬとが記憶される。なお、クエリログテーブルに記憶されるクエリとＵＲＬとの組合せは、選択されたコンテンツが所定のサイト（質問・回答型コンテンツ）の場合に限ることとしてよい。 The query log table stores a query input to various search engines and a URL of content selected (clicked) from a search result list searched by the query. The combination of the query and URL stored in the query log table may be limited to a case where the selected content is a predetermined site (question / answer type content).

カテゴリ判定部１２は、キーワード抽出部１１により抽出されたキーワードと共に、同一クエリ内において所定以上の頻度で共起する一又は複数のキーワードのそれぞれが属するカテゴリを判定する。具体的には、カテゴリ判定部１２は、あるキーワード（対象語）が抽出されると、このキーワード（対象語）と共に、所定以上の頻度（例えば、割合）で同一のクエリを構成する別のキーワード（共起語）を抽出する。そして、カテゴリ判定部１２は、予め複数のカテゴリに対して設定されている分類ルールに基づいて、共起語がいずれのカテゴリに属するかを判定する。 The category determination unit 12 determines a category to which each of one or a plurality of keywords co-occurring at a predetermined frequency or more in the same query belongs together with the keyword extracted by the keyword extraction unit 11. Specifically, when a certain keyword (target word) is extracted, the category determination unit 12 and the keyword (target word) together with another keyword constituting the same query at a predetermined frequency (for example, a ratio). (Co-occurrence word) is extracted. Then, the category determination unit 12 determines to which category the co-occurrence word belongs based on classification rules set in advance for a plurality of categories.

なお、分類ルールとしては、例えば、カテゴリ毎に設けられている特徴語と共起語との類似度が相対的に高いカテゴリに判定する手法が用いられるが、これには限られず、様々な手法が適用可能である。 As a classification rule, for example, a method for determining a category having a relatively high similarity between a feature word and a co-occurrence word provided for each category is used, but the method is not limited to this, and various methods are used. Is applicable.

キーワード分類部１３は、カテゴリ判定部１２により判定されたカテゴリのばらつきが所定未満である場合、キーワード抽出部１１により抽出された対象語を直接目的語（第１キーワード）に分類する。一方、カテゴリのばらつきが所定以上である場合、キーワード抽出部１１により抽出された対象語を間接目的語（第２キーワード）に分類する。 When the variation of the category determined by the category determination unit 12 is less than a predetermined value, the keyword classification unit 13 directly classifies the target word extracted by the keyword extraction unit 11 into a target word (first keyword). On the other hand, when the variation of the category is equal to or greater than a predetermined value, the target word extracted by the keyword extraction unit 11 is classified as an indirect object (second keyword).

直接目的語は、例えば、様々な種類（カテゴリ）の商品名を含む。この直接目的語と対になる間接目的語は、例えば、「値段」や「使い方」等、比較的限定された種類（カテゴリ）となる。つまり、ある商品名と共起するキーワードのカテゴリは、ばらつきが小さいので、キーワード分類部１３は、商品名を直接目的語と判定できる。逆に、「値段」と共起するキーワードは、様々な商品を含み、カテゴリのばらつきが大きいので、キーワード分類部１３は、「値段」を間接目的語と判定できる。 The direct object includes, for example, various types (categories) of product names. The indirect object paired with the direct object is a relatively limited type (category) such as “price” or “how to use”. That is, since the category of a keyword that co-occurs with a certain product name has little variation, the keyword classification unit 13 can determine the product name as a direct object. On the other hand, the keyword co-occurring with “price” includes various products and has a large variation in category, so the keyword classification unit 13 can determine “price” as an indirect object.

なお、カテゴリのばらつきとは、例えば、判定されたカテゴリの数をいい、このカテゴリの数は、判定された共起語の数が相対的に少ない（例えば、絶対数が所定未満、又は最大値に対する割合が所定未満等）カテゴリを除いてカウントされてもよい。 The category variation means, for example, the number of determined categories, and the number of categories is relatively small in the number of determined co-occurrence words (for example, the absolute number is less than a predetermined value or the maximum value). The ratio may be counted excluding categories).

そして、キーワード分類部１３は、このようにして分類された直接目的語及び間接目的語を、キーワード候補ＤＢ２２に記憶させる。
図３は、本実施形態に係るキーワード候補ＤＢ２２に格納されている直接目的語候補テーブル及び間接目的語候補テーブルの一例を示す図である。 Then, the keyword classification unit 13 stores the direct object and the indirect object thus classified in the keyword candidate DB 22.
FIG. 3 is a diagram illustrating an example of a direct object candidate table and an indirect object candidate table stored in the keyword candidate DB 22 according to the present embodiment.

直接目的語候補テーブル（ａ）には、キーワード分類部１３により直接目的語に分類されたキーワードが、間接目的語候補テーブル（ｂ）には、キーワード分類部１３により間接目的語に分類されたキーワードが、それぞれ独立して記憶される。 In the direct object candidate table (a), keywords classified as direct objects by the keyword classifying unit 13 and in the indirect object candidate table (b), keywords classified as indirect objects by the keyword classifying unit 13 Are stored independently.

［処理フロー］
図４は、本実施形態に係る抽出サーバ１の制御部１０における処理を示すフローチャートである。なお、制御部１０は、質問・回答型コンテンツが閲覧されたときに使用されたクエリの履歴であるクエリログを、クエリログＤＢ２１に十分に蓄積しているものとする（クエリ記憶ステップ）。 [Processing flow]
FIG. 4 is a flowchart showing processing in the control unit 10 of the extraction server 1 according to this embodiment. It is assumed that the control unit 10 has sufficiently accumulated a query log, which is a history of queries used when the question / answer type content is browsed, in the query log DB 21 (query storage step).

ステップＳ１において、制御部１０（キーワード抽出部１１）は、クエリログＤＢ２１から、質問・回答型コンテンツへ検索により辿り着くことができたクエリの履歴であるクエリログを読込む。 In step S <b> 1, the control unit 10 (keyword extraction unit 11) reads a query log that is a history of queries that have been able to reach the question / answer type content from the query log DB 21.

ステップＳ２（抽出ステップ）において、制御部１０（キーワード抽出部１１）は、ステップＳ１で読込んだクエリログの各々のクエリを解析し、対象語を抽出する。 In step S2 (extraction step), the control unit 10 (keyword extraction unit 11) analyzes each query of the query log read in step S1 and extracts a target word.

ステップＳ３において、制御部１０（カテゴリ判定部１２）は、ステップＳ２で抽出された対象語とクエリ内で所定以上の頻度で共起する共起語を抽出する。 In step S3, the control unit 10 (category determination unit 12) extracts the co-occurrence words that co-occur with the target word extracted in step S2 at a predetermined frequency or more in the query.

ステップＳ４（判定ステップ）において、制御部１０（カテゴリ判定部１２）は、ステップＳ３で抽出された共起語のそれぞれが属するカテゴリを判定する。 In step S4 (determination step), the control unit 10 (category determination unit 12) determines the category to which each of the co-occurrence words extracted in step S3 belongs.

ステップＳ５（分類ステップ）において、制御部１０（キーワード分類部１３）は、ステップＳ４で判定されたカテゴリのばらつきが所定未満か否かを判定する。制御部１０は、この判定がＹＥＳの場合、処理をステップＳ６に移し、判定がＮＯの場合、処理をステップＳ７に移す。 In step S5 (classification step), the control unit 10 (keyword classification unit 13) determines whether or not the variation of the category determined in step S4 is less than a predetermined value. When this determination is YES, the control unit 10 moves the process to step S6, and when the determination is NO, the control unit 10 moves the process to step S7.

ステップＳ６（分類ステップ）において、制御部１０（キーワード分類部１３）は、共起語が属するカテゴリのばらつきが小さいので、ステップＳ２で抽出された対象語を直接目的語に分類する。 In step S6 (classification step), the control unit 10 (keyword classification unit 13) classifies the target word extracted in step S2 directly as a target word because the variation of the category to which the co-occurrence word belongs is small.

ステップＳ７（分類ステップ）において、制御部１０（キーワード分類部１３）は、共起語が属するカテゴリのばらつきが大きいので、ステップＳ２で抽出された対象語を間接目的語に分類する。 In step S7 (classification step), the control unit 10 (keyword classification unit 13) classifies the target word extracted in step S2 as an indirect object because the variation of the category to which the co-occurrence word belongs is large.

ステップＳ８（キーワード記憶ステップ）において、制御部１０（キーワード分類部１３）は、ステップＳ６で直接目的語に分類された対象語、及びステップＳ７で間接目的語に分類された対象語を、キーワード候補ＤＢ２２にキーワード候補として記憶する。 In step S8 (keyword storage step), the control unit 10 (keyword classifying unit 13) uses the target words classified as direct objects in step S6 and the target words classified as indirect objects in step S7 as keyword candidates. Store as keyword candidates in DB22.

ステップＳ９において、制御部１０は、全ての対象語を処理したか否かを判定する。制御部１０は、この判定がＹＥＳの場合、処理を終了し、判定がＮＯの場合、処理をステップＳ２に戻して次の対象語について処理を継続する。 In step S9, the control unit 10 determines whether or not all the target words have been processed. If this determination is YES, the control unit 10 ends the process. If the determination is NO, the control unit 10 returns the process to step S2 and continues the process for the next target word.

以上のように、本実施形態によれば、抽出サーバ１は、共起性の高いキーワードが属するカテゴリのばらつき度合いを判断材料として、抽出した対象語を自動的に、直接目的語又は間接目的語に分類できる。したがって、抽出サーバ１は、質問・回答型コンテンツが閲覧されたときに使用されたクエリの中から、このコンテンツを効率的に検索するためのキーワードの候補を自動的に抽出できる。 As described above, according to the present embodiment, the extraction server 1 automatically selects the extracted target word as a direct object or an indirect object using the degree of variation of the category to which the highly co-occurrence keyword belongs as a determination material. Can be classified. Therefore, the extraction server 1 can automatically extract keyword candidates for efficiently searching for the content from the query used when the question / answer type content is browsed.

＜第２実施形態＞
以下、本発明の実施形態の一例である第２実施形態について図を参照しながら説明する。なお、第１実施形態と同様の構成については、同一の符号を付し、説明を省略又は簡略化する。 Second Embodiment
Hereinafter, a second embodiment, which is an example of an embodiment of the present invention, will be described with reference to the drawings. In addition, about the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted or simplified.

［機能構成］
図５は、本実施形態に係る抽出サーバ１ａ（抽出装置）の機能構成を示す図である。
抽出サーバ１ａは、第１実施形態の構成に加えて、制御部１０ａに、集計部１４及び選択部１５をさらに備える。また、記憶部２０ａのキーワード候補ＤＢ２２ａに記憶される情報が第１実施形態のキーワード候補ＤＢ２２と異なる。 [Function configuration]
FIG. 5 is a diagram illustrating a functional configuration of the extraction server 1a (extraction apparatus) according to the present embodiment.
In addition to the configuration of the first embodiment, the extraction server 1a further includes a totaling unit 14 and a selection unit 15 in the control unit 10a. Moreover, the information memorize | stored in keyword candidate DB22a of the memory | storage part 20a differs from keyword candidate DB22 of 1st Embodiment.

集計部１４は、キーワード分類部１３により分類された結果、同一クエリに直接目的語と間接目的語との組合せが含まれる場合、この組合せが含まれるクエリを使用して質問・回答型コンテンツのいずれかが閲覧された回数を集計する。この回数は、具体的には、検索システムによる検索結果リストの中からユーザにより質問・回答型コンテンツが選択（クリック）された回数である。 When the result of the classification by the keyword classification unit 13 includes a combination of a direct object and an indirect object as a result of the classification by the keyword classification unit 13, the aggregation unit 14 uses the query including this combination to determine any of the question / answer type contents. Count the number of times that is viewed. Specifically, the number of times is the number of times that the user selects (clicks) the question / answer type content from the search result list by the search system.

選択部１５は、キーワード分類部１３により分類された結果、同一クエリに直接目的語と間接目的語との組合せが含まれる場合、キーワード抽出部１１を介してクエリログＤＢ２１を参照し、この組合せが含まれるクエリを使用して閲覧されたコンテンツのうち、相対的に閲覧頻度が高いコンテンツを選択する。選択部１５は、最も閲覧頻度が高いコンテンツを選択してもよいし、所定以上の閲覧頻度である複数のコンテンツを選択してもよい。 As a result of the classification by the keyword classification unit 13, the selection unit 15 refers to the query log DB 21 via the keyword extraction unit 11 when a combination of a direct object and an indirect object is included in the same query, and includes this combination. Content that is viewed using a query that is relatively frequently viewed is selected. The selection unit 15 may select the content having the highest browsing frequency, or may select a plurality of contents having a browsing frequency equal to or higher than a predetermined level.

キーワード候補ＤＢ２２ａは、同一クエリ内で共起している直接目的語と間接目的語との組合せと共に、集計部１４により集計された回数、及び選択部１５により選択されたコンテンツのＵＲＬを、対応付けて記憶する。 The keyword candidate DB 22a associates the combination of the direct object and the indirect object co-occurring in the same query, the number of times counted by the counting unit 14, and the URL of the content selected by the selection unit 15. Remember.

また、キーワード候補ＤＢ２２ａは、キーワード分類部１３により分類された結果、同一クエリ内で複数の直接目的語が共起している場合に、この複数の直接目的語を対応付けて記憶する。 In addition, as a result of the classification by the keyword classification unit 13, the keyword candidate DB 22a stores a plurality of direct objects in association with each other when a plurality of direct objects co-occur in the same query.

図６は、本実施形態に係るキーワード候補ＤＢ２２ａに格納されているキーワード候補テーブルの一例を示す図である。 FIG. 6 is a diagram showing an example of a keyword candidate table stored in the keyword candidate DB 22a according to the present embodiment.

キーワード候補テーブルには、直接目的語と間接目的語との組合せと、この組合せを含むクエリを使用してコンテンツが閲覧された回数（クリック数）と、この組合せを使用して高頻度で閲覧されたコンテンツのＵＲＬとが対応付けて記憶される。 In the keyword candidate table, the combination of the direct object and the indirect object, the number of times the content is viewed using a query including this combination (the number of clicks), and the combination is frequently used. The content URL is stored in association with each other.

例えば、「○○と□□の違い」のような「○○」、「□□」及び「違い」を含むクエリからは、質問・回答型コンテンツが１０００回クリックされ、「ｈｔｔｐ：／／ｃｃｃ〜」が最も多く閲覧されていることが示されている。 For example, from a query including “XX”, “□□”, and “Difference” such as “Difference between XX and □□”, the question / answer type content is clicked 1000 times and “http: // ccc It is shown that "~" is browsed most.

［処理フロー］
図７は、本実施形態に係る抽出サーバ１ａの制御部１０ａにおける処理を示すフローチャートである。 [Processing flow]
FIG. 7 is a flowchart showing processing in the control unit 10a of the extraction server 1a according to the present embodiment.

ステップＳ１１からステップＳ１７は、第１実施形態のステップＳ１からステップＳ７と同様であり、制御部１０ａは、クエリを構成する対象語を、直接目的語又は間接目的語に分類する。 Steps S11 to S17 are the same as steps S1 to S7 in the first embodiment, and the control unit 10a classifies the target words constituting the query into direct objects or indirect objects.

ステップＳ１８において、制御部１０ａ（集計部１４）は、直接目的語と間接目的語との組合せが同一クエリ内に含まれる場合、このクエリに基づく検索結果から、質問・回答型コンテンツがクリックされた回数を集計する。 In step S18, when the combination of the direct object and the indirect object is included in the same query, the control unit 10a (aggregation unit 14) clicks the question / answer type content from the search result based on this query. Count the number of times.

ステップＳ１９において、制御部１０ａ（選択部１５）は、直接目的語と間接目的語との組合せが同一クエリ内に含まれる場合、このクエリに基づく検索結果から高頻度でクリックされたコンテンツのＵＲＬを選択する。 In step S19, when the combination of the direct object and the indirect object is included in the same query, the control unit 10a (selection unit 15) selects the URL of the content frequently clicked from the search result based on the query. select.

ステップＳ２０において、制御部１０ａ（キーワード分類部１３）は、ステップＳ１６で直接目的語に分類された対象語と、ステップＳ１７で間接目的語に分類された対象語との組合せを、ステップＳ１８で集計された回数及びステップＳ１９で選択されたＵＲＬと対応付けて、キーワード候補ＤＢ２２ａにキーワード候補として記憶する。 In step S20, the control unit 10a (keyword classifying unit 13) tabulates the combinations of the target words classified as direct objects in step S16 and the target words classified as indirect objects in step S17 in step S18. The keyword is stored as a keyword candidate in the keyword candidate DB 22a in association with the number of times and the URL selected in step S19.

ステップＳ２１において、制御部１０ａは、全ての対象語を処理したか否かを判定する。制御部１０ａは、この判定がＹＥＳの場合、処理を終了し、判定がＮＯの場合、処理をステップＳ１２に戻して次の対象語について処理を継続する。 In step S21, the control unit 10a determines whether or not all the target words have been processed. If this determination is YES, the control unit 10a ends the process. If the determination is NO, the control unit 10a returns the process to step S12 and continues the process for the next target word.

以上のように、本実施形態によれば、抽出サーバ１ａは、直接目的語と間接目的語とを、共起する組合せとして記憶する。したがって、管理者は、コンテンツが閲覧できた実績のあるクエリを、推薦クエリとして利用することができるので、抽出サーバ１ａは、推薦クエリの精度を向上させることができる。 As described above, according to the present embodiment, the extraction server 1a stores the direct object and the indirect object as a co-occurring combination. Accordingly, since the administrator can use a query with a track record of browsing the content as a recommendation query, the extraction server 1a can improve the accuracy of the recommendation query.

また、抽出サーバ１ａは、直接目的語と間接目的語との組合せに基づく検索により、質問・回答型コンテンツがクリックされた回数を記憶する。したがって、抽出サーバ１ａは、クリック回数が多く、検索でコンテンツに辿り着きやすいクエリを容易に選択できるので、コンテンツに辿り着きやすい効率的なクエリを優先して推薦することができる。 Further, the extraction server 1a stores the number of times the question / answer type content is clicked by the search based on the combination of the direct object and the indirect object. Therefore, since the extraction server 1a can easily select a query that has a large number of clicks and can easily reach the content by the search, an efficient query that can easily reach the content can be preferentially recommended.

例えば、図６のキーワード候補テーブルによれば、検索システムにおいてキーワード「○○」が指定された場合に、抽出サーバ１ａは、クリック数が多い順に、「口コミ」、「値段」、「違い」を組合せたクエリを推薦することができる。 For example, according to the keyword candidate table of FIG. 6, when the keyword “XX” is specified in the search system, the extraction server 1a displays “word of mouth”, “price”, and “difference” in descending order of the number of clicks. A combined query can be recommended.

また、抽出サーバ１ａは、直接目的語と間接目的語との組合せに基づく検索により、閲覧頻度が高い質問・回答型コンテンツを記憶する。したがって、抽出サーバ１ａは、この組合せをクエリとした場合に、閲覧頻度が高く有用なコンテンツを効率的に提供することができる。 Further, the extraction server 1a stores question / answer type content with a high browsing frequency by a search based on a combination of a direct object and an indirect object. Therefore, when this combination is used as a query, the extraction server 1a can efficiently provide useful content with a high browsing frequency.

さらに、抽出サーバ１ａは、複数の直接目的語を、共起する組合せとして記憶するので、組合せることに意味のある、又は組合せることにより所望のコンテンツに辿り着きやすい直接目的語を、効率的に推薦することができる。 Furthermore, since the extraction server 1a stores a plurality of direct objects as co-occurring combinations, it is efficient to generate direct objects that are meaningful for the combination or that can easily reach the desired content by the combination. Can be recommended.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

前述の実施形態では、抽出サーバ１又は１ａが各部を備えることとして説明したが、これには限られず、各部は、適宜複数のサーバに分散されてもよい。 In the above-described embodiment, the extraction server 1 or 1a has been described as including each unit. However, the present invention is not limited to this, and each unit may be appropriately distributed to a plurality of servers.

１、１ａ抽出サーバ（抽出装置）
１０、１０ａ制御部
１１キーワード抽出部（抽出手段）
１２カテゴリ判定部（判定手段）
１３キーワード分類部（分類手段）
１４集計部
１５選択部
２０、２０ａ記憶部
２１クエリログＤＢ（クエリ記憶手段）
２２、２２ａキーワード候補ＤＢ（キーワード記憶手段） 1, 1a Extraction server (extraction device)
10, 10a Control unit 11 Keyword extraction unit (extraction means)
12 Category determination unit (determination means)
13 Keyword classification part (classification means)
14 total part 15 selection part 20, 20a storage part 21 query log DB (query storage means)
22, 22a Keyword candidate DB (keyword storage means)

Claims

An extraction device that extracts a query used to search for question / answer type content,
Query storage means for storing a history of the query used when any of the contents is viewed from a terminal connectable to the extraction device;
Extraction means for extracting a keyword constituting the query from each of the queries stored in the query storage means;
Determining means for determining a category to which each of one or a plurality of keywords co-occurring at a predetermined frequency or more in the same query together with the keywords extracted by the extracting means;
When the variation of the category determined by the determination unit is less than a predetermined value, the keyword extracted by the extraction unit is the first keyword, and when the variation of the category is a predetermined value or more, the keyword is extracted by the extraction unit A classifying means for classifying the keyword into a second keyword;
An extraction apparatus comprising: keyword storage means for storing the first keyword and the second keyword classified by the classification means.

The keyword storage means stores the association of the first keyword and the second keyword in association with each other when the keyword co-occurring in the same query is a combination of the first keyword and the second keyword as a result of the classification by the classification means. The extraction device according to claim 1.

A totaling unit that counts the number of times any of the contents is viewed using a query including the combination;
The extraction device according to claim 2, wherein the keyword storage unit stores the number of times counted by the counting unit in association with the combination.

The query storage means stores the query used when one of the contents is browsed and the URL of the content in association with each other,
A selection unit that refers to the query storage unit, and selects a content having a relatively high browsing frequency among the content browsed using a query including the combination;
The extraction device according to claim 2, wherein the keyword storage unit stores a URL of the content selected by the selection unit in association with the combination.

The keyword storage means stores the plurality of first keywords in association with each other when a plurality of the first keywords co-occur in the same query as a result of the classification by the classification means. The extraction device according to claim 4.

A method in which an extraction device extracts a query used to search for question / answer type content,
A query storage step of storing a history of the query used when any of the contents is viewed from a terminal connectable to the extraction device;
An extraction step of extracting a keyword constituting the query from each of the queries stored in the query storage step;
A determination step of determining a category to which each of one or a plurality of keywords co-occurring at a predetermined frequency or more in the same query together with the keywords extracted in the extraction step,
When the variation of the category determined in the determination step is less than a predetermined value, the keyword extracted in the extraction step is the first keyword, and when the variation of the category is a predetermined value or more, the extraction is performed in the extraction step A classification step of classifying the keyword into a second keyword;
A keyword storage step of storing the first keyword and the second keyword classified in the classification step.

The program which makes the said extraction apparatus perform the method of Claim 6.