JP7485706B2

JP7485706B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7485706B2
Application number: JP2022024322A
Authority: JP
Inventors: 颯太山城
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2024-05-16
Anticipated expiration: 2042-02-18
Also published as: JP2023121078A

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

インターネット上の求人サービスにおいて、企業を紹介する企業情報詳細ページに対して、Ｑ＆Ａ（Question and Answer）サイト（適宜、「質問サイト」）に投稿された企業に関連する質問情報を付加したいことがある。 In an online job search service, it may be necessary to add questions related to a company posted on a Q&A (Question and Answer) site (referred to as a "question site") to a company information detail page that introduces the company.

特開２００５－３３２２７１号公報JP 2005-332271 A

しかしながら、従来技術は、顧客（適宜、「利用者」）に対して効率的に情報を提供する上で改善の余地がある。例えば、質問サイトへの質問は多岐にわたるので、求人カテゴリの質問だとしても、就職に関連しない質問が投稿されることが多くある。従来技術では、そのような就職に関連しない質問を効率的に除外することが難しい。 However, conventional technology has room for improvement in terms of efficiently providing information to customers (referred to as "users"). For example, because questions posted on question sites are diverse, questions that are not related to employment are often posted, even if they are in the job category. With conventional technology, it is difficult to efficiently filter out such questions that are not related to employment.

本願は、上記に鑑みてなされたものであって、利用者に対して効率的に情報を提供可能にする情報処理装置、情報処理方法および情報処理プログラムを提供することを目的とする。 The present application has been made in consideration of the above, and aims to provide an information processing device, an information processing method, and an information processing program that enable efficient provision of information to users.

上述した課題を解決し、目的を達成するために、本発明に係る情報処理装置は、所定の対象に関連する投稿情報に含まれる文字列のうち、当該投稿情報が提供されたサービスとは異なるサービスにおいて有用である有用文字列、および有用でない非有用文字列を抽出し、前記対象を示す対象情報と、前記有用文字列および前記非有用文字列とを対応付けた文字列情報を生成する生成部、を備えることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the information processing device according to the present invention is characterized by having a generation unit that extracts useful strings and non-useful strings that are useful in a service other than the service to which the posted information is provided from strings included in posted information related to a specific target, and generates string information that associates target information indicating the target with the useful strings and non-useful strings.

また、本発明に係る情報処理方法は、情報処理装置によって実行される情報処理方法であって、所定の対象に関連する投稿情報に含まれる文字列のうち、当該投稿情報が提供されたサービスとは異なるサービスにおいて有用である有用文字列、および有用でない非有用文字列を抽出し、前記対象を示す対象情報と、前記有用文字列および前記非有用文字列とを対応付けた文字列情報を生成する生成工程、を含むことを特徴とする。 The information processing method according to the present invention is an information processing method executed by an information processing device, and is characterized by including a generation step of extracting useful strings and non-useful strings that are useful in a service other than the service to which the posted information is provided from strings included in posted information related to a specific target, and generating string information that associates target information indicating the target with the useful strings and the non-useful strings.

また、本発明に係る情報処理プログラムは、所定の対象に関連する投稿情報に含まれる文字列のうち、当該投稿情報が提供されたサービスとは異なるサービスにおいて有用である有用文字列、および有用でない非有用文字列を抽出し、前記対象を示す対象情報と、前記有用文字列および前記非有用文字列とを対応付けた文字列情報を生成する生成手順、をコンピュータに実行させることを特徴とする。 The information processing program according to the present invention is characterized in that it causes a computer to execute a generation procedure for extracting useful strings and non-useful strings that are useful in a service other than the service to which the posted information is provided from strings included in posted information related to a specific target, and generating string information that associates target information indicating the target with the useful strings and non-useful strings.

本発明では、利用者に対して効率的に情報を提供することができる。 The present invention allows information to be provided to users efficiently.

図１は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of an information processing system according to an embodiment. 図２は、実施形態に係る情報処理装置の構成例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of the information processing device according to the embodiment. 図３は、実施形態に係る選択処理の具体例１を示す図である。FIG. 3 is a diagram showing a specific example 1 of the selection process according to the embodiment. 図４は、実施形態に係る選択処理の具体例２を示す図である。FIG. 4 is a diagram showing a second specific example of the selection process according to the embodiment. 図５は、実施形態に係る抽出処理の具体例を示す図である。FIG. 5 is a diagram showing a specific example of the extraction process according to the embodiment. 図６は、実施形態に係る情報処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the flow of information processing according to the embodiment. 図７は、ハードウェア構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a hardware configuration.

以下に、本願に係る情報処理装置、情報処理方法および情報処理プログラムを実施するための形態（以下、実施形態）について図面を参照しつつ詳細に説明する。なお、この実施形態により、本願に係る情報処理装置、情報処理方法および情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Below, the information processing device, information processing method, and information processing program according to the present application will be described in detail with reference to the drawings. Note that the information processing device, information processing method, and information processing program according to the present application are not limited to these embodiments. In addition, the same components in the following embodiments will be denoted by the same reference numerals, and duplicated descriptions will be omitted.

〔実施形態〕
以下に、実施形態に係る情報処理システム１００の構成、情報処理装置１０の構成、情報処理の具体例、情報処理の流れを順に説明し、最後に実施形態の効果を説明する。 [Embodiment]
The configuration of the information processing system 100 according to the embodiment, the configuration of the information processing device 10, a specific example of information processing, and the flow of information processing will be described below in this order, and finally, the effects of the embodiment will be described.

〔１．情報処理システム１００の構成〕
図１を用いて、実施形態に係る情報処理システム１００の処理を説明する。図１は、実施形態に係る情報処理システム１００の構成例を示す図である。以下では、情報処理システム１００の構成例、情報処理システム１００の処理、情報処理システム１００の効果の順に説明する。 1. Configuration of information processing system 100
The processing of the information processing system 100 according to the embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing a configuration example of the information processing system 100 according to the embodiment. Below, the configuration example of the information processing system 100, the processing of the information processing system 100, and the effects of the information processing system 100 will be described in that order.

（１－１．情報処理システム１００の構成例）
図１に示した情報処理システム１００は、情報処理装置１０を有する。なお、情報処理システム１００には、複数台の情報処理装置１０が含まれてもよい。また、情報処理システム１００では、情報処理装置１０に入力するデータとして、質問サイト投稿文（適宜、「投稿情報」）２０、また、情報処理装置１０が出力するデータとして、就職活動支援サイト表示画面３０上に表示されるＱ＆Ａ一覧（適宜、「表示情報」）３１が関与する。 (1-1. Configuration example of information processing system 100)
1 includes an information processing device 10. The information processing system 100 may include a plurality of information processing devices 10. The information processing system 100 involves data input to the information processing device 10, such as a question site posting (referred to as "posted information") 20, and data output by the information processing device 10, such as a Q&A list (referred to as "display information") 31 displayed on a job hunting support site display screen 30.

（１－１－１．情報処理装置１０）
情報処理装置１０は、質問サイト投稿文２０を収集したり、就職活動支援サイト表示画面３０を作成したりする就職活動支援サイトの管理者によって使用されるデバイス（コンピュータ）である。情報処理装置１０は、就職活動支援サイトの管理者による操作を受け付ける。なお、情報処理装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。図１の例では、情報処理装置１０がデスクトップＰＣにより実現される場合を示す。 (1-1-1. Information processing device 10)
The information processing device 10 is a device (computer) used by an administrator of a job hunting support site to collect question site posts 20 and create a job hunting support site display screen 30. The information processing device 10 accepts operations by the administrator of the job hunting support site. The information processing device 10 is realized, for example, by a smartphone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like. The example of FIG. 1 shows a case where the information processing device 10 is realized by a desktop PC.

（１－１－２．質問サイト投稿文２０）
質問サイト投稿文２０は、質問サイト上に当該サイトの利用者から投稿された、質問文と当該質問文に対する回答文とを含む文章データである。ここで、質問サイト投稿文２０は、質問文のみから構成されるデータであってもよい。また、質問サイト投稿文２０は、投稿された質問文と、当該質問文に対する回答文のうち最適の回答文（ベストアンサー）に評価された回答文のみとから構成されるデータであってもよい。さらに、質問サイト投稿文２０は、文章データの他、画像データ、動画データまたは音声データを含むものであってもよい。 (1-1-2. Q&A site post 20)
The message 20 posted to the question site is text data posted on the question site by users of the site, and includes a question and a response to the question. Here, the message 20 posted to the question site may be data consisting of only a question. The message 20 posted to the question site may also be data consisting of only a posted question and a response that has been evaluated as the optimal response (best answer) among the responses to the question. Furthermore, the message 20 posted to the question site may include image data, video data, or audio data in addition to text data.

（１－１－３．就職活動支援サイト表示画面３０）
就職活動支援サイト表示画面３０は、当該就職活動支援サイトに登録された企業（適宜、「事業者」）ごとに作成されたウェブページであり、Ｑ＆Ａ一覧３１の他、企業概要や求人、年収・給与等に関する情報も含まれる。 (1-1-3. Job hunting support site display screen 30)
The job hunting support site display screen 30 is a web page created for each company (or "business" as appropriate) registered on the job hunting support site, and includes, in addition to a Q&A list 31, information regarding company overview, job openings, annual income/salary, etc.

（１－２．情報処理システム１００の処理）
情報処理システム１００において、まず、情報処理装置１０の操作者は、質問サイト投稿文２０の入力処理を実行する（図１（１）参照）。次に、情報処理装置１０は、質問サイト投稿文２０からＱ＆Ａ一覧３１を生成する情報処理を実行する（図１（２）参照）。最後に、情報処理装置１０は、Ｑ＆Ａ一覧３１を含む就職活動支援サイト表示画面３０の出力処理を実行する（図１（３）参照）。以下では、情報処理システム１００の処理について、入力処理、情報処理、出力処理の順に詳細に説明する。 (1-2. Processing of Information Processing System 100)
In the information processing system 100, first, an operator of the information processing device 10 executes an input process for a message 20 posted to a question site (see FIG. 1 (1)). Next, the information processing device 10 executes an information process for generating a Q&A list 31 from the message 20 posted to the question site (see FIG. 1 (2)). Finally, the information processing device 10 executes an output process for a job hunting support site display screen 30 including the Q&A list 31 (see FIG. 1 (3)). Below, the processing of the information processing system 100 will be described in detail in the order of input processing, information processing, and output processing.

（１－２－１．入力処理）
図１（１）に示すように、情報処理装置１０は、操作者から質問サイト投稿文２０の入力を受け付け、入力データを取得する。このとき、情報処理装置１０は、図示しない端末から送信された質問サイト投稿文２０を、情報処理装置１０の通信部１１を介して入力を受け付けてもよい。 (1-2-1. Input Processing)
1(1), the information processing device 10 receives an input of a message 20 posted to a question site from an operator and acquires input data. At this time, the information processing device 10 may receive an input of the message 20 posted to a question site transmitted from a terminal (not shown) via the communication unit 11 of the information processing device 10.

（１－２－２．情報処理）
図１（２）に示すように、情報処理装置１０は、質問サイト投稿文２０からＱ＆Ａ一覧３１を生成する。以下では、当該情報処理について、投稿情報分類処理、文字列情報生成処理、投稿情報選択処理の順に詳細に説明する。 (1-2-2. Information Processing)
1B, the information processing device 10 generates a Q&A list 31 from the questions posted on the question site 20. The information processing will be described in detail below in the order of the posted information classification process, the character string information generation process, and the posted information selection process.

（１－２－２－１．投稿情報分類処理）
図１（２－１）に示すように、情報処理装置１０は、機械学習モデルである分類モデル（適宜、「学習モデル」）１４ｄを用いて、入力された質問サイト投稿文２０を分類する処理を行う（投稿情報分類処理）。このとき、情報処理装置１０は、収集された質問文に対して当該質問文が就職に関連する有用な情報であるか否かのラベルが付与されたアノテーションデータ（適宜、「学習データ」）によって学習したＤＮＮ（Deep Neural Network）等の分類モデル１４ｄを用いて、入力された質問サイト投稿文２０を分類する処理を行う。図１の例に示すように、情報処理装置１０は、質問サイト投稿文２０が入力されると、質問文ごとに有用性の判定結果を示した分類リスト１４ｂ－１を分類結果として出力する。 (1-2-2-1. Posted Information Classification Processing)
As shown in FIG. 1 (2-1), the information processing device 10 performs a process of classifying the inputted question site posted message 20 using a classification model (appropriately, "learning model") 14d, which is a machine learning model (posted information classification process). At this time, the information processing device 10 performs a process of classifying the inputted question site posted message 20 using a classification model 14d such as a deep neural network (DNN) that has been trained using annotation data (appropriately, "learning data") in which collected questions are labeled with whether or not the questions are useful information related to employment. As shown in the example of FIG. 1, when the question site posted message 20 is inputted, the information processing device 10 outputs a classification list 14b-1 indicating the usefulness judgment result for each question as a classification result.

図１の例を用いて説明すると、情報処理装置１０は、質問サイト投稿文２０に記載されたＮｏ．１の質問文「Ｚ社って外資企業？」について、企業名「Ｚ社」やキーワード「外資企業」をもとに、就職関連の質問文として分類する。また、情報処理装置１０は、質問サイト投稿文２０に記載されたＮｏ．２の質問文「Ｙ社・Ｒ社の年収は？」について、企業名「Ｙ社」、「Ｒ社」やキーワード「年収」をもとに、就職関連の質問文として分類する。 Explaining using the example of FIG. 1, the information processing device 10 classifies question No. 1 posted on the question site post 20, "Is Company Z a foreign-owned company?", as a question related to employment based on the company name "Company Z" and the keyword "foreign-owned company." The information processing device 10 also classifies question No. 2 posted on the question site post 20, "What are the annual salaries at Company Y and Company R?", as a question related to employment based on the company names "Company Y" and "Company R" and the keyword "annual salary."

一方、情報処理装置１０は、質問サイト投稿文２０に記載されたＮｏ．３の質問文「Ｌ社のアプリは便利？」について、企業名「Ｌ社」が含まれているがアプリ開発会社である「Ｌ社」の商品についての質問文であるので、就職非関連の質問文として分類する。また、情報処理装置１０は、質問サイト投稿文２０に記載されたＮｏ．４の質問文「Ｕ社の服で面接行っていい？」について、企業名「Ｕ社」が含まれているが衣服販売会社「Ｕ社」の商品についての質問文であるので、就職非関連の質問文として分類する。 On the other hand, the information processing device 10 classifies question No. 3 in the question site posting 20, "Is Company L's app convenient?", as a question not related to employment because it contains the company name "Company L" but is a question about a product of "Company L", an app development company. Also, the information processing device 10 classifies question No. 4 in the question site posting 20, "Can I wear Company U's clothes to the interview?", as a question not related to employment because it contains the company name "Company U" but is a question about a product of clothing sales company "Company U".

情報処理装置１０は、上記の分類モデル１４ｄによる分類結果として、Ｎｏ．１の質問文「○」、Ｎｏ．２の質問文「○」、Ｎｏ．３の質問文「×」、Ｎｏ．４の質問文「×」・・・の、就職に関する有用判定を記載した分類リスト１４ｂ－１を出力する。 The information processing device 10 outputs a classification list 14b-1 that lists the employment-related usefulness judgments for question No. 1 "○", question No. 2 "○", question No. 3 "×", question No. 4 "×", etc., as a classification result based on the above classification model 14d.

（１－２－２－２．文字列情報生成処理）
図１（２－２）に示すように、情報処理装置１０は、後述するキーワードマッチに用いるキーワード辞書（適宜、「文字列情報」）１４ｂ－２を生成する処理を行う（文字列情報生成処理）。ここで、キーワード辞書１４ｂ－２とは、企業ごとに作成された所定のサービス（例：就職、会社説明、金融等の支援サイト）に関する有用な情報であるか否かの判断に用いるキーワードをリスト化したものである。キーワード辞書１４ｂ－２にリスト化されるキーワードには、企業名に関するキーワード（以下、「企業名キーワード」）の他、上記サービスに質問文を表示する際に有用である有用キーワード（適宜、「有用文字列」）と、上記サービスに質問文を表示する際に有用でない非有用キーワード（適宜、「非有用文字列」）とが含まれる。 (1-2-2-2. Character String Information Generation Process)
As shown in FIG. 1 (2-2), the information processing device 10 performs a process of generating a keyword dictionary (suitably referred to as "character string information") 14b-2 used for keyword matching (described later) (character string information generation process). Here, the keyword dictionary 14b-2 is a list of keywords used to determine whether information is useful for a specific service (e.g., a support site for employment, company information, finance, etc.) created for each company. The keywords listed in the keyword dictionary 14b-2 include keywords related to company names (hereinafter, "company name keywords"), as well as useful keywords (suitably referred to as "useful character strings") that are useful when displaying a question on the above service, and non-useful keywords (suitably referred to as "non-useful character strings") that are not useful when displaying a question on the above service.

ここで、企業名キーワードのリスト化について説明する。情報処理装置１０は、企業名のカタログ等である企業名データベースから企業の正式名称と、企業ＩＤとを取得する。例えば、情報処理装置１０は、企業である「Ｙ社」の正式名称「Ｙ株式会社」と、企業ＩＤ「２３」とを取得し、「Ｙ株式会社」と「２３」とを紐づけしたキーワード辞書１４ｂ－２を生成する。次に、情報処理装置１０は、インターネット百科事典等の外部リソースが保持する情報（適宜、「外部リソース情報」）を取得し、上記のキーワード辞書１４ｂ－２に対して企業名の表記ゆれを追加する。例えば、情報処理装置１０は、上記の外部リソース情報として、住所、創立年月日、社長の名前、従業員数の情報をもとに、「Ｙ株式会社」が「Ｙ社」、「Ｙ」、「Ｙジャパン」等で表記されることを対応付ける。 Here, we will explain how to list company name keywords. The information processing device 10 acquires the official name of a company and the company ID from a company name database, which is a catalog of company names. For example, the information processing device 10 acquires the official name of a company "Y Co., Ltd.", "Y Co., Ltd.", and the company ID "23," and generates a keyword dictionary 14b-2 that links "Y Co., Ltd." and "23." Next, the information processing device 10 acquires information held by an external resource such as an Internet encyclopedia ("external resource information" as appropriate), and adds variations in the spelling of the company name to the above keyword dictionary 14b-2. For example, the information processing device 10 associates "Y Co., Ltd." with being spelled as "Y Co., Ltd.", "Y," "Y Japan," etc., based on information such as the address, date of establishment, name of the president, and number of employees as the above external resource information.

次に、有用キーワードおよび非有用キーワードのリスト化について説明する。ここで、有用キーワードは、企業ごとに抽出された文字列であって、当該文字列を含む質問文が所定のサービスに有用である可能性が高いことを示す文字列である。例えば、作業服販売会社「Ｗ社」の就職関連の質問文に含まれる有用キーワードとして「求人」等がある。一方、非有用キーワードは、企業ごとに抽出された文字列であって、当該文字列を含む質問文が所定のサービスに有用でない可能性が高いことを示す文字列である。例えば、ソフトウェア開発会社「Ｍ社」の就職非関連の質問文に含まれる非有用キーワードとして「アップデート」等がある。 Next, we will explain how to list useful keywords and non-useful keywords. Here, useful keywords are character strings extracted for each company, and are character strings that indicate that questions containing the character string are likely to be useful for a specified service. For example, a useful keyword included in an employment-related question from a workwear sales company, "Company W," is "job openings." On the other hand, non-useful keywords are character strings extracted for each company, and are character strings that indicate that questions containing the character string are likely not useful for a specified service. For example, a non-useful keyword included in a non-employment-related question from a software development company, "Company M," is "update."

情報処理装置１０は、就職に関する有用キーワードおよび非有用キーワードをリスト化する場合には、就職に関するアノテーションデータによって学習した分類モデル１４ｄを用いて、収集された特定の企業名を含んだラベルなし質問文を分類することによって仮ラベル付けを行う。例えば、情報処理装置１０は、「Ｙ社」に関するキーワード辞書１４ｂ－２を作成する場合には、「Ｙ社」の企業名を含んだラベルなし質問文の入力を受け付け、就職に関する分類モデル１４ｄを用いて、就職関連または就職非関連の仮ラベルが付与された質問文である仮ラベルデータを出力する。 When creating a list of useful and unuseful keywords related to employment, the information processing device 10 uses the classification model 14d learned from annotation data related to employment to temporarily label collected unlabeled questions that include specific company names. For example, when creating a keyword dictionary 14b-2 related to "Company Y," the information processing device 10 accepts input of an unlabeled question that includes the company name "Company Y," and uses the classification model 14d related to employment to output temporary label data that is a question that has been assigned a temporary label of employment-related or non-employment-related.

続いて、情報処理装置１０は、アノテーションデータの正例および仮ラベルデータの負例にはあまり含まれず、仮ラベルデータの正例には比較的多く含まれる単語を有用キーワードとして抽出する。一方、情報処理装置１０は、アノテーションデータの負例および仮ラベルデータの正例にはあまり含まれず、仮ラベルデータの負例には比較的多く含まれる単語を非有用キーワードとして抽出する。このとき、情報処理装置１０は、単語の出現頻度や相互情報量に基づいて、有用キーワードおよび非有用キーワードを抽出する。また、情報処理装置１０は、質問文に含まれる文字列の品詞を判定し、特定の品詞（例：名詞）のみを抽出してもよい。 Then, the information processing device 10 extracts, as useful keywords, words that are not often included in the positive examples of the annotation data and the negative examples of the temporary label data, and are relatively frequently included in the positive examples of the temporary label data. On the other hand, the information processing device 10 extracts, as non-useful keywords, words that are not often included in the negative examples of the annotation data and the positive examples of the temporary label data, and are relatively frequently included in the negative examples of the temporary label data. At this time, the information processing device 10 extracts useful keywords and non-useful keywords based on the frequency of appearance of the words and the amount of mutual information. The information processing device 10 may also determine the part of speech of the character string included in the question sentence, and extract only a specific part of speech (e.g., nouns).

最後に、情報処理装置１０は、抽出した有用キーワードおよび非有用キーワードを、企業名や企業ＩＤと紐づけしたキーワード辞書１４ｂ－２を生成する。このとき、情報処理装置１０は、企業ごとに生成されたキーワード辞書１４ｂ－２をもとに、企業のカテゴリで共通する有用キーワードおよび非有用キーワードを抽出してもよいし、すべての企業で共通する汎用的な有用キーワードおよび非有用キーワードを抽出してもよい。 Finally, the information processing device 10 generates a keyword dictionary 14b-2 that links the extracted useful and unuseful keywords with company names and company IDs. At this time, the information processing device 10 may extract useful and unuseful keywords that are common to a company category based on the keyword dictionary 14b-2 generated for each company, or may extract general-purpose useful and unuseful keywords that are common to all companies.

（１－２－２－３．投稿情報選択処理）
図１（２－３）に示すように、情報処理装置１０は、キーワード辞書１４ｂ－２を用いて、質問サイト投稿文２０の質問文のキーワードマッチを行う（投稿情報選択処理）。このとき、情報処理装置１０は、キーワード辞書１４ｂ－２に記載されている企業名キーワードをもとに、質問文と企業ＩＤとを紐づけしたキーワードマッチ結果（選択結果）である選択リスト１４ｂ－３を出力する。 (1-2-2-3. Posted Information Selection Process)
1 (2-3), the information processing device 10 uses the keyword dictionary 14b-2 to perform keyword matching of the question text posted to the question site 20 (posted information selection process). At this time, the information processing device 10 outputs a selection list 14b-3, which is a keyword matching result (selection result) that links the question text with a company ID based on the company name keywords written in the keyword dictionary 14b-2.

また、情報処理装置１０は、分類結果である分類リスト１４ｂ－１をもとに、選択リスト１４ｂ－３から質問文を削除する。例えば、情報処理装置１０は、選択リスト１４ｂ－３に記載されたＮｏ．１～Ｎｏ．４の質問文のうち、分類リスト１４ｂ－１において有用判定が「×」であるＮｏ．３およびＮｏ．４の質問文を削除する。 In addition, the information processing device 10 deletes questions from the selection list 14b-3 based on the classification list 14b-1, which is the classification result. For example, the information processing device 10 deletes questions No. 3 and No. 4, which have a usefulness judgment of "x" in the classification list 14b-1, from among questions No. 1 to No. 4 listed in the selection list 14b-3.

さらに、情報処理装置１０は、キーワード辞書１４ｂ－２に記載されている有用キーワードおよび非有用キーワードをもとに、選択リスト１４ｂ－３における質問文と企業ＩＤとの紐づけを変更する。例えば、情報処理装置１０は、有用キーワードが含まれていない質問文のうち、企業ＩＤと紐づけされている質問文があれば、当該企業ＩＤとの紐づけから外す。一方、情報処理装置１０は、非有用キーワードが含まれている質問文のうち、企業ＩＤと紐づけされている質問文があれば、当該企業ＩＤとの紐づけから外す。このとき、情報処理装置１０は、有用キーワードと非有用キーワードとのうち、いずれか一方を用いて選択処理を実行することができる。すなわち、情報処理装置１０は、有用キーワードおよび非有用キーワードが両方含まれている質問文がある場合には、有用キーワードを優先して紐づけしてもよいし、非有用キーワードを優先して紐づけから外してもよい。 Furthermore, the information processing device 10 changes the link between the question sentence and the company ID in the selection list 14b-3 based on the useful keywords and unuseful keywords listed in the keyword dictionary 14b-2. For example, if there is a question sentence that is linked to a company ID among the question sentences that do not contain a useful keyword, the information processing device 10 removes the link from the company ID. On the other hand, if there is a question sentence that is linked to a company ID among the question sentences that contain a unuseful keyword, the information processing device 10 removes the link from the company ID. At this time, the information processing device 10 can execute the selection process using either the useful keyword or the unuseful keyword. In other words, if there is a question sentence that contains both a useful keyword and an unuseful keyword, the information processing device 10 may give priority to linking the useful keyword, or may give priority to removing the unuseful keyword from the link.

上述してきた処理によって、情報処理装置１０は、企業ごとに作成した選択結果である選択リスト１４ｂ－３を、就職活動支援サイト表示画面３０に表示する表示情報であるＱ＆Ａ一覧３１として出力する。例えば、情報処理装置１０は、「Ｙ社」の就職活動支援サイト表示画面３０に表示するＱ＆Ａ一覧３１として、「Ｙ社の年収はどのくらいなのでしょうか？」の質問文および回答文を出力する。 By the above-mentioned process, the information processing device 10 outputs the selection list 14b-3, which is the selection result created for each company, as a Q&A list 31, which is display information to be displayed on the job hunting support site display screen 30. For example, the information processing device 10 outputs the question "How much is the annual salary at Company Y?" and the answer as the Q&A list 31 to be displayed on the job hunting support site display screen 30 for "Company Y."

また、情報処理装置１０は、企業ごとに作成した選択結果である選択リスト１４ｂ－３をもとに、同様の質問文が投稿された企業に関する情報を表示情報として出力することもできる。例えば、情報処理装置１０は、「年収はどのくらいなのでしょうか？」の質問文が投稿されている「Ｙ社」以外の企業として「Ｒ社」を出力することもできる。 The information processing device 10 can also output, as display information, information about companies to which similar questions have been posted, based on a selection list 14b-3, which is a selection result created for each company. For example, the information processing device 10 can output "Company R" as a company other than "Company Y" to which the question "How much is the annual salary?" has been posted.

（１－２－３．出力処理）
図１（３）に示すように、情報処理装置１０は、Ｑ＆Ａ一覧３１を含む就職活動支援サイト表示画面３０を出力する（図１（３）参照）。このとき、情報処理装置１０は、出力した表示情報であるＱ＆Ａ一覧３１を、通信部１１を介して図示しない端末に送信してもよい。 (1-2-3. Output processing)
As shown in Fig. 1 (3), the information processing device 10 outputs a job hunting support site display screen 30 including a Q&A list 31 (see Fig. 1 (3)). At this time, the information processing device 10 may transmit the Q&A list 31, which is the output display information, to a terminal (not shown) via the communication unit 11.

（１－３．情報処理システム１００の効果）
情報処理システム１００の効果について、有用キーワードおよび非有用キーワードを使用しない技術での問題点を具体的に説明した上で、詳細に説明する。 (1-3. Effects of Information Processing System 100)
The effects of the information processing system 100 will be described in detail after specifically explaining the problems with the technology that does not use useful keywords and unuseful keywords.

（１－３－１．問題点）
まず、有用キーワードおよび非有用キーワードを使用しない技術、すなわち所定のサービスごとに学習された分類モデル１４ｄによる分類処理の問題点について説明する。例えば、就職に関する分類モデル１４ｄを用いて、質問文「Ｍ社の提供する○○のような資格を取るとＹ社の就職に有利？」を分類することを考える。分類モデル１４ｄによる分類処理では、「資格」、「就職」の単語に反応し、就職関連の質問文であると判定する。しかしながら、上記の分類処理では、「Ｍ社」、「Ｙ社」の企業名に反応し、「Ｍ社」および「Ｙ社」の就職関連の質問文と判定してしまう。ここで、資格名「○○」は、「Ｍ社」が提供する資格であるので、「Ｍ社」の就職関連の質問文として分類することは適切ではない。上記の例であれば、資格名「○○」および「資格」の単語が含まれる質問文は「Ｍ社」とは紐づけせず、「Ｙ社」とは紐づけすることが好ましい。一方、上記の問題点を解消するために、「Ｍ社」や「Ｙ社」等の各企業専用の就職関連の分類モデル１４ｄを作成し、分類処理に用いることも難しい。 (1-3-1. Issues)
First, a problem with the classification process using a technology that does not use useful keywords and non-useful keywords, that is, a classification model 14d trained for each predetermined service, will be described. For example, consider classifying a question "If I obtain a qualification such as XX provided by Company M, will it be advantageous to get a job at Company Y?" using the classification model 14d related to employment. In the classification process using the classification model 14d, it reacts to the words "qualification" and "employment" and determines that the question is related to employment. However, in the above classification process, it reacts to the company names "Company M" and "Company Y" and determines that the question is related to employment at "Company M" and "Company Y". Here, since the qualification name "XX" is a qualification provided by "Company M", it is not appropriate to classify it as a question related to employment at "Company M". In the above example, it is preferable that a question containing the qualification name "XX" and the word "qualification" is not linked to "Company M" but is linked to "Company Y". On the other hand, in order to solve the above problem, it is difficult to create a classification model 14d related to employment for each company such as "Company M" and "Company Y" and use it in the classification process.

以上のように、企業によってその質問が就職に関連するか否かの基準は違うので、汎用の分類モデル１４ｄの分類処理のみで就職活動支援サイトに表示するＱ＆Ａ一覧３１を作成することは困難である。そこで、実施形態に係る情報処理システム１００では、上記の分類モデル１４ｄの分類処理に加えて、企業ごとの有用キーワードおよび非有用キーワードを含むキーワード辞書１４ｂ－２を作成し、当該キーワード辞書１４ｂ－２を用いてＱ＆Ａ一覧３１を選択することによって、上記の問題点を解消する。 As described above, different companies have different standards for whether a question is relevant to employment, so it is difficult to create a Q&A list 31 to be displayed on a job hunting support site using only the classification process of the general-purpose classification model 14d. Therefore, in the information processing system 100 according to the embodiment, in addition to the classification process of the classification model 14d, a keyword dictionary 14b-2 containing useful and unuseful keywords for each company is created, and the keyword dictionary 14b-2 is used to select the Q&A list 31, thereby resolving the above problem.

（１－３－２．概要）
情報処理システム１００では、情報処理装置１０は、質問サイトにおいて投稿された質問サイト投稿文２０を取得し、就職活動支援サイトにおいてＱ＆Ａ一覧３１を表示する際に有用である有用キーワード、および有用でない非有用キーワードのうち少なくとも１つに基づいて、就職活動支援サイトに登録された企業ごとに、質問サイト投稿文２０から就職活動支援サイトに表示するＱ＆Ａ一覧３１を選択し、当該Ｑ＆Ａ一覧３１を表示する。このとき、情報処理装置１０は、質問サイト投稿文２０と就職活動支援サイトにおける有用性とを学習した分類モデル１４ｄを用いて、取得された質問サイト投稿文２０を分類し、有用性があると分類された質問サイト投稿文２０から、Ｑ＆Ａ一覧３１に表示する質問文を選択する。さらに、情報処理装置１０は、企業に関連する質問サイト投稿文２０に含まれる単語のうち、就職活動支援サイトにおいて有用である有用キーワード、および有用でない非有用キーワードを抽出し、企業を示す企業ＩＤと、有用キーワードおよび非有用キーワードとを対応付けたキーワード辞書１４ｂ－２を生成する。 (1-3-2. Overview)
In the information processing system 100, the information processing device 10 acquires the question site posted text 20 posted on the question site, and selects the Q&A list 31 to be displayed on the job hunting support site from the question site posted text 20 for each company registered on the job hunting support site based on at least one of useful keywords that are useful when displaying the Q&A list 31 on the job hunting support site and unuseful keywords that are not useful, and displays the Q&A list 31. At this time, the information processing device 10 classifies the acquired question site posted text 20 using a classification model 14d that has learned the question site posted text 20 and its usefulness on the job hunting support site, and selects a question to be displayed on the Q&A list 31 from the question site posted text 20 classified as useful. Furthermore, the information processing device 10 extracts useful keywords that are useful on the job hunting support site and unuseful keywords that are not useful from among words included in the question site posted text 20 related to the company, and generates a keyword dictionary 14b-2 in which a company ID indicating the company is associated with the useful keywords and unuseful keywords.

（１－３－３．効果）
このため、情報処理システム１００では、利用者に対して効率的に情報を提供することができる。すなわち、情報処理システム１００では、就職活動支援サイトの利用者が閲覧するＱ＆Ａ一覧３１を効率的に作成することができる。また、情報処理システム１００では、上記のＱ＆Ａ一覧３１を作成するための有用キーワードおよび非有用キーワードを含むキーワード辞書１４ｂ－２を自動生成することができるので、さらにＱ＆Ａ一覧３１を効率的に、かつ効果的に作成することができる。 (1-3-3. Effects)
Therefore, the information processing system 100 can efficiently provide information to users. That is, the information processing system 100 can efficiently create the Q&A list 31 to be viewed by users of the job hunting support site. Furthermore, the information processing system 100 can automatically generate the keyword dictionary 14b-2 including useful keywords and non-useful keywords for creating the above-mentioned Q&A list 31, so that the Q&A list 31 can be created more efficiently and effectively.

〔２．情報処理装置１０の構成〕
図２を用いて、実施形態に係る情報処理装置１０の構成について説明する。図２は、実施形態に係る情報処理装置１０の構成例を示すブロック図である。図２に示すように、情報処理装置１０は、通信部１１、入力部１２、出力部１３、記憶部１４および制御部１５を有する。 2. Configuration of information processing device 10
The configuration of the information processing device 10 according to the embodiment will be described with reference to Fig. 2. Fig. 2 is a block diagram showing an example of the configuration of the information processing device 10 according to the embodiment. As shown in Fig. 2, the information processing device 10 has a communication unit 11, an input unit 12, an output unit 13, a storage unit 14, and a control unit 15.

（２－１．通信部１１）
通信部１１は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１は、所定の通信網（ネットワーク）と有線または無線で接続され、各種装置との間で情報の送受信を行う。 (2-1. Communication Unit 11)
The communication unit 11 is realized by, for example, a network interface card (NIC) etc. The communication unit 11 is connected to a predetermined communication network by wire or wirelessly, and transmits and receives information to and from various devices.

（２－２．入力部１２）
入力部１２は、例えば、キーボードやマウス等で実現される。そして、入力部１２は、情報処理装置１０の管理者等から各種操作を受け付ける。例えば、入力部１２は、情報処理装置１０の管理者等から質問サイト投稿文２０の入力を受け付ける。 (2-2. Input unit 12)
The input unit 12 is realized by, for example, a keyboard, a mouse, etc. The input unit 12 accepts various operations from an administrator or the like of the information processing device 10. For example, the input unit 12 accepts input of a question site post 20 from an administrator or the like of the information processing device 10.

（２－３．出力部１３）
出力部１３は、例えば、液晶ディスプレイ等で実現される。そして、出力部１３は、各種情報を表示する。例えば、出力部１３は、情報処理装置１０の制御部１５によって生成された就職活動支援サイト表示画面３０を表示する。 (2-3. Output unit 13)
The output unit 13 is realized by, for example, a liquid crystal display, etc. The output unit 13 displays various information. For example, the output unit 13 displays a job hunting support site display screen 30 generated by the control unit 15 of the information processing device 10.

（２－４．記憶部１４）
記憶部１４は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１４は、図２に示すように、収集情報記憶部１４ａ、処理結果記憶部１４ｂ、学習データ記憶部１４ｃおよび学習モデル１４ｄを有する。そして、記憶部１４は、制御部１５が動作する際に参照する各種情報や、制御部１５が動作した際に取得した各種情報を記憶する。 (2-4. Storage unit 14)
The storage unit 14 is realized by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk. As shown in Fig. 2, the storage unit 14 according to the embodiment has a collected information storage unit 14a, a processing result storage unit 14b, a learning data storage unit 14c, and a learning model 14d. The storage unit 14 stores various pieces of information referenced when the control unit 15 operates and various pieces of information acquired when the control unit 15 operates.

（２－４－１．収集情報記憶部１４ａ）
収集情報記憶部１４ａは、情報処理装置１０の取得部１５ａによって取得された収集情報を記憶する。例えば、収集情報記憶部１４ａは、質問サイトサーバから取得した質問サイト投稿文２０である投稿情報、企業名データベースから取得した企業の情報である事業者情報、インターネット百科事典等の外部リソースから取得した外部リソース情報等を記憶する。 (2-4-1. Collected information storage unit 14a)
The collected information storage unit 14a stores collected information acquired by the acquisition unit 15a of the information processing device 10. For example, the collected information storage unit 14a stores posted information which is the question site posted message 20 acquired from a question site server, business information which is information on a business acquired from a business name database, external resource information acquired from an external resource such as an Internet encyclopedia, and the like.

（２－４－２．処理結果記憶部１４ｂ）
処理結果記憶部１４ｂは、情報処理装置１０の分類部１５ｂ、生成部１５ｃおよび選択部１５ｄによって出力された処理結果を記憶する。例えば、処理結果記憶部１４ｂは、分類結果である分類部１５ｂによって出力された分類リスト１４ｂ－１、生成結果である生成部１５ｃによって出力されたキーワード辞書（文字列情報）１４ｂ－２、選択結果である選択部１５ｄによって出力された選択リスト１４ｂ－３等を記憶する。また、処理結果記憶部１４ｂは、表示部１５ｅが表示するＱ＆Ａ一覧（表示情報）３１を記憶してもよい。 (2-4-2. Processing result storage unit 14b)
The processing result storage unit 14b stores the processing results output by the classification unit 15b, the generation unit 15c, and the selection unit 15d of the information processing device 10. For example, the processing result storage unit 14b stores a classification list 14b-1 output by the classification unit 15b, which is the classification result, a keyword dictionary (character string information) 14b-2 output by the generation unit 15c, which is the generation result, and a selection list 14b-3 output by the selection unit 15d, which is the selection result. The processing result storage unit 14b may also store a Q&A list (display information) 31 displayed by the display unit 15e.

（２－４－３．学習データ記憶部１４ｃ）
学習データ記憶部１４ｃは、機械学習モデルの学習を行うための学習データを記憶する。例えば、学習データ記憶部１４ｃは、学習データ「入力データ、正解情報」として「質問サイト投稿文、就職に関する有用判定」等の機械学習モデル１４ｄに入力するためのラベル付きのアノテーションデータを記憶する。 (2-4-3. Learning Data Storage Unit 14c)
The learning data storage unit 14c stores learning data for learning the machine learning model. For example, the learning data storage unit 14c stores labeled annotation data such as "text posted on a Q&A site, usefulness judgment regarding employment" as learning data "input data, correct answer information" to be input to the machine learning model 14d.

（２－４－４．学習モデル１４ｄ）
学習モデル１４ｄは、質問サイト投稿文２０の入力に応じて就職に関する有用判定を出力するように学習された、単語ベースの線形分類モデルである。例えば、学習モデル１４ｄは、学習データ記憶部１４ｃに記憶される就職に関するアノテーションデータを用いて生成された学習済みモデルである。 (2-4-4. Learning model 14d)
The learning model 14d is a word-based linear classification model that is trained to output a useful judgment regarding employment in response to an input of a question site post 20. For example, the learning model 14d is a trained model generated using annotation data regarding employment stored in the learning data storage unit 14c.

（２－５．制御部１５）
制御部１５は、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、情報処理装置１０内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１５は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (2-5. Control Unit 15)
The control unit 15 is realized, for example, by a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) executing various programs (corresponding to an example of an information processing program) stored in a storage device inside the information processing device 10 using a RAM as a working area. The control unit 15 is also realized, for example, by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部１５は、取得部１５ａ、分類部１５ｂ、生成部１５ｃ、選択部１５ｄ、表示部１５ｅおよび学習部１５ｆを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１５の内部構成は、図２に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１５が有する各処理部の接続関係は、図２に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 2, the control unit 15 has an acquisition unit 15a, a classification unit 15b, a generation unit 15c, a selection unit 15d, a display unit 15e, and a learning unit 15f, and realizes or executes the functions and actions of the information processing described below. Note that the internal configuration of the control unit 15 is not limited to the configuration shown in FIG. 2, and may be other configurations as long as they perform the information processing described below. Also, the connection relationships between the processing units in the control unit 15 are not limited to the connection relationships shown in FIG. 2, and may be other connection relationships.

（２－５－１．取得部１５ａ）
取得部１５ａは、質問サイトにおいて投稿された投稿情報を取得する。例えば、取得部１５ａは、質問サイトを管理するサーバから、投稿情報である質問サイト投稿文２０を取得する。また、取得部１５ａは、企業名データベースから、企業の情報である事業者情報を取得する。また、取得部１５ａは、インターネット百科事典等の外部リソースから、外部リソース情報を取得する。 (2-5-1. Acquisition unit 15a)
The acquisition unit 15a acquires posted information posted on the question site. For example, the acquisition unit 15a acquires posted messages 20 on the question site, which are posted information, from a server that manages the question site. The acquisition unit 15a also acquires company information, which is company information, from a company name database. The acquisition unit 15a also acquires external resource information from external resources such as Internet encyclopedias.

なお、取得部１５ａは、取得した投稿情報、事業者情報および外部リソース情報を収集情報記憶部１４ａに格納する。 The acquisition unit 15a collects and stores the acquired post information, business information, and external resource information in the information storage unit 14a.

（２－５－２．分類部１５ｂ）
分類部１５ｂは、投稿情報と所定のサービスにおける有用性とを学習した機械学習モデル１４ｄを用いて、取得された投稿情報を分類する。例えば、分類部１５ｂは、質問サイト投稿文２０と就職活動支援サイトにおける有用性とをアノテーションデータを用いて学習した分類モデル１４ｄを用いて、取得された質問サイト投稿文２０を「就職関連」または「就職非関連」のいずれかに分類する。 (2-5-2. Classification unit 15b)
The classification unit 15b classifies the acquired posted information using the machine learning model 14d that has learned about the posted information and its usefulness in a predetermined service. For example, the classification unit 15b classifies the acquired question site posted message 20 into either "employment-related" or "non-employment-related" using the classification model 14d that has learned about the question site posted message 20 and its usefulness in a job-hunting support site using annotation data.

なお、分類部１５ｂは、分類対象となる質問サイト投稿文２０を収集情報記憶部１４ａから取得する。一方、分類部１５ｂは、分類結果である質問文ごとに有用判定が示された分類リスト１４ｂ－１を処理結果記憶部１４ｂに格納する。 The classification unit 15b acquires the question site posts 20 to be classified from the collected information storage unit 14a. Meanwhile, the classification unit 15b stores in the processing result storage unit 14b a classification list 14b-1 in which the usefulness judgment is indicated for each question, which is the classification result.

（２－５－３．生成部１５ｃ）
生成部１５ｃは、機械学習モデル１４ｄの学習に用いる学習データと、所定の対象に関連する投稿情報から機械学習モデル１４ｄを用いて出力された分類結果とに基づいて、当該対象に関連する投稿情報に含まれる文字列のうち、当該投稿情報が提供されたサービスとは異なるサービスにおいて有用である有用文字列（有用キーワード）、および有用でない非有用文字列（非有用キーワード）を抽出し、当該対象を示す対象情報と、有用文字列および非有用文字列とを対応付けた文字列情報を生成する。例えば、生成部１５ｃは、文字列情報として、対象に関するコンテンツを表示する際に、当該コンテンツ内に配置して表示する投稿情報の選択に用いられる情報を生成する。 (2-5-3. Generation unit 15c)
The generation unit 15c extracts useful strings (useful keywords) that are useful in a service other than the service to which the posted information is provided, and non-useful strings (non-useful keywords) that are not useful, from among strings included in the posted information related to a specific target, based on learning data used for training the machine learning model 14d and the classification result output by using the machine learning model 14d from the posted information related to the target, and generates string information in which target information indicating the target is associated with the useful strings and the non-useful strings. For example, the generation unit 15c generates, as the string information, information used to select posted information to be placed and displayed in content related to the target when the content is displayed.

具体的な例を挙げて説明すると、生成部１５ｃは、就職に関する分類モデル１４ｄの学習に用いるアノテーションデータと、企業「Ｙ社」に関連する質問サイト投稿文２０から分類モデル１４ｄを用いて出力された分類リスト１４ｂ－１とに基づいて、「Ｙ社」に関連する質問サイト投稿文２０に含まれる単語のうち、就職活動支援サイトにおいて有用である有用キーワード、および有用でない非有用キーワードを抽出し、「Ｙ社」を示す企業ＩＤ「２３」と、有用キーワードおよび非有用キーワードとを対応付けた選択リスト１４ｂ－３を生成する。 To explain this with a specific example, the generation unit 15c extracts useful keywords that are useful on a job hunting support site and unuseful keywords that are not useful from among the words contained in the questions site posts 20 related to "Company Y" based on the annotation data used to train the employment-related classification model 14d and the classification list 14b-1 output from the classification model 14d from the questions site posts 20 related to the company "Company Y." The generation unit 15c then generates a selection list 14b-3 that associates the company ID "23" indicating "Company Y" with the useful keywords and unuseful keywords.

有用キーワードおよび非有用キーワードの抽出処理について説明すると、生成部１５ｃは、文字列の出現頻度または相互情報量を含む判定値を算出し、学習データの正例において判定値が所定の閾値未満であって、分類結果の負例において判定値が所定の閾値未満であって、かつ分類結果の正例において判定値が所定の閾値以上である文字列を有用文字列として抽出し、文字列情報を生成する。すなわち、生成部１５ｃは、全体の企業における就職関連を示す質問文に少なく、かつ特定の企業における就職関連を示す質問文に顕著に多い単語を、企業ごとの特徴を示す就職関連のキーワードである有用キーワードとして抽出する。 Explaining the process of extracting useful keywords and non-useful keywords, the generation unit 15c calculates a judgment value including the frequency of occurrence or mutual information of a string, extracts as useful strings strings whose judgment values are less than a predetermined threshold in positive examples of the learning data, whose judgment values are less than the predetermined threshold in negative examples of the classification results, and whose judgment values are equal to or greater than the predetermined threshold in positive examples of the classification results, and generates string information. That is, the generation unit 15c extracts words that are rare in questions indicating employment-related matters at all companies and that are significantly more common in questions indicating employment-related matters at a specific company, as useful keywords that are employment-related keywords that indicate the characteristics of each company.

一方、生成部１５ｃは、文字列の出現頻度または相互情報量を含む判定値を算出し、学習データの負例において判定値が所定の閾値未満であって、分類結果の正例において判定値が所定の閾値未満であって、かつ分類結果の負例において判定値が所定の閾値以上である文字列を非有用文字列として抽出し、文字列情報を生成する。すなわち、生成部１５ｃは、全体の企業における就職非関連を示す質問文に少なく、かつ特定の企業における就職非関連を示す質問文に顕著に多い単語を、企業ごとの特徴を示す就職非関連のキーワードである非有用キーワードとして抽出する。 Meanwhile, the generation unit 15c calculates a judgment value including the frequency of occurrence or mutual information of a string, extracts strings whose judgment value is less than a predetermined threshold in negative examples of the learning data, whose judgment value is less than the predetermined threshold in positive examples of the classification results, and whose judgment value is equal to or greater than the predetermined threshold in negative examples of the classification results, as non-useful strings, and generates string information. That is, the generation unit 15c extracts words that are rare in questions indicating non-employment-related topics at all companies and that are significantly more common in questions indicating non-employment-related topics at a specific company, as non-useful keywords that are non-employment-related keywords that indicate the characteristics of each company.

なお、生成部１５ｃは、学習データであるアノテーションデータを学習データ記憶部１４ｃから取得する。また、生成部１５ｃは、所定の対象に関連する投稿情報である質問サイト投稿文２０を収集情報記憶部１４ａから取得する。一方、生成部１５ｃは、生成結果の文字列情報であるキーワード辞書１４ｂ－２を処理結果記憶部１４ｂに格納する。 The generation unit 15c acquires annotation data, which is learning data, from the learning data storage unit 14c. The generation unit 15c also acquires questions site posts 20, which are posted information related to a specific target, from the collected information storage unit 14a. Meanwhile, the generation unit 15c stores keyword dictionary 14b-2, which is character string information of the generated result, in the processing result storage unit 14b.

（２－５－４．選択部１５ｄ）
選択部１５ｄは、所定のサービスにおいて投稿情報に関する表示情報を表示する際に有用である有用文字列（有用キーワード）、および／または有用でない非有用文字列（非有用キーワード）に基づいて、所定の対象と当該投稿情報とを対応付けた表示情報を選択する。例えば、選択部１５ｄは、就職活動支援サイトにおける有用キーワード、および非有用キーワードのうち少なくとも１つに基づいて、就職活動支援サイトに登録された事業者（企業）ごとに、表示情報であるＱ＆Ａ一覧３１を選択する。 (2-5-4. Selection unit 15d)
The selection unit 15d selects display information in which a predetermined target is associated with the posted information, based on useful character strings (useful keywords) that are useful when displaying display information related to the posted information in a predetermined service, and/or non-useful character strings (non-useful keywords) that are not useful. For example, the selection unit 15d selects a Q&A list 31, which is display information, for each business entity (company) registered on the job hunting support site, based on at least one of useful keywords and non-useful keywords in the job hunting support site.

選択部１５ｄは、有用性があると分類された投稿情報から、表示情報を選択する。例えば、選択部１５ｄは、分類部１５ｂによって就職活動支援サイトにおいて有用である「就職関連」と分類された質問サイト投稿文２０から、企業ごとにＱ＆Ａ一覧３１を選択する。このとき、選択部１５ｄは、企業名を含む質問サイト投稿文２０を当該企業ＩＤと紐づけて企業ごとの選択リスト１４ｂ－３を作成し、当該選択リスト１４ｂ－３のうち分類部１５ｂによって「就職非関連」と分類された質問サイト投稿文２０を削除し、企業ごとにＱ＆Ａ一覧３１を選択する。 The selection unit 15d selects display information from the posted information classified as useful. For example, the selection unit 15d selects a Q&A list 31 for each company from the posts 20 to the question site classified by the classification unit 15b as "employment-related," which is useful for a job-hunting support site. At this time, the selection unit 15d creates a selection list 14b-3 for each company by linking the posts 20 to the question site that include a company name with the company ID, deletes the posts 20 to the question site that are classified by the classification unit 15b as "non-employment-related" from the selection list 14b-3, and selects a Q&A list 31 for each company.

さらに、選択部１５ｄは、有用キーワードが含まれる投稿情報を表示情報として選択し、非有用キーワードが含まれる投稿情報を表示情報として選択しない。例えば、選択部１５ｄは、企業名をもとに作成された企業ごとの選択リスト１４ｂ－３の有用キーワードが含まれていない質問文のうち、企業ＩＤと紐づけされている質問文があれば、当該企業ＩＤとの紐づけから外す。一方、選択部１５ｄは、企業名をもとに作成された企業ごとの選択リスト１４ｂ－３の非有用キーワードが含まれている質問文のうち、企業ＩＤと紐づけされている質問文があれば、当該企業ＩＤとの紐づけから外す。このとき、選択部１５ｄは、有用キーワードと非有用キーワードとのうち、いずれか一方を用いて選択処理を実行することができる。すなわち、選択部１５ｄは、有用キーワードおよび非有用キーワードが両方含まれている質問文がある場合には、有用キーワードを優先して紐づけしてもよいし、非有用キーワードを優先して紐づけから外してもよい。 Furthermore, the selection unit 15d selects posted information including useful keywords as display information, and does not select posted information including unuseful keywords as display information. For example, if there is a question sentence that is linked to a company ID among the questions that do not include useful keywords in the selection list 14b-3 for each company created based on the company name, the selection unit 15d removes the question sentence from the link with the company ID. On the other hand, if there is a question sentence that is linked to a company ID among the questions that include unuseful keywords in the selection list 14b-3 for each company created based on the company name, the selection unit 15d removes the question sentence from the link with the company ID. At this time, the selection unit 15d can execute the selection process using either the useful keywords or the unuseful keywords. That is, if there is a question sentence that includes both useful keywords and unuseful keywords, the selection unit 15d may preferentially link the useful keywords, or may preferentially remove the unuseful keywords from the link.

なお、選択部１５ｄは、投稿情報である質問サイト投稿文２０を収集情報記憶部１４ａから取得する。また、選択部１５ｄは、分類結果である分類リスト１４ｂ－１を処理結果記憶部１４ｂから取得する。一方、選択部１５ｄは、選択結果である企業ごとの最終的な選択リスト１４ｂ－３、すなわち企業ごとのＱ＆Ａ一覧３１を処理結果記憶部１４ｂに格納する。 The selection unit 15d acquires the posted information, ie, the question site posts 20, from the collected information storage unit 14a. The selection unit 15d also acquires the classification list 14b-1, which is the classification result, from the processing result storage unit 14b. On the other hand, the selection unit 15d stores the final selection list 14b-3 for each company, which is the selection result, i.e., the Q&A list 31 for each company, in the processing result storage unit 14b.

（２－５－５．表示部１５ｅ）
表示部１５ｅは、表示情報を表示する。例えば、表示部１５ｅは、就職活動支援サイト表示画面３０上に、企業ごとにＱ＆Ａ一覧３１を表示する。また、表示部１５ｅは、就職活動支援サイト表示画面３０上に、共通する質問文に出現する企業名を表示情報として表示する。なお、表示部１５ｅは、企業ごとのＱ＆Ａ一覧３１等の表示情報を処理結果記憶部１４ｂから取得する。また、表示部１５ｅは、企業ごとのＱ＆Ａ一覧３１等の表示情報を、図示しない事業者端末やデータベースに送信してもよい。 (2-5-5. Display unit 15e)
The display unit 15e displays the display information. For example, the display unit 15e displays a Q&A list 31 for each company on the job hunting support site display screen 30. Furthermore, the display unit 15e displays the names of companies appearing in common questions as display information on the job hunting support site display screen 30. The display unit 15e acquires the display information such as the Q&A list 31 for each company from the processing result storage unit 14b. Furthermore, the display unit 15e may transmit the display information such as the Q&A list 31 for each company to a business operator terminal or a database (not shown).

（２－５－６．学習部１５ｆ）
学習部１５ｆは、収集された質問文に対して当該質問文が就職に関する有用な情報であるか否かのラベルが付与されたアノテーションデータを用いて、入力された質問サイト投稿文２０が「就職関連」または「就職非関連」のいずれであるかの分類結果を出力するように、機械学習モデル１４ｄの学習を行う。このとき、学習部１３ｆは、バックプロパゲーション等により機械学習モデル１４ｄの学習を行ってもよい。また、学習部１５ｆは、複数の機械学習モデル１４ｄの学習を行うこともできる。 (2-5-6. Learning unit 15f)
The learning unit 15f uses annotation data in which a label is added to the collected question text indicating whether the question text is useful information related to employment, to train the machine learning model 14d so as to output a classification result indicating whether the input question site post 20 is "employment-related" or "non-employment-related". At this time, the learning unit 13f may train the machine learning model 14d by backpropagation or the like. The learning unit 15f may also train multiple machine learning models 14d.

〔３．情報処理の具体例〕
続いて、実施形態に係る情報処理の具体例について説明する。以下では、情報処理装置１０の有用キーワードを用いた選択処理、非有用キーワードを用いた選択処理、有用キーワードおよび非有用キーワードの抽出処理の具体例について説明する。 [3. Specific examples of information processing]
Next, a specific example of the information processing according to the embodiment will be described. A specific example of the selection process using useful keywords, the selection process using unuseful keywords, and the extraction process of useful keywords and unuseful keywords of the information processing device 10 will be described below.

（３－１．有用キーワードを用いた選択処理）
図３を用いて、実施形態に係る有用キーワードを用いた選択処理について説明する。図３は、実施形態に係る選択処理の具体例１を示す図である。以下では、作業服販売会社「Ｗ社」の就職関連の質問文に含まれる有用キーワードの例について説明する。 (3-1. Selection process using useful keywords)
A selection process using useful keywords according to the embodiment will be described with reference to Fig. 3. Fig. 3 is a diagram showing a specific example 1 of the selection process according to the embodiment. Below, an example of useful keywords included in a question text related to employment from a workwear sales company "Company W" will be described.

図３に示すように、「Ｗ社」の有用キーワードとして「求人」等が登録されている場合（図３（１）参照）、情報処理装置１０は、「求人」が含まれている質問文を「Ｗ社」の就職関連の質問文として選択する（図３（２）参照）。すなわち、情報処理装置１０は、「求人」というキーワードが「Ｗ社」の企業名が含まれる質問文において、就職関連の質問文である可能性が高いと判定する。図３の例で示すように、有用キーワードが含まれる質問文は、所定のサービス（例：就職活動支援サイト）に有用である可能性が高いことを示す。 As shown in FIG. 3, when "job vacancies" and the like are registered as useful keywords for "Company W" (see FIG. 3 (1)), the information processing device 10 selects questions containing "job vacancies" as employment-related questions for "Company W" (see FIG. 3 (2)). In other words, the information processing device 10 determines that the keyword "job vacancies" is highly likely to be an employment-related question in questions that include the company name of "Company W". As shown in the example of FIG. 3, questions that contain useful keywords indicate a high possibility of being useful for a specified service (e.g., a job hunting support site).

（３－２．非有用キーワードを用いた選択処理）
図４を用いて、実施形態に係る非有用キーワードを用いた選択処理について説明する。図４は、実施形態に係る選択処理の具体例２を示す図である。以下では、ソフトウェア開発会社「Ｍ社」の就職非関連の質問文に含まれる非有用キーワードの例について説明する。 (3-2. Selection process using non-useful keywords)
A selection process using non-useful keywords according to the embodiment will be described with reference to Fig. 4. Fig. 4 is a diagram showing a specific example 2 of the selection process according to the embodiment. Below, an example of non-useful keywords included in a question text not related to employment from a software development company "Company M" will be described.

図４に示すように、「Ｍ社」の非有用キーワードとして「アップデート」、「表計算ソフトＥ」等が登録されている場合（図４（１）参照）、情報処理装置１０は、「アップデート」、「表計算ソフトＥ」が含まれている質問文を「Ｍ社」の就職非関連の質問文として選択する（図４（２）参照）。すなわち、情報処理装置１０は、「Ｍ社」のサービスに関連する「アップデート」や、「Ｍ社」の製品である「表計算ソフトＥ」というキーワードが「Ｍ社」の企業名が含まれる質問文において、就職非関連の質問文である可能性が高いと判定する。図４の例で示すように、非有用キーワードが含まれる質問文は、所定のサービス（例：就職活動支援サイト）に有用でない可能性が高いことを示す。 As shown in FIG. 4, when "update", "spreadsheet software E", etc. are registered as non-useful keywords for "Company M" (see FIG. 4 (1)), the information processing device 10 selects questions containing "update" and "spreadsheet software E" as non-employment-related questions for "Company M" (see FIG. 4 (2)). That is, the information processing device 10 determines that keywords such as "update", which is related to the services of "Company M", and "spreadsheet software E", which is a product of "Company M", are likely to be non-employment-related questions in questions that include the company name of "Company M". As shown in the example of FIG. 4, questions that contain non-useful keywords are likely to be unuseful for a specified service (e.g., a job hunting support site).

（３－３．有用キーワードおよび非有用キーワードの抽出処理）
図５を用いて、実施形態に係る有用キーワードおよび非有用キーワードの抽出処理について説明する。図５は、実施形態に係る抽出処理の具体例を示す図である。ここで、アノテーションデータの正例は、全体の企業における「就職関連」とラベル付けされた質問文であり、アノテーションデータの負例は、全体の企業における「就職非関連」とラベル付けされた質問文であり、仮ラベルデータの正例は、特定の企業における「就職関連」と仮ラベル付けされた質問文であり、仮ラベルデータの負例は、特定の企業における「就職非関連」と仮ラベル付けされた質問文である。 (3-3. Extraction process of useful and non-useful keywords)
The extraction process of useful keywords and non-useful keywords according to the embodiment will be described with reference to Fig. 5. Fig. 5 is a diagram showing a specific example of the extraction process according to the embodiment. Here, a positive example of annotation data is a question sentence labeled as "employment-related" for all companies, a negative example of annotation data is a question sentence labeled as "non-employment-related" for all companies, a positive example of temporary label data is a question sentence temporarily labeled as "employment-related" for a specific company, and a negative example of temporary label data is a question sentence temporarily labeled as "non-employment-related" for a specific company.

図５に示すように、情報処理装置１０は、アノテーションデータの正例において少なく、仮ラベルデータの負例において少なく、かつ仮ラベルデータの正例において多く含まれる単語を有用キーワードとして抽出する。すなわち、情報処理装置１０は、全体の企業における就職関連を示す質問文に少なく、かつ特定の企業における就職関連を示す質問文に顕著に多い単語を、企業ごとの特徴を示す就職関連のキーワードである有用キーワードとして抽出する。 As shown in FIG. 5, the information processing device 10 extracts as useful keywords words that are rare in positive examples of annotation data, rare in negative examples of temporary label data, and common in positive examples of temporary label data. In other words, the information processing device 10 extracts words that are rare in questions related to employment at all companies, and that are noticeably common in questions related to employment at a specific company, as useful keywords that are employment-related keywords that indicate the characteristics of each company.

一方、情報処理装置１０は、アノテーションデータの負例において少なく、仮ラベルデータの正例において少なく、かつ仮ラベルデータの負例において多く含まれる単語を非有用キーワードとして抽出する。すなわち、情報処理装置１０は、全体の企業における就職非関連を示す質問文に少なく、かつ特定の企業における就職非関連を示す質問文に顕著に多い単語を、企業ごとの特徴を示す就職非関連のキーワードである非有用キーワードとして抽出する。 On the other hand, the information processing device 10 extracts as non-useful keywords words that are rare in negative examples of the annotation data, rare in positive examples of the temporary label data, and common in negative examples of the temporary label data. In other words, the information processing device 10 extracts words that are rare in questions indicating non-employment-related issues across all companies, and that are significantly common in questions indicating non-employment-related issues at a specific company, as non-useful keywords that are non-employment-related keywords that indicate the characteristics of each company.

〔４．情報処理の流れ〕
図６を用いて、実施形態に係る情報処理装置１０の情報処理の手順について説明する。図６は、実施形態に係る情報処理の流れの一例を示すフローチャートである。なお、下記のステップＳ１０１～Ｓ１０５は、異なる順序で実行することもできる。また、下記のステップＳ１０１～Ｓ１０５のうち、省略される処理があってもよい。 [4. Information processing flow]
The procedure of information processing of the information processing device 10 according to the embodiment will be described with reference to Fig. 6. Fig. 6 is a flowchart showing an example of the flow of information processing according to the embodiment. Note that the following steps S101 to S105 may be executed in a different order. Also, among the following steps S101 to S105, some processing may be omitted.

（４－１．投稿情報取得処理）
第１に、情報処理装置１０の取得部１５ａは、投稿情報取得処理を実行する（ステップＳ１０１）。例えば、取得部１５ａは、質問サイトサーバから、投稿された質問サイト投稿文２０を取得する。 (4-1. Posted Information Acquisition Process)
First, the acquisition unit 15a of the information processing device 10 executes a posted information acquisition process (step S101). For example, the acquisition unit 15a acquires a posted message 20 from a question site server.

（４－２．投稿情報分類処理）
第２に、情報処理装置１０の分類部１５ｂは、投稿情報分類処理を実行する（ステップＳ１０２）。例えば、分類部１５ｂは、就職に関する分類モデル１４ｄを用いて、質問サイト投稿文２０を就職関連、または就職非関連に分類する。 (4-2. Posted Information Classification Processing)
Second, the classification unit 15b of the information processing device 10 executes a posted information classification process (step S102). For example, the classification unit 15b classifies the question site posted message 20 into employment-related or non-employment-related messages using the employment-related classification model 14d.

（４－３．文字列情報生成処理）
第３に、情報処理装置１０の生成部１５ｃは、文字列情報生成処理を実行する（ステップＳ１０３）。例えば、生成部１５ｃは、アノテーションデータと企業ごとの質問サイト投稿文２０とから、就職に関する分類モデル１４ｄを用いて、企業ごとに有用キーワードおよび非有用キーワードを紐づけしたキーワード辞書１４ｂ－２を生成する。 (4-3. Character String Information Generation Process)
Third, the generating unit 15c of the information processing device 10 executes a character string information generating process (step S103). For example, the generating unit 15c generates a keyword dictionary 14b-2 in which useful keywords and non-useful keywords are linked for each company from the annotation data and the question site posts 20 for each company, using the classification model 14d related to employment.

（４－４．投稿情報選択処理）
第４に、情報処理装置１０の選択部１５ｄは、投稿情報選択処理を実行する（ステップＳ１０４）。例えば、選択部１５ｄは、キーワード辞書１４ｂ－２をもとに、企業ごとに選択リスト１４ｂ－３を作成し、就職活動支援サイトに表示するＱ＆Ａ一覧３１を生成する。 (4-4. Posted Information Selection Process)
Fourth, the selection unit 15d of the information processing device 10 executes a posted information selection process (step S104). For example, the selection unit 15d creates a selection list 14b-3 for each company based on the keyword dictionary 14b-2, and generates a Q&A list 31 to be displayed on the job hunting support site.

（４－５．投稿情報表示処理）
第５に、情報処理装置１０の表示部１５ｅは、投稿情報表示処理を実行し（ステップＳ１０５）、処理を終了する。例えば、表示部１５ｅは、企業ごとにＱ＆Ａ一覧３１を含む就職活動支援サイト表示画面３０を表示する。 (4-5. Posted Information Display Processing)
Fifth, the display unit 15e of the information processing device 10 executes the posted information display process (step S105), and ends the process. For example, the display unit 15e displays the job hunting support site display screen 30 including the Q&A list 31 for each company.

〔５．実施形態の効果〕
最後に、実施形態の効果について説明する。以下では、実施形態に係る処理に対応する効果１～７について説明する。 5. Effects of the embodiment
Finally, effects of the embodiment will be described below: Effects 1 to 7 corresponding to the processing according to the embodiment will be described below.

（５－１．効果１）
上述した実施形態に係る処理では、質問サイトにおいて投稿された投稿情報を取得し、所定の対象と当該投稿情報とを対応付けた表示情報を選択し、当該表示情報を表示する。このため、本処理では、利用者に対して効率的に情報を提供することができる。 (5-1. Effect 1)
In the process according to the embodiment described above, posted information posted on a question site is acquired, display information in which a predetermined target is associated with the posted information is selected, and the display information is displayed. Therefore, in this process, information can be efficiently provided to users.

（５－２．効果２）
上述した実施形態に係る処理では、投稿情報と所定のサービスにおける有用性とを学習した機械学習モデルを用いて、取得した投稿情報を分類し、有用性があると分類した投稿情報から、表示情報を選択する。このため、本処理では、所定のサービスにおける分類モデルを用いて、利用者に対して効率的に情報を提供することができる。 (5-2. Effect 2)
In the process according to the embodiment described above, the acquired posted information is classified using a machine learning model that has learned about the posted information and its usefulness in a predetermined service, and display information is selected from the posted information classified as useful. Therefore, in this process, information can be efficiently provided to users using a classification model in a predetermined service.

（５－３．効果３）
上述した実施形態に係る処理では、有用文字列が含まれる投稿情報を表示情報として選択し、非有用文字列が含まれる投稿情報を表示情報として選択しない。このため、本処理では、利用者に対して効率的に効果的な情報を提供することができる。 (5-3. Effect 3)
In the process according to the embodiment described above, posted information including useful character strings is selected as display information, and posted information including non-useful character strings is not selected as display information. Therefore, in this process, it is possible to efficiently provide effective information to users.

（５－４．効果４）
上述した実施形態に係る処理では、就職活動支援サイトにおいて表示情報を表示する際に有用である有用文字列、および／または有用でない非有用文字列に基づいて、就職活動支援サイトに登録された事業者と投稿情報とを対応付けた表示情報を選択する。このため、本処理では、就職活動支援サイトの利用者に対して効率的に情報を提供することができる。 (5-4. Effect 4)
In the process according to the embodiment described above, display information that associates businesses registered on the job hunting support site with posted information is selected based on useful character strings that are useful when displaying display information on the job hunting support site and/or non-useful character strings that are not useful. Therefore, this process can efficiently provide information to users of the job hunting support site.

（５－５．効果５）
上述した実施形態に係る処理では、機械学習モデルの学習に用いる学習データと、所定の対象に関連する投稿情報から機械学習モデルを用いて出力された分類結果とに基づいて、所定の対象に関連する投稿情報に含まれる文字列のうち、当該投稿情報が提供されたサービスとは異なるサービスにおいて有用である有用文字列、および有用でない非有用文字列を抽出し、所定の対象を示す対象情報と、当該有用文字列および当該非有用文字列とを対応付けた文字列情報を生成する。このため、本処理では、所定の対象ごとに効果的に文字列情報を生成することによって、利用者に対して効率的に情報を提供することができる。 (5-5. Effect 5)
In the process according to the embodiment described above, useful strings that are useful in a service other than the service to which the posted information is provided and non-useful strings that are not useful are extracted from strings included in the posted information related to a specific target based on the learning data used to train the machine learning model and the classification results output from the posted information related to a specific target using the machine learning model, and string information is generated that associates the useful strings and non-useful strings with target information indicating the specific target. Therefore, in this process, by effectively generating string information for each specific target, it is possible to efficiently provide information to users.

（５－６．効果６）
上述した実施形態に係る処理では、文字列の出現頻度または相互情報量を含む判定値を算出し、学習データの正例において判定値が所定の閾値未満であって、分類結果の負例において判定値が所定の閾値未満であって、かつ分類結果の正例において判定値が所定の閾値以上である文字列を有用文字列として抽出し、文字列情報を生成する。このため、本処理では、所定の対象ごとに効果的に有用文字列を含む文字列情報を生成することによって、利用者に対して効率的に情報を提供することができる。 (5-6. Effect 6)
In the process according to the embodiment described above, a judgment value including the frequency of occurrence or mutual information of a character string is calculated, and character strings whose judgment value is less than a predetermined threshold in positive examples of the learning data, whose judgment value is less than the predetermined threshold in negative examples of the classification result, and whose judgment value is equal to or greater than the predetermined threshold in positive examples of the classification result are extracted as useful character strings, and character string information is generated. Therefore, in this process, by generating character string information including useful character strings effectively for each predetermined target, information can be efficiently provided to users.

（５－７．効果７）
上述した本実施形態に係る処理では、文字列の出現頻度または相互情報量を含む判定値を算出し、学習データの負例において判定値が所定の閾値未満であって、分類結果の正例において判定値が所定の閾値未満であって、かつ分類結果の負例において判定値が所定の閾値以上である文字列を非有用文字列として抽出し、前記文字列情報を生成する。このため、本処理では、所定の対象ごとに効果的に非有用文字列を含む文字列情報を生成することによって、利用者に対して効率的に情報を提供することができる。 (5-7. Effect 7)
In the process according to the present embodiment described above, a judgment value including the frequency of occurrence or mutual information of a character string is calculated, and character strings for which the judgment value is less than a predetermined threshold in negative examples of the learning data, the judgment value is less than the predetermined threshold in positive examples of the classification result, and the judgment value is equal to or greater than the predetermined threshold in negative examples of the classification result are extracted as non-useful character strings, and the character string information is generated. Therefore, in this process, by generating character string information including non-useful character strings effectively for each predetermined target, it is possible to efficiently provide information to users.

〔ハードウェア構成〕
また、上述してきた実施形態に係る情報処理装置１０は、例えば、図７に示すような構成のコンピュータ１０００によって実現される。以下、情報処理装置１０を例に挙げて説明する。図７は、情報処理装置１０の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、およびメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [Hardware configuration]
The information processing device 10 according to the embodiment described above is realized, for example, by a computer 1000 having a configuration as shown in Fig. 7. The information processing device 10 will be described below as an example. Fig. 7 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing device 10. The computer 1000 has a CPU 1100, a RAM 1200, a ROM 1300, a HDD 1400, a communication interface (I/F) 1500, an input/output interface (I/F) 1600, and a media interface (I/F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each component. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, and programs that depend on the hardware of the computer 1000, etc.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、および、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、所定の通信網を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを所定の通信網を介して他の機器へ送信する。 HDD 1400 stores programs executed by CPU 1100 and data used by such programs. Communication interface 1500 receives data from other devices via a specified communication network and sends it to CPU 1100, and transmits data generated by CPU 1100 to other devices via the specified communication network.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as a display and a printer, and input devices such as a keyboard and a mouse, via the input/output interface 1600. The CPU 1100 acquires data from the input devices via the input/output interface 1600. The CPU 1100 also outputs generated data to the output devices via the input/output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides it to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700 and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable Disc), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.

例えば、コンピュータ１０００が実施形態に係る情報処理装置１０として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１５の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing device 10 according to the embodiment, the CPU 1100 of the computer 1000 executes programs loaded onto the RAM 1200 to realize the functions of the control unit 15. The CPU 1100 of the computer 1000 reads and executes these programs from the recording medium 1800, but as another example, the CPU 1100 may obtain these programs from another device via a specified communication network.

〔その他〕
以上、本願の実施形態を説明したが、これら実施形態の内容により本発明が限定されるものではない。また、前述した構成要素には、当業者が容易に想定できるもの、実質的に同一のもの、いわゆる均等の範囲のものが含まれる。さらに、前述した構成要素は適宜組み合わせることが可能である。さらに、前述した実施形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換または変更を行うことができる。〔others〕
Although the embodiments of the present application have been described above, the present invention is not limited to the contents of these embodiments. The above-described components include those that can be easily imagined by a person skilled in the art, those that are substantially the same, and those that are within the so-called equivalent range. Furthermore, the above-described components can be appropriately combined. Furthermore, various omissions, substitutions, or modifications of the components can be made without departing from the scope of the gist of the above-described embodiments.

また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 Furthermore, among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, specific names, various data, and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified. For example, the various information shown in each drawing is not limited to the information shown in the drawings.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 In addition, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc.

例えば、上述した情報処理装置１０は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットホーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティング等で呼び出して実現するなど、構成は柔軟に変更できる。 For example, the information processing device 10 described above may be realized by multiple server computers, and depending on the functions, the configuration can be flexibly changed, such as by calling an external platform using an API (Application Programming Interface) or network computing.

また、上述してきた実施形態および変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 The above-described embodiments and variations can be combined as appropriate to the extent that they do not cause inconsistencies in the processing content.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、制御部は、制御手段や制御回路に読み替えることができる。 The above-mentioned "section, module, unit" can be read as "means" or "circuit." For example, a control unit can be read as a control means or a control circuit.

１０情報処理装置
１１通信部
１２入力部
１３出力部
１４記憶部
１４ａ収集情報記憶部
１４ｂ処理結果記憶部
１４ｃ学習データ記憶部
１４ｄ学習モデル
１５制御部
１５ａ取得部
１５ｂ分類部
１５ｃ生成部
１５ｄ選択部
１５ｅ表示部
１５ｆ学習部
２０質問サイト投稿文（投稿情報）
３０就職活動支援サイト表示画面
３１Ｑ＆Ａ一覧（表示情報）
１００情報処理システム REFERENCE SIGNS LIST 10 Information processing device 11 Communication unit 12 Input unit 13 Output unit 14 Memory unit 14a Collected information memory unit 14b Processing result memory unit 14c Learning data memory unit 14d Learning model 15 Control unit 15a Acquisition unit 15b Classification unit 15c Generation unit 15d Selection unit 15e Display unit 15f Learning unit 20 Question site post (posted information)
30 Job hunting support site display screen 31 Q&A list (display information)
100 Information Processing System

Claims

a classification unit that classifies the acquired posted information using a machine learning model that has learned about posted information related to a predetermined target and usefulness of the posted information on a second website different from a first website to which the posted information is provided;
a generation unit that extracts useful strings that are useful on the second website and non-useful strings that are not useful from strings included in the posted information based on learning data used for learning the machine learning model and a classification result output from the posted information related to the target using the machine learning model, and generates string information in which target information indicating a string that identifies the target is associated with the useful strings and the non-useful strings;
Equipped with
the generation unit calculates a judgment value including an appearance frequency of a character string, and generates, as the useful character string, a character string in which the judgment value is less than a predetermined threshold in a positive example of the learning data, the judgment value is less than the predetermined threshold in a negative example of the classification result, and the judgment value is equal to or greater than the predetermined threshold in a positive example of the classification result.
23. An information processing apparatus comprising:

a classification unit that classifies the acquired posted information using a machine learning model that has learned about posted information related to a predetermined target and usefulness of the posted information on a second website different from a first website to which the posted information is provided;
a generation unit that extracts useful strings that are useful on the second website and non-useful strings that are not useful from strings included in the posted information based on learning data used for learning the machine learning model and a classification result output from the posted information related to the target using the machine learning model, and generates string information in which target information indicating a string that identifies the target is associated with the useful strings and the non-useful strings;
Equipped with
the generation unit calculates a judgment value including an appearance frequency of a character string, and generates, as the non-useful character string, a character string for which the judgment value is less than a predetermined threshold in a negative example of the learning data, the judgment value is less than the predetermined threshold in a positive example of the classification result, and the judgment value is equal to or greater than the predetermined threshold in a negative example of the classification result.
23. An information processing apparatus comprising:

the generation unit extracts useful character strings that are useful on the job hunting support site and non-useful character strings that are not useful from character strings included in posted information that is posted on a question site and is related to businesses registered on a job hunting support site, and generates the useful character strings and the non-useful character strings associated with the businesses.
3. The information processing apparatus according to claim 1, wherein the information processing apparatus is a computer.

the generation unit generates, as the character string information, information used for selecting posted information to be arranged and displayed within content related to a target when the content is displayed;
4. The information processing apparatus according to claim 1, wherein the information processing apparatus is a computer.

An information processing method executed by an information processing device,
a classification step of classifying the acquired posted information using a machine learning model that has learned about posted information related to a predetermined subject and the usefulness of the posted information on a second website different from a first website to which the posted information is provided;
a generation step of extracting useful strings that are useful on the second website and non-useful strings that are not useful from strings included in the posted information based on learning data used for learning the machine learning model and a classification result output from the posted information related to the target using the machine learning model, and generating string information in which target information indicating a string that identifies the target is associated with the useful strings and the non-useful strings;
Including,
the generating step calculates a judgment value including an appearance frequency of a character string, and generates, as the useful character string, a character string in which the judgment value is less than a predetermined threshold in a positive example of the learning data, the judgment value is less than the predetermined threshold in a negative example of the classification result, and the judgment value is equal to or greater than the predetermined threshold in a positive example of the classification result;
23. An information processing method comprising:

An information processing method executed by an information processing device,
a classification step of classifying the acquired posted information using a machine learning model that has learned about posted information related to a predetermined subject and the usefulness of the posted information on a second website different from a first website to which the posted information is provided;
a generation step of extracting useful strings that are useful on the second website and non-useful strings that are not useful from strings included in the posted information based on learning data used for learning the machine learning model and a classification result output from the posted information related to the target using the machine learning model, and generating string information in which target information indicating a string that identifies the target is associated with the useful strings and the non-useful strings;
Including,
the generating step calculates a judgment value including an appearance frequency of a character string, and generates, as the non-useful character string, a character string for which the judgment value is less than a predetermined threshold in a negative example of the learning data, the judgment value is less than a predetermined threshold in a positive example of the classification result, and the judgment value is equal to or greater than the predetermined threshold in a negative example of the classification result;
23. An information processing method comprising:

a classification step of classifying the acquired posted information using a machine learning model that has learned about posted information related to a predetermined subject and usefulness of the posted information on a second website different from a first website to which the posted information is provided;
a generation step of extracting useful strings that are useful on the second website and non-useful strings that are not useful from strings included in the posted information based on learning data used for training the machine learning model and a classification result output from the posted information related to the target using the machine learning model, and generating string information in which target information indicating a string that identifies the target is associated with the useful strings and the non-useful strings;
Run the following on your computer :
the generation step includes calculating a judgment value including an appearance frequency of a character string, and generating, as the useful character string, a character string in which the judgment value is less than a predetermined threshold in a positive example of the learning data, the judgment value is less than the predetermined threshold in a negative example of the classification result, and the judgment value is equal to or greater than the predetermined threshold in a positive example of the classification result;
2. An information processing program comprising:

a classification step of classifying the acquired posted information using a machine learning model that has learned about posted information related to a predetermined subject and usefulness of the posted information on a second website different from a first website to which the posted information is provided;
a generation step of extracting useful strings that are useful on the second website and non-useful strings that are not useful from strings included in the posted information based on learning data used for training the machine learning model and a classification result output from the posted information related to the target using the machine learning model, and generating string information in which target information indicating a string that identifies the target is associated with the useful strings and the non-useful strings;
Run the following on your computer:
the generation step includes calculating a judgment value including an appearance frequency of a character string, and generating, as the non-useful character string, a character string for which the judgment value is less than a predetermined threshold in a negative example of the learning data, the judgment value is less than a predetermined threshold in a positive example of the classification result, and the judgment value is equal to or greater than the predetermined threshold in a negative example of the classification result;
23. An information processing program comprising: