JP2015018491A

JP2015018491A - Information processing device and method

Info

Publication number: JP2015018491A
Application number: JP2013146556A
Authority: JP
Inventors: 江森　正; Tadashi Emori; 正江森
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2013-07-12
Filing date: 2013-07-12
Publication date: 2015-01-29
Anticipated expiration: 2033-07-12
Also published as: JP5850886B2

Abstract

PROBLEM TO BE SOLVED: To use user voice of which attributes are unknown, as teacher data for machine learning.SOLUTION: An information processing device comprises: voice acquisition means which acquires voice uttered by a user; attribute estimation means which estimates attributes of the acquired voice by using a discriminator having learned input data and teacher data of which attributes are already known; operation reception means which receives an operation for an advertisement that is associated with attributes and output for the user; and learning means which updates parameters of the discriminator by using combination of the attributes of the operated advertisement and the acquired voice.

Description

本発明は、機械学習に基づくネット広告に関する。 The present invention relates to an online advertisement based on machine learning.

スマートデバイスでは、検索語の入力その他の用途で、文字入力に代えて音声認識が広く用いられる。その際、発話された音声から発話内容を表す検索語以外にも、ユーザの性別、年代その他を識別する識別器を用いて属性を推定し、その結果を広告の抽出に用いることで、よりユーザに適した広告を提示することが可能になる。従来、音声から属性を推定するための識別器の教師データとしては、属性が判明しているユーザの音声を用いていた（例えば、特許文献１及び２参照）。 In smart devices, voice recognition is widely used instead of character input for input of search terms and other purposes. At that time, in addition to the search term representing the utterance content from the spoken voice, the attribute is estimated using a discriminator for identifying the gender, age, etc. of the user, and the result is used for the extraction of the advertisement. It is possible to present an advertisement suitable for. Conventionally, as the teacher data of the classifier for estimating the attribute from the voice, the voice of the user whose attribute is known has been used (see, for example, Patent Documents 1 and 2).

特開２０１２−０２９０９９号JP 2012-029099 A 特開２０１２−００２８５６号JP 2012-002856 A

しかし、ユーザ登録やアクセス履歴、その他の情報に基づいて属性が判明しているユーザは数が限られる。それ以外の多数のユーザによる発話は教師データとして利用できず、精度向上に限界があった。 However, the number of users whose attributes are known based on user registration, access history, and other information is limited. The utterances by many other users cannot be used as teacher data, and there is a limit to improving accuracy.

本発明の目的は、属性が未知のユーザの音声を機械学習の教師データとして活用することである。 An object of the present invention is to utilize voices of users whose attributes are unknown as teacher data for machine learning.

上記の目的をふまえ、本発明の一態様（１）である情報処理装置は、ユーザにより発話された音声を取得する音声取得手段と、取得された前記音声から、予め属性既知の入力データおよび教師データを用いて学習された識別器を用いて属性を推定する属性推定手段と、属性と対応付けられた広告であって前記ユーザへ出力された広告に対する操作を受け付ける操作受付手段と、操作された前記広告に関する属性と、取得された前記音声と、の組を用いて前記識別器のパラメータの更新を行う学習手段と、を備えたことを特徴とする。 Based on the above object, an information processing apparatus according to one aspect (1) of the present invention includes a voice acquisition unit that acquires a voice uttered by a user, input data having a known attribute in advance and a teacher from the acquired voice. An attribute estimation unit that estimates an attribute using a discriminator learned using data, an operation reception unit that receives an operation associated with the attribute and an advertisement output to the user, And learning means for updating parameters of the discriminator using a set of the attribute relating to the advertisement and the acquired voice.

本発明の他の態様（５）である情報処理方法は、上記態様を方法のカテゴリで捉えたもので、ユーザにより発話された音声を取得する音声取得処理と、取得された前記音声から、予め属性既知の入力データおよび教師データを用いて学習された識別器を用いて属性を推定する属性推定処理と、属性と対応付けられた広告であって前記ユーザへ出力された前記広告に対する操作を受け付ける操作受付処理と、操作された前記広告に関する属性と、取得された前記音声と、の組を用いて前記識別器のパラメータの更新を行う学習処理と、をコンピュータが実行することを特徴とする。 An information processing method according to another aspect (5) of the present invention is obtained by capturing the above aspect in a method category. From an acquired voice, a voice acquisition process for acquiring voice uttered by a user in advance. An attribute estimation process that estimates an attribute using a classifier that is learned using input data with known attributes and teacher data, and an operation that is associated with the attribute and that is output to the user is received. The computer executes an operation reception process, a learning process for updating a parameter of the classifier using a set of the attribute related to the operated advertisement and the acquired voice.

また、本発明の他の態様（２）は、上記いずれかの態様において、前記属性は、性別であることを特徴とする。 According to another aspect (2) of the present invention, in any one of the above aspects, the attribute is gender.

本発明の他の態様（３）は、上記いずれかの態様において、取得された前記音声から語を音声認識する音声認識手段を備え、前記広告選択手段は、音声認識された前記語に基づいて、前記広告記憶手段に記憶されている前記広告のなかから出力する広告を選択し、前記学習手段は、操作された前記広告に対応付けられている前記属性と前記音声との組を前記パラメータの更新に用いることを特徴とする。 According to another aspect (3) of the present invention, in any one of the above aspects, the speech recognition unit includes a speech recognition unit that recognizes a word from the acquired speech, and the advertisement selection unit is based on the speech-recognized word. , Selecting an advertisement to be output from among the advertisements stored in the advertisement storage means, and the learning means determines a set of the attribute and the voice associated with the operated advertisement as the parameter It is used for updating.

本発明の他の態様（４）は、上記いずれかの態様において、取得された前記音声から語を音声認識する音声認識手段を備え、前記広告選択手段は、推定された前記属性または音声認識された前記語の少なくとも一方に基づいて、前記広告記憶手段に記憶されている前記広告のなかから出力する広告を選択し、前記学習手段は、操作された前記広告に対応付けられている前記属性と推定された前記属性と、が一致する場合、前記音声と前記属性の組を前記パラメータの更新に用いることを特徴とする。 According to another aspect (4) of the present invention, in any one of the aspects described above, the aspect further includes speech recognition means for recognizing a word from the acquired speech, and the advertisement selection means is configured to recognize the estimated attribute or speech. Based on at least one of the words, an advertisement to be output is selected from the advertisements stored in the advertisement storage unit, and the learning unit includes the attribute associated with the operated advertisement and When the estimated attribute matches, the speech and attribute pair is used for updating the parameter.

本発明によれば、属性が未知のユーザの音声を機械学習の教師データとして活用することができる。 According to the present invention, the voice of a user whose attribute is unknown can be used as teacher data for machine learning.

本発明の実施形態について構成を示す機能ブロック図。The functional block diagram which shows a structure about embodiment of this invention. 本発明の実施形態におけるデータの例を示す図。The figure which shows the example of the data in embodiment of this invention. 本発明の実施形態におけるデータの例を示す図。The figure which shows the example of the data in embodiment of this invention. 本発明の実施形態における処理手順を示すフローチャート。The flowchart which shows the process sequence in embodiment of this invention. 本発明の実施形態における処理の全体を示す概念図。The conceptual diagram which shows the whole process in embodiment of this invention. 本発明の実施形態における処理の一例を示す概念図。The conceptual diagram which shows an example of the process in embodiment of this invention. 本発明の実施形態における処理の一例を示す概念図。The conceptual diagram which shows an example of the process in embodiment of this invention.

次に、本発明を実施するための形態（「実施形態」と呼ぶ）について図に沿って例示する。なお、背景技術や課題などで既に述べた内容と共通の前提事項は適宜省略する。 Next, a mode for carrying out the present invention (referred to as “embodiment”) will be illustrated along the drawings. In addition, the assumptions common to the contents already described in the background art and problems are omitted as appropriate.

〔１．構成〕
図１は、本実施形態における情報処理装置（「本装置１」とも呼ぶ）の構成を示す。本装置１は、端末Ｔから音声認識で受け付けた検索要求に対し、音声認識した語と、機械学習の結果を用いて音声から推定する属性（例えば、性別であるものとする）と、の少なくとも一方で選択した広告を付した検索結果を端末Ｔへ送り、送った広告に対する操作（例えば、マウスクリックや画面タップなどの選択操作）に応じて機械学習を行うサーバ装置である。本装置１は、機能又は手段毎の装置やシステムを組み合わせて実現してもよい。 [1. Constitution〕
FIG. 1 shows a configuration of an information processing apparatus (also referred to as “present apparatus 1”) in the present embodiment. In response to a search request received by speech recognition from the terminal T, the device 1 includes at least a speech-recognized word and an attribute (for example, gender) that is estimated from speech using a result of machine learning. On the other hand, it is a server device that sends a search result with a selected advertisement to the terminal T and performs machine learning according to an operation (for example, a selection operation such as a mouse click or a screen tap) for the sent advertisement. The device 1 may be realized by combining devices or systems for each function or means.

端末Ｔは、本装置１に通信ネットワークＮ（インターネット、携帯電話網その他）を介してアクセスするスマートデバイス（例えば、スマートフォン、タブレットＰＣその他）であり、多数のユーザに応じ多数存在する。 The terminal T is a smart device (for example, a smart phone, a tablet PC, or the like) that accesses the apparatus 1 via a communication network N (the Internet, a mobile phone network, or the like), and there are many terminals T corresponding to many users.

本装置１は、コンピュータの構成すなわち、ＣＰＵなどの演算制御部６と、主メモリや補助記憶装置等の記憶装置７と、通信ネットワークＮとの通信装置８（通信機器や通信アダプタなど）と、を備える。端末Ｔも、仕様は異なるが同様にコンピュータの構成を有する（図示省略）ほか、マイクロホン、受話用のマイクロホンやスピーカーその他、モバイル端末としての構成を有する（図示省略）。 The apparatus 1 includes a computer configuration, that is, an arithmetic control unit 6 such as a CPU, a storage device 7 such as a main memory and an auxiliary storage device, a communication device 8 (such as a communication device and a communication adapter) with a communication network N, Is provided. The terminal T also has a computer configuration (not shown), although the specifications are different, and also has a configuration as a mobile terminal (not shown) such as a microphone, a receiving microphone and a speaker.

本装置１では、記憶装置７に記憶されている図示しないコンピュータプログラムを演算制御部６が実行することで、図１に示す各要素を実現する。 In the present apparatus 1, each element shown in FIG. 1 is realized by the arithmetic control unit 6 executing a computer program (not shown) stored in the storage device 7.

実現される要素のうち、情報の記憶手段は、本装置１内のいわゆるローカル記憶に限らず、ネットワーク・コンピューティング（クラウド）などによるリモート記憶でもよい。また、本出願に示す記憶手段は、説明の便宜に合わせた単位、かつ主なものである。実際の記憶手段は、情報の記憶に付随する入出力や管理などの機能を含んでもよいし、構成の単位を分割または一体化してもよいし、ワークエリアなど他の記憶手段を適宜用いてもよい。 Among the realized elements, the information storage means is not limited to the so-called local storage in the apparatus 1 but may be remote storage by network computing (cloud) or the like. Further, the storage means shown in the present application is a unit that is convenient for explanation and is a main one. The actual storage means may include functions such as input / output and management associated with information storage, the unit of the configuration may be divided or integrated, and other storage means such as a work area may be used as appropriate. Good.

記憶手段のうち、広告記憶手段４５は、属性と対応付けられた広告を記憶している手段である（例えば図２）。図２に例示する広告は、検索連動型広告にユーザの属性を考慮したものである。例えば、広告ＩＤ「Ａ０１」の広告は、端末Ｔからの検索要求に対し、指定語（例えば「紅茶」）を含むか、又はユーザの属性が広告属性（ここでは「女性」）と一致するとき、出力する広告として選択される。 Of the storage means, the advertisement storage means 45 is means for storing advertisements associated with attributes (for example, FIG. 2). The advertisement illustrated in FIG. 2 is a search-linked advertisement that considers user attributes. For example, the advertisement with the advertisement ID “A01” includes a designated word (for example, “tea”) in response to a search request from the terminal T, or the user attribute matches the advertisement attribute (here, “female”). , Selected as an advertisement to be output.

上記の例では、出力される広告は、ＵＲＬ（http://tea・・・）へのハイパーリンクを伴う広告文（例えば「紅茶送料無料・・・」云々）である。このように、出力の条件として広告に対応付けられている属性を「広告属性」と呼ぶこととする。１つの広告に複数の広告属性が対応してもよい。 In the above example, the output advertisement is an advertisement sentence (for example, “tea free shipping ...” etc.) with a hyperlink to the URL (http: // tea...). As described above, an attribute associated with an advertisement as an output condition is referred to as an “advertisement attribute”. A plurality of advertisement attributes may correspond to one advertisement.

教師データ記憶手段７５は、機械学習における学習に用いるデータである教師データ（広義の教師データ）を記憶する手段である（例えば図３）。教師データの単位は、発話された音声データを入力データとし、その音声データが与えられたときに出力されることが望ましい属性すなわち狭義の教師データとするデータの組である。 The teacher data storage means 75 is means for storing teacher data (broad sense teacher data) that is data used for learning in machine learning (for example, FIG. 3). The unit of the teacher data is a set of data having the spoken voice data as input data and the attributes that are preferably output when the voice data is given, that is, teacher data in a narrow sense.

図３に例示するように、教師データは、前回の学習に用いた既存データと（図３の例では「追加フラグ」が既存を表す「０」）、前回の学習より後に追加された追加データと（「追加フラグ」が新規を表す「１」）、を含む。 As illustrated in FIG. 3, the teacher data includes the existing data used in the previous learning (in the example of FIG. 3, “0” indicating existing in the example of FIG. 3), and additional data added after the previous learning. ("Addition flag" is "1" indicating new)).

なお、図１及びその他の図中の矢印は、データ又は制御その他の流れの主な方向の例示で、他の流れの否定も方向の限定も意味しない。また、記憶手段以外の各手段は、以下に述べる情報処理の機能又は作用を実現又は実行する処理手段であるが、これら機能又は作用は、専ら説明のための単位で、実際のハードウェア及びソフトウェアの要素を限定するものではない。 Note that the arrows in FIG. 1 and other drawings are examples of main directions of data or control and other flows, and do not mean the denial of other flows or limitation of directions. In addition, each means other than the storage means is a processing means for realizing or executing the following information processing functions or operations, but these functions or operations are units for explanation only, actual hardware and software. The elements of are not limited.

〔２．作用〕
図４は、本装置１の動作を示すフローチャートである。図５は、本実施形態における処理の全体を示す概念図、図６及び図７は、本実施形態における処理の例を示す概念図である。まず、図４のフローチャートに沿って、一部のステップを省略して、動作の概要を説明する。 [2. Action)
FIG. 4 is a flowchart showing the operation of the apparatus 1. FIG. 5 is a conceptual diagram showing the entire processing in the present embodiment, and FIGS. 6 and 7 are conceptual diagrams showing examples of processing in the present embodiment. First, an outline of the operation will be described along the flowchart of FIG. 4 with some steps omitted.

〔２−１．概要〕
端末Ｔのユーザが音声認識で検索（例えば、ウェブ検索、画像検索、辞書検索、商品検索、ミニブログの投稿検索その他）するとき、端末Ｔは発話された音声のデータを本装置１へ送信する。 [2-1. Overview〕
When the user of the terminal T performs a search by voice recognition (for example, web search, image search, dictionary search, product search, miniblog post search, etc.), the terminal T transmits the spoken voice data to the apparatus 1. .

本装置１では、音声取得手段２０が、ユーザにより発話された音声を端末Ｔから取得し（ステップＳ１１）、属性推定手段３０は、取得された音声から、最適化されたパラメータＰをもつ識別器を用いて属性を推定する（ステップＳ１２）。 In this apparatus 1, the voice acquisition unit 20 acquires the voice uttered by the user from the terminal T (step S11), and the attribute estimation unit 30 determines the classifier having the optimized parameter P from the acquired voice. Is used to estimate the attribute (step S12).

識別器およびパラメータＰの最適化に用いるアルゴリズムは、入力データと、その教師データ（狭義の教師データ）と、（両者併せて広義の教師データ）を用いる方法（教師あり学習）であれば何でもよい。例えば、音声データから性別等を判断する場合、識別器としＧＭＭ（Gaussian Mixture Model）が広く用いられ、そのパラメータの最適化に前記の機械学習としてＥＭアルゴリズムが用いられる。 The algorithm used for optimizing the discriminator and the parameter P may be any method that uses input data, its teacher data (narrow sense teacher data), and (both in a broad sense teacher data) (supervised learning). . For example, when gender or the like is determined from speech data, a GMM (Gaussian Mixture Model) is widely used as a discriminator, and an EM algorithm is used as the machine learning for optimizing the parameters.

教師データは、識別器であるＧＭＭのパラメータを最適化するのに使われ、例えば狭義の教師データが男性である音声データ（入力データ）は、男性という属性を識別するためのＧＭＭの学習に用いられる。一方、狭義の教師データが女性である音声データ（入力データ）は、男性のときと同様、女性という属性を識別するためのＧＭＭのパラメータの最適化に用いられる。ここで、最適化されたパラメータを前述のパラメータＰとする。 The teacher data is used to optimize the parameters of the GMM that is a discriminator. For example, speech data (input data) in which the narrowly-sensed teacher data is male is used for learning the GMM for identifying the attribute of male. It is done. On the other hand, voice data (input data) whose teacher data in the narrow sense is female is used for optimization of GMM parameters for identifying the attribute of female, as in the case of male. Here, let the optimized parameter be the above-mentioned parameter P.

入力された音声の属性を推定する際には、入力された音声が、男性の音声データで最適化されたＧＭＭと、女性の音声データで最適化されたＧＭＭ、それぞれと照合（スコア計算）され、照合の結果近い（スコアが大きい）とされた性別を「推定属性」と呼ぶこととする。ここで、性別を例に出したが、年齢など別の属性にも応用可能である。 When estimating the attributes of the input speech, the input speech is collated (score calculation) with a GMM optimized with male speech data and a GMM optimized with female speech data. The gender that is close as a result of the collation (having a high score) will be referred to as an “estimated attribute”. Here, gender is taken as an example, but it can also be applied to other attributes such as age.

一方、音声認識手段３２は、取得された音声から語を音声認識する（ステップＳ１３）。このように認識された語を「認識語」と呼ぶこととする。音声認識は、公知の技術（例えば、統計的手法、動的時間伸縮法、隠れマルコフモデルその他）による。認識語の用途の一つは、検索キーワードである。検索手段３５は、索引記憶手段３６に記憶されているインデックスデータを用い、認識語をキーとした検索を行って検索結果を得る（ステップＳ１４）。 On the other hand, the voice recognition means 32 recognizes a word from the acquired voice (step S13). The word recognized in this way is called a “recognized word”. Speech recognition is based on a known technique (for example, a statistical method, a dynamic time stretching method, a hidden Markov model, or the like). One of the uses of the recognition word is a search keyword. The search means 35 uses the index data stored in the index storage means 36 to perform a search using the recognition word as a key to obtain a search result (step S14).

また、広告選択手段４２は、広告記憶手段４５に記憶されている広告のなかから、検索結果と共に出力する広告を選択する（ステップＳ１５）。広告出力手段５０は、選択された広告を、検索手段３５による検索結果と組み合わせて端末Ｔへ送信することによりユーザへ出力する（ステップＳ１６）。 Moreover, the advertisement selection means 42 selects the advertisement output with a search result from the advertisements memorize | stored in the advertisement memory | storage means 45 (step S15). The advertisement output means 50 outputs the selected advertisement to the user by transmitting it to the terminal T in combination with the search result by the search means 35 (step S16).

操作受付手段６０は、出力された広告に対するクリック（広告に対する操作を意味し、マウスクリックに限らずタップその他を含む）を受け付ける（ステップＳ１７）。 The operation accepting means 60 accepts a click on the output advertisement (meaning an operation on the advertisement, including not only a mouse click but also a tap or the like) (step S17).

広告がクリックされた場合（ステップＳ１７：「ＹＥＳ」）、学習手段７２は、クリックされた広告に関する属性と、取得された音声と、の組を（ステップＳ１９及びＳ２１）機械学習の学習すなわち再学習によるパラメータ調整に用いる（ステップＳ２２及びＳ２３）。より具体的には、クリックされた広告に関する属性（狭義の教師データ）と取得された音声（入力データ）の組である広義の教師データが追加データとして教師データ記憶手段７５に所定数以上貯まると（ステップＳ２２：「ＹＥＳ」）、再学習を行う（ステップＳ２３）。 When the advertisement is clicked (step S17: “YES”), the learning means 72 learns machine learning, that is, relearning, a set of the attribute related to the clicked advertisement and the acquired voice (steps S19 and S21). (Steps S22 and S23). More specifically, when a large number of teacher data in a broad sense, which is a set of attributes (narrow sense teacher data) related to the clicked advertisement and acquired voice (input data), are stored in the teacher data storage means 75 as additional data, a predetermined number or more. (Step S22: “YES”), relearning is performed (step S23).

学習に用いる「クリックされた広告に関する属性」は、例えば、広告属性でもよいし、推定属性であってその広告を選択する根拠となった属性その他でもよく、以下に２つの例を示す。 The “attribute related to the clicked advertisement” used for learning may be, for example, an advertisement attribute, an estimated attribute that is a basis for selecting the advertisement, or the like. Two examples are shown below.

〔２−２．学習に用いる属性〕
第１の例は、推定属性と広告属性の一致に基づく学習である（図６）。この場合、広告選択手段４２は、推定属性または認識語の少なくとも一方に基づいて、広告記憶手段４５に記憶されている広告のなかから出力する広告を選択する（ステップＳ１５）。 [2-2. Attributes used for learning)
The first example is learning based on a match between an estimated attribute and an advertisement attribute (FIG. 6). In this case, the advertisement selection unit 42 selects an advertisement to be output from among the advertisements stored in the advertisement storage unit 45 based on at least one of the estimated attribute or the recognition word (step S15).

そして、学習手段７２は、クリックされた広告の広告属性と推定属性と、が一致する場合（ステップＳ１８：「ＹＥＳ」）、その一致している属性と音声との組を（ステップＳ１９）学習に用いる（ステップＳ２２及びＳ２３）。例えば、図６の例では、推定属性と、認識語に基づく広告の広告属性がいずれも女性で一致しているので、それらの基となった音声と、属性「女性」との組を学習に用いる。 When the advertisement attribute and the estimated attribute of the clicked advertisement match (step S18: “YES”), the learning unit 72 learns the pair of the matching attribute and the voice (step S19). Used (steps S22 and S23). For example, in the example of FIG. 6, the estimated attribute and the advertisement attribute of the advertisement based on the recognition word are all the same for women. Use.

第２の例は、広告属性を優先する学習である（図７）。図４の流れでは、広告属性と推定属性が不一致の場合に（ステップＳ１８：「ＮＯ」）、広告属性を優先する設定になっているとき（ステップＳ２０：「ＹＥＳ」）、第２の例である広告属性を優先する学習（図７）を行う（ステップＳ２１からＳ２３）。 The second example is learning that prioritizes advertisement attributes (FIG. 7). In the flow of FIG. 4, when the advertisement attribute and the estimated attribute do not match (step S18: “NO”), when the advertisement attribute is set to be prioritized (step S20: “YES”), in the second example, Learning (FIG. 7) giving priority to a certain advertisement attribute is performed (steps S21 to S23).

広告属性を優先する学習が効果を発揮するのは、広告選択手段４２が、広告記憶手段４５に記憶されている広告のなかから出力する広告を認識語に基づいて選択する場合である（ステップＳ１４）。この場合、認識語に基づいて選択された広告の広告属性は、音声に基づく推定属性と一致するとは限らない。 Learning that prioritizes advertisement attributes is effective when the advertisement selection means 42 selects an advertisement to be output from advertisements stored in the advertisement storage means 45 based on the recognition word (step S14). ). In this case, the advertisement attribute of the advertisement selected based on the recognition word does not necessarily match the estimated attribute based on the voice.

そして、学習手段７２は、クリックされた広告の広告属性と音声との組を（ステップＳ２１）学習に用いる（ステップＳ２２及びＳ２３）。例えば、図７の例では、推定属性（ここでは「男性」）を無視して、認識語（例えば「紅茶」）に基づく広告の広告属性（「女性」）と音声との組を学習に用いる。 And the learning means 72 uses the group of the advertisement attribute and audio | voice of the clicked advertisement (step S21) for learning (step S22 and S23). For example, in the example of FIG. 7, the estimated attribute (here, “male”) is ignored, and the advertisement attribute (“female”) based on the recognition word (for example, “tea”) and the voice are used for learning. .

〔３．効果〕
（１）以上のように、本実施形態では、属性が未知のユーザにより発話された音声も、操作された広告に関する属性と組み合わせることにより（図５から図７）、広告の選択という目的に適合した機械学習の教師データとして活用することができる。 [3. effect〕
(1) As described above, in the present embodiment, the voice uttered by the user whose attribute is unknown is combined with the attribute related to the operated advertisement (FIGS. 5 to 7), and is suitable for the purpose of selecting the advertisement. Can be used as machine learning teacher data.

特に、発話したユーザに対し出力する広告を選ぶ目的のためには、音声からの推定による属性が厳密にそのユーザの真の属性（例えば、生物学的ジェンダー）と一致する必要はない。例えば、たとえ男性でも女性向けの広告を操作する傾向がある人の音声は、女性と推定すべき教師データとして学習に用いても、広告の選択という目的には適合する。 In particular, for the purpose of selecting an advertisement to be output to a spoken user, the attribute estimated from speech need not exactly match the user's true attribute (eg, biological gender). For example, the voice of a person who tends to manipulate advertisements for men even for men is suitable for the purpose of advertisement selection even if it is used for learning as teacher data to be estimated as women.

（２）また、本実施形態では、最も基本的なユーザの属性である性別を対象に機械学習を充実させることにより、属性推定の精度向上の利点を多くの応用分野に及ぼすことができる。 (2) Further, in this embodiment, by enhancing machine learning for the sex that is the most basic user attribute, the advantage of improving the accuracy of attribute estimation can be exerted in many application fields.

（３）また、本実施形態では、操作された広告に対応する属性（広告属性）と音声の組を機械学習に用いることにより（図６）、広告の選択という目的により適合した学習が実現でき、ＣＴＲの改善が期待できる。 (3) Also, in this embodiment, by using a set of attributes (advertising attributes) and speech corresponding to the operated advertisement for machine learning (FIG. 6), learning suitable for the purpose of advertisement selection can be realized. Improvement of CTR can be expected.

（４）また、本実施形態では、広告が選択された根拠が、音声認識された語であっても推定された属性であっても、操作された広告に対応する属性（広告属性）と推定された属性とが一致する場合に、その属性と音声とのペアを学習に用いることにより（図７）、信頼性が高い教師データを増やせる。 (4) In this embodiment, even if the basis for selecting an advertisement is a speech-recognized word or an estimated attribute, the attribute (advertisement attribute) corresponding to the operated advertisement is estimated. If the attribute matches the attribute, the pair of the attribute and speech is used for learning (FIG. 7), so that highly reliable teacher data can be increased.

〔４．他の実施形態〕
なお、上記実施形態や図の内容は例示に過ぎず、各要素の有無や配置、処理の順序や内容などは適宜変更可能である。このため、本発明は、以下に例示する変形例やそれ以外の他の実施形態も含むものである。例えば、属性は、性別に限らず、年代、居住または出身の地方その他でもよい。 [4. Other embodiments]
In addition, the content of the said embodiment and figure is only an illustration, and the presence or absence and arrangement | positioning of each element, the order and content of a process, etc. can be changed suitably. For this reason, this invention also includes the modification illustrated below and other embodiment other than that. For example, the attribute is not limited to gender, but may be age, residence, home region, or the like.

また、本発明の各態様は、明記しない他のカテゴリ（方法、プログラム、端末を含むシステムなど）としても把握できる。方法やプログラムのカテゴリでは、装置のカテゴリで示した「手段」を「処理」や「ステップ」のように適宜読み替えるものとする。また、「手段」の全部又は任意の一部を「部」（ユニット、セクション、モジュール等）と読み替えることができる。 Moreover, each aspect of the present invention can be understood as other categories (methods, programs, systems including terminals, etc.) that are not specified. In the category of method or program, “means” shown in the category of apparatus is appropriately read as “process” or “step”. In addition, all or any part of “means” can be read as “part” (unit, section, module, etc.).

また、実施形態に示した処理やステップについても、順序を変更したり、いくつかをまとめて実行しもしくは一部分ずつ分けて実行するなど変更可能である。また、個々の手段、処理やステップを実現、実行するハードウェア要素などは共通でもよいし、手段、処理やステップごとにもしくはタイミングごとに異なってもよい。 Also, the processes and steps shown in the embodiment can be changed by changing the order, executing some of them collectively, or executing them part by part. In addition, hardware elements that implement and execute individual means, processes, and steps may be common, or may differ for each means, process, step, or timing.

また、本出願で示す個々の手段は、外部のサーバが提供している機能をＡＰＩ（アプリケーションプログラムインタフェース）やネットワーク・コンピューティング（いわゆるクラウドなど）で呼び出して実現してもよい。さらに、手段などの要素は、コンピュータに限らず、現在のまたは将来登場する他の情報処理機構で実現してもよい。 The individual means shown in the present application may be realized by calling a function provided by an external server by an API (Application Program Interface) or network computing (so-called cloud or the like). Furthermore, elements such as means are not limited to computers, and may be realized by other information processing mechanisms that appear now or in the future.

１情報処理装置（本装置）
６演算制御部
７記憶装置
８通信装置
２０音声取得手段
３０属性推定手段
３２音声認識手段
３５検索手段
３６索引記憶手段
４２広告選択手段
４５広告記憶手段
５０広告出力手段
６０操作受付手段
７２学習手段
７５教師データ記憶手段
Ｎ通信ネットワーク
Ｔ端末 1. Information processing device (this device)
6 arithmetic control unit 7 storage device 8 communication device 20 voice acquisition means 30 attribute estimation means 32 voice recognition means 35 search means 36 index storage means 42 advertisement selection means 45 advertisement storage means 50 advertisement output means 60 operation acceptance means 72 learning means 75 teacher Data storage means N Communication network T Terminal

Claims

Voice acquisition means for acquiring voice spoken by the user;
Attribute estimation means for estimating an attribute from the acquired speech using a discriminator previously learned using input data and teacher data with known attributes;
Operation accepting means for accepting an operation associated with an attribute and an advertisement output to the user;
Learning means for updating the parameters of the classifier using a set of the attribute related to the operated advertisement and the acquired voice;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, wherein the attribute is gender.

Comprising speech recognition means for recognizing words from the acquired speech;
The advertisement selection means selects an advertisement to be output from the advertisements stored in the advertisement storage means based on the speech-recognized word,
The information processing apparatus according to claim 1, wherein the learning unit uses a set of the attribute and the voice associated with the operated advertisement for updating the parameter.

Comprising speech recognition means for recognizing words from the acquired speech;
The advertisement selection means selects an advertisement to be output from the advertisements stored in the advertisement storage means based on at least one of the estimated attribute or the speech-recognized word.
The learning means uses the voice and the attribute set for updating the parameter when the attribute associated with the operated advertisement matches the estimated attribute. The information processing apparatus according to claim 1 or 2.

Voice acquisition processing for acquiring voice spoken by the user;
Attribute estimation processing for estimating an attribute from the acquired speech using a discriminator previously learned using input data and teacher data with known attributes;
An operation reception process for receiving an operation associated with an attribute and an advertisement output to the user;
A learning process for updating the parameters of the classifier using a set of the attribute relating to the operated advertisement and the acquired voice;
An information processing method characterized in that a computer executes.