JP2023027697A

JP2023027697A - Terminal device, transmission method, transmission program and information processing system

Info

Publication number: JP2023027697A
Application number: JP2021132964A
Authority: JP
Inventors: 健一磯; Kenichi Iso
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2023-03-02
Anticipated expiration: 2041-08-17
Also published as: JP7430672B2

Abstract

To appropriately transmit information about speech collected by a terminal device to other devices, even when the terminal device performs speech recognition.SOLUTION: A terminal device is a terminal device used by a user and performs speech recognition in the device itself, which includes a collection unit and a transmission unit. The collection unit collects speech sound uttered by the user and recognition results from speech recognition of the utterances in a storage unit in the self device in association with each other. When the speech sounds collected by the collection unit meets predetermined conditions, the transmission unit transmits information about the speech sound to a server device according to user's permission.SELECTED DRAWING: Figure 1

Description

本発明は、端末装置、送信方法、送信プログラム及び情報処理システムに関する。 The present invention relates to a terminal device, transmission method, transmission program, and information processing system.

ユーザが発話した音声を認識する音声認識が様々なサービスで利用されている。例えば、発話情報とその発話情報の示す発話内容とを用いた学習により作成されたモデルを用いて、音声認識を行う技術が提供されている（例えば、特許文献１参照）。 Voice recognition for recognizing voice uttered by a user is used in various services. For example, there is provided a technique for recognizing speech using a model created by learning using utterance information and utterance content indicated by the utterance information (see, for example, Patent Document 1).

特開２０２１－０８１５２７号公報JP 2021-081527 A

しかしながら、上記の従来技術では、ユーザが発話した音声に関する情報を取得することが難しい場合がある。例えば、上記の従来技術では、ユーザの発話（音声）をサーバ装置で音声認識を行い、その認識結果をユーザが利用する端末装置へ送信する構成であるため、必然的にサーバ装置はユーザの発話（音声）データを取得することができる。一方で、端末装置自体で音声認識が行われる、いわゆるオンデバイス型の音声認識が行われる場合、端末装置からサーバ装置へユーザの発話（音声）データを送信することなく、音声認識が完了するため、ユーザが発話した音声に関する情報を、ユーザが利用する端末装置以外の装置であるサーバ装置等の他の装置が収集することが難しい。 However, with the conventional technology described above, it may be difficult to obtain information about the voice uttered by the user. For example, in the above-described prior art, since the server device recognizes the user's utterance (voice) and transmits the recognition result to the terminal device used by the user, the server device inevitably recognizes the user's utterance. (Voice) data can be acquired. On the other hand, in the case of so-called on-device type speech recognition, in which speech recognition is performed on the terminal device itself, speech recognition is completed without sending the user's utterance (voice) data from the terminal device to the server device. It is difficult for other devices such as a server device, which is a device other than the terminal device used by the user, to collect information about the voice uttered by the user.

本願は、上記に鑑みてなされたものであって、端末装置が音声認識を行う場合であっても、端末装置で収集される音声に関する情報を他の装置へ適切に送信することを目的とする。 The present application has been made in view of the above, and an object of the present invention is to appropriately transmit information on speech collected by a terminal device to another device even when the terminal device performs speech recognition. .

本願に係る端末装置は、ユーザに利用され、自装置で音声認識を行う端末装置であって、前記ユーザが発話した音声と前記発話の前記音声認識による認識結果とを対応付けて自装置内の記憶部に収集する収集部と、前記収集部により収集された前記音声が所定の条件を満たす場合、前記ユーザの許諾に応じて、前記音声に関する情報をサーバ装置に送信する送信部と、を備えることを特徴とする。 A terminal device according to the present application is a terminal device that is used by a user and performs voice recognition on its own device, and associates a voice uttered by the user with a recognition result of the utterance obtained by the voice recognition, a collection unit that collects data in a storage unit; and a transmission unit that transmits information about the sound to a server device in accordance with permission from the user when the sound collected by the collection unit satisfies a predetermined condition. It is characterized by

実施形態の一態様によれば、端末装置が音声認識を行う場合であっても、端末装置で収集される音声に関する情報を他の装置へ適切に送信することができる。 According to one aspect of the embodiments, even when the terminal device performs speech recognition, it is possible to appropriately transmit information about speech collected by the terminal device to another device.

図１は、実施形態に係る情報処理の概要を示す説明図である。FIG. 1 is an explanatory diagram showing an outline of information processing according to the embodiment. 図２は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of an information processing system according to the embodiment; 図３は、実施形態に係る端末装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of a terminal device according to the embodiment; 図４は、モデル情報記憶部の一例を示す図である。FIG. 4 is a diagram illustrating an example of a model information storage unit; 図５は、収集情報記憶部の一例を示す図である。FIG. 5 is a diagram illustrating an example of a collected information storage unit; 図６は、実施形態に係るサーバ装置の構成例を示す図である。FIG. 6 is a diagram illustrating a configuration example of a server device according to the embodiment; 図７は、モデル情報記憶部の一例を示す図である。FIG. 7 is a diagram illustrating an example of a model information storage unit; 図８は、学習用データ情報記憶部の一例を示す図である。FIG. 8 is a diagram showing an example of a learning data information storage unit. 図９は、実施形態に係る処理手順を示すフローチャートである。FIG. 9 is a flow chart showing a processing procedure according to the embodiment. 図１０は、ハードウェア構成の一例を示す図である。FIG. 10 is a diagram illustrating an example of a hardware configuration;

以下に、本願に係る端末装置、送信方法、送信プログラム及び情報処理システムを実施するための形態（以下、「実施形態」と記載する）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る端末装置、送信方法、送信プログラム及び情報処理システムが限定されるものではない。また、以下の実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, modes for implementing a terminal device, a transmission method, a transmission program, and an information processing system according to the present application (hereinafter referred to as "embodiments") will be described in detail with reference to the drawings. Note that the terminal device, transmission method, transmission program, and information processing system according to the present application are not limited to this embodiment. Also, in the following embodiments, the same parts are denoted by the same reference numerals, and overlapping descriptions are omitted.

（実施形態）
〔１．情報処理〕
まず、図１を参照し、実施形態に係る情報処理システム１が行う情報処理の概要について説明する。図１は、実施形態に係る情報処理の概要を示す説明図である。図１では、サーバ装置１００が端末装置１０に情報送信を要求する音声を指定する条件の情報を端末装置１０に送信し、端末装置１０は、サーバ装置１００から受信した条件を満たす音声に関する情報をサーバ装置１００に送信する場合を一例として説明する。なお、端末装置１０からサーバ装置１００が音声に関する情報を取得可能であれば、サーバ装置１００が条件を指定することなく、端末装置１０は予め設定された条件を基にサーバ装置１００に音声に関する情報を送信してもよい。 (embodiment)
[1. information processing]
First, with reference to FIG. 1, an overview of information processing performed by an information processing system 1 according to an embodiment will be described. FIG. 1 is an explanatory diagram showing an outline of information processing according to the embodiment. In FIG. 1 , the server device 100 transmits to the terminal device 10 information on conditions for designating voice requesting information transmission from the terminal device 10 , and the terminal device 10 receives information on voice that satisfies the conditions received from the server device 100 . A case of transmitting to the server apparatus 100 will be described as an example. Note that if the server device 100 can acquire the information about the voice from the terminal device 10, the terminal device 10 can transmit the information about the voice to the server device 100 based on the preset conditions without specifying the conditions. may be sent.

また、以下では、端末装置１０をユーザと表記する場合がある。すなわち、以下では、ユーザを端末装置１０と読み替えることもできる。なお、図１では、端末装置１０がスマートフォンである場合を一例として説明するが、端末装置１０は、ユーザが発話した音声を収集し、サーバ装置１００へ送信可能であれば、スマートフォンに限らず任意の装置（機器）であってもよいが、この点についての詳細は後述する。また、図１では、端末装置１０が音声に関する情報として、音声データをサーバ装置１００へ送信する場合を一例として説明するが、端末装置１０がサーバ装置１００へ送信する音声に関する情報は、音声データに限らず、様々な情報であってもよいが、この点についての詳細は後述する。 Moreover, below, the terminal device 10 may be described as a user. That is, hereinafter, the user can also be read as the terminal device 10 . In FIG. 1, a case where the terminal device 10 is a smartphone will be described as an example. device (equipment), which will be described later in detail. In FIG. 1, a case where the terminal device 10 transmits voice data to the server device 100 as information about voice will be described as an example. The information is not limited to this and may be various kinds of information, but the details of this point will be described later.

以下、図１を用いて、情報処理の一例を説明する。なお、図１では説明のため、１つの発話を対象として送信を行う場合を示すが、端末装置１０は、収集した発話（音声）の数が一定の閾値（例えば５０、１００等）を超えた場合に、ユーザの送信許諾を得て、送信を行ってもよい。 An example of information processing will be described below with reference to FIG. For the sake of explanation, FIG. 1 shows a case where one utterance is targeted for transmission. In some cases, transmission may be performed with the user's transmission permission.

図１では、ユーザがユーザＩＤ「Ｕ１」により識別されるユーザ（以下、「ユーザＵ１」とする場合がある）である場合を示す。ユーザＵ１が利用する端末装置１０は、音声認識モデルであるモデルＭ１を用いて、自装置内で音声認識を行い、音声認識の結果に応じたサービスをユーザＵ１に提供する。図１では、説明のためにモデルＭ１がユーザの発話の音声情報を文字に変換する場合を一例とするが、モデルＭ１は音声認識に関する処理を行うものであれば、発話をしたユーザ（話者）を識別する機能など、様々な機能を実行する音声認識モデルであってもよい。 FIG. 1 shows a case where the user is identified by a user ID "U1" (hereinafter sometimes referred to as "user U1"). The terminal device 10 used by the user U1 performs speech recognition within itself using the model M1, which is a speech recognition model, and provides the user U1 with a service according to the speech recognition result. In FIG. 1, for the sake of explanation, the case where the model M1 converts voice information of the user's utterance into characters is taken as an example. ) to perform various functions.

まず、サーバ装置１００は、端末装置１０に提供を要求する音声を指定する条件ＣＮ１を示す情報を端末装置１０に送信する（ステップＳ１１）。端末装置１０は、受信した条件ＣＮ１を示す情報を記憶部１２０（図３参照）に記憶する。例えば、端末装置１０は、記憶部１２０に記憶した条件ＣＮ１と、音声に関する情報とを比較し、音声が条件ＣＮ１を満たすか否かを判定し、条件ＣＮ１を満たす音声データをサーバ装置１００へ送信する。なお、以下では説明を簡単にするために条件ＣＮ１が特定の単語（以下「新語ＮＸ」とする）を含むことであるものとして説明し、他の条件の例示については後述する。 First, the server device 100 transmits to the terminal device 10 information indicating a condition CN1 for designating a voice to be provided to the terminal device 10 (step S11). The terminal device 10 stores the information indicating the received condition CN1 in the storage unit 120 (see FIG. 3). For example, the terminal device 10 compares the condition CN1 stored in the storage unit 120 with information about voice, determines whether the voice satisfies the condition CN1, and transmits voice data that satisfies the condition CN1 to the server device 100. do. To simplify the explanation, it is assumed that the condition CN1 includes a specific word (hereinafter referred to as "new word NX"), and other conditions will be exemplified later.

まず、ユーザＵ１が「ＸＸＸＸ」と発話する。なお、「ＸＸＸＸ」は具体的な内容を含む発話であるものとする。端末装置１０は、ユーザＵ１の発話ＰＡを検知し、ユーザＵ１の発話ＰＡである「ＸＸＸＸ」の音声データを入力として受け付ける（ステップＳ１２）。 First, user U1 speaks "XXXX". It is assumed that "XXXX" is an utterance containing specific content. The terminal device 10 detects the utterance PA of the user U1, and receives as an input voice data of "XXXX" which is the utterance PA of the user U1 (step S12).

そして、端末装置１０は、入力として受け付けた「ＸＸＸＸ」の音声データと、モデルＭ１とを利用して音声認識の処理を行う（ステップＳ１３）。端末装置１０は、「ＸＸＸＸ」の音声データをモデルＭ１に入力し、モデルＭ１に文字データを出力させることにより、音声を文字に変換する処理（音声認識処理）を行う。図１では、「ＸＸＸＸ」の音声データが入力されたモデルＭ１は、「ＸＸＸＸ」の文字データを出力する。なお、モデルＭ１は、入力された音声に対する文字とともに、その音声認識の確度を示すスコアを出力してもよい。また、「ＸＸＸＸ」の文字データには、新語ＮＸが含まれるものとする。 Then, the terminal device 10 performs speech recognition processing using the speech data of "XXXX" received as input and the model M1 (step S13). The terminal device 10 inputs voice data of "XXXX" to the model M1 and causes the model M1 to output character data, thereby performing processing (voice recognition processing) of converting the voice into characters. In FIG. 1, the model M1 to which voice data of "XXXX" is input outputs character data of "XXXX". Note that the model M1 may output a score indicating the accuracy of speech recognition along with the characters corresponding to the input speech. It is also assumed that the character data of "XXXX" includes the new word NX.

そして、端末装置１０は、ユーザが発話した音声と発話の音声認識による認識結果とを対応付けて記憶部ＤＢに収集する（ステップＳ１４）。図１では、端末装置１０は、ユーザＵ１が発話した発話ＰＡと発話ＰＡの認識結果とを対応付けて記憶部１２０に収集する。例えば、端末装置１０は、発話ＰＡである「ＸＸＸＸ」の音声データと、その音声データの認識結果である「ＸＸＸＸ」の文字データとを対応付けて自装置内の収集情報記憶部１４２（図３参照）に格納する。 Then, the terminal device 10 associates the voice uttered by the user with the recognition result of the voice recognition of the utterance and collects them in the storage unit DB (step S14). In FIG. 1 , the terminal device 10 associates an utterance PA uttered by the user U1 with the recognition result of the utterance PA and collects them in the storage unit 120 . For example, the terminal device 10 associates the voice data of "XXXX", which is the utterance PA, with the character data of "XXXX", which is the recognition result of the voice data, and stores the collected information storage unit 142 (FIG. 3) in the terminal device 10. reference).

そして、端末装置１０は、収集した音声が条件を満たすか否かを判定し、条件を満たすと判定した場合、ユーザに通知する（ステップＳ１５）。図１では、端末装置１０は、収集した発話ＰＡが条件ＣＮ１を満たすか否かを判定する。例えば、端末装置１０は、発話ＰＡの文字データと、条件ＣＮ１とを比較し、発話ＰＡが条件ＣＮ１を満たすか否かを判定する。このように、端末装置１０は、内容に関する条件（「内容条件」ともいう）である条件ＣＮ１を用いて発話ＰＡが所定の内容を含むか否かを判定する。端末装置１０は、収集した発話ＰＡの文字データには新語ＮＸが含まれるため、条件ＣＮ１を満たすと判定する。 Then, the terminal device 10 determines whether or not the collected voice satisfies the conditions, and if it determines that the conditions are satisfied, the terminal device 10 notifies the user (step S15). In FIG. 1, the terminal device 10 determines whether or not the collected utterance PA satisfies the condition CN1. For example, the terminal device 10 compares the character data of the utterance PA with the condition CN1, and determines whether the utterance PA satisfies the condition CN1. In this manner, the terminal device 10 determines whether or not the utterance PA includes predetermined content using the condition CN1, which is a condition regarding content (also referred to as a "content condition"). Since the character data of the collected utterance PA includes the new word NX, the terminal device 10 determines that the condition CN1 is satisfied.

そのため、端末装置１０は、発話ＰＡをサーバ装置１００へ送信する候補となる情報としてユーザＵ１に通知する。例えば、端末装置１０は、発話ＰＡをサーバ装置１００へ送信する候補として表示する。例えば、端末装置１０は、文字列「ＸＸＸＸ」及びそれが候補であることを示す説明を画面に表示する。この場合、端末装置１０は、発話ＰＡが候補であることを示す情報とともに、ユーザＵ１の許諾の可否を示す情報を受け付けるための情報を表示してもよい。例えば、端末装置１０は、発話ＰＡが候補であることを示す情報とともに、その送信可否をユーザが指定するためのボタンを表示する。例えば、端末装置１０は、発話ＰＡが候補であることを示す情報とともに、「送信を許諾する」等と記載された許諾ボタン及び「送信を許諾しない」等と記載された拒絶ボタンを表示してもよい。このように、端末装置１０は、条件を満たした場合にユーザに許諾を確認する。 Therefore, the terminal device 10 notifies the user U1 of the utterance PA as information that is a candidate for transmission to the server device 100 . For example, the terminal device 10 displays the utterance PA as a candidate for transmission to the server device 100 . For example, the terminal device 10 displays on the screen the character string "XXXX" and an explanation indicating that it is a candidate. In this case, the terminal device 10 may display information indicating that the utterance PA is a candidate as well as information for accepting information indicating whether or not user U1's permission is granted. For example, the terminal device 10 displays information indicating that the utterance PA is a candidate, and a button for the user to specify whether or not to transmit it. For example, the terminal device 10 displays information indicating that the utterance PA is a candidate, an approval button stating "permit transmission" or the like and a rejection button stating "not permitting transmission" or the like. good too. In this way, the terminal device 10 confirms the permission from the user when the conditions are satisfied.

この場合、端末装置１０は、許諾ボタンにより、発話ＰＡをサーバ装置１００へ送信することに対するユーザＵ１の許諾を受け付ける。例えば、端末装置１０は、許諾ボタンをユーザＵ１が選択した場合、発話ＰＡをサーバ装置１００に送信することをユーザＵ１が許諾したと判定する。また、端末装置１０は、拒絶ボタンをユーザＵ１が選択した場合、発話ＰＡをサーバ装置１００に送信することをユーザＵ１が許諾しなかったと判定する。なお、上記は一例に過ぎず、端末装置１０は、表示による通知や許諾の受付けに限らず、様々な態様（モーダル）により通知や許諾の受付けを行ってもよい。例えば、端末装置１０は、音声により発話ＰＡをサーバ装置１００へ送信する候補であることをユーザＵ１に対して通知（出力）してもよい。また、端末装置１０は、音声により発話ＰＡをサーバ装置１００へ送信することに対するユーザＵ１の許諾の可否を受け付けてもよい。 In this case, the terminal device 10 accepts user U1's permission to transmit the utterance PA to the server device 100 by pressing the permission button. For example, when the user U1 selects the consent button, the terminal device 10 determines that the user U1 has permitted transmission of the utterance PA to the server device 100 . Moreover, when the user U1 selects the reject button, the terminal device 10 determines that the user U1 did not approve the transmission of the utterance PA to the server device 100 . Note that the above is merely an example, and the terminal device 10 may receive notifications and approvals in various modes (modals) without being limited to receiving notifications and approvals through displays. For example, the terminal device 10 may notify (output) to the user U1 that it is a candidate for transmitting the utterance PA to the server device 100 by voice. In addition, the terminal device 10 may accept whether or not the user U1 permits the transmission of the utterance PA to the server device 100 by voice.

端末装置１０は、発話ＰＡをサーバ装置１００へ送信することに対するユーザＵ１の許諾を受け付ける。（ステップＳ１６）。例えば、端末装置１０は、ユーザＵ１が許諾ボタンを選択する操作により、発話ＰＡをサーバ装置１００へ送信することに対するユーザＵ１の許諾を受け付ける Terminal device 10 accepts permission from user U1 to transmit utterance PA to server device 100 . (Step S16). For example, terminal device 10 accepts user U1's permission to transmit utterance PA to server device 100 by user U1's operation of selecting a consent button.

そして、端末装置１０は、ユーザＵ１が送信を許諾した発話ＰＡに関する情報をサーバ装置１００へ送信する（ステップＳ１７）。図１では、端末装置１０は、発話ＰＡの音声データ及びその認識結果をサーバ装置１００へ送信する。すなわち、端末装置１０は、発話ＰＡである「ＸＸＸＸ」の音声データと、その音声データの認識結果である「ＸＸＸＸ」の文字データとをサーバ装置１００へ送信する。なお、上記は一例に過ぎず、端末装置１０は、発話ＰＡの音声データのみをサーバ装置１００へ送信してもよい。 Then, the terminal device 10 transmits, to the server device 100, information regarding the utterance PA, for which transmission is permitted by the user U1 (step S17). In FIG. 1 , the terminal device 10 transmits voice data of the utterance PA and its recognition result to the server device 100 . That is, the terminal device 10 transmits to the server device 100 the voice data of "XXXX" which is the utterance PA and the character data of "XXXX" which is the recognition result of the voice data. Note that the above is merely an example, and the terminal device 10 may transmit only the voice data of the utterance PA to the server device 100 .

サーバ装置１００は、端末装置１０から受信した音声に関する情報を学習に用いるデータに追加する（ステップＳ１８）。図１では、端末装置１０から発話ＰＡである「ＸＸＸＸ」の音声データと、その音声データの認識結果である「ＸＸＸＸ」の文字データとの組合せ（以下「新規データＰＤＴ」ともいう）を受信したサーバ装置１００は、受信した新規データＰＤＴを学習用データセットであるデータセットＤＳ１に追加する。例えば、サーバ装置１００は、発話ＰＡである「ＸＸＸＸ」の音声データに、「ＸＸＸＸ」の文字データをラベルとして対応付けた新規データＰＤＴを、データセットＤＳ１のデータとして学習用データ情報記憶部１２２（図８参照）に格納する。 The server device 100 adds the information about the voice received from the terminal device 10 to the data used for learning (step S18). In FIG. 1, a combination of voice data of "XXXX", which is the utterance PA, and character data of "XXXX", which is the recognition result of the voice data (hereinafter also referred to as "new data PDT"), is received from the terminal device 10. Server device 100 adds the received new data PDT to data set DS1, which is a learning data set. For example, the server device 100 stores the new data PDT in which the character data of "XXXX" as a label is associated with the voice data of "XXXX", which is the utterance PA, as the data of the data set DS1 for learning data information storage unit 122 ( (see FIG. 8).

そして、サーバ装置１００は、新規データＰＤＴが追加されたデータセットＤＳ１を用いて、モデルＭ１を学習する（ステップＳ１９）。サーバ装置１００は、データセットＤＳ１を用いて、モデルＭ１の重み等のパラメータを学習（更新）する。モデルＭ１の学習処理には、任意の手法が採用可能である。 Then, the server device 100 learns the model M1 using the data set DS1 to which the new data PDT has been added (step S19). The server device 100 learns (updates) parameters such as weights of the model M1 using the data set DS1. Any method can be adopted for the learning process of the model M1.

例えば、サーバ装置１００は、モデルＭ１が出力した文字データが、モデルＭ１に入力した音声データに対応する正解データ（ラベル）に近づくように、バックプロパゲーション（誤差逆伝播法）等の手法により学習処理を行う。例えば、サーバ装置１００は、学習処理によりノード間で値が伝達する際に考慮される重み（すなわち、接続係数）の値を調整する。このように、サーバ装置１００は、モデルＭ１における出力と、入力に対応する正解データとの誤差が少なくなるようにパラメータ（接続係数）を補正するバックプロパゲーション等の処理によりモデルＭ１を学習する。例えば、サーバ装置１００は、所定の損失（ロス）関数を最小化するようにバックプロパゲーション等の処理を行うことによりモデルＭ１を生成する。これにより、サーバ装置１００は、モデルＭ１のパラメータを学習する学習処理を行うことができる。 For example, the server device 100 learns by a method such as back propagation so that the character data output by the model M1 approaches the correct data (label) corresponding to the speech data input to the model M1. process. For example, the server device 100 adjusts the value of the weight (that is, the connection coefficient) that is taken into account when values are transmitted between nodes by learning processing. In this way, the server device 100 learns the model M1 by processing such as back propagation that corrects the parameters (connection coefficients) so as to reduce the error between the output of the model M1 and the correct data corresponding to the input. For example, the server device 100 generates the model M1 by performing processing such as back propagation so as to minimize a predetermined loss function. As a result, the server device 100 can perform learning processing for learning the parameters of the model M1.

上述したように、情報処理システム１は、端末装置１０が音声認識を行う場合であっても、端末装置１０で収集される音声に関する情報をサーバ装置１００へ適切に送信することができる。したがって、情報処理システム１は、端末装置１０が音声認識を行う場合であっても、モデルを学習するために必要なデータを収集することができる。 As described above, the information processing system 1 can appropriately transmit information about voices collected by the terminal device 10 to the server device 100 even when the terminal device 10 performs voice recognition. Therefore, the information processing system 1 can collect data necessary for learning a model even when the terminal device 10 performs speech recognition.

例えば、情報処理システム１は、ユーザがオンデバイス音声認識を搭載したアプリケーション（単に「アプリ」ともいう）を端末装置１０で使用したときに、発話した音声と認識結果を端末装置１０内に保存（蓄積）する。例えば、情報処理システム１は、例えばカーナビアプリ、ショッピングアプリ等の任意のアプリがインストールされた端末装置１０でアプリを使用したときに、発話した音声と認識結果を端末装置１０内に保存（蓄積）する。そして、情報処理システム１は、蓄積（収集）された発話が所定の基準を満たしたら、ユーザにサーバ装置１００への送信許諾を確認し、ユーザの許諾が得られた場合に、許諾が得られた情報を端末装置１０からサーバ装置１００へ送信する。このような処理により、情報処理システム１は、端末装置１０が音声認識を行う場合であっても、端末装置１０で収集される音声に関する情報をサーバ装置１００へ適切に送信することができる。 For example, the information processing system 1 saves the uttered voice and the recognition result in the terminal device 10 ( accumulate. For example, the information processing system 1 stores (accumulates) the uttered voice and the recognition result in the terminal device 10 when using the application on the terminal device 10 in which any application such as a car navigation application or a shopping application is installed. do. Then, when the accumulated (collected) speech satisfies a predetermined criterion, the information processing system 1 confirms with the user permission for transmission to the server device 100, and if the user's permission is obtained, the permission is obtained. The information received is transmitted from the terminal device 10 to the server device 100 . With such a process, the information processing system 1 can appropriately transmit information about voices collected by the terminal device 10 to the server device 100 even when the terminal device 10 performs voice recognition.

〔１－１．他の例〕
なお、図１に示した処理は一例に過ぎず、情報処理システム１は、様々な条件を用いて、様々な情報を端末装置１０からサーバ装置１００へ送信してもよい。この点について、以下各要素についての例示を記載する。 [1-1. Other examples]
Note that the processing shown in FIG. 1 is merely an example, and the information processing system 1 may transmit various information from the terminal device 10 to the server device 100 using various conditions. In this regard, examples of each element are described below.

〔１－１－１．送信する情報〕
図１では、端末装置１０からサーバ装置１００へ音声データが送信される場合を一例として説明したが、端末装置１０からサーバ装置１００へ送信される情報は、音声に関する情報であればどのような情報であってもよい。 [1-1-1. Information to be sent]
In FIG. 1, the case where voice data is transmitted from the terminal device 10 to the server device 100 has been described as an example. may be

端末装置１０は、音声のデータをサーバ装置１００に送信してもよい。端末装置１０は、音声の波形データをサーバ装置１００に送信してもよい。端末装置１０は、音声のデータを圧縮したデータをサーバ装置１００に送信してもよい。端末装置１０は、音声から抽出した特徴情報をサーバ装置１００に送信してもよい。 The terminal device 10 may transmit voice data to the server device 100 . The terminal device 10 may transmit voice waveform data to the server device 100 . The terminal device 10 may transmit data obtained by compressing voice data to the server device 100 . The terminal device 10 may transmit feature information extracted from the voice to the server device 100 .

上述のように、端末装置１０からサーバ装置１００へ送信する情報は、音声波形またはその圧縮したもの等の様々な情報であってもよい。端末装置１０からサーバ装置１００へ送信する情報は、音声波形から抽出した特徴量であってもよい。ここでいう特徴量とは、例えば元となる音声データよりもサイズが小さいデータであり、個人性に関する情報を極力含まないスペクトル情報などであってもよい。また、端末装置１０からサーバ装置１００へ送信する情報は、発話内容を検聴確認可能なレベルで不可逆圧縮してサイズを極力小さくした音声等の圧縮音声のデータであってもよい。 As described above, the information transmitted from the terminal device 10 to the server device 100 may be various information such as voice waveforms or compressed versions thereof. The information to be transmitted from the terminal device 10 to the server device 100 may be a feature quantity extracted from the voice waveform. The feature amount here is, for example, data whose size is smaller than that of the original voice data, and may be spectrum information or the like that contains as little information about individuality as possible. The information to be transmitted from the terminal device 10 to the server device 100 may be data of compressed voice such as voice in which the size is reduced as much as possible by irreversibly compressing the utterance content to a level that allows listening confirmation.

〔１－１－２．情報の条件〕
図１では、音声が特定の単語を含むか否かである条件ＣＮ１を一例と説明したが、サーバ装置１００へ送信する情報の条件（送信情報条件）は、音声が特定の単語を含むか否かに限らず、様々な条件であってもよい。 [1-1-2. Information conditions]
In FIG. 1, the condition CN1, which is whether or not the voice contains a specific word, has been described as an example. Not limited to this, various conditions may be used.

送信情報条件は、収集した発話（音声）の数に関する条件（「数条件」ともいう）であってもよい。例えば、送信情報条件は、収集した発話（音声）の数が一定の閾値を超えたことであってもよい。この場合、端末装置１０は、収集した音声の数が所定数（例えば５０、１００等）以上であるか否かを判定する。例えば、端末装置１０は、収集した音声の数が所定数以上になった場合、条件を満たしたと判定し、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信する。 The transmission information condition may be a condition related to the number of collected utterances (speech) (also referred to as a “number condition”). For example, the transmission information condition may be that the number of collected utterances (speech) exceeds a certain threshold. In this case, the terminal device 10 determines whether or not the number of collected voices is equal to or greater than a predetermined number (eg, 50, 100, etc.). For example, when the number of collected voices reaches or exceeds a predetermined number, the terminal device 10 determines that the condition is satisfied, and transmits information about voices to the server device 100 according to the user's permission.

例えば、送信情報条件は、収集した発話（音声）の音声認識に関するスコアが所定の条件（「スコア条件」ともいう）を満たすことであってもよい。この場合、端末装置１０は、収集した音声の音声認識に関するスコアが所定の閾値（例えば０．５、０．７等）以上であるか否かを判定する。例えば、端末装置１０は、収集した音声の音声認識に関するスコアが所定の閾値以上である場合、条件を満たしたと判定し、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信する。 For example, the transmission information condition may be that the score related to voice recognition of collected utterances (speech) satisfies a predetermined condition (also referred to as "score condition"). In this case, the terminal device 10 determines whether or not the speech recognition score of the collected speech is equal to or greater than a predetermined threshold value (eg, 0.5, 0.7, etc.). For example, when the voice recognition score of the collected voice is equal to or greater than a predetermined threshold, the terminal device 10 determines that the condition is satisfied, and transmits information about the voice to the server device 100 according to the user's permission.

また、例えば、端末装置１０は、収集した音声の音声認識に関するスコアが所定の閾値（例えば０．５、０．７等）未満であるか否かを判定する。例えば、端末装置１０は、収集した音声の音声認識に関するスコアが所定の閾値未満である場合、条件を満たしたと判定し、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信する。 In addition, for example, the terminal device 10 determines whether or not the score regarding voice recognition of the collected voice is less than a predetermined threshold (for example, 0.5, 0.7, etc.). For example, when the voice recognition score of the collected voice is less than a predetermined threshold, the terminal device 10 determines that the condition is satisfied, and transmits information about the voice to the server device 100 according to the user's permission.

また、例えば、送信情報条件は、収集した音声がノイズに関する条件（「ノイズ条件」ともいう）を満たすことであってもよい。この場合、端末装置１０は、収集した音声の信号対雑音比（ＳＮ比）が所定値以上であるか否かを判定する。例えば、端末装置１０は、収集した音声のＳＮ比が所定値以上である場合、条件を満たしたと判定し、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信する。 Further, for example, the transmission information condition may be that the collected voice satisfies a noise condition (also referred to as a “noise condition”). In this case, the terminal device 10 determines whether or not the signal-to-noise ratio (SN ratio) of the collected voice is equal to or greater than a predetermined value. For example, when the SN ratio of the collected voice is equal to or greater than a predetermined value, the terminal device 10 determines that the condition is satisfied, and transmits information about the voice to the server device 100 according to the user's permission.

また、例えば、端末装置１０は、収集した音声の信号対雑音比（ＳＮ比）が所定値未満であるか否かを判定する。例えば、端末装置１０は、収集した音声のＳＮ比が所定値未満である場合、条件を満たしたと判定し、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信する。 Also, for example, the terminal device 10 determines whether or not the signal-to-noise ratio (SN ratio) of the collected voice is less than a predetermined value. For example, when the SN ratio of the collected voice is less than a predetermined value, the terminal device 10 determines that the condition is satisfied, and transmits information about the voice to the server device 100 according to the user's permission.

なお、上記は一例に過ぎず、情報処理システム１は、様々な条件を適宜用いてもよい。送信情報条件は、端末装置１０の種別であってもよい。例えば、情報処理システム１は、カーナビでのユーザの発話（音声）が不足している場合、端末装置１０の種別が「カーナビ」であることを条件（端末条件）としてもよい。送信情報条件は、ユーザの属性であってもよい。例えば、情報処理システム１は、子どもの発話（音声）が不足している場合、話者が「子ども」であることまたは音声（声）の基本周波数が所定値以上であることを条件（話者条件）としてもよい。なお、上記は一例に過ぎず、情報処理システム１は、子どもの発話（音声）を収集する場合、「成人・子供」や「年齢年代」の判別器を用いて、子どもの発話（音声）を収集してもよい。この場合、情報処理システム１は、特徴量として基本周波数だけでなくスペクトル情報なども利用する判別器を用いて、子どもの発話（音声）を収集してもよい。 Note that the above is merely an example, and the information processing system 1 may appropriately use various conditions. The transmission information condition may be the type of the terminal device 10 . For example, the information processing system 1 may set a condition (terminal condition) that the type of the terminal device 10 is "car navigation" when the user's utterance (voice) in the car navigation system is insufficient. The transmission information condition may be a user attribute. For example, when the child's utterance (voice) is insufficient, the information processing system 1 sets the condition (the speaker condition). Note that the above is only an example, and the information processing system 1, when collecting a child's utterance (sound), uses a discriminator for "adult/child" and "age group" to classify the child's utterance (sound). may be collected. In this case, the information processing system 1 may collect the child's utterances (speech) using a discriminator that uses not only the fundamental frequency but also the spectral information as the feature amount.

また、例えば、情報処理システム１は、上述した条件などを組み合わせて用いてもよい。例えば、端末装置１０は、内容条件、スコア条件、またはノイズ条件の少なくとも１つを満たす音声の数が数条件を満たす場合に、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信してもよい。例えば、端末装置１０は、内容条件、スコア条件、またはノイズ条件の少なくとも１つを満たす音声を該当音声として収集する。そして、端末装置１０は、該当音声の数が所定の数（例えば３０や１５０等）に達した場合、ユーザの許諾に応じて音声に関する情報をサーバ装置１００に送信してもよい。 Further, for example, the information processing system 1 may use a combination of the conditions described above. For example, when the number of voices satisfying at least one of the content condition, the score condition, or the noise condition satisfies the number condition, the terminal device 10 transmits the information about the voice to the server device 100 according to the permission of the user. good too. For example, the terminal device 10 collects sounds that satisfy at least one of the content condition, score condition, or noise condition as relevant sounds. Then, when the number of corresponding sounds reaches a predetermined number (for example, 30, 150, etc.), the terminal device 10 may transmit the information about the sounds to the server device 100 according to the permission of the user.

このように、端末装置１０は、収集した音声（ユーザ発話）のうち、音声認識エンジンの改良に資すると判断される発話を選んで、それが一定数を超えた場合に送信許諾を得てもよい。例えば、音声認識エンジンの改良に資すると判定する基準として、音声認識エンジンが発話ごとに付与するスコアを利用してもよい。音声認識エンジンの改良に資すると判定する基準として、発話ごとの信号対雑音比（ＳＮＲ：signal-to-noise ratio）を利用してもよい。例えば、音声認識エンジンの改良に資すると判定する基準として、発話ごとの認識結果テキストを利用してもよい。例えば、情報処理システム１は、音声認識エンジンの改良のために欲しい単語を含んでいるような発話を選んでもよい。 In this way, the terminal device 10 selects utterances judged to contribute to the improvement of the speech recognition engine from among the collected voices (user utterances), and obtains transmission permission when the number of utterances exceeds a certain number. good. For example, a score given to each utterance by the speech recognition engine may be used as a criterion for determining that the speech recognition engine contributes to improvement. A signal-to-noise ratio (SNR) for each utterance may be used as a criterion for determining that it contributes to improvement of the speech recognition engine. For example, the recognition result text for each utterance may be used as a criterion for determining that it contributes to the improvement of the speech recognition engine. For example, the information processing system 1 may select utterances that contain desired words for improving the speech recognition engine.

〔１－１－３．送信タイミング〕
図１では、ユーザの許諾が得られたタイミングで情報を送信する場合を一例と説明したが、サーバ装置１００へ送信するタイミングは、任意のタイミングが採用可能である。 [1-1-3. Transmission timing]
In FIG. 1, the case where the information is transmitted at the timing when the user's permission is obtained has been described as an example, but any timing can be adopted as the timing for transmitting the information to the server device 100 .

端末装置１０は、音声についてユーザの許諾を受け付けた後、その音声に関する情報を所定のタイミングでサーバ装置１００に送信する。端末装置１０は、音声についてユーザの許諾を受け付けた後、通信環境が所定の通信条件を満たしている間に、その音声に関する情報をサーバ装置１００に送信する。例えば、端末装置１０は、音声についてユーザの許諾を受け付けた後、Ｗｉ－Ｆｉ（登録商標）（Wireless Fidelity）による通信を行っている間に、その音声に関する情報をサーバ装置１００に送信する。 After accepting the user's permission for the voice, the terminal device 10 transmits information about the voice to the server device 100 at a predetermined timing. After receiving the user's permission for the voice, the terminal device 10 transmits information about the voice to the server device 100 while the communication environment satisfies a predetermined communication condition. For example, after accepting the user's permission for voice, the terminal device 10 transmits information about the voice to the server device 100 while communicating by Wi-Fi (registered trademark) (Wireless Fidelity).

また、端末装置１０は、音声についてユーザの許諾を受け付けた後、端末装置１０の利用率が低いタイミングで、その音声に関する情報をサーバ装置１００に送信する。例えば、端末装置１０は、音声についてユーザの許諾を受け付けた後、端末装置１０のプロセッサの利用率が所定の閾値未満となったタイミングで、その音声に関する情報をサーバ装置１００に送信する。例えば、端末装置１０は、音声についてユーザの許諾を受け付けた後、端末装置１０が充電されている間に、その音声に関する情報をサーバ装置１００に送信する。 Further, after accepting the user's permission for the voice, the terminal device 10 transmits information about the voice to the server device 100 at a timing when the usage rate of the terminal device 10 is low. For example, the terminal device 10 receives the user's permission for the voice, and then transmits information about the voice to the server device 100 at the timing when the usage rate of the processor of the terminal device 10 becomes less than a predetermined threshold. For example, the terminal device 10, after accepting the user's permission for audio, transmits information about the audio to the server device 100 while the terminal device 10 is being charged.

上記のように、端末装置１０は、ユーザの送信許諾を得た後、Ｗｉ－Ｆｉ接続されている場合、またはユーザの端末装置１０の利用率が低いタイミング（ＣＰＵ負荷が所定値以下や深夜自宅充電時など)にサーバ装置１００へ情報を送信する。 As described above, the terminal device 10 is connected to Wi-Fi after obtaining the user's transmission permission, or when the usage rate of the terminal device 10 by the user is low (when the CPU load is less than a predetermined value or at home late at night). The information is transmitted to the server device 100 at the time of charging, etc.).

また、端末装置１０は、各音声の価値に応じて、価値が高い音声に関する情報の優先送信を行ってもよい。端末装置１０は、音声認識のスコアに応じて優先度（順位）付けを行い、優先度が高い音声に関する情報から順に、サーバ装置１００へ送信する。例えば、端末装置１０は、音声認識のスコアが高い方から順に高い優先度（順位）を付して、優先度が高い音声に関する情報から順に、サーバ装置１００へ送信する。 In addition, the terminal device 10 may preferentially transmit information on high-value voices according to the value of each voice. The terminal device 10 assigns priority (ranking) according to the score of voice recognition, and transmits to the server device 100 in order from the information regarding the voice with the highest priority. For example, the terminal device 10 assigns a higher priority (order) to the one with the higher speech recognition score, and transmits to the server device 100 the information about the speech with the highest priority.

〔１－１－４．ユーザによる選択〕
上述したように、端末装置１０は、ユーザに通知した候補のうち、ユーザが許諾した音声に関する情報のみサーバ装置１００に送信する。例えば、端末装置１０は、ユーザに送信許諾を得る際に、送信する発話リスト（認識結果テキストと発話へのリンクなど）をユーザに提示して、ユーザが送信したくない発話を選択した場合、ユーザが選択した発話を送信対象から除外する。 [1-1-4. Selection by User]
As described above, the terminal device 10 transmits to the server device 100 only the information about the voices that the user has approved among the candidates notified to the user. For example, when obtaining permission for transmission from the user, the terminal device 10 presents the user with a list of utterances to be transmitted (recognition result text and links to utterances, etc.). Exclude user-selected utterances from being sent.

〔１－１－５．インセンティブ〕
情報処理システム１は、ユーザに送信許諾を得るために、ユーザにインセンティブを提供してもよい。例えば、情報処理システム１は、ユーザが音声の送信を許諾した場合、許諾した音声に応じた、電子マネー、ポイント、クーポン等の様々種別のインセンティブをユーザに提供してもよい。 [1-1-5. incentive]
The information processing system 1 may provide an incentive to the user in order to obtain permission for transmission from the user. For example, when the user approves the transmission of voice, the information processing system 1 may provide the user with various types of incentives such as electronic money, points, and coupons according to the voice that has been approved.

例えば、端末装置１０は、音声に関する情報がサーバ装置１００へ送信された場合にユーザに提供されるインセンティブを示す情報をユーザに通知してもよい。例えば、端末装置１０は、音声の価値が高い方がより良いインセンティブをユーザに提供することをユーザに通知してもよい。例えば、端末装置１０は、音声のスコアが高い方が多いポイントをユーザに提供することをユーザに通知してもよい。 For example, the terminal device 10 may notify the user of information indicating an incentive to be provided to the user when information about voice is transmitted to the server device 100 . For example, the terminal device 10 may notify the user that the higher the value of the voice, the better the user's incentive. For example, the terminal device 10 may notify the user that the higher the voice score, the more points the user will receive.

例えば、端末装置１０は、音声にユーザがラベル（正解）を付した場合に、インセンティブをユーザに提供することをユーザに通知してもよい。例えば、端末装置１０は、スコアが所定の閾値未満である音声の認識結果をユーザが確認し、誤っている場合に修正した場合に、インセンティブをユーザに提供することをユーザに通知してもよい。 For example, the terminal device 10 may notify the user that an incentive will be provided to the user when the user labels the voice (correct answer). For example, the terminal device 10 may notify the user that an incentive will be provided to the user when the user checks the speech recognition result whose score is less than a predetermined threshold and corrects the error. .

〔２．情報処理システムの構成例〕
次に、図２を用いて、実施形態に係るサーバ装置１００が含まれる情報処理システム１の構成について説明する。図２は、実施形態に係る情報処理システム１の構成例を示す図である。図２に示すように、実施形態に係る情報処理システム１は、複数の端末装置１０とサーバ装置１００とを含む。これらの各種装置は、ネットワークＮを介して、有線又は無線により通信可能に接続される。ネットワークＮは、例えば、ＬＡＮ（Local Area Network）や、インターネット等のＷＡＮ（Wide Area Network）である。 [2. Configuration example of information processing system]
Next, the configuration of the information processing system 1 including the server device 100 according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram showing a configuration example of the information processing system 1 according to the embodiment. As shown in FIG. 2, the information processing system 1 according to the embodiment includes a plurality of terminal devices 10 and a server device 100. As shown in FIG. These various devices are communicatively connected via a network N by wire or wirelessly. The network N is, for example, a LAN (Local Area Network) or a WAN (Wide Area Network) such as the Internet.

また、図２に示す情報処理システム１に含まれる各装置の数は図示したものに限られない。例えば、図２では、図示の簡略化のため、端末装置１０－１、１０－２、１０－３の３台のみを示したが、これはあくまでも例示であって限定されるものではなく、４台以上であってもよい。 Also, the number of devices included in the information processing system 1 shown in FIG. 2 is not limited to the illustrated one. For example, in FIG. 2, only three terminal devices 10-1, 10-2, and 10-3 are shown for simplification of illustration, but this is only an example and not a limitation, and four It may be more than one.

端末装置１０は、ユーザにより利用され、自装置内での音声認識によりユーザに音声認識に基づくサービスを提供する情報処理装置（コンピュータ）である。端末装置１０は、ＬＴＥ（Long Term Evolution）、４Ｇ（4th Generation）、５Ｇ（5th Generation：第５世代移動通信システム）等の無線通信網や、Ｂｌｕｅｔｏｏｔｈ（登録商標）、無線ＬＡＮ（Local Area Network）等の近距離無線通信を介してネットワークＮに接続し、サーバ装置１００と通信することができる。 The terminal device 10 is an information processing device (computer) that is used by a user and provides services based on voice recognition to the user through voice recognition within the device itself. The terminal device 10 is a wireless communication network such as LTE (Long Term Evolution), 4G (4th Generation), 5G (5th Generation: fifth generation mobile communication system), Bluetooth (registered trademark), wireless LAN (Local Area Network). It is possible to connect to the network N and communicate with the server apparatus 100 via short-range wireless communication, such as.

図２では、端末装置１０－１は、ユーザにより利用されるスマートフォンである。なお、端末装置１０－１は、ユーザにより利用されるデバイスであれば、タブレット端末等のスマートデバイス、フィーチャーフォン、ＰＣ（Personal Computer）、ＰＤＡ（Personal Digital Assistant）、カーナビゲーションシステム、スマートウォッチやヘッドマウントディスプレイ等のウェアラブルデバイス（Wearable Device）、スマートグラス等であってもよい。 In FIG. 2, the terminal device 10-1 is a smart phone used by a user. The terminal device 10-1 may be a smart device such as a tablet terminal, a feature phone, a PC (Personal Computer), a PDA (Personal Digital Assistant), a car navigation system, a smart watch, or a head device, as long as the device is used by the user. It may be a wearable device such as a mount display, smart glasses, or the like.

図２では、端末装置１０－２は、ユーザにより利用されるスマートスピーカである。なお、端末装置１０－２は、ユーザにより利用されるデバイスであれば、テレビ、冷蔵庫等の任意のＩｏＴ（Internet of Things）であってもよい。 In FIG. 2, the terminal device 10-2 is a smart speaker used by the user. Note that the terminal device 10-2 may be any IoT (Internet of Things) such as a television or a refrigerator as long as it is a device used by the user.

図２では、端末装置１０－３は、カーナビゲーション（カーナビ）機能が搭載された自動車等の車両等の移動体である。なお、端末装置１０－３は、移動体に配置されたカーナビゲーション機能を提供するカーナビ装置であってもよい。 In FIG. 2, the terminal device 10-3 is a moving object such as a vehicle such as an automobile equipped with a car navigation (car navigation) function. Note that the terminal device 10-3 may be a car navigation device that provides a car navigation function provided in a moving body.

サーバ装置１００は、端末装置１０から音声に関する情報を取得する情報処理装置（コンピュータ）である。また、サーバ装置１００は、取得した音声に関する情報を用いた機械学習により、音声認識に用いられるモデルを学習する学習装置である。また、サーバ装置１００は、端末装置１０にモデルを送信する。また、サーバ装置１００は、端末装置１０に送信してほしいデータを指定する条件を示す情報を端末装置１０に送信する。なお、サーバ装置１００は、端末装置１０から音声に関する情報を取得し、取得した情報を蓄積する機能のみを有してもよい。この場合、情報処理システム１には、サーバ装置１００が蓄積した情報を用いてモデルの学習を行う装置（学習装置）が含まれてもよい。すなわち、情報処理システム１において、端末装置１０から音声データ等の音声に関する情報を受信して蓄積する装置（例えばサーバ装置１００）と、音声認識モデルを保持して収集したデータで学習または更新行い、端末装置１０に送信（配布）する装置（例えば学習装置）とは別体であってもよい。この場合、学習装置は、音声認識モデルを保持し、サーバ装置１００から取得したデータを用いてモデルの学習や更新を行い、ユーザが利用する端末装置１０へモデルを送信してもよい。例えば、情報処理システム１では、サーバ装置１００が収集した音声データ等の音声に関する情報を、別途、人手で検聴して正解テキストを付与したり、機械的に選別して、新しい音声認識モデルを学習したり、更新したりしてもよい。また、情報処理システム１では、モデルの配布は、アプリのバイナリに添付して、アプリストア等、アプリケーションのダウンロードサービスを経由して、ユーザが利用する端末装置１０にバージョンアップとして配信されてもよい。なお、情報処理システム１の構成は、サーバ装置１００が端末装置１０から音声に関する情報を取得する構成であれば、上記に限らず任意の構成が採用可能である。 The server device 100 is an information processing device (computer) that acquires information about voice from the terminal device 10 . Also, the server device 100 is a learning device that learns a model used for speech recognition by machine learning using information about acquired speech. The server device 100 also transmits the model to the terminal device 10 . In addition, the server device 100 transmits to the terminal device 10 information indicating conditions for designating data to be transmitted to the terminal device 10 . Note that the server device 100 may have only a function of acquiring information about voice from the terminal device 10 and accumulating the acquired information. In this case, the information processing system 1 may include a device (learning device) that performs model learning using information accumulated by the server device 100 . That is, in the information processing system 1, a device (for example, a server device 100) that receives and accumulates information related to speech such as speech data from the terminal device 10, holds a speech recognition model and performs learning or updating with collected data, It may be separate from a device (for example, a learning device) that transmits (distributes) to the terminal device 10 . In this case, the learning device may hold a speech recognition model, use data acquired from the server device 100 to learn and update the model, and transmit the model to the terminal device 10 used by the user. For example, in the information processing system 1, information related to speech such as speech data collected by the server device 100 is separately manually listened to and given a correct text, or mechanically sorted to create a new speech recognition model. You can learn and update. Further, in the information processing system 1, the model may be distributed as a version upgrade to the terminal device 10 used by the user by attaching it to the binary of the application and via an application download service such as an application store. . Note that the configuration of the information processing system 1 is not limited to the above configuration, and any configuration can be adopted as long as the server device 100 acquires information about voice from the terminal device 10 .

〔３．端末装置の構成例〕
次に、図３を用いて、端末装置１０の構成について説明する。図３は、端末装置１０の構成例を示す図である。図３に示すように、端末装置１０は、通信部１１と、入力部１２と、出力部１３と、記憶部１４と、制御部１５と、センサ部１６とを有する。なお、端末装置１０は、データを収集し、サーバ装置１００へ提供可能な構成であれば、どのような装置構成であってもよい。例えば、端末装置１０は、サーバ装置１００と通信する通信部１１と、データを収集する処理を行う制御部１５とを有すれば、その他の構成は任意であってもよい。端末装置１０の種別によっては、例えば、端末装置１０は、入力部１２や出力部１３や記憶部１４やセンサ部１６のいずれかを有しなくてもよい。 [3. Configuration example of terminal device]
Next, the configuration of the terminal device 10 will be described using FIG. FIG. 3 is a diagram showing a configuration example of the terminal device 10. As shown in FIG. As shown in FIG. 3 , the terminal device 10 has a communication section 11 , an input section 12 , an output section 13 , a storage section 14 , a control section 15 and a sensor section 16 . Note that the terminal device 10 may have any configuration as long as it can collect data and provide it to the server device 100 . For example, the terminal device 10 may have any other configuration as long as it has a communication unit 11 that communicates with the server device 100 and a control unit 15 that performs processing for collecting data. Depending on the type of the terminal device 10 , for example, the terminal device 10 may not have any of the input unit 12 , the output unit 13 , the storage unit 14 and the sensor unit 16 .

なお、端末装置１０は、どのような実現態様であるかに応じて、上記に限らず任意の構成を有してもよい。例えば、端末装置１０が移動体である場合、端末装置１０は、駆動部（モータ）等の移動を実現するための機構を有する構成であってもよい。 It should be noted that the terminal device 10 may have any configuration, not limited to the above, depending on the mode of implementation. For example, when the terminal device 10 is a mobile object, the terminal device 10 may be configured to have a mechanism for realizing movement, such as a drive unit (motor).

（通信部１１）
通信部１１は、例えば、ＮＩＣや通信回路等によって実現される。通信部１１は、ネットワークＮ（インターネット等）と有線又は無線で接続され、ネットワークＮを介して、サーバ装置１００等の他の装置等との間で情報の送受信を行う。 (Communication unit 11)
The communication unit 11 is implemented by, for example, a NIC, a communication circuit, or the like. The communication unit 11 is connected to a network N (Internet or the like) by wire or wirelessly, and transmits and receives information to and from other devices such as the server device 100 via the network N.

（入力部１２）
入力部１２は、各種入力を受け付ける。入力部１２は、ユーザの操作を受け付ける。例えば、入力部１２は、音声によるユーザの入力をマイク等の音声センサ１６１を介して受け付ける。入力部１２は、ユーザの発話による各種操作を受け付ける。 (Input unit 12)
The input unit 12 receives various inputs. The input unit 12 receives a user's operation. For example, the input unit 12 receives a user's voice input via a voice sensor 161 such as a microphone. The input unit 12 receives various operations by user's speech.

また、入力部１２は、ユーザの発話（音声）以外による端末装置１０への操作（ユーザ操作）をユーザによる操作入力として受け付けてもよい。入力部１２は、通信部１１を介して、リモコン（リモートコントローラー：remote controller）を用いたユーザの操作に関する情報を受け付けてもよい。また、入力部１２は、端末装置１０に設けられたボタンや、端末装置１０に接続されたキーボードやマウスを有してもよい。 Further, the input unit 12 may accept an operation (user operation) to the terminal device 10 other than the user's utterance (voice) as an operation input by the user. The input unit 12 may receive, via the communication unit 11, information regarding user operations using a remote controller (remote controller). Also, the input unit 12 may have buttons provided on the terminal device 10 or a keyboard and mouse connected to the terminal device 10 .

例えば、入力部１２は、リモコンやキーボードやマウスと同等の機能を実現できるタッチパネルを有してもよい。この場合、入力部１２は、ディスプレイ（出力部１３）を介して各種情報が入力される。入力部１２は、各種センサにより実現されるタッチパネルの機能により、表示画面を介してユーザから各種操作を受け付ける。すなわち、入力部１２は、端末装置１０のディスプレイ（出力部１３）を介してユーザから各種操作を受け付ける。例えば、入力部１２は、端末装置１０のディスプレイ（出力部１３）を介してユーザの操作を受け付ける。 For example, the input unit 12 may have a touch panel capable of realizing functions equivalent to those of a remote controller, keyboard, or mouse. In this case, various information is input to the input unit 12 via the display (output unit 13). The input unit 12 receives various operations from the user via the display screen using a touch panel function realized by various sensors. That is, the input unit 12 receives various operations from the user via the display (output unit 13) of the terminal device 10. FIG. For example, the input unit 12 receives a user's operation via the display (output unit 13) of the terminal device 10. FIG.

（出力部１３）
出力部１３は、各種情報を出力する。出力部１３は、情報を表示する機能を有する。出力部１３は、端末装置１０に設けられ各種情報を表示する。出力部１３は、例えば液晶ディスプレイや有機ＥＬ（Electro-Luminescence）ディスプレイ等によって実現される。出力部１３は、音声を出力する機能を有してもよい。例えば、出力部１３は、音声を出力するスピーカーを有する。 (Output unit 13)
The output unit 13 outputs various information. The output unit 13 has a function of displaying information. The output unit 13 is provided in the terminal device 10 and displays various information. The output unit 13 is realized by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display. The output unit 13 may have a function of outputting sound. For example, the output unit 13 has a speaker that outputs audio.

（記憶部１４）
記憶部１４は、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部１４は、データの収集に必要な各種情報を記憶する。記憶部１４は、モデル情報記憶部１４１と収集情報記憶部１４２とを有する。 (storage unit 14)
The storage unit 14 is realized by, for example, a semiconductor memory device such as a RAM or flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 14 stores various information necessary for collecting data. The storage unit 14 has a model information storage unit 141 and a collected information storage unit 142 .

（モデル情報記憶部１４１）
実施形態に係るモデル情報記憶部１４１は、モデル（ネットワーク）の構造を示す情報（モデルデータ）を記憶する。図４は、モデル情報記憶部の一例を示す図である。図４に示した例では、モデル情報記憶部１４１は、「モデルＩＤ」、「用途」、「モデルデータ」といった項目が含まれる。 (Model information storage unit 141)
The model information storage unit 141 according to the embodiment stores information (model data) indicating the structure of a model (network). FIG. 4 is a diagram illustrating an example of a model information storage unit; In the example shown in FIG. 4, the model information storage unit 141 includes items such as "model ID", "usage", and "model data".

「モデルＩＤ」は、モデルを識別するための識別情報を示す。「用途」は、対応するモデルの用途を示す。「モデルデータ」は、モデルのデータを示す。図４では「モデルデータ」に「ＭＤＴ１」といった概念的な情報が格納される例を示したが、実際には、モデルに含まれるネットワークに関する情報や関数等、そのモデルを構成する種々の情報が含まれる。 "Model ID" indicates identification information for identifying a model. "Use" indicates the use of the corresponding model. "Model data" indicates model data. FIG. 4 shows an example in which conceptual information such as "MDT1" is stored in "model data", but in reality, various types of information that make up the model, such as network information and functions included in the model, are stored. included.

図４に示す例では、モデルＩＤ「Ｍ１」により識別されるモデル（モデルＭ１）は、用途が「音声認識」であることを示す。モデルＭ１は、音声認識に用いられるモデルであることを示す。また、モデルＭ１のモデルデータは、モデルデータＭＤＴ１であることを示す。 In the example shown in FIG. 4, the model (model M1) identified by the model ID "M1" indicates that the application is "speech recognition". Model M1 indicates that it is a model used for speech recognition. It also indicates that the model data of the model M1 is the model data MDT1.

なお、モデル情報記憶部１４１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、モデル情報記憶部１４１は、学習処理により学習（生成）されたモデルのパラメータ情報を記憶する。 Note that the model information storage unit 141 may store various types of information, not limited to the above, depending on the purpose. For example, the model information storage unit 141 stores parameter information of the model learned (generated) by the learning process.

（収集情報記憶部１４２）
実施形態に係る収集情報記憶部１４２は、端末装置１０が収集したユーザの発話（音声）に関する情報を記憶する。図５は、収集情報記憶部の一例を示す図である。図５に示した例では、収集情報記憶部１４２は、「音声ＩＤ」、「音声」、「認識結果」といった項目が含まれる。 (Collected information storage unit 142)
The collected information storage unit 142 according to the embodiment stores information on the user's utterance (voice) collected by the terminal device 10 . FIG. 5 is a diagram illustrating an example of a collected information storage unit; In the example shown in FIG. 5, the collected information storage unit 142 includes items such as "voice ID", "voice", and "recognition result".

「音声ＩＤ」は、収集した音声を識別するための識別情報を示す。「音声」は、収集した音声を示す。図５では「音声」に「ＡＤＴ１」といった概念的な情報が格納される例を示したが、実際には、収集した音声データ、例えば音声の波形データ等、音声に関する種々の情報が含まれる。「認識結果」は、対応する音声の認識結果を示す。図５では「認識結果」に「ＲＳ１」といった概念的な情報が格納される例を示したが、実際には、音声の認識結果、例えば、音声データを文字データ（文字列）に変換した結果や、音声に含まれる内容等を示す情報が含まれる。 "Voice ID" indicates identification information for identifying the collected voice. "Voice" indicates the collected voice. FIG. 5 shows an example in which conceptual information such as "ADT1" is stored in "speech", but in reality, various information related to speech such as collected speech data, for example speech waveform data, is included. "Recognition result" indicates the recognition result of the corresponding voice. FIG. 5 shows an example in which conceptual information such as "RS1" is stored in "recognition result". and information indicating the contents included in the voice.

図５に示す例では、音声ＩＤ「ＡＤ１」により識別される音声（音声ＡＤ１）の認識結果が、認識結果ＲＳ１あることを示す。 The example shown in FIG. 5 indicates that the recognition result of the voice (voice AD1) identified by the voice ID "AD1" is the recognition result RS1.

なお、収集情報記憶部１４２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、収集情報記憶部１４２は、各音声にその音声がどのような状況で検知されたかを示す情報が対応付けて記憶する。例えば、収集情報記憶部１４２は、各音声にその音声のＳＮ比を対応付けて記憶する。 It should be noted that the collected information storage unit 142 may store various types of information, not limited to the above, depending on the purpose. For example, the collected information storage unit 142 stores each sound in association with information indicating under what circumstances the sound was detected. For example, the collected information storage unit 142 stores each sound in association with the SN ratio of the sound.

（制御部１５）
図３に戻り、説明を続ける。制御部１５は、コントローラ（Controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、端末装置１０内部に記憶されたプログラム（例えば、に係る送信プログラム等の情報処理プログラム）がＲＡＭ等を作業領域として実行されることにより実現される。また、制御部１５は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。 (control unit 15)
Returning to FIG. 3, the description is continued. The control unit 15 is a controller, and for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like processes information processing such as a program (for example, a transmission program related to a program stored inside the terminal device 10). program) is executed using a RAM or the like as a work area. Also, the control unit 15 is a controller, and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図３に示すように、制御部１５は、受信部１５１と、音声認識部１５２と、収集部１５３と、通知部１５４と、受付部１５５と、判定部１５６と、送信部１５７とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１５の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 3, the control unit 15 includes a receiving unit 151, a speech recognition unit 152, a collection unit 153, a notification unit 154, a reception unit 155, a determination unit 156, and a transmission unit 157. , implements or performs the information processing functions and actions described below. Note that the internal configuration of the control unit 15 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it performs information processing to be described later.

（受信部１５１）
受信部１５１は、各種情報を受信する。受信部１５１は、外部の情報処理装置から各種情報を受信する。受信部１５１は、サーバ装置１００等の他の情報処理装置から各種情報を受信する。 (Receiver 151)
The receiving unit 151 receives various information. The receiving unit 151 receives various information from an external information processing device. The receiving unit 151 receives various information from other information processing devices such as the server device 100 .

受信部１５１は、サーバ装置１００が情報提供を要求する音声に関する条件を示す情報をサーバ装置１００から受信する。 Receiving unit 151 receives, from server device 100, information indicating a condition related to voice for which server device 100 requests information provision.

受信部１５１は、サーバ装置１００が学習したモデルをサーバ装置１００から受信する。受信部１５１は、サーバ装置１００から音声認識モデルを受信する。受信部１５１は、サーバ装置１００からモデルＭ１を受信する。 The receiving unit 151 receives the model learned by the server device 100 from the server device 100 . The receiving unit 151 receives the speech recognition model from the server device 100 . Receiving unit 151 receives model M1 from server device 100 .

（音声認識部１５２）
音声認識部１５２は、音声認識に関する各種処理を実行する。音声認識部１５２は、記憶部１４に記憶された情報を用いて、音声認識処理を実行する。音声認識部１５２は、音声認識モデルを用いて、音声認識処理を実行する。音声認識部１５２は、受信部１５１が受信した音声認識モデルを用いて、音声認識処理を実行する。音声認識部１５２は、受信部１５１が受信したモデルＭ１を用いて、音声認識処理を実行する。例えば、音声認識部１５２は、モデル情報記憶部１４１に記憶されたモデルＭ１を用いて、音声認識処理を実行する。 (Voice recognition unit 152)
The speech recognition unit 152 executes various processes related to speech recognition. The speech recognition unit 152 uses the information stored in the storage unit 14 to perform speech recognition processing. The speech recognition unit 152 executes speech recognition processing using a speech recognition model. The speech recognition unit 152 uses the speech recognition model received by the reception unit 151 to perform speech recognition processing. The speech recognition unit 152 uses the model M1 received by the reception unit 151 to perform speech recognition processing. For example, the speech recognition unit 152 uses the model M1 stored in the model information storage unit 141 to perform speech recognition processing.

音声認識部１５２は、モデルＭ１を用いて、ユーザの発話（音声）を文字情報（文字データ）に変換することにより、ユーザ発話の音声をテキスト化する。また、音声認識部１５２は、ユーザの発話の内容を分析する。音声認識部１５２は、種々の従来技術を適宜用いて、ユーザの発話を分析することにより、ユーザの発話の内容を推定する。例えば、音声認識部１５２は、自然言語理解（ＮＬＵ：Natural Language Understanding）や自動音声認識（ＡＳＲ：Automatic Speech Recognition）の機能により、ユーザの発話の内容を分析してもよい。 The voice recognition unit 152 uses the model M1 to convert the user's utterance (voice) into character information (character data), thereby converting the voice of the user's utterance into text. Also, the speech recognition unit 152 analyzes the content of the user's speech. The voice recognition unit 152 estimates the content of the user's utterance by analyzing the user's utterance using various conventional techniques as appropriate. For example, the speech recognition unit 152 may analyze the contents of the user's utterances using a function of natural language understanding (NLU) or automatic speech recognition (ASR).

（収集部１５３）
収集部１５３は、各種情報を収集する。収集部１５３は、各種情報の収集を決定する。収集部１５３は、外部の情報処理装置からの情報に基づいて、各種情報を収集する。収集部１５３は、記憶部１４に記憶された情報に基づいて、各種情報を収集する。収集部１５３は、モデル情報記憶部１４１に記憶されたモデルＭ１を用いたセンシングによりデータを収集する。 (Collection unit 153)
The collection unit 153 collects various types of information. The collection unit 153 determines collection of various information. The collection unit 153 collects various types of information based on information from an external information processing device. The collection unit 153 collects various types of information based on the information stored in the storage unit 14 . The collection unit 153 collects data by sensing using the model M1 stored in the model information storage unit 141 .

収集部１５３は、ユーザが発話した音声と発話の音声認識による認識結果とを対応付けて自装置内の記憶部１４に収集する。収集部１５３は、音声とその音声の認識結果とを対応付けて収集情報記憶部１４２に格納する。収集部１５３は、ユーザが発話した音声と音声認識部１５２による音声認識の結果とを対応付けて収集情報記憶部１４２に登録する。 The collection unit 153 collects the speech uttered by the user and the recognition result of the speech recognition in the storage unit 14 in the own device in association with each other. The collecting unit 153 associates the speech with the recognition result of the speech and stores them in the collected information storage unit 142 . The collection unit 153 associates the voice uttered by the user with the result of voice recognition by the voice recognition unit 152 and registers them in the collected information storage unit 142 .

（通知部１５４）
通知部１５４は、ユーザへの通知に関する処理を実行する。通知部１５４は、ユーザへの情報の通知を行う。通知部１５４は、出力部１３を介してユーザへの情報の通知を行う。 (Notification unit 154)
The notification unit 154 executes processing related to notification to the user. The notification unit 154 notifies the user of information. The notification unit 154 notifies the user of information via the output unit 13 .

通知部１５４は、サーバ装置１００へ送信する候補となる音声に関する情報をユーザに通知する。通知部１５４は、サーバ装置１００へ送信する候補の一覧情報を出力部１３に表示する。 The notification unit 154 notifies the user of information about voices that are candidates for transmission to the server device 100 . The notification unit 154 displays on the output unit 13 list information of candidates to be transmitted to the server device 100 .

通知部１５４は、音声に関する情報がサーバ装置１００へ送信された場合にユーザに提供されるインセンティブを示す情報を通知する。通知部１５４は、音声に関する情報がサーバ装置１００へ送信された場合にユーザに提供されるインセンティブを示す情報を出力部１３に表示する。 Notification unit 154 notifies information indicating an incentive to be provided to the user when information about voice is transmitted to server device 100 . Notification unit 154 displays on output unit 13 information indicating an incentive to be provided to the user when information about voice is transmitted to server device 100 .

（受付部１５５）
受付部１５５は、各種情報を受け付ける。受付部１５５は、ユーザによる各種操作を受け付ける。例えば、受付部１５５は、入力部１２を介してユーザによる各種操作を受け付ける。 (Reception unit 155)
The reception unit 155 receives various types of information. The accepting unit 155 accepts various operations by the user. For example, the reception unit 155 receives various operations by the user via the input unit 12 .

受付部１５５は、ユーザによる許諾を受け付ける。受付部１５５は、通知部１５４により通知されたサーバ装置１００へ送信する候補に対する、ユーザによる許諾を受け付ける。 The accepting unit 155 accepts permission from the user. Accepting unit 155 accepts the user's permission for the candidate to be transmitted to server device 100 notified by notifying unit 154 .

（判定部１５６）
判定部１５６は、各種情報を判定する。例えば、判定部１５６は、受信部１５１により外部装置から受信された各種情報に基づいて、各種情報を判定する。例えば、判定部１５６は、記憶部１４に記憶された情報に基づいて、各種情報を判定する。例えば、判定部１５６は、記憶部１４に記憶されたデータの収集に関する条件を示す情報を用いて、判定を行う。 (Determination unit 156)
The determination unit 156 determines various information. For example, the determination unit 156 determines various information based on various information received from the external device by the reception unit 151 . For example, the determination unit 156 determines various types of information based on information stored in the storage unit 14 . For example, the determination unit 156 makes a determination using information indicating conditions regarding data collection stored in the storage unit 14 .

判定部１５６は、サーバ装置１００へ送信する情報に関する判定を行う。判定部１５６は、音声がサーバ装置１００へ送信する情報に関する条件を満たすか否かを判定する。判定部１５６は、サーバ装置１００へ情報を送信するタイミングを判定する。 The determination unit 156 determines information to be transmitted to the server device 100 . The determination unit 156 determines whether or not the voice satisfies a condition regarding information to be transmitted to the server device 100 . The determination unit 156 determines the timing of transmitting information to the server device 100 .

判定部１５６は、収集部１５３により収集された音声が所定の条件を満たすか否かを判定する。判定部１５６は、収集部１５３により収集された音声の数が所定数以上であるか否かを判定する。判定部１５６は、収集部１５３により収集された音声の音声認識に関するスコアが所定の条件を満たすか否かを判定する。判定部１５６は、収集部１５３により収集された音声がノイズに関する条件を満たすか否かを判定する。判定部１５６は、収集部１５３により収集された音声に対応する発話が所定の内容を含むか否かを判定する。 The determination unit 156 determines whether or not the voice collected by the collection unit 153 satisfies a predetermined condition. The determination unit 156 determines whether or not the number of voices collected by the collection unit 153 is equal to or greater than a predetermined number. The determination unit 156 determines whether or not the score regarding voice recognition of the voice collected by the collection unit 153 satisfies a predetermined condition. The determination unit 156 determines whether or not the sound collected by the collection unit 153 satisfies the noise condition. The determination unit 156 determines whether or not the utterance corresponding to the voice collected by the collection unit 153 includes predetermined content.

判定部１５６は、サーバ装置１００により指定された所定の条件を満たすか否かを判定する。判定部１５６は、サーバ装置１００から受信した所定の条件を満たすであるか否かを判定する。 The determination unit 156 determines whether or not a predetermined condition designated by the server device 100 is satisfied. The determination unit 156 determines whether or not the predetermined condition received from the server device 100 is satisfied.

判定部１５６は、通知部１５４により通知された候補のうち、ユーザが許諾した音声に関する情報をサーバ装置１００に送信する。判定部１５６は、サーバ装置１００へ情報を送信するタイミングであるか否かを判定する。判定部１５６は、通信環境が所定の通信条件を満たしているか否かを判定する。判定部１５６は、ユーザによる端末装置の利用率が低いタイミングであるか否かを判定する。 The determination unit 156 transmits to the server device 100 information about the voices approved by the user among the candidates notified by the notification unit 154 . The determination unit 156 determines whether or not it is time to transmit information to the server device 100 . The determination unit 156 determines whether the communication environment satisfies predetermined communication conditions. The determination unit 156 determines whether or not it is timing when the usage rate of the terminal device by the user is low.

（送信部１５７）
送信部１５７は、外部の情報処理装置へ各種情報を送信する。例えば、送信部１５７は、サーバ装置１００等の他の情報処理装置へ各種情報を送信する。送信部１５７は、記憶部１４に記憶された情報を送信する。 (transmitting unit 157)
The transmission unit 157 transmits various types of information to an external information processing device. For example, the transmission unit 157 transmits various information to other information processing devices such as the server device 100 . Transmitter 157 transmits information stored in storage 14 .

送信部１５７は、サーバ装置１００等の他の情報処理装置からの情報に基づいて、各種情報を送信する。送信部１５７は、記憶部１４に記憶された情報に基づいて、各種情報を送信する。 The transmission unit 157 transmits various types of information based on information from other information processing devices such as the server device 100 . The transmitting section 157 transmits various information based on the information stored in the storage section 14 .

送信部１５７は、判定部１５６による判定結果に応じて、サーバ装置１００に情報を送信する。送信部１５７は、判定部１５６により情報送信の条件をみたすと判定された場合、サーバ装置１００に情報を送信する。送信部１５７は、収集部１５３により収集された音声が所定の条件を満たす場合、ユーザの許諾に応じて、音声に関する情報を音声認識に関するモデルを学習するサーバ装置１００に送信する。 The transmission unit 157 transmits information to the server device 100 according to the determination result by the determination unit 156 . If the determination unit 156 determines that the information transmission condition is satisfied, the transmission unit 157 transmits the information to the server device 100 . When the voice collected by the collecting unit 153 satisfies a predetermined condition, the transmitting unit 157 transmits information about the voice to the server device 100 that learns a model related to voice recognition according to user's permission.

送信部１５７は、音声のデータをサーバ装置１００に送信する。送信部１５７は、音声の波形データをサーバ装置１００に送信する。送信部１５７は、音声のデータを圧縮したデータをサーバ装置１００に送信する。送信部１５７は、音声から抽出した特徴情報をサーバ装置１００に送信する。 The transmission unit 157 transmits the voice data to the server device 100 . The transmitting unit 157 transmits the voice waveform data to the server device 100 . The transmission unit 157 transmits data obtained by compressing the voice data to the server device 100 . The transmission unit 157 transmits the feature information extracted from the voice to the server device 100 .

送信部１５７は、収集部１５３により収集された音声の数が所定数以上である場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。送信部１５７は、収集部１５３により収集された音声の音声認識に関するスコアが所定の条件を満たす場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。送信部１５７は、収集部１５３により収集された音声がノイズに関する条件を満たす場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。送信部１５７は、収集部１５３により収集された音声に対応する発話が所定の内容を含む場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。送信部１５７は、サーバ装置１００により指定された所定の条件を満たす場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。 When the number of sounds collected by the collection unit 153 is equal to or greater than a predetermined number, the transmission unit 157 transmits information about the sound to the server device 100 in accordance with the permission. If the score regarding voice recognition of the voice collected by the collecting unit 153 satisfies a predetermined condition, the transmitting unit 157 transmits information regarding the voice to the server device 100 in response to permission. If the sound collected by the collection unit 153 satisfies the noise condition, the transmission unit 157 transmits information on the sound to the server device 100 in response to permission. When the utterance corresponding to the voice collected by the collecting unit 153 includes predetermined content, the transmitting unit 157 transmits information about the voice to the server device 100 in accordance with the permission. If a predetermined condition specified by server device 100 is satisfied, transmission unit 157 transmits information about audio to server device 100 in response to permission.

送信部１５７は、通知部１５４により通知された候補のうち、ユーザが許諾した音声に関する情報をサーバ装置１００に送信する。送信部１５７は、受付部１５５によりユーザの許諾が受け付けられた後、所定のタイミングで音声に関する情報をサーバ装置１００に送信する。送信部１５７は、通信環境が所定の通信条件を満たしている間に、音声に関する情報をサーバ装置１００に送信する。送信部１５７は、ユーザによる端末装置の利用率が低いタイミングで、音声に関する情報をサーバ装置１００に送信する。送信部１５７は、サーバ装置１００から受信した所定の条件を満たす音声に関する情報を、ユーザの許諾に応じて、サーバ装置１００に送信する。 Transmitter 157 transmits to server device 100 information about the voices that have been approved by the user among the candidates notified by notifier 154 . After the user's permission is accepted by the accepting unit 155, the transmitting unit 157 transmits the information about the voice to the server device 100 at a predetermined timing. Transmitter 157 transmits information about voice to server device 100 while the communication environment satisfies a predetermined communication condition. The transmission unit 157 transmits the information about the voice to the server device 100 at a timing when the usage rate of the terminal device by the user is low. Transmitter 157 transmits information about audio that satisfies a predetermined condition received from server device 100 to server device 100 in accordance with the user's permission.

（センサ部１６）
センサ部１６は、様々なセンサ情報を検知するセンサを有する。図３の例では、センサ部１６は、音声センサ１６１を有する。 (Sensor unit 16)
The sensor unit 16 has sensors that detect various sensor information. In the example of FIG. 3 , the sensor section 16 has a voice sensor 161 .

（音声センサ１６１）
音声センサ１６１は、例えばマイク等であり、音声を検知する。例えば、音声センサ１６１は、ユーザの発話を検知する。なお、音声センサ１６１は、処理に必要なユーザの発話情報を検知可能であれば、どのような構成であってもよい。 (Audio sensor 161)
The audio sensor 161 is, for example, a microphone or the like, and detects audio. For example, the audio sensor 161 detects user speech. Note that the voice sensor 161 may have any configuration as long as it can detect the user's speech information necessary for processing.

なお、センサ部１６は、上記に限らず、種々のセンサを有してもよい。センサ部１６は、画像センサ、位置センサ、加速度センサ、ジャイロセンサ、温度センサ、湿度センサ、照度センサ、圧力センサ、近接センサ、ニオイや汗や心拍や脈拍や脳波等の生体情報を受信のためのセンサ等の種々のセンサを有してもよい。また、センサ部１６における上記の各種情報を検知するセンサは共通のセンサであってもよいし、各々異なるセンサにより実現されてもよい。 In addition, the sensor part 16 may have not only the above but various sensors. The sensor unit 16 includes an image sensor, a position sensor, an acceleration sensor, a gyro sensor, a temperature sensor, a humidity sensor, an illuminance sensor, a pressure sensor, a proximity sensor, and sensors for receiving biometric information such as odor, sweat, heart rate, pulse, and brain waves. It may have various sensors such as sensors. Further, the sensors for detecting the above various information in the sensor unit 16 may be a common sensor, or may be implemented by different sensors.

〔４．サーバ装置の構成例〕
次に、図６を用いて、実施形態に係るサーバ装置１００の構成について説明する。図６は、実施形態に係るサーバ装置１００の構成例を示す図である。図６に示すように、サーバ装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。 [4. Configuration example of server device]
Next, the configuration of the server device 100 according to the embodiment will be described using FIG. FIG. 6 is a diagram showing a configuration example of the server device 100 according to the embodiment. As shown in FIG. 6, the server device 100 has a communication section 110, a storage section 120, and a control section .

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。また、通信部１１０は、ネットワークＮ（図２参照）と有線又は無線で接続される。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Also, the communication unit 110 is connected to the network N (see FIG. 2) by wire or wirelessly.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、又は、ハードディスク、光ディスク等の記憶装置によって実現される。図６に示すように、記憶部１２０は、モデル情報記憶部１２１と、学習用データ情報記憶部１２２とを有する。なお、記憶部１２０は、上記に限らず、様々な情報を記憶する。記憶部１２０は、情報の送信に関する様々な条件を示す情報を記憶する。例えば、記憶部１２０は、サーバ装置１００へ送信する情報の条件を示す情報（情報条件情報）を記憶する。例えば、記憶部１２０は、サーバ装置１００へ送信するタイミングの条件を示す情報（タイミング条件情報）を記憶する。 (storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or an optical disk. As shown in FIG. 6 , the storage unit 120 has a model information storage unit 121 and a learning data information storage unit 122 . Note that the storage unit 120 stores various information, not limited to the above. The storage unit 120 stores information indicating various conditions regarding transmission of information. For example, the storage unit 120 stores information (information condition information) indicating conditions of information to be transmitted to the server device 100 . For example, the storage unit 120 stores information (timing condition information) indicating a condition of timing for transmission to the server device 100 .

（モデル情報記憶部１２１）
実施形態に係るモデル情報記憶部１２１は、モデルに関する情報を記憶する。例えば、モデル情報記憶部１２１は、ユーザやコミュニティを対象として学習した共通モデル（グローバルモデル）を記憶する。図７は、実施形態に係るモデル情報記憶部の一例を示す図である。図７に示すモデル情報記憶部１２１は、「モデルＩＤ」、「用途」、「モデルデータ」といった項目が含まれる。 (Model information storage unit 121)
The model information storage unit 121 according to the embodiment stores information about models. For example, the model information storage unit 121 stores a common model (global model) learned for users and communities. 7 is a diagram illustrating an example of a model information storage unit according to the embodiment; FIG. The model information storage unit 121 shown in FIG. 7 includes items such as "model ID", "usage", and "model data".

「モデルＩＤ」は、モデルを識別するための識別情報を示す。「用途」は、対応するモデルの用途を示す。「モデルデータ」は、モデルのデータを示す。図７では「モデルデータ」に「ＭＤＴ１」といった概念的な情報が格納される例を示したが、実際には、モデルに含まれるネットワークに関する情報や関数等、そのモデルを構成する種々の情報が含まれる。 "Model ID" indicates identification information for identifying a model. "Use" indicates the use of the corresponding model. "Model data" indicates model data. FIG. 7 shows an example in which conceptual information such as "MDT1" is stored in "model data", but in reality, various types of information that make up the model, such as network information and functions included in the model, are stored. included.

図７に示す例では、モデルＩＤ「Ｍ１」により識別されるモデル（モデルＭ１）は、用途が「音声認識」であることを示す。モデルＭ１は、音声認識に用いられるモデルであることを示す。また、モデルＭ１のモデルデータは、モデルデータＭＤＴ１であることを示す。 In the example shown in FIG. 7, the model (model M1) identified by the model ID "M1" indicates that the application is "speech recognition". Model M1 indicates that it is a model used for speech recognition. It also indicates that the model data of the model M1 is the model data MDT1.

なお、モデル情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、モデル情報記憶部１２１は、学習処理により学習（生成）されたモデルのパラメータ情報を記憶する。なお、モデルの学習を他の装置（学習装置等）が行う場合は、サーバ装置１００は、モデル情報記憶部１２１を有しなくてもよい。 Note that the model information storage unit 121 may store various types of information, not limited to the above, depending on the purpose. For example, the model information storage unit 121 stores parameter information of the model learned (generated) by the learning process. Note that the server device 100 may not have the model information storage unit 121 if another device (learning device or the like) performs model learning.

（学習用データ情報記憶部１２２）
実施形態に係る学習用データ情報記憶部１２２は、学習に用いるデータに関する各種情報を記憶する。学習用データ情報記憶部１２２は、学習に用いるデータセットを記憶する。図８は、実施形態に係る学習用データ情報記憶部の一例を示す図である。例えば、学習用データ情報記憶部１２２は、学習に用いる学習用データや精度評価（算出）に用いる評価用データ等の種々のデータに関する各種情報を記憶する。図８に、実施形態に係る学習用データ情報記憶部１２２の一例を示す。図８の例では、学習用データ情報記憶部１２２は、「データセットＩＤ」、「データＩＤ」、「データ」、「ラベル」、「日時」といった項目が含まれる。 (Learning data information storage unit 122)
The learning data information storage unit 122 according to the embodiment stores various types of information regarding data used for learning. The learning data information storage unit 122 stores data sets used for learning. 8 is a diagram illustrating an example of a learning data information storage unit according to the embodiment; FIG. For example, the learning data information storage unit 122 stores various information related to various data such as learning data used for learning and evaluation data used for accuracy evaluation (calculation). FIG. 8 shows an example of the learning data information storage unit 122 according to the embodiment. In the example of FIG. 8, the learning data information storage unit 122 includes items such as "data set ID", "data ID", "data", "label", and "date and time".

「データセットＩＤ」は、データセットを識別するための識別情報を示す。「データＩＤ」は、データを識別するための識別情報を示す。また、「データ」は、データＩＤにより識別されるデータに対応するデータを示す。 "Dataset ID" indicates identification information for identifying a data set. "Data ID" indicates identification information for identifying data. "Data" indicates data corresponding to the data identified by the data ID.

「ラベル」は、対応するデータに付されるラベル（正解ラベル）を示す。例えば、「ラベル」は、対応するデータ（音声）の認識結果を示す情報（正解情報）であってもよい。例えば、「ラベル」は、ユーザの発話を示す音声データを文字データ（文字列）に変換した結果を示す正解情報である。 "Label" indicates a label (correct label) attached to corresponding data. For example, the "label" may be information (correct answer information) indicating the recognition result of the corresponding data (voice). For example, "label" is correct answer information indicating the result of converting voice data indicating user's utterance into character data (character string).

また、「日時」は、対応するデータに関する時間（日時）を示す。なお、図８の例では、「ＤＡ１」等で図示するが、「日時」には、「２０２１年８月８日１５時５２分１４秒」等の具体的な日時であってもよいし、「バージョンＸＸもモデル学習から使用開始」等、そのデータがどのモデルの学習から使用が開始されたかを示す情報が記憶されてもよい。 "Date and time" indicates the time (date and time) for the corresponding data. In the example of FIG. 8, "DA1" or the like is shown, but the "date and time" may be a specific date and time such as "15:52:14 on August 8, 2021". Information indicating from which model learning the data has started to be used may be stored, such as "use of version XX also started from model learning".

図８の例では、データセットＩＤ「ＤＳ１」により識別されるデータセット（データセットＤＳ１）には、データＩＤ「ＤＩＤ１」、「ＤＩＤ２」、「ＤＩＤ３」等により識別される複数のデータが含まれることを示す。例えば、データＩＤ「ＤＩＤ１」、「ＤＩＤ２」、「ＤＩＤ３」等により識別される各データ（学習用データ）は、モデルの学習に用いられる音声情報（音声データ）等である。 In the example of FIG. 8, the data set (data set DS1) identified by the data set ID "DS1" includes a plurality of data identified by the data IDs "DID1", "DID2", "DID3", etc. indicates that For example, each data (learning data) identified by the data IDs "DID1", "DID2", "DID3", etc. is speech information (speech data) used for model learning.

例えば、データＩＤ「ＤＩＤ１」により識別されるデータＤＴ１は、ラベルＬＢ１が付されたラベル有りデータであり、日時ＤＡ１でのモデルの学習から使用が開始されたことを示す。また、例えば、データＩＤ「ＤＩＤ４」により識別されるデータＤＴ４は、ラベル無しデータとして取集され、予測ラベルであるラベルＬＢ４が付されたデータであり、日時ＤＡ４でのモデルの学習から使用が開始されたことを示す。 For example, data DT1 identified by data ID "DID1" is labeled data with label LB1, and indicates that use has started from model learning at date and time DA1. Also, for example, data DT4 identified by the data ID "DID4" is data collected as unlabeled data and attached with the label LB4, which is a prediction label, and is used starting from model learning at date and time DA4. indicates that the

なお、学習用データ情報記憶部１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、学習用データ情報記憶部１２２は、各データが学習用データであるか、評価用データであるか等を特定可能に記憶してもよい。例えば、学習用データ情報記憶部１２２は、学習用データと評価用データとを区別可能に記憶する。学習用データ情報記憶部１２２は、各データが学習用データや評価用データであるかを識別する情報を記憶してもよい。サーバ装置１００は、学習用データとして用いられる各データと正解情報とに基づいて、モデルを学習する。サーバ装置１００は、評価用データとして用いられる各データと正解情報とに基づいて、モデルの精度を算出する。サーバ装置１００は、評価用データを入力した場合にモデルが出力する出力結果と、正解情報とを比較した結果を収集することにより、モデルの精度を算出する。 Note that the learning data information storage unit 122 may store various types of information, not limited to the above, depending on the purpose. For example, the learning data information storage unit 122 may store data such as whether each data is learning data or evaluation data so as to be identifiable. For example, the learning data information storage unit 122 stores learning data and evaluation data in a distinguishable manner. The learning data information storage unit 122 may store information for identifying whether each data is learning data or evaluation data. The server device 100 learns a model based on each data used as learning data and the correct answer information. The server device 100 calculates the accuracy of the model based on each data used as the evaluation data and the correct answer information. The server device 100 calculates the accuracy of the model by collecting the result of comparing the output result output by the model when the evaluation data is input with the correct answer information.

（制御部１３０）
図６に戻り、説明を続ける。制御部１３０は、コントローラ（Controller）であり、例えば、ＣＰＵ、ＭＰＵ、ＡＳＩＣ、ＦＰＧＡ等によって、サーバ装置１００の内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭ等の記憶領域を作業領域として実行されることにより実現される。 (control unit 130)
Returning to FIG. 6, the description is continued. The control unit 130 is a controller. For example, various programs (corresponding to an example of an information processing program) stored in a storage device inside the server device 100 are stored in the RAM by a CPU, MPU, ASIC, FPGA, or the like. , etc., as a work area.

図６に示す例では、制御部１３０は、取得部１３１と、決定部１３２と、学習部１３３と、送信部１３４とを有する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 In the example shown in FIG. 6 , the control unit 130 has an acquisition unit 131 , a determination unit 132 , a learning unit 133 and a transmission unit 134 . Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it performs information processing described later.

（取得部１３１）
取得部１３１は、通信部１１０を介して、外部の情報処理装置から各種情報を受信する。取得部１３１は、端末装置１０から各種情報を受信する。取得部１３１は、端末装置１０から受信した音声に関する情報を記憶部１２０へ格納する。取得部１３１は、端末装置１０から受信した音声に関する情報を、モデルの学習に用いるデータ（学習データ）として学習用データ情報記憶部１２２に登録する。 (Acquisition unit 131)
Acquisition unit 131 receives various types of information from an external information processing device via communication unit 110 . The acquisition unit 131 receives various information from the terminal device 10 . Acquisition unit 131 stores information about the audio received from terminal device 10 in storage unit 120 . The acquisition unit 131 registers the information about the voice received from the terminal device 10 in the learning data information storage unit 122 as data (learning data) used for model learning.

取得部１３１は、各種情報を取得する。取得部１３１は、記憶部１２０から各種情報を取得する。取得部１３１は、モデル情報記憶部１２１や学習用データ情報記憶部１２２から各種情報を取得する。 Acquisition unit 131 acquires various types of information. Acquisition unit 131 acquires various types of information from storage unit 120 . The acquisition unit 131 acquires various types of information from the model information storage unit 121 and the learning data information storage unit 122 .

（決定部１３２）
決定部１３２は、各種情報を決定する。例えば、決定部１３２は、取得部１３１により外部装置から取得された各種情報に基づいて、各種情報を決定する。例えば、決定部１３２は、端末装置１０から取得された各種情報に基づいて、各種情報を決定する。 (Determination unit 132)
The determination unit 132 determines various types of information. For example, the determination unit 132 determines various information based on various information acquired from the external device by the acquisition unit 131 . For example, the determination unit 132 determines various information based on various information acquired from the terminal device 10 .

決定部１３２は、端末装置１０に提供を要求する情報に関する条件を決定する。例えば、決定部１３２は、記憶部１２０に記憶された情報に基づいて、端末装置１０に提供を要求する情報に関する条件を決定する。 The determination unit 132 determines conditions regarding information requested to be provided to the terminal device 10 . For example, the determination unit 132 determines conditions regarding information to request the terminal device 10 to provide, based on the information stored in the storage unit 120 .

（学習部１３３）
学習部１３３は、各種情報を学習する。学習部１３３は、外部の情報処理装置からの情報や記憶部１２０に記憶された情報に基づいて、各種情報を学習する。学習部１３３は、学習用データ情報記憶部１２２に記憶された情報に基づいて、各種情報を学習する。学習部１３３は、学習により生成したモデルをモデル情報記憶部１２１に格納する。学習部１３３は、学習により更新したモデルをモデル情報記憶部１２１に格納する。 (Learning unit 133)
The learning unit 133 learns various types of information. The learning unit 133 learns various types of information based on information from an external information processing device and information stored in the storage unit 120 . The learning unit 133 learns various information based on the information stored in the learning data information storage unit 122 . The learning unit 133 stores the model generated by learning in the model information storage unit 121 . The learning unit 133 stores the model updated by learning in the model information storage unit 121 .

学習部１３３は、学習処理を行う。学習部１３３は、各種学習を行う。学習部１３３は、取得部１３１により取得された情報に基づいて、各種情報を学習する。学習部１３３は、モデルを学習（生成）する。学習部１３３は、モデル等の各種情報を学習する。学習部１３３は、学習によりモデルを生成する。学習部１３３は、種々の機械学習に関する技術を用いて、モデルを学習する。例えば、学習部１３３は、モデル（ネットワーク）のパラメータを学習する。学習部１３３は、種々の機械学習に関する技術を用いて、モデルを学習する。 The learning unit 133 performs learning processing. The learning unit 133 performs various types of learning. The learning unit 133 learns various information based on the information acquired by the acquisition unit 131 . The learning unit 133 learns (generates) a model. The learning unit 133 learns various information such as models. The learning unit 133 generates a model through learning. The learning unit 133 learns the model using various machine learning techniques. For example, the learning unit 133 learns model (network) parameters. The learning unit 133 learns the model using various machine learning techniques.

学習部１３３は、モデルＭ１を生成する。学習部１３３は、ネットワークのパラメータを学習する。例えば、学習部１３３は、モデルＭ１のネットワークのパラメータを学習する。 Learning unit 133 generates model M1. The learning unit 133 learns network parameters. For example, the learning unit 133 learns network parameters of the model M1.

学習部１３３は、学習用データ情報記憶部１２２に記憶された学習用データ（教師データ）に基づいて、学習処理を行う。学習部１３３は、学習用データ情報記憶部１２２に記憶された学習用データを用いて、学習処理を行うことにより、モデルＭ１を生成する。例えば、学習部１３３は、音声認識に用いられるモデルを生成する。学習部１３３は、モデルＭ１のネットワークのパラメータを学習することにより、モデルＭ１を生成する。 The learning unit 133 performs learning processing based on the learning data (teacher data) stored in the learning data information storage unit 122 . The learning unit 133 generates the model M1 by performing learning processing using the learning data stored in the learning data information storage unit 122 . For example, the learning unit 133 generates models used for speech recognition. The learning unit 133 generates the model M1 by learning parameters of the network of the model M1.

学習部１３３による学習の手法は特に限定されないが、例えば、ラベルとデータ（画像）とを紐づけた学習用データを用意し、その学習用データを多層ニューラルネットワークに基づいた計算モデルに入力して学習してもよい。また、例えばＣＮＮ（Convolutional Neural Network）、３Ｄ－ＣＮＮ等のＤＮＮ（Deep Neural Network）に基づく手法が用いられてもよい。学習部１３３は、音声等のような時系列データを対象とする場合、再帰型ニューラルネットワーク（Recurrent Neural Network：ＲＮＮ）やＲＮＮを拡張したＬＳＴＭ（Long Short-Term Memory units）に基づく手法を用いてもよい。なお、モデルの学習を他の装置（学習装置等）が行う場合は、サーバ装置１００は、学習部１３３を有しなくてもよい。 The method of learning by the learning unit 133 is not particularly limited. You can learn. Also, for example, a technique based on DNN (Deep Neural Network) such as CNN (Convolutional Neural Network) and 3D-CNN may be used. When the learning unit 133 targets time-series data such as voice, a recurrent neural network (RNN) or an extended RNN LSTM (Long Short-Term Memory units) using a method based on good too. Note that the server device 100 does not need to have the learning unit 133 when another device (learning device or the like) performs model learning.

（送信部１３４）
送信部１３４は、通信部１１０を介して、各種情報を端末装置１０へ送信する。送信部１３４は、端末装置１０が自装置内で音声認識を行うために用いる音声認識モデルを端末装置１０へ送信する。送信部１３４は、モデルＭ１を端末装置１０に提供する。 (Sending unit 134)
The transmission unit 134 transmits various types of information to the terminal device 10 via the communication unit 110 . The transmission unit 134 transmits to the terminal device 10 a speech recognition model used by the terminal device 10 to perform speech recognition within itself. The transmitter 134 provides the terminal device 10 with the model M1.

送信部１３４は、端末装置１０に情報送信を要求する音声を指定する所定の条件を示す情報を、端末装置１０に送信する。送信部１３４は、決定部１３２により決定された条件を示す情報を、端末装置１０に送信する。送信部１３４は、他の装置（学習装置等）に収集したデータを送信してもよい。 The transmitting unit 134 transmits to the terminal device 10 information indicating a predetermined condition for designating a voice requesting the terminal device 10 to transmit information. The transmitting unit 134 transmits information indicating the conditions determined by the determining unit 132 to the terminal device 10 . The transmission unit 134 may transmit the collected data to another device (learning device, etc.).

〔５．処理手順〕
次に、図９を用いて実施形態に係る端末装置１０による処理手順について説明する。図９は、実施形態に係る処理手順を示すフローチャートである。 [5. Processing procedure]
Next, a processing procedure performed by the terminal device 10 according to the embodiment will be described with reference to FIG. 9 . FIG. 9 is a flow chart showing a processing procedure according to the embodiment.

図９に示すように、端末装置１０は、ユーザが発話した音声と発話の音声認識による認識結果とを対応付けて自装置内の記憶部１２０に収集する（ステップＳ１０１）。 As shown in FIG. 9, the terminal device 10 associates the voice uttered by the user with the recognition result of the voice recognition of the utterance and collects them in the storage unit 120 within the device itself (step S101).

そして、端末装置１０は、収集した音声が所定の条件を満たす場合、ユーザの許諾に応じて、音声に関する情報をサーバ装置１００に送信する（ステップＳ１０２）。 Then, when the collected voice satisfies a predetermined condition, the terminal device 10 transmits information about the voice to the server device 100 according to the user's permission (step S102).

〔６．効果〕
上述してきたように、本願に係る端末装置１０は、ユーザに利用され、自装置で音声認識を行う端末装置１０であり、収集部１５３と、送信部１５７とを有する。収集部１５３は、ユーザが発話した音声と発話の音声認識による認識結果とを対応付けて自装置内の記憶部１４に収集する。送信部１５７は、収集部１５３により収集された音声が所定の条件を満たす場合、ユーザの許諾に応じて、音声に関する情報をサーバ装置１００に送信する。 [6. effect〕
As described above, the terminal device 10 according to the present application is a terminal device 10 that is used by a user and performs speech recognition by itself, and has the collection unit 153 and the transmission unit 157 . The collection unit 153 collects the speech uttered by the user and the recognition result of the speech recognition in the storage unit 14 in the own device in association with each other. When the voice collected by the collecting unit 153 satisfies a predetermined condition, the transmitting unit 157 transmits information about the voice to the server device 100 in accordance with the user's permission.

このように、端末装置１０は、自装置内でユーザの発話の音声の音声認識行い、その音声に関する情報を、条件を満たしかつユーザが許諾した場合にサーバ装置１００に送信する。これにより、端末装置１０は、端末装置１０が音声認識を行う場合であっても、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 In this way, the terminal device 10 performs voice recognition of the voice uttered by the user within itself, and transmits information about the voice to the server device 100 when the conditions are satisfied and the user permits. As a result, even when the terminal device 10 performs voice recognition, the terminal device 10 can appropriately transmit information about voice collected by the terminal device 10 to other devices.

また、端末装置１０において、送信部１５７は、音声のデータをサーバ装置１００に送信する。このように、端末装置１０は、音声のデータをサーバ装置１００へ送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Also, in the terminal device 10 , the transmission unit 157 transmits voice data to the server device 100 . In this way, the terminal device 10 can appropriately transmit the information about the voice collected by the terminal device 10 to another device by transmitting voice data to the server device 100 .

また、端末装置１０において、送信部１５７は、音声の波形データをサーバ装置１００に送信する。このように、端末装置１０は、音声の波形データをサーバ装置１００へ送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Also, in the terminal device 10 , the transmission unit 157 transmits voice waveform data to the server device 100 . In this way, the terminal device 10 can appropriately transmit the information on the voice collected by the terminal device 10 to another device by transmitting the waveform data of the voice to the server device 100 .

また、端末装置１０において、送信部１５７は、音声のデータを圧縮したデータをサーバ装置１００に送信する。このように、端末装置１０は、音声のデータを圧縮したデータをサーバ装置１００へ送信することにより、通信量の増大を抑制しつつ、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 In the terminal device 10 , the transmission unit 157 transmits data obtained by compressing the voice data to the server device 100 . In this way, the terminal device 10 transmits the data obtained by compressing the voice data to the server device 100, thereby suppressing an increase in the amount of communication and transmitting the information about the voice collected by the terminal device 10 to another device. can be sent properly.

また、端末装置１０において、送信部１５７は、音声から抽出した特徴情報をサーバ装置１００に送信する。このように、端末装置１０は、音声のデータを圧縮したデータをサーバ装置１００へ送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Also, in the terminal device 10 , the transmission unit 157 transmits feature information extracted from the voice to the server device 100 . In this way, the terminal device 10 can appropriately transmit the information on the voice collected by the terminal device 10 to other devices by transmitting the data obtained by compressing the voice data to the server device 100 .

また、端末装置１０において、送信部１５７は、収集部１５３により収集された音声の数が所定数以上である場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、収集した音声の数が所定数以上になった場合に、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Further, in the terminal device 10, when the number of voices collected by the collecting unit 153 is equal to or greater than a predetermined number, the transmission unit 157 transmits information about voices to the server device 100 in response to permission. In this way, when the number of collected voices reaches or exceeds a predetermined number, the terminal device 10 transmits information about voices to the server device 100 so that the information about voices collected by the terminal device 10 can be transmitted to other devices. can be properly sent to the device.

また、端末装置１０において、送信部１５７は、収集部１５３により収集された音声の音声認識に関するスコアが所定の条件を満たす場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、収集した音声の音声認識に関するスコアが所定の条件を満たす場合に、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Further, in the terminal device 10, the transmitting unit 157 transmits the information regarding the voice to the server device 100 in accordance with the permission when the score related to voice recognition of the voice collected by the collecting unit 153 satisfies a predetermined condition. In this way, the terminal device 10 transmits the information about the voice to the server device 100 when the score about the voice recognition of the collected voice satisfies a predetermined condition, thereby obtaining the information about the voice collected by the terminal device 10 . It can be properly transmitted to other devices.

また、端末装置１０において、送信部１５７は、収集部１５３により収集された音声がノイズに関する条件を満たす場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、収集した音声がノイズに関する条件を満たす場合に、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Further, in the terminal device 10, when the sound collected by the collection unit 153 satisfies the condition regarding noise, the transmission unit 157 transmits information regarding sound to the server device 100 in response to permission. In this way, when the collected sound satisfies the noise-related condition, the terminal device 10 transmits the sound-related information to the server device 100, thereby appropriately transmitting the sound-related information collected by the terminal device 10 to other devices. can be sent to

また、端末装置１０において、送信部１５７は、収集部１５３により収集された音声に対応する発話が所定の内容を含む場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、音声に対応する発話が所定の内容を含む場合に、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Further, in the terminal device 10, when the utterance corresponding to the voice collected by the collection unit 153 includes a predetermined content, the transmission unit 157 transmits information about the voice to the server device 100 in accordance with permission. In this way, the terminal device 10 transmits information about the voice to the server device 100 when the utterance corresponding to the voice includes predetermined content, thereby transmitting the information about the voice collected by the terminal device 10 to other devices. can be properly sent to

また、端末装置１０において、送信部１５７は、サーバ装置１００により指定された所定の条件を満たす場合、許諾に応じて音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、サーバ装置１００により指定された条件を満たす場合に、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Further, in the terminal device 10, when a predetermined condition specified by the server device 100 is satisfied, the transmission section 157 transmits information about voice to the server device 100 in response to permission. In this way, the terminal device 10 transmits the information about the voice to the server device 100 when the condition specified by the server device 100 is satisfied, thereby transmitting the information about the voice collected by the terminal device 10 to another device. can be sent properly.

また、端末装置１０は、受付部１５５を有する。受付部１５５は、ユーザによる許諾を受け付ける。このように、端末装置１０は、ユーザによる許諾を受け付けることにより、ユーザの許諾があった場合にのみ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 The terminal device 10 also has a reception unit 155 . The accepting unit 155 accepts permission from the user. In this way, the terminal device 10 receives permission from the user, and by transmitting the information about the voice only when the user gives permission, the information about the voice collected by the terminal device 10 can be sent to other devices. can be sent properly.

また、端末装置１０において、送信部１５７は、受付部１５５によりユーザの許諾が受け付けられた後、所定のタイミングで音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、ユーザの許諾後の所定のタイミングで、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切なタイミングで送信することができる。 Further, in the terminal device 10, the transmission unit 157 transmits information about voice to the server device 100 at a predetermined timing after the user's permission is received by the reception unit 155. FIG. In this way, the terminal device 10 transmits the information about the voice to the server device 100 at a predetermined timing after the permission of the user, so that the information about the voice collected by the terminal device 10 can be sent to another device at an appropriate timing. can be sent by

また、端末装置１０において、送信部１５７は、通信環境が所定の通信条件を満たしている間に、音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、通信環境が所定の通信条件を満たしている間に、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切なタイミングで送信することができる。 Also, in the terminal device 10, the transmission unit 157 transmits information about voice to the server device 100 while the communication environment satisfies a predetermined communication condition. In this way, the terminal device 10 transmits information about voice to the server device 100 while the communication environment satisfies a predetermined communication condition, thereby transmitting information about voice collected by the terminal device 10 to other devices. can be sent at the right time.

また、端末装置１０において、送信部１５７は、ユーザによる端末装置の利用率が低いタイミングで、音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、ユーザによる端末装置１０の利用率が低いタイミングで、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切なタイミングで送信することができる。 Also, in the terminal device 10, the transmission unit 157 transmits the information about the voice to the server device 100 at a timing when the usage rate of the terminal device by the user is low. In this way, the terminal device 10 transmits the information about the voice to the server device 100 at the timing when the usage rate of the terminal device 10 by the user is low. It can be sent at the right time.

また、端末装置１０は、通知部１５４を有する。通知部１５４は、サーバ装置１００へ送信する候補となる音声に関する情報をユーザに通知する。このように、端末装置１０は、サーバ装置１００へ送信する候補となる音声に関する情報をユーザに通知することにより、ユーザにどのような情報がサーバ装置１００へ送信されるのかを認識させることができる。 The terminal device 10 also has a notification unit 154 . The notification unit 154 notifies the user of information about voices that are candidates for transmission to the server device 100 . In this way, the terminal device 10 notifies the user of the information about the voice that is a candidate for transmission to the server device 100, thereby allowing the user to recognize what kind of information is to be transmitted to the server device 100. .

また、端末装置１０において、送信部１５７は、通知部１５４により通知された候補のうち、ユーザが許諾した音声に関する情報をサーバ装置１００に送信する。このように、端末装置１０は、ユーザに候補を通知し、その中でユーザが許諾した音声について、サーバ装置１００へ音声に関する情報を送信することにより、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 Also, in the terminal device 10 , the transmission unit 157 transmits to the server device 100 information about the voices approved by the user among the candidates notified by the notification unit 154 . In this way, the terminal device 10 notifies the user of the candidates, and transmits the information about the voices that the user has approved among them to the server device 100, thereby collecting the information about the voices collected by the terminal device 10. It can be properly transmitted to other devices.

また、端末装置１０において、通知部１５４は、音声に関する情報がサーバ装置１００へ送信された場合にユーザに提供されるインセンティブを示す情報を通知する。このように、端末装置１０は、情報提供の対価としてユーザに提供されるインセンティブをユーザに知らせることにより、ユーザが情報提供を行う動機づけを与えることができ、端末装置１０から他の装置へ送信される情報量を増大させることができる。 Also, in the terminal device 10 , the notification unit 154 notifies the information indicating the incentive provided to the user when the information about the voice is transmitted to the server device 100 . In this way, the terminal device 10 can motivate the user to provide information by informing the user of the incentives provided to the user in exchange for providing information, and the information can be transmitted from the terminal device 10 to other devices. can increase the amount of information presented.

上述してきたように、本願に係る情報処理システム１は、ユーザに利用され、自装置で音声認識を行う端末装置１０と、サーバ装置１００と、を有する。サーバ装置１００は、端末装置１０に情報送信を要求する音声を指定する所定の条件を示す情報を、端末装置１０に送信する。端末装置１０は、サーバ装置１００から受信した所定の条件を満たす音声に関する情報を、ユーザの許諾に応じて、サーバ装置１００に送信する。 As described above, the information processing system 1 according to the present application includes the terminal device 10 and the server device 100 which are used by users and perform speech recognition on their own. The server device 100 transmits to the terminal device 10 information indicating a predetermined condition for designating a voice requesting the terminal device 10 to transmit information. The terminal device 10 transmits the information about the voice that satisfies the predetermined condition received from the server device 100 to the server device 100 in accordance with the user's permission.

このように、情報処理システム１は、ユーザが利用する端末装置１０が自装置内でユーザの発話の音声の音声認識行い、サーバ装置１００から指定された条件を満たす音声に関する情報を、端末装置１０がサーバに送信する。これにより、情報処理システム１は、端末装置１０が音声認識を行う場合であっても、端末装置１０で収集される音声に関する情報を他の装置へ適切に送信することができる。 In this manner, the information processing system 1 allows the terminal device 10 used by the user to perform voice recognition of the voice of the user's utterance within the own device, and the information about the voice that satisfies the conditions specified by the server device 100 is transmitted to the terminal device 10 . sends to the server. As a result, even when the terminal device 10 performs voice recognition, the information processing system 1 can appropriately transmit the information about the voice collected by the terminal device 10 to another device.

〔７．ハードウェア構成〕
また、上述した実施形態に係る端末装置１０やサーバ装置１００は、例えば図１０に示すような構成のコンピュータ１０００によって実現される。以下、サーバ装置１００を例に挙げて説明する。図１０は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力Ｉ／Ｆ（Interface）１０６０、入力Ｉ／Ｆ１０７０、ネットワークＩ／Ｆ１０８０がバス１０９０により接続された形態を有する。 [7. Hardware configuration]
Also, the terminal device 10 and the server device 100 according to the above-described embodiments are implemented by a computer 1000 configured as shown in FIG. 10, for example. The server apparatus 100 will be described below as an example. FIG. 10 is a diagram illustrating an example of a hardware configuration; The computer 1000 is connected to an output device 1010 and an input device 1020, and an arithmetic device 1030, a primary storage device 1040, a secondary storage device 1050, an output I/F (Interface) 1060, an input I/F 1070, and a network I/F 1080 are buses. It has a form connected by 1090.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。演算装置１０３０は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等により実現される。 The arithmetic device 1030 operates based on programs stored in the primary storage device 1040 and the secondary storage device 1050, programs read from the input device 1020, and the like, and executes various processes. The arithmetic unit 1030 is implemented by, for example, a CPU (Central Processing Unit), MPU (Micro Processing Unit), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), or the like.

一次記憶装置１０４０は、ＲＡＭ（Random Access Memory）等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等により実現される。二次記憶装置１０５０は、内蔵ストレージであってもよいし、外付けストレージであってもよい。また、二次記憶装置１０５０は、ＵＳＢメモリやＳＤ（Secure Digital）メモリカード等の取り外し可能な記憶媒体であってもよい。また、二次記憶装置１０５０は、クラウドストレージ（オンラインストレージ）やＮＡＳ（Network Attached Storage）、ファイルサーバ等であってもよい。 The primary storage device 1040 is a memory device such as a RAM (Random Access Memory) that temporarily stores data used for various calculations by the arithmetic device 1030 . The secondary storage device 1050 is a storage device in which data used for various calculations by the arithmetic device 1030 and various databases are registered. State Drive), flash memory, or the like. The secondary storage device 1050 may be an internal storage or an external storage. Also, the secondary storage device 1050 may be a removable storage medium such as a USB memory or an SD (Secure Digital) memory card. Also, the secondary storage device 1050 may be a cloud storage (online storage), a NAS (Network Attached Storage), a file server, or the like.

出力Ｉ／Ｆ１０６０は、ディスプレイ、プロジェクタ、及びプリンタ等といった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインターフェイスであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力Ｉ／Ｆ１０７０は、マウス、キーボード、キーパッド、ボタン、及びスキャナ等といった各種の入力装置１０２０から情報を受信するためのインターフェイスであり、例えば、ＵＳＢ等により実現される。 The output I/F 1060 is an interface for transmitting information to be output to the output device 1010 that outputs various information such as a display, a projector, and a printer. (Digital Visual Interface), HDMI (registered trademark) (High Definition Multimedia Interface), and other standardized connectors. Also, the input I/F 1070 is an interface for receiving information from various input devices 1020 such as a mouse, keyboard, keypad, buttons, scanner, etc., and is realized by, for example, USB.

また、出力Ｉ／Ｆ１０６０及び入力Ｉ／Ｆ１０７０はそれぞれ出力装置１０１０及び入力装置１０２０と無線で接続してもよい。すなわち、出力装置１０１０及び入力装置１０２０は、ワイヤレス機器であってもよい。 Also, the output I/F 1060 and the input I/F 1070 may be wirelessly connected to the output device 1010 and the input device 1020, respectively. That is, the output device 1010 and the input device 1020 may be wireless devices.

また、出力装置１０１０及び入力装置１０２０は、タッチパネルのように一体化していてもよい。この場合、出力Ｉ／Ｆ１０６０及び入力Ｉ／Ｆ１０７０も、入出力Ｉ／Ｆとして一体化していてもよい。 Also, the output device 1010 and the input device 1020 may be integrated like a touch panel. In this case, the output I/F 1060 and the input I/F 1070 may also be integrated as an input/output I/F.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、又は半導体メモリ等から情報を読み出す装置であってもよい。 Note that the input device 1020 includes, for example, optical recording media such as CDs (Compact Discs), DVDs (Digital Versatile Discs), PDs (Phase change rewritable discs), magneto-optical recording media such as MOs (Magneto-Optical discs), and tapes. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like.

ネットワークＩ／Ｆ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 Network I/F 1080 receives data from other devices via network N and sends the data to arithmetic device 1030, and also transmits data generated by arithmetic device 1030 via network N to other devices.

演算装置１０３０は、出力Ｉ／Ｆ１０６０や入力Ｉ／Ｆ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 Arithmetic device 1030 controls output device 1010 and input device 1020 via output I/F 1060 and input I/F 1070 . For example, arithmetic device 1030 loads a program from input device 1020 or secondary storage device 1050 onto primary storage device 1040 and executes the loaded program.

例えば、コンピュータ１０００がサーバ装置１００として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、コンピュータ１０００の演算装置１０３０は、ネットワークＩ／Ｆ１０８０を介して他の機器から取得したプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行してもよい。また、コンピュータ１０００の演算装置１０３０は、ネットワークＩ／Ｆ１０８０を介して他の機器と連携し、プログラムの機能やデータ等を他の機器の他のプログラムから呼び出して利用してもよい。 For example, when the computer 1000 functions as the server device 100 , the arithmetic device 1030 of the computer 1000 implements the functions of the control unit 130 by executing a program loaded on the primary storage device 1040 . Further, arithmetic device 1030 of computer 1000 may load a program acquired from another device via network I/F 1080 onto primary storage device 1040 and execute the loaded program. Further, the arithmetic unit 1030 of the computer 1000 may cooperate with another device via the network I/F 1080, and call functions, data, etc. of the program from another program of the other device for use.

〔８．その他〕
以上、本願の実施形態を説明したが、これら実施形態の内容により本発明が限定されるものではない。また、前述した構成要素には、当業者が容易に想定できるもの、実質的に同一のもの、いわゆる均等の範囲のものが含まれる。さらに、前述した構成要素は適宜組み合わせることが可能である。さらに、前述した実施形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換又は変更を行うことができる。 [8. others〕
Although the embodiments of the present application have been described above, the present invention is not limited by the contents of these embodiments. In addition, the components described above include those that can be easily assumed by those skilled in the art, those that are substantially the same, and those within the so-called equivalent range. Furthermore, the components described above can be combined as appropriate. Furthermore, various omissions, replacements, or modifications of components can be made without departing from the gist of the above-described embodiments.

また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 Further, among the processes described in the above embodiments, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed manually. All or part of this can also be done automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the illustrated one, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

例えば、上述したサーバ装置１００は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットホーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティング等で呼び出して実現するなど、構成は柔軟に変更できる。 For example, the server device 100 described above may be implemented by a plurality of server computers, and depending on the function, an external platform may be called using an API (Application Programming Interface), network computing, or the like. Flexible to change.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Also, the above-described embodiments and modifications can be appropriately combined within a range that does not contradict the processing contents.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Also, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as acquisition means or an acquisition circuit.

１情報処理システム
１０端末装置
１１通信部
１２入力部
１３出力部
１４記憶部
１４１モデル情報記憶部
１４２収集情報記憶部
１５制御部
１５１受信部
１５２音声認識部
１５３収集部
１５４通知部
１５５受付部
１５６判定部
１５７送信部
１６センサ部
１６１音声センサ
１００サーバ装置
１１０通信部
１２０記憶部
１２１モデル情報記憶部
１２２学習用データ情報記憶部
１３０制御部
１３１取得部
１３２決定部
１３３学習部
１３４送信部 1 information processing system 10 terminal device 11 communication unit 12 input unit 13 output unit 14 storage unit 141 model information storage unit 142 collected information storage unit 15 control unit 151 reception unit 152 speech recognition unit 153 collection unit 154 notification unit 155 reception unit 156 determination Unit 157 Transmission Unit 16 Sensor Unit 161 Voice Sensor 100 Server Device 110 Communication Unit 120 Storage Unit 121 Model Information Storage Unit 122 Learning Data Information Storage Unit 130 Control Unit 131 Acquisition Unit 132 Determination Unit 133 Learning Unit 134 Transmission Unit

Claims

A terminal device that is used by a user and performs speech recognition on its own device,
a collection unit that associates the voice uttered by the user with the recognition result of the utterance obtained by the voice recognition, and collects them in a storage unit within the device;
a transmitting unit configured to transmit information about the voice to a server device in response to permission from the user when the voice collected by the collecting unit satisfies a predetermined condition;
A terminal device comprising:

The transmission unit
2. The terminal device according to claim 1, wherein said voice data is transmitted to said server device.

The transmission unit
3. The terminal device according to claim 1, wherein the waveform data of the voice is transmitted to the server device.

The transmission unit
The terminal device according to any one of claims 1 to 3, wherein data obtained by compressing the voice data is transmitted to the server device.

The transmission unit
The terminal device according to any one of claims 1 to 4, wherein the feature information extracted from the voice is transmitted to the server device.

The transmission unit
6. The apparatus according to any one of claims 1 to 5, wherein, when the number of said voices collected by said collecting unit is equal to or greater than a predetermined number, information about said voices is transmitted to said server device according to said permission. The terminal device described in .

The transmission unit
7. The apparatus according to any one of claims 1 to 6, characterized in that, when the score relating to the speech recognition of the speech collected by the collecting unit satisfies a predetermined condition, the information relating to the speech is transmitted to the server device according to the permission. The terminal device according to any one of items 1 and 2.

The transmission unit
8. The apparatus according to any one of claims 1 to 7, wherein when the voice collected by the collecting unit satisfies a condition regarding noise, the information regarding the voice is transmitted to the server device according to the permission. terminal equipment.

The transmission unit
9. The apparatus according to any one of claims 1 to 8, characterized in that, when the utterance corresponding to the voice collected by the collecting unit includes predetermined content, the information about the voice is transmitted to the server device according to the permission. The terminal device according to item 1.

The transmission unit
10. The terminal according to any one of claims 1 to 9, wherein, when the predetermined condition specified by the server device is satisfied, the information about the voice is transmitted to the server device according to the permission. Device.

a reception unit that receives the permission from the user;
The terminal device according to any one of claims 1 to 10, characterized by comprising:

The transmission unit
12. The terminal device according to claim 11, wherein the information about the voice is transmitted to the server device at a predetermined timing after the permission of the user is received by the receiving unit.

The transmission unit
13. The terminal device according to claim 12, wherein the information about the voice is transmitted to the server device while the communication environment satisfies a predetermined communication condition.

The transmission unit
14. The terminal device according to claim 12 or 13, wherein the information about the voice is transmitted to the server device at a timing when the usage rate of the terminal device by the user is low.

a notification unit that notifies the user of information about the voice that is a candidate for transmission to the server device;
The terminal device according to any one of claims 1 to 14, further comprising:

The transmission unit
16. The terminal device according to claim 15, wherein, among the candidates notified by the notification unit, information about the voice approved by the user is transmitted to the server device.

The notification unit
17. The terminal device according to claim 15 or 16, wherein information indicating an incentive provided to the user is notified when the information about the voice is transmitted to the server device.

A transmission method executed by a terminal device that is used by a user and performs speech recognition on its own device,
a collection step of associating the voice uttered by the user with the recognition result of the utterance obtained by the voice recognition, and collecting the results in a storage unit within the own device;
a transmitting step of transmitting information about the voice to a server device in response to permission from the user when the voice collected in the collecting step satisfies a predetermined condition;
A transmission method comprising:

A transmission program that is used by a user and executed by a terminal device that performs speech recognition on its own device,
a collection procedure of associating the voice uttered by the user with the recognition result of the utterance obtained by the voice recognition, and collecting the result in a storage unit within the own device;
a transmitting step of transmitting information about the voice to a server device in accordance with the user's permission when the voice collected by the collecting step satisfies a predetermined condition;
A transmission program characterized by causing a terminal device to execute

A terminal device that is used by a user and performs speech recognition on its own device;
a server device;
has
The server device
transmitting to the terminal device information indicating a predetermined condition specifying a voice requesting information transmission from the terminal device;
The terminal device
An information processing system, wherein the information about the voice that has been received from the server device and that satisfies the predetermined condition is transmitted to the server device according to the permission of the user.