JP7139937B2

JP7139937B2 - Speech processing system, job generation device, job generation method and job generation program

Info

Publication number: JP7139937B2
Application number: JP2018241775A
Authority: JP
Inventors: 俊和川口; 智章中島; 一美澤柳
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2018-12-25
Filing date: 2018-12-25
Publication date: 2022-09-21
Anticipated expiration: 2038-12-25
Also published as: JP2020102171A

Description

この発明は、音声処理システム、ジョブ生成装置、ジョブ生成方法およびジョブ生成プログラムに関し、特に、音声に基づいてジョブを生成する音声処理システム、音声に基づいてジョブを生成するジョブ生成装置、その音声処理システムまたはそのジョブ生成装置で実行されるジョブ生成方法およびコンピューターに音声に基づいてジョブを生成させるジョブ生成プログラムに関する。 TECHNICAL FIELD The present invention relates to a voice processing system, a job generation device, a job generation method, and a job generation program, and more particularly to a voice processing system that generates a job based on voice, a job generation device that generates a job based on voice, and its voice processing. The present invention relates to a job generation method executed by a system or its job generation device and a job generation program for causing a computer to generate a job based on voice.

一般的に、オフィスに複合機（以下「ＭＦＰ」という）が配置されている。ユーザーは、画像データの印刷、原稿の複写、原稿の読取、画像データの記憶、画像データの送信などの処理をＭＦＰに実行させる。ＭＦＰに処理を実行させる作業が、依頼者から別の受諾者に依頼される場合がある。この場合、受諾者がＭＦＰを操作して作業することになる。受諾者は、依頼された作業をＭＦＰに実行させるために、ＭＦＰに動作条件を設定しなければならない。受諾者が、依頼者から伝達された作業の内容を誤って理解した場合、または、ＭＦＰに設定する動作条件を誤って設定する場合などは、依頼者が依頼した作業がＭＦＰにより実行されない場合がある。 2. Description of the Related Art Generally, a multifunction peripheral (hereinafter referred to as "MFP") is installed in an office. The user causes the MFP to execute processes such as printing image data, copying documents, reading documents, storing image data, and transmitting image data. A requester may request another acceptor to perform a process on the MFP. In this case, the acceptor operates the MFP to perform the work. The acceptor must set operating conditions for the MFP in order to have the MFP execute the requested work. If the acceptor misunderstands the content of the work transmitted from the requester, or sets the operating conditions to be set in the MFP incorrectly, the work requested by the requester may not be executed by the MFP. be.

例えば、特開２００３－１１４７７９号公報には、ユーザーが任意の設定項目についてのフリーワードを入力する手段と、前記入力されたフリーワードに応じて設定項目の設定を行なう手段と、前記設定された設定項目に応じて定義ファイルを生成し保存する定義ファイル生成保存手段とを有することを特徴とする画像形成装置の設定処理装置が記載されている。しかしながら、特開２００３－１１４７７９号公報に記載の設定処理装置によれば、１人の操作者がジョブを設定することができるが、複数のユーザーのうち一方のユーザーが他方のユーザーにジョブを依頼する場合、他方のユーザーは依頼内容に基づいてフリーワードを設定処理装置に入力しなおさなければならず、ジョブを設定する操作が煩雑になるといった問題がある。 For example, Japanese Unexamined Patent Application Publication No. 2003-114779 discloses means for allowing the user to input a free word for any setting item, means for setting the setting item according to the input free word, and A setting processing apparatus for an image forming apparatus is described, comprising a definition file generating and saving means for generating and saving a definition file according to setting items. However, according to the setting processing device described in Japanese Patent Application Laid-Open No. 2003-114779, although one operator can set a job, one user out of a plurality of users requests the other user for a job. In this case, the other user must re-enter the free word into the setting processing device based on the content of the request, which complicates the operation of setting the job.

特開２００３－１１４７７９号公報JP-A-2003-114779

この発明の目的の一つは、ジョブを設定する操作を容易にした音声処理システムを提供することである。 One of the objects of the present invention is to provide a voice processing system that facilitates job setting operations.

この発明の他の目的は、ジョブを設定する操作を容易にしたジョブ生成装置を提供することである。 Another object of the present invention is to provide a job generating apparatus that facilitates the operation of setting jobs.

この発明の他の目的は、ジョブを設定する操作を容易にしたジョブ生成方法を提供することである。 Another object of the present invention is to provide a job generation method that facilitates the operation of setting a job.

この発明の他の目的は、ジョブを設定する操作を容易にしたジョブ生成プログラムを提供することである。 Another object of the present invention is to provide a job generation program that facilitates the operation of setting jobs.

この発明のある局面によれば、音声処理システムは、音声を収集する複数の音声収集装置と、画像処理装置が実行するためのジョブを生成するジョブ生成装置と、を備え、複数の音声収集装置およびジョブ生成装置のいずれかは、発声したユーザーを特定するユーザー特定手段を備え、ジョブ生成装置は、複数の音声収集装置のうち第１装置で収集される音声であってユーザー特定手段により第１ユーザーが特定される音声および複数の音声収集装置のうち第１装置と異なる第２装置で収集される音声であってユーザー特定手段により第１ユーザーと異なる第２ユーザーが特定される音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブをジョブとして生成するジョブ生成手段を備える。 According to one aspect of the present invention, a sound processing system includes a plurality of sound collecting devices for collecting sounds, and a job generating device for generating a job to be executed by an image processing device, wherein the plurality of sound collecting devices and the job generation device includes user identification means for identifying a user who has made the utterance, and the job generation device selects the voice collected by the first device among the plurality of voice collection devices and is the first voice by the user identification means. Based on the voice that identifies the user and the voice that is collected by a second device that is different from the first device among a plurality of voice collecting devices and that identifies a second user that is different from the first user by the user identifying means and job generation means for generating a requested job, which the first user has requested the second user to execute, as a job.

この局面に従えば、ジョブ生成装置により、第１装置で収集された音声から第１ユーザーが特定される場合、その音声および第２装置で収集される第２ユーザーが発声する音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブが画像処理装置に実行させるためのジョブとして生成される。このため、離れた位置に存在する第１ユーザーおよび第２ユーザーの会話から依頼ジョブが生成されるので、ジョブを設定する操作を容易にした音声処理システムを提供することができる。 According to this aspect, when the job generation device identifies the first user from the voice collected by the first device, based on the voice and the voice uttered by the second user collected by the second device, A requested job that the first user has requested the second user to execute is generated as a job to be executed by the image processing apparatus. Therefore, since the requested job is generated from the conversation of the first user and the second user who are located at separate locations, it is possible to provide a voice processing system that facilitates the operation of setting the job.

好ましくは、ジョブ生成装置は、さらに、ユーザー特定手段により第１ユーザーが特定される音声から予め登録された複数のユーザーいずれかを識別するためのユーザー識別情報が検出されることに応じて、第１ユーザーが特定される音声から検出されたユーザー識別情報で識別されるユーザーを第２ユーザーに決定し、他の１以上の音声収集装置のうち第２ユーザーが発声した音声を収集する音声収集装置を第２装置に決定する装置決定手段を、備える。 Preferably, the job generation device further performs the first user identification information for identifying one of the plurality of pre-registered users from the voice identifying the first user by the user identification means. A voice collecting device that determines a user identified by user identification information detected from a voice identified by one user as a second user, and collects the voice uttered by the second user among one or more other voice collecting devices. to be the second device.

この局面に従えば、第１ユーザーが特定される音声から検出されたユーザー識別情報で識別されるユーザーが第２ユーザーに決定され、第２ユーザーが発声した音声を収集する音声収集装置が第２装置に決定される。このため、第１ユーザーと会話する第２ユーザーが発声した音声を収集する音声収集装置を容易に決定することができる。 According to this aspect, the user identified by the user identification information detected from the voice identifying the first user is determined to be the second user, and the voice collection device that collects the voice uttered by the second user is the second user. Determined by the device. Therefore, it is possible to easily determine the voice collecting device that collects the voice uttered by the second user who is conversing with the first user.

好ましくは、装置決定手段は、第２ユーザーが発声した音声を収集する音声収集装置が複数の場合、第２ユーザーが発声した音声を収集する複数の音声収集装置のうちで収集される音声の音量が最大の音声収集装置を第２装置に決定する。 Preferably, when there are a plurality of voice collecting devices that collect the voice uttered by the second user, the device determining means determines the volume of the voice collected among the plurality of voice collecting devices that collect the voice uttered by the second user. is determined to be the second device.

この局面に従えば、第２ユーザーが発声した音声を収集する複数の音声収集装置のうちで収集される音声の音量が最大の音声収集装置がペアリング装置に決定されるので、音声認識の精度を高めることができる。 According to this aspect, since the voice collecting device with the highest volume of the collected voice among the plurality of voice collecting devices collecting the voice uttered by the second user is determined as the pairing device, the accuracy of voice recognition is improved. can increase

好ましくは、ジョブ生成手段は、第２装置で収集される音声が許諾の内容を示す場合、依頼ジョブを生成する。 Preferably, the job generating means generates the requested job when the voice collected by the second device indicates the contents of the permission.

この局面に従えば、第１ユーザーによる依頼を第２ユーザーが受けない場合にジョブを生成しないようにすることができる。 According to this aspect, it is possible not to generate a job when the second user does not accept the request by the first user.

好ましくは、ジョブ生成装置は、複数の音声収集装置のいずれかである。 Preferably, the job generator is any one of a plurality of audio collection devices.

好ましくは、ジョブ生成装置は、画像処理装置である。 Preferably, the job generation device is an image processing device.

この発明のさらに他の局面によれば、ジョブ生成装置は、画像処理装置が実行するためのジョブを生成するジョブ生成装置であって、音声を発声したユーザーを特定するユーザー特定手段と、複数の音声収集装置のうち第１装置で収集される音声であってユーザー特定手段により第１ユーザーが特定される音声と、複数の音声収集装置のうち第１装置と異なる第２装置で収集される音声であってユーザー特定手段により第１ユーザーと異なる第２ユーザーが特定される音声と、に基づいて第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブをジョブとして生成するジョブ生成手段と、を備える。 According to still another aspect of the present invention, a job generating apparatus is a job generating apparatus for generating a job to be executed by an image processing apparatus, comprising user identifying means for identifying a user who has uttered a voice; Voice collected by a first device out of the voice collecting devices and the first user is specified by the user specifying means, and voice collected by a second device different from the first device out of the plurality of voice collecting devices and a job generating means for generating a requested job, which the first user has requested the second user to execute, as a job based on the voice by which the second user different from the first user is specified by the user specifying means. Prepare.

この局面に従えば、第１装置で収集された音声から第１ユーザーが特定される場合、その音声および第２装置で収集される第２ユーザーが発声する音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブが画像処理装置に実行させるためのジョブとして生成される。このため、離れた位置に存在する第１ユーザーおよび第２ユーザーの会話から依頼ジョブが生成されるので、ジョブを設定する操作を容易にしたジョブ生成装置を提供することができる。 According to this aspect, when the first user is identified from the voice collected by the first device, the first user identifies the first user based on the voice and the voice uttered by the second user collected by the second device. 2. The requested job requested to be executed by the user is generated as a job to be executed by the image processing apparatus. Therefore, since the requested job is generated from the conversation between the first user and the second user who are located at separate locations, it is possible to provide a job generating apparatus that facilitates the operation of setting a job.

好ましくは、ユーザー特定手段により第１ユーザーが特定される音声から予め登録された複数のユーザーのいずれかを識別するためのユーザー識別情報が検出されることに応じて、第１ユーザーが特定される音声から検出されたユーザー識別情報で識別されるユーザーを第２ユーザーに決定し、複数の音声収集装置のうち第２ユーザーが発声した音声を収集する音声収集装置を第２装置に決定する装置決定手段と、を備える。 Preferably, the first user is identified in response to detection of user identification information for identifying one of a plurality of pre-registered users from the voice identifying the first user by the user identifying means. Determining the user identified by the user identification information detected from the voice as the second user, and determining the voice collecting device for collecting the voice uttered by the second user among the plurality of voice collecting devices as the second device. a means;

好ましくは、装置決定手段は、第２ユーザーが発声した音声を収集する音声収集装置が複数の場合、第２ユーザーが発声した音声を収集する複数の音声収集装置のうち収集される音声の音量が最大の音声収集装置を第２装置に決定する。 Preferably, when there are a plurality of voice collecting devices that collect the voice uttered by the second user, the device determining means determines the volume of the voice collected from among the plurality of voice collecting devices that collect the voice uttered by the second user. Determine the largest sound collection device to be the second device.

この局面に従えば、第２ユーザーが発声した音声を収集する複数の音声収集装置のうちで収集される音声の音量が最大の音声収集装置が第２装置に決定されるので、音声認識の精度を高めることができる。 According to this aspect, the voice collecting device with the highest volume of the collected voice among the plurality of voice collecting devices collecting the voice uttered by the second user is determined as the second device, so the accuracy of voice recognition is improved. can increase

好ましくは、ジョブ生成手段は、第２装置で収集された音声が許諾の内容を示す場合、依頼ジョブを生成する。 Preferably, the job generation means generates the requested job when the voice collected by the second device indicates the content of the permission.

この発明のさらに他の局面に従えば、ジョブ生成方法は、音声を収集する複数の音声収集装置と、画像処理装置が実行するためのジョブを生成するジョブ生成装置と、を備えた音声処理システムで実行されるジョブ制御方法であって、発声したユーザーを特定するユーザー特定ステップを、複数の音声収集装置およびジョブ生成装置のいずれかに実行させ、複数の音声収集装置のうち第１装置で収集される音声であってユーザー特定ステップにおいて第１ユーザーが特定される音声および複数の音声収集装置のうち第１装置とは異なる第２装置で収集される音声であってユーザー特定ステップにおいて第１ユーザーとは異なる第２ユーザーが特定される音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブをジョブとして生成するジョブ生成ステップを、ジョブ生成装置に実行させる。 According to still another aspect of the present invention, a job generating method is a sound processing system comprising a plurality of sound collecting devices for collecting sound and a job generating device for generating a job to be executed by an image processing device. wherein a user identifying step of identifying a user who has spoken is performed by one of the plurality of voice collecting devices and the job generating device, and the first device among the plurality of voice collecting devices collects the user identification step. and a voice collected by a second device different from the first device among a plurality of voice collection devices, and a voice collected by the first user in the user identification step The job generating device is caused to execute a job generating step of generating as a job the requested job that the first user has requested the second user to execute, based on the voice specifying the second user different from the second user.

この局面に従えば、ジョブを設定する操作を容易にしたジョブ生成方法を提供することができる。 According to this aspect, it is possible to provide a job generation method that facilitates the operation of setting a job.

この発明のさらに他の局面に従えば、ジョブ生成方法は、画像処理装置が実行するためのジョブを生成するジョブ生成装置で実行されるジョブ生成方法であって、音声を発声したユーザーを特定するユーザー特定ステップと、複数の音声収集装置のうち第１装置で収集される音声であってユーザー特定ステップにおいて第１ユーザーが特定される音声と、複数の音声収集装置のうち第１装置とは異なる第２装置で収集される音声であってユーザー特定ステップにおいて第１ユーザーとは異なる第２ユーザーが特定される音声と、に基づいて第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブを画像処理装置に実行させるジョブとして生成するジョブ生成ステップと、をジョブ生成装置に実行させる。 According to still another aspect of the present invention, a job generation method is a job generation method executed by a job generation device that generates a job to be executed by an image processing device, wherein a user who has uttered a voice is specified. a user identifying step, a sound collected by a first device out of the plurality of sound collecting devices, the sound identifying the first user in the user identifying step, and a sound different from the first device out of the plurality of sound collecting devices An image of the requested job that the first user has requested the second user to execute based on the voice collected by the second device and the voice that specifies the second user different from the first user in the user specifying step. and a job generation step for generating a job to be executed by the processing device.

この発明のさらに他の局面に従えば、ジョブ生成プログラムは、画像処理装置が実行するためのジョブを生成するジョブ生成装置を制御するコンピューターにより実行されるジョブ生成プログラムであって、音声を発声したユーザーを特定するユーザー特定ステップと、複数の音声収集装置のうち第１装置で収集される音声であってユーザー特定ステップにおいて第１ユーザーが特定される音声と、複数の音声収集装置のうち第１装置とは異なる第２装置で収集される音声であってユーザー特定ステップにおいて第１ユーザーとは異なる第２ユーザーが特定される音声と、に基づいて第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブをジョブとして生成するジョブ生成ステップと、をコンピューターに実行させる。 According to still another aspect of the present invention, the job generation program is a job generation program executed by a computer that controls the job generation device for generating a job to be executed by the image processing device, the job generation program generating a voice. a user identifying step of identifying a user; a sound collected by a first device out of a plurality of sound collecting devices that identifies the first user in the user identifying step; The first user requested the second user to perform based on the voice collected by the second device different from the device and the voice that identifies the second user different from the first user in the user identification step and a job generation step of generating the requested job as a job.

この局面に従えば、ジョブを設定する操作を容易にしたジョブ生成プログラムを提供することができる。 According to this aspect, it is possible to provide a job generation program that facilitates the operation of setting a job.

本発明の第１の実施の形態の１つにおける音声処理システムの全体概要を示す図である。1 is a diagram showing an overall overview of a speech processing system in one of the first embodiments of the present invention; FIG. 第１の実施の形態におけるスマートスピーカーのハードウェア構成の概要の一例を示すブロック図である。2 is a block diagram showing an example of an overview of the hardware configuration of a smart speaker according to the first embodiment; FIG. 第１の実施の形態におけるＭＦＰのハードウェア構成の概要を示すブロック図である。2 is a block diagram showing an overview of the hardware configuration of the MFP according to the first embodiment; FIG. 第１の実施の形態におけるスマートスピーカーが備えるＣＰＵが有する機能の一例を示すブロック図である。3 is a block diagram showing an example of functions of a CPU included in the smart speaker according to the first embodiment; FIG. キーワードテーブルの一例を示す図である。It is a figure which shows an example of a keyword table. 第１の実施の形態におけるＭＦＰが備えるＣＰＵが有する機能の一例を示すブロック図である。3 is a block diagram showing an example of functions of a CPU included in the MFP according to the first embodiment; FIG. ログイン画面の一例を示す図である。It is a figure which shows an example of a login screen. 第１の実施の形態におけるジョブ生成の流れの一例を示すフローチャートである。4 is a flow chart showing an example of the flow of job generation in the first embodiment; ユーザー特定処理の流れの一例を示すフローチャートである。7 is a flowchart showing an example of the flow of user identification processing; ペアリング処理の流れの一例を示すフローチャートである。4 is a flowchart showing an example of the flow of pairing processing; ジョブ生成サブ処理の流れの一例を示すフローチャートである。9 is a flowchart showing an example of the flow of job generation sub-processing; ジョブ実行処理の流れの一例を示すフローチャートである。7 is a flowchart illustrating an example of the flow of job execution processing; 実行指示処理の流れの一例を示すフローチャートである。7 is a flowchart showing an example of the flow of execution instruction processing; 第２の実施の形態におけるスマートスピーカーが備えるＣＰＵが有する機能の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of functions of a CPU included in the smart speaker according to the second embodiment; FIG. 第２の実施の形態におけるＭＦＰが備えるＣＰＵが有する機能の一例を示すブロック図である。FIG. 11 is a block diagram showing an example of functions of a CPU included in the MFP according to the second embodiment; FIG.

以下、本発明の実施の形態について図面を参照して説明する。以下の説明では同一の部品には同一の符号を付してある。それらの名称および機能も同じである。従ってそれらについての詳細な説明は繰返さない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are given the same reference numerals. Their names and functions are also the same. A detailed description thereof will therefore not be repeated.

＜第１の実施の形態＞
図１は、本発明の第１の実施の形態の１つにおける音声処理システムの全体概要を示す図である。図１を参照して、音声処理システム１は、ＭＦＰ（ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）１００と、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃと、サーバー４００と、を含む。 <First embodiment>
FIG. 1 is a diagram showing an overall outline of a speech processing system in one of the first embodiments of the present invention. Referring to FIG. 1 , audio processing system 1 includes MFP (Multi Function Peripheral) 100 , smart speakers 200 , 200 A, 200 B and 200 C, and server 400 .

ＭＦＰ１００は、画像処理装置の一例である。スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃは、ジョブ生成装置の一例である。スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれは、ＭＦＰ１００に実行させる処理を定めたジョブを生成し、ＭＦＰ１００は、ジョブを実行する。ＭＦＰ１００は、ネットワーク３と接続されている。ネットワーク３には、アクセスポイント（ＡＰ）９、９Ａが接続されている。ＡＰ９、９Ａは、無線通信機能を有する中継装置である。スマートスピーカー２００，２００Ａそれぞれは、ＡＰ９と通信することによりネットワーク３に接続され、スマートスピーカー２００Ｂ，２００Ｃそれぞれは、ＡＰ９Ａと通信することによりネットワーク３に接続される。このため、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれは、互いに通信可能であるとともに、ＭＦＰ１００と通信可能である。ネットワーク３は、例えば、ローカルエリアネットワーク（ＬＡＮ）である。ネットワーク３において、その接続形態は有線または無線を問わない。また、ネットワーク３は、ワイドエリアネットワーク（ＷＡＮ）、公衆交換電話網（ＰＳＴＮ）、インターネット等であってもよい。スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれの機能およびハードウェア構成は同じなので、ここでは、特に言及しない限りスマートスピーカー２００を例に説明する。 MFP 100 is an example of an image processing apparatus. Smart speakers 200, 200A, 200B, and 200C are examples of job generation devices. Each of smart speakers 200, 200A, 200B, and 200C generates a job defining a process to be executed by MFP 100, and MFP 100 executes the job. MFP 100 is connected to network 3 . Access points (AP) 9 and 9A are connected to the network 3 . APs 9 and 9A are relay devices having wireless communication functions. Each of smart speakers 200 and 200A is connected to network 3 by communicating with AP9, and each of smart speakers 200B and 200C is connected to network 3 by communicating with AP9A. Therefore, smart speakers 200 , 200 A, 200 B, and 200 C can communicate with each other and with MFP 100 . Network 3 is, for example, a local area network (LAN). In the network 3, the form of connection may be wired or wireless. Network 3 may also be a wide area network (WAN), a public switched telephone network (PSTN), the Internet, or the like. Since smart speakers 200, 200A, 200B, and 200C have the same functions and hardware configurations, smart speaker 200 will be described here as an example unless otherwise specified.

ゲートウェイ（Ｇ／Ｗ）装置７は、ネットワーク３に接続されるとともに、インターネット５に接続される。ゲートウェイ装置７は、ネットワーク３とインターネット５とを中継する。サーバー４００は、インターネット５に接続される。このため、ＭＦＰ１００、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれは、ゲートウェイ装置７を介して、サーバー４００と通信可能である。 A gateway (G/W) device 7 is connected to the network 3 and the Internet 5 . Gateway device 7 relays network 3 and Internet 5 . Server 400 is connected to Internet 5 . Therefore, each of MFP 100 and smart speakers 200 , 200 A, 200 B, and 200 C can communicate with server 400 via gateway device 7 .

サーバー４００は、音声を発声したユーザーを認証する認証機能を有する。例えば、サーバー４００は、予め登録された複数のユーザーそれぞれの声紋を示す声紋情報を登録している。サーバー４００は、複数のユーザーの声紋を用いてユーザーを特定する。本実施の形態においては、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃは、それぞれで収集された音声の声紋をサーバー４００に送信する。サーバー４００は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれから送信される声紋に基づいて、その声紋のユーザーを特定し、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれに返信する。 The server 400 has an authentication function for authenticating a user who speaks. For example, the server 400 registers voiceprint information indicating the voiceprints of each of a plurality of pre-registered users. The server 400 identifies users using voiceprints of multiple users. In the present embodiment, smart speakers 200 , 200 A, 200 B, and 200 C transmit voiceprints of sounds collected by each to server 400 . Based on the voiceprint transmitted from each of smart speakers 200, 200A, 200B, and 200C, server 400 identifies the user of the voiceprint, and replies to each of smart speakers 200, 200A, 200B, and 200C.

図２は、第１の実施の形態におけるスマートスピーカーのハードウェア構成の概要の一例を示すブロック図である。図２を参照して、スマートスピーカー２００は、スマートスピーカー２００の全体を制御するための中央演算処理装置（ＣＰＵ）２０１と、ＣＰＵ２０１が実行するためのプログラムを記憶するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２と、ＣＰＵ２０１の作業領域として使用されるＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３と、データを不揮発的に記憶するＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）２０４と、ＣＰＵ２０１をネットワーク３に接続する通信部２０５と、情報を表示する表示部２０６と、ユーザーの操作の入力を受け付ける操作部２０７と、マイクロフォン２０８と、スピーカー２０９と、シリアルインターフェース２１０と、を含む。 FIG. 2 is a block diagram showing an example of an overview of the hardware configuration of the smart speaker according to the first embodiment. Referring to FIG. 2, smart speaker 200 includes a central processing unit (CPU) 201 for controlling the entire smart speaker 200, and a ROM (Read Only Memory) 202 for storing programs to be executed by CPU 201. , a RAM (Random Access Memory) 203 used as a work area for the CPU 201, an EPROM (Erasable Programmable ROM) 204 for storing data in a non-volatile manner, a communication unit 205 for connecting the CPU 201 to the network 3, and for displaying information. It includes a display unit 206 , an operation unit 207 that receives user operation input, a microphone 208 , a speaker 209 , and a serial interface 210 .

ＣＰＵ２０１は、インターネット５に接続されたコンピューターからプログラムをダウンロードしてＥＰＲＯＭ２０４に記憶する。また、ネットワーク３に接続されたコンピューターがプログラムをＥＰＲＯＭ２０４に書込みする場合に、ＥＰＲＯＭ２０４にプログラムが記憶される。ＣＰＵ２０１は、ＥＰＲＯＭ２０４に記憶されたプログラムをＲＡＭ２０３にロードして実行する。 CPU 201 downloads a program from a computer connected to Internet 5 and stores it in EPROM 204 . Also, when a computer connected to network 3 writes a program into EPROM 204 , the program is stored in EPROM 204 . The CPU 201 loads a program stored in the EPROM 204 into the RAM 203 and executes it.

マイクロフォン２０８は、音声を収集し、収集した音声を電気信号に変換する。マイクロフォン２０８は、音声から電気信号に変換された音声データをＣＰＵ２０１に出力する。 Microphone 208 collects sound and converts the collected sound into electrical signals. Microphone 208 outputs to CPU 201 audio data converted from audio into electrical signals.

シリアルインターフェース２１０は、外部の装置とシリアル通信するためのインターフェースである。ここでは、シリアル通信は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）規格である。シリアルインターフェース２１０は、ＵＳＢ規格で通信可能な外部装置が接続可能である。ＣＰＵ１１１は、シリアルインターフェース２１０を介して外部装置にアクセス可能である。外部装置は、ＵＳＢメモリ２１１、ＣＤドライブ等の記憶装置を含む。ここでは、外部装置をＵＳＢメモリ２１１とする場合を例に説明する。ＵＳＢメモリ２１１は、ＥＰＲＯＭなどの半導体メモリと、シリアル通信回路と、を備えている。ＣＰＵ２０１は、シリアルインターフェース２１０に装着されたＵＳＢメモリ２１１に記録されたプログラムをＲＡＭ２０３にロードして実行する。 A serial interface 210 is an interface for serial communication with an external device. Here, the serial communication is the USB (Universal Serial Bus) standard. The serial interface 210 can be connected to an external device that can communicate according to the USB standard. The CPU 111 can access external devices via the serial interface 210 . The external device includes storage devices such as a USB memory 211 and a CD drive. Here, a case where the external device is the USB memory 211 will be described as an example. The USB memory 211 includes a semiconductor memory such as EPROM and a serial communication circuit. The CPU 201 loads a program recorded in the USB memory 211 attached to the serial interface 210 into the RAM 203 and executes it.

なお、ＣＰＵ２０１が実行するプログラムを記憶する媒体としては、ＵＳＢメモリ２１１に限られず、光ディスク（ＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲＯＭ）、ＭＯ（ＭａｇｎｅｔｉｃＯｐｔｉｃａｌＤｉｓｃ）／ＭＤ（ＭｉｎｉＤｉｓｃ）／ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ））、光カード、マスクＲＯＭ、であってもよい。ここでいうプログラムは、ＣＰＵ２０１により直接実行可能なプログラムだけでなく、ソースプログラム、圧縮処理されたプログラム、暗号化されたプログラム等を含む。 The medium for storing the program executed by the CPU 201 is not limited to the USB memory 211, and may be an optical disk (CD-ROM (Compact Disk ROM), MO (Magnetic Optical Disc)/MD (Mini Disc)/DVD (Digital Versatile Disc). )), optical card, mask ROM. The programs here include not only programs that can be directly executed by the CPU 201, but also source programs, compressed programs, encrypted programs, and the like.

図３は、第１の実施の形態におけるＭＦＰのハードウェア構成の概要を示すブロック図である。図３を参照して、ＭＦＰ１００は、メイン回路１１０と、原稿を読み取るための原稿読取部１３０と、原稿を原稿読取部１３０に搬送するための自動原稿搬送装置１２０と、原稿読取部１３０が原稿を読み取って出力する画像データに基づいて用紙等に画像を形成するための画像形成部１４０と、画像形成部１４０に用紙を供給するための給紙部１５０と、画像が形成された用紙を処理する後処理部１５５と、ユーザーインターフェースとしての操作パネル１６０とを含む。 FIG. 3 is a block diagram outlining the hardware configuration of the MFP according to the first embodiment. Referring to FIG. 3, MFP 100 includes main circuit 110, document reading unit 130 for reading a document, automatic document feeder 120 for conveying a document to document reading unit 130, and document reading unit 130. image forming unit 140 for forming an image on paper or the like based on image data read and output, a paper feeding unit 150 for supplying paper to the image forming unit 140, and processing the paper on which the image is formed. and an operation panel 160 as a user interface.

後処理部１５５は、画像形成部１４０により画像が形成された１以上の用紙を並び替えて排紙するソート処理、パンチ穴加工するパンチ処理、ステープル針を打ち込むステープル処理を実行する。 The post-processing unit 155 executes sorting processing for rearranging and discharging one or more sheets on which images have been formed by the image forming unit 140, punching processing for punching holes, and stapling processing for driving staples.

メイン回路１１０は、ＣＰＵ１１１と、通信インターフェース（Ｉ／Ｆ）部１１２と、ＲＯＭ１１３と、ＲＡＭ１１４と、大容量記憶装置としてのハードディスクドライブ（ＨＤＤ）１１５と、ファクシミリ部１１６と、ＣＤ－ＲＯＭ１１８が装着される外部記憶装置１１７と、を含む。ＣＰＵ１１１は、自動原稿搬送装置１２０、原稿読取部１３０、画像形成部１４０、給紙部１５０、後処理部１５５および操作パネル１６０と接続され、ＭＦＰ１００の全体を制御する。 The main circuit 110 includes a CPU 111, a communication interface (I/F) section 112, a ROM 113, a RAM 114, a hard disk drive (HDD) 115 as a mass storage device, a facsimile section 116, and a CD-ROM 118. and an external storage device 117 . CPU 111 is connected to automatic document feeder 120 , document reading portion 130 , image forming portion 140 , paper feeding portion 150 , post-processing portion 155 and operation panel 160 , and controls MFP 100 as a whole.

ＲＯＭ１１３は、ＣＰＵ１１１が実行するプログラム、またはそのプログラムを実行するために必要なデータを記憶する。ＲＡＭ１１４は、ＣＰＵ１１１がプログラムを実行する際の作業領域として用いられる。さらに、ＲＡＭ１１４は、原稿読取部１３０から連続的に送られてくる画像データを一時的に記憶する。 ROM 113 stores a program executed by CPU 111 or data necessary for executing the program. The RAM 114 is used as a work area when the CPU 111 executes programs. Further, RAM 114 temporarily stores image data continuously sent from document reading unit 130 .

通信Ｉ／Ｆ部１１２は、ＭＦＰ１００をネットワーク３に接続するためのインターフェースである。ＣＰＵ１１１は、通信Ｉ／Ｆ部１１２を介して、スマートスピーカー２００との間で通信し、データを送受信する。また、通信Ｉ／Ｆ部１１２は、ネットワーク３を介してインターネット５に接続されたコンピューターと通信が可能である。 Communication I/F unit 112 is an interface for connecting MFP 100 to network 3 . CPU 111 communicates with smart speaker 200 via communication I/F unit 112 to transmit and receive data. Also, the communication I/F unit 112 can communicate with a computer connected to the Internet 5 via the network 3 .

ファクシミリ部１１６は、公衆交換電話網（ＰＳＴＮ）に接続され、ＰＳＴＮにファクシミリデータを送信する、またはＰＳＴＮからファクシミリデータを受信する。ファクシミリ部１１６は、受信したファクシミリデータを、ＨＤＤ１１５に記憶する、または画像形成部１４０に出力する。画像形成部１４０は、ファクシミリ部１１６により受信されたファクシミリデータを用紙にプリントする。また、ファクシミリ部１１６は、ＨＤＤ１１５に記憶されたデータをファクシミリデータに変換して、ＰＳＴＮに接続されたファクシミリ装置に送信する。 Facsimile unit 116 is connected to the public switched telephone network (PSTN) and transmits facsimile data to or receives facsimile data from the PSTN. Facsimile unit 116 stores the received facsimile data in HDD 115 or outputs it to image forming unit 140 . Image forming unit 140 prints the facsimile data received by facsimile unit 116 on paper. Further, facsimile section 116 converts data stored in HDD 115 into facsimile data and transmits the facsimile data to a facsimile machine connected to the PSTN.

外部記憶装置１１７は、ＣＤ－ＲＯＭ１１８が装着される。ＣＰＵ１１１は、外部記憶装置１１７を介してＣＤ－ＲＯＭ１１８にアクセス可能である。ＣＰＵ１１１は、外部記憶装置１１７に装着されたＣＤ－ＲＯＭ１１８に記録されたプログラムをＲＡＭ１１４にロードして実行する。なお、ＣＰＵ１１１が実行するプログラムを記憶する媒体としては、ＣＤ－ＲＯＭ１１８に限られず、光ディスク、ＩＣカード、光カード、マスクＲＯＭ、ＥＰＲＯＭなどの半導体メモリであってもよい。 A CD-ROM 118 is attached to the external storage device 117 . CPU 111 can access CD-ROM 118 via external storage device 117 . CPU 111 loads a program recorded on CD-ROM 118 attached to external storage device 117 into RAM 114 and executes the program. The medium for storing the program executed by the CPU 111 is not limited to the CD-ROM 118, and may be semiconductor memory such as an optical disk, an IC card, an optical card, a mask ROM, or an EPROM.

また、ＣＰＵ１１１が実行するプログラムは、ＣＤ－ＲＯＭ１１８に記録されたプログラムに限られず、ＨＤＤ１１５に記憶されたプログラムをＲＡＭ１１４にロードして実行するようにしてもよい。この場合、ネットワーク３に接続された他のコンピューターが、ＭＦＰ１００のＨＤＤ１１５に記憶されたプログラムを書き換える、または、新たなプログラムを追加して書き込むようにしてもよい。さらに、ＭＦＰ１００が、ネットワーク３に接続された他のコンピューターからプログラムをダウンロードして、そのプログラムをＨＤＤ１１５に記憶するようにしてもよい。ここでいうプログラムは、ＣＰＵ１１１が直接実行可能なプログラムだけでなく、ソースプログラム、圧縮処理されたプログラム、暗号化されたプログラム等を含む。 Further, the program executed by CPU 111 is not limited to the program recorded on CD-ROM 118, and a program stored in HDD 115 may be loaded into RAM 114 and executed. In this case, another computer connected to network 3 may rewrite the program stored in HDD 115 of MFP 100 or add and write a new program. Furthermore, MFP 100 may download a program from another computer connected to network 3 and store the program in HDD 115 . The programs here include not only programs directly executable by the CPU 111, but also source programs, compressed programs, encrypted programs, and the like.

操作パネル１６０は、ＭＦＰ１００の上面に設けられ、表示部１６１と操作部１６３とを含む。表示部１６１は、例えば、液晶表示装置（ＬＣＤ）または有機ＬＥ（ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイであり、ユーザーに対する指示メニューや取得した画像データに関する情報等を表示する。操作部１６３は、タッチパネル１６５と、ハードキー部１６７とを含む。タッチパネル１６５は、表示部１６１の上面または下面に表示部１６１に重畳して設けられる。ハードキー部１６７は、複数のハードキーを含む。ハードキーは、例えば接点スイッチである。タッチパネル１６５は、表示部１６１の表示面中でユーザーにより指示された位置を検出する。 Operation panel 160 is provided on the upper surface of MFP 100 and includes display portion 161 and operation portion 163 . The display unit 161 is, for example, a liquid crystal display (LCD) or an organic LE (electroluminescence) display, and displays an instruction menu for the user, information about acquired image data, and the like. Operation portion 163 includes a touch panel 165 and a hard key portion 167 . Touch panel 165 is provided on the upper surface or the lower surface of display unit 161 so as to overlap display unit 161 . Hard key portion 167 includes a plurality of hard keys. A hard key is, for example, a contact switch. Touch panel 165 detects a position designated by the user on the display surface of display unit 161 .

本実施の形態における音声処理システム１においては、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃが互いに異なる位置に配置される。このため、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれは、異なるユーザーが発声する音声を収集する。ここでは、ユーザーＡとユーザーＢとが電話で会話する場合に、ユーザーＡが発声する音声がスマートスピーカー２００により収集され、ユーザーＢが発声する音声がスマートスピーカー２００Ｂにより収集される場合を例に説明する。 In speech processing system 1 according to the present embodiment, smart speakers 200, 200A, 200B, and 200C are arranged at different positions. Therefore, smart speakers 200, 200A, 200B, and 200C each collect voices uttered by different users. Here, when user A and user B talk over the phone, the smart speaker 200 collects the voice uttered by user A, and the smart speaker 200B collects the voice uttered by user B. do.

図４は、第１の実施の形態におけるスマートスピーカー２００が備えるＣＰＵ２０１が有する機能の一例を示すブロック図である。図４に示す機能は、ハードウェアで実現してもよいし、スマートスピーカー２００が備えるＣＰＵ２０１に、ＲＯＭ２０２、ＥＰＲＯＭ２０４またはＣＤ－ＲＯＭに記憶されたプログラムを実行させることにより、ＣＰＵ２０１で実現してもよい。ここで、スマートスピーカー２００が備えるＣＰＵ２０１にジョブ生成プログラムを実行させる場合を例に説明する。 FIG. 4 is a block diagram showing an example of functions of the CPU 201 included in the smart speaker 200 according to the first embodiment. The functions shown in FIG. 4 may be implemented by hardware, or may be implemented by the CPU 201 of the smart speaker 200 by causing the CPU 201 to execute a program stored in the ROM 202, EPROM 204, or CD-ROM. . Here, a case of causing the CPU 201 included in the smart speaker 200 to execute the job generation program will be described as an example.

図４を参照して、スマートスピーカー２００が備えるＣＰＵ２０１は、音声受付部２５１と、音声認識部２５３と、ユーザー特定部２５５と、ジョブ生成部２５７と、ジョブ送信部２５９と、応答部２６１と、通話者決定部２６３と、装置決定部２６５と、音声情報取得部２６７と、を含む。音声受付部２５１は、マイクロフォン２０８が出力する音声データを受け付ける。音声データは、ユーザーが発声している間だけマイクロフォン２０８から入力される。音声受付部２５１は、マイクロフォン２０８から入力される音声データと、その音声データが入力された時刻を示す時刻情報とを音声認識部２５３およびユーザー特定部２５５に出力する。 Referring to FIG. 4 , CPU 201 included in smart speaker 200 includes voice reception unit 251 , voice recognition unit 253 , user identification unit 255 , job generation unit 257 , job transmission unit 259 , response unit 261 , A caller determination unit 263 , a device determination unit 265 , and a voice information acquisition unit 267 are included. Audio reception unit 251 receives audio data output from microphone 208 . Voice data is input from the microphone 208 only while the user is speaking. Voice accepting portion 251 outputs voice data input from microphone 208 and time information indicating the time when the voice data was input to voice recognition portion 253 and user specifying portion 255 .

音声認識部２５３は、音声受付部２５１から音声データと時刻情報とが入力されるごとに、音声データで特定される音声を認識する。具体的には、音声認識部２５３は、音声データを文字で構成される音声情報に変換する。音声を認識する技術は公知なので、ここでは説明を繰り返さない。音声認識部２５３は、音声データを変換した音声情報と時刻情報との組をジョブ生成部２５７、応答部２６１および通話者決定部２６３に出力する。 Voice recognition unit 253 recognizes the voice specified by the voice data each time voice data and time information are input from voice reception unit 251 . Specifically, the voice recognition unit 253 converts voice data into voice information composed of characters. Techniques for recognizing speech are well known and will not be described again here. Voice recognition unit 253 outputs a set of voice information obtained by converting voice data and time information to job generation unit 257 , response unit 261 and caller determination unit 263 .

ユーザー特定部２５５は、音声受付部２５１から音声データと時刻情報とが入力されるごとに、音声データで特定される音声を発声したユーザーを特定する。具体的には、ユーザー特定部２５５は、サーバー４００に音声データから抽出される声紋を送信し、サーバー４００にユーザーの特定を依頼する。なお、音声を発声したユーザーを特定する技術は公知なので、ここでは説明を繰り返さない。サーバー４００は、音声データのユーザーを特定する場合、そのユーザーを識別するためのユーザー識別情報を返信するので、ユーザー特定部２５５は、サーバー４００により特定されたユーザーのユーザー識別情報と時刻情報との組をジョブ生成部２５７および応答部２６１に出力する。ここでは、ユーザー特定部２５５は、ユーザーＡを特定する。なお、スマートスピーカー２００が認証機能を有する場合には、ユーザー特定部２５５が音声データの音声を発声したユーザーを特定してもよい。 User specifying unit 255 specifies the user who has uttered the voice specified by the voice data each time voice data and time information are input from voice accepting unit 251 . Specifically, user identification unit 255 transmits the voiceprint extracted from the voice data to server 400 and requests server 400 to identify the user. Note that the technology for specifying the user who has uttered the voice is well known, so the description will not be repeated here. When server 400 identifies the user of the voice data, server 400 returns the user identification information for identifying the user. The set is output to job generation unit 257 and response unit 261 . Here, the user identification unit 255 identifies user A. FIG. Note that if the smart speaker 200 has an authentication function, the user identification unit 255 may identify the user who uttered the voice of the audio data.

ユーザー特定部２５５により特定されるユーザーと、他のスマートスピーカー２００Ａ，２００Ｂ，２００Ｃのいずれかにより特定されるユーザーとが同一の場合がある。例えば、スマートスピーカー２００とスマートスピーカー２００Ａとの距離が所定の距離以下の場合、スマートスピーカー２００とスマートスピーカー２００Ａとが一人のユーザーが発声した音声を同時に収集する場合がある。この場合、スマートスピーカー２００とスマートスピーカー２００Ａそれぞれは、サーバー４００から調停指示を受信する。サーバー４００がスマートスピーカー２００に送信する調停指示は、スマートスピーカー２００Ａの装置識別情報を含み、サーバー４００がスマートスピーカー２００Ａに送信する調停指示は、スマートスピーカー２００の装置識別情報を含む。 The user identified by user identification unit 255 and the user identified by any of smart speakers 200A, 200B, and 200C may be the same. For example, when the distance between smart speaker 200 and smart speaker 200A is less than a predetermined distance, smart speaker 200 and smart speaker 200A may simultaneously collect voices uttered by one user. In this case, smart speaker 200 and smart speaker 200A each receive an arbitration instruction from server 400 . The arbitration instruction sent by the server 400 to the smart speaker 200 includes the device identification information of the smart speaker 200A, and the arbitration instruction sent by the server 400 to the smart speaker 200A includes the device identification information of the smart speaker 200.

スマートスピーカー２００とスマートスピーカー２００Ａそれぞれは、収集された音声の音量が最大の装置でユーザーを特定するようにする。具体的には、ユーザー特定部２５５は、スマートスピーカー２００Ａからスマートスピーカー２００Ａで収集された音声の音量を取得し、自装置で収集された音声の音量と比較する。ユーザー特定部２５５は、自装置の音量がスマートスピーカー２００Ａから取得された音量より大きければユーザーを特定するが、自装置の音量がスマートスピーカー２００Ａから取得された音量より小さければユーザーを特定しない。また、ユーザー特定部２５５は、自装置の音量がスマートスピーカー２００Ａから取得された音量と同じ場合、スマートスピーカー２００Ａとの間でいずれか一方でユーザーを特定するようにする。例えば、ユーザー特定部２５５は、自装置がユーザーを特定する場合には、スマートスピーカー２００Ａにユーザーの特定を禁止する禁止信号を送信し、自装置がユーザーを特定する前にスマートスピーカー２００Ａからに禁止信号を受信する場合にユーザーを特定しない。 Smart speaker 200 and smart speaker 200A each attempt to identify the user on the device with the loudest collected voice. Specifically, the user identifying unit 255 acquires the volume of the voice collected by the smart speaker 200A from the smart speaker 200A, and compares it with the volume of the voice collected by the own device. The user identifying unit 255 identifies the user if the volume of its own device is higher than the volume obtained from the smart speaker 200A, but does not identify the user if the volume of its own device is lower than the volume obtained from the smart speaker 200A. Further, when the volume of the device itself is the same as the volume acquired from smart speaker 200A, user identification unit 255 identifies the user from either one of smart speaker 200A. For example, when the user identification unit 255 identifies the user, the user identification unit 255 transmits a prohibition signal to the smart speaker 200A to prohibit identification of the user, and the smart speaker 200A prohibits the identification of the user before the device identifies the user. Do not identify the user when receiving a signal.

通話者決定部２７９は、音声認識部２５３から入力される音声情報に基づいて、音声を発したユーザーが会話する相手方のユーザーを通話者として決定する。通常、ユーザーが電話等で会話する場合、相手方を確認するなどのために通話の相手方の名前、呼称等を発声する。通話者決定部２７９は、音声情報から予め登録されたユーザーの名前または呼称が抽出される場合、抽出された名前または呼称のユーザーを通話者に決定する。通話者決定部２７９は、通話者に決定されたユーザーのユーザー識別情報を装置決定部２６５に出力する。ここでは、ユーザーＡがユーザーＢの名前を発声する場合を例に説明する。 Caller determination unit 279 determines, as the caller, the user with whom the user who has made the voice has a conversation, based on the voice information input from voice recognition unit 253 . Normally, when a user talks on the phone or the like, he/she utters the name, designation, etc. of the other party of the call in order to confirm the other party. When the name or appellation of a pre-registered user is extracted from the voice information, caller determination unit 279 determines the user with the extracted name or appellation as the caller. Caller determination section 279 outputs the user identification information of the user determined as the caller to device determination section 265 . Here, a case where user A utters user B's name will be described as an example.

装置決定部２６５は、他のスマートスピーカー２００Ａ，１００Ｂ，２００Ｃのうちから通話者の音声を集音する装置をペアリング装置に決定する。通話者のユーザー識別情報を含む音声を収集した装置が第１装置であり、通話者の音声を集音する装置が第２装置である。例えば、装置決定部２６５は、サーバー４００に通話者の音声を収集する装置を問い合わせることにより、通話者の音声を収集する装置を特定する。サーバー４００への問い合わせは、通話者のユーザー識別情報を含む。サーバー４００がスマートスピーカー２００，２００Ａ，１００Ｂ，２００Ｃそれぞれから受信される声紋に基づいて、スマートスピーカー２００，２００Ａ，１００Ｂ，２００Ｃそれぞれに対してユーザーを特定している。このため、サーバー４００は、スマートスピーカー２００，２００Ａ，１００Ｂ，２００Ｃのうちで、ユーザー識別情報で識別されるユーザーが発声する音声を収集している装置を特定することができる。装置決定部２６５は、サーバー４００により特定された装置をペアリング装置に決定する。装置決定部２６５は、ペアリング装置を識別するための装置識別情報を音声情報取得部２６７に出力する。ここでは、ユーザーＢの音声をスマートスピーカー２００Ｂが収集するので、装置決定部２６５は、スマートスピーカー２００Ｂをペアリング装置に決定する。装置決定部２６５は、ペアリング装置であるスマートスピーカー２００Ｂに音声情報の送信を依頼する。 The device determination unit 265 determines a device for collecting the caller's voice from among the other smart speakers 200A, 100B, and 200C as the pairing device. The device that collects the voice including the user identification information of the caller is the first device, and the device that collects the voice of the caller is the second device. For example, the device determining unit 265 identifies a device for collecting the voice of the caller by inquiring of the server 400 about the device for collecting the voice of the caller. The query to server 400 includes the caller's user identification information. Server 400 identifies users to smart speakers 200, 200A, 100B, and 200C based on voiceprints received from smart speakers 200, 200A, 100B, and 200C, respectively. Therefore, server 400 can identify a device among smart speakers 200, 200A, 100B, and 200C that collects the voice uttered by the user identified by the user identification information. Device determination unit 265 determines the device identified by server 400 as a pairing device. Device determination section 265 outputs device identification information for identifying a pairing device to audio information acquisition section 267 . Here, since the smart speaker 200B collects the voice of the user B, the device determination unit 265 determines the smart speaker 200B as the pairing device. Device determination unit 265 requests smart speaker 200B, which is a pairing device, to transmit audio information.

応答部２６１は、自装置がペアリング装置の場合に機能する。換言すれば、応答部２６１は、通信部２０５が他のスマートスピーカー２００Ａ，２００Ｂ，２００Ｃのいずれかから音声情報の送信が依頼される場合に機能する。応答部２６１は、装置決定部２６５が音声認識部２５３から音声情報と時刻情報との組が入力され、ユーザー特定部２５５からユーザー識別情報と時刻情報との組が入力される。応答部２６１は、通信部２０５が他のスマートスピーカー２００Ａ，２００Ｂ，２００Ｃのいずれかから音声情報の送信が依頼される場合、その後に、時刻情報と、その時刻情報と組になる音声情報と、その時刻情報と組になるユーザー識別情報とを、スマートスピーカー２００Ａ，２００Ｂ，２００Ｃのうち音声情報の送信を依頼してきた装置に送信する。 The response unit 261 functions when its own device is a pairing device. In other words, response unit 261 functions when communication unit 205 is requested to transmit voice information from any of smart speakers 200A, 200B, and 200C. In response unit 261 , device determination unit 265 receives a set of voice information and time information from speech recognition unit 253 and receives a set of user identification information and time information from user identification unit 255 . When the communication unit 205 is requested to transmit audio information from any of the other smart speakers 200A, 200B, and 200C, the response unit 261 then transmits the time information, the audio information paired with the time information, The user identification information paired with the time information is transmitted to one of the smart speakers 200A, 200B, and 200C that has requested transmission of the audio information.

なお、装置決定部２６５は、通信部２０５を制御して、スマートスピーカー２００Ａ，１００Ｂ，２００Ｃそれぞれに問合せコマンドを送信し、スマートスピーカー２００Ａ，１００Ｂ，２００Ｃのいずれかから応答を受信する場合に、応答してきた装置をペアリング装置に決定してもよい。問合せコマンドは、通話者のユーザー識別情報を含む。 Note that the device determination unit 265 controls the communication unit 205 to transmit an inquiry command to each of the smart speakers 200A, 100B, and 200C. The device that has received the call may be determined as the pairing device. The inquiry command includes the user identification information of the caller.

応答部２６１は、通信部２０５が他のスマートスピーカー２００Ａ，２００Ｂ，２００Ｃのいずれかから問合せコマンドを受信する場合に機能する。応答部２６１は、装置決定部２６５が音声認識部２５３から音声情報と時刻情報との組が入力され、ユーザー特定部２５５からユーザー識別情報と時刻情報との組が入力される。応答部２６１は、通信部２０５が他のスマートスピーカー２００Ａ，２００Ｂ，２００Ｃのいずれかから問合せコマンドを受信する場合、問合せコマンドに含まれるユーザー識別情報とユーザー特定部２５５から入力されるユーザー識別情報とが一致すれば、問合せコマンドを送信してきた装置に応答する。応答部２６１は、問合せコマンドに応答する場合、その後に、時刻情報と、その時刻情報と組になる音声情報と、その時刻情報と組になるユーザー識別情報とを問合せコマンドを送信してきた装置に送信する。 The response unit 261 functions when the communication unit 205 receives an inquiry command from any of the other smart speakers 200A, 200B, 200C. In response unit 261 , device determination unit 265 receives a set of voice information and time information from speech recognition unit 253 and receives a set of user identification information and time information from user identification unit 255 . When communication unit 205 receives an inquiry command from any of smart speakers 200A, 200B, and 200C, response unit 261 receives the user identification information included in the inquiry command and the user identification information input from user identification unit 255. match, it responds to the device that sent the inquiry command. When responding to an inquiry command, the response unit 261 then transmits the time information, the voice information paired with the time information, and the user identification information paired with the time information to the device that has sent the inquiry command. Send.

ジョブ生成部２５７は、音声認識部２５３から音声情報と時刻情報との組が入力され、ユーザー特定部２５５からユーザー識別情報と時刻情報との組が入力される。ジョブ生成部２５７は、音声情報と時刻情報との組の複数とユーザー識別情報と時刻情報との組の複数とが入力される場合がある。ジョブ生成部２５７は、複数の音声情報を区別するために時刻情報を用いる。 Job generation portion 257 receives a set of voice information and time information from voice recognition portion 253 and receives a set of user identification information and time information from user identification portion 255 . Job generation unit 257 may receive a plurality of sets of voice information and time information and a plurality of sets of user identification information and time information. Job generator 257 uses time information to distinguish between multiple pieces of audio information.

ジョブ生成部２５７は、それぞれと組になる時刻情報が同じユーザー識別情報と音声情報とを関連付ける。換言すれば、ジョブ生成部２５７は、音声とその音声を発声したユーザーとを関連付ける。具体的には、ジョブ生成部２５７は、音声から変換された音声情報と、その音声を発声したユーザーのユーザー識別情報と、音声が発声された時刻を示す時刻情報と、を含む音声レコードを生成し、音声レコードをＥＰＲＯＭ２０４に記憶された音声テーブルに追加する。 Job generation unit 257 associates the user identification information and the voice information that are paired with the same time information. In other words, job generation unit 257 associates the voice with the user who uttered the voice. Specifically, the job generation unit 257 generates a voice record including voice information converted from voice, user identification information of the user who uttered the voice, and time information indicating the time when the voice was uttered. and adds the voice record to the voice table stored in EPROM 204 .

音声情報取得部２６７は、装置決定部２６５から装置識別情報が入力された後、通信部２０５が装置識別情報で特定されるペアリング装置から受信される時刻情報と、音声情報とユーザー識別情報とを含む音声レコードを生成し、音声レコードをＥＰＲＯＭ２０４に記憶された音声テーブルに追加する。 After the device identification information is input from device determination portion 265, voice information acquisition portion 267 receives time information, voice information, and user identification information from communication portion 205 from the pairing device specified by the device identification information. and add the voice record to the voice table stored in EPROM 204 .

ここでは、ペアリング装置をスマートスピーカー２００Ｂとしているので、音声テーブルには、自装置であるスマートスピーカー２００で収集された音声に対応する音声レコードと、ペアリング装置であるスマートスピーカー２００Ｂで収集された音声に対応する音声レコードとが登録される。スマートスピーカー２００で収集された音声に対応する音声レコードは、ユーザーＡのユーザー識別情報を含み、スマートスピーカー２００Ｂで収集された音声に対応する音声レコードは、ユーザーＢのユーザー識別情報を含む。 Here, since the pairing device is the smart speaker 200B, the voice table contains the voice records corresponding to the voices collected by the smart speaker 200, which is the own device, and the voice records collected by the smart speaker 200B, which is the pairing device. A voice record corresponding to the voice is registered. The voice record corresponding to the voice collected by smart speaker 200 includes user A's user identification information, and the voice record corresponding to the voice collected by smart speaker 200B includes user B's user identification information.

ジョブ生成部２５７は、音声テーブルに登録された音声情報に基づいてＭＦＰ１００に実行させる処理とその処理を実行するための条件と定めたジョブを生成する。ジョブ生成部２５７は、処理決定部２７１と、通常ジョブ生成部２７３と、依頼ジョブ生成部２７５と、キーワード抽出部２７７と、を含む。 Job generation unit 257 generates a job that defines processing to be executed by MFP 100 and conditions for executing the processing based on the voice information registered in the voice table. Job generating portion 257 includes process determining portion 271 , normal job generating portion 273 , requested job generating portion 275 , and keyword extracting portion 277 .

処理決定部２７１は、音声情報に基づいて設定情報を決定する。設定情報は、ＭＦＰ１００に実行させる処理とその処理を実行するための条件とを示す情報である。処理決定部２７１は、音声情報がデータを識別するためのファイル名を含む場合、音声情報に含まれるファイル名で特定されるデータを処理の対象となるデータを示す設定情報に決定する。 The processing determination unit 271 determines setting information based on the audio information. The setting information is information indicating processing to be executed by MFP 100 and conditions for executing the processing. When the audio information includes a file name for identifying data, processing determination unit 271 determines the data specified by the file name included in the audio information as setting information indicating data to be processed.

キーワード抽出部２７７は、ＥＰＲＯＭ２０４に予め記憶されたキーワードテーブルを用いて、音声情報からキーワードを抽出する。キーワードテーブルは、設定情報にその設定情報に関連するキーワードを関連付ける。キーワードは、設定情報に関連する関連情報の１つである。 A keyword extraction unit 277 extracts keywords from the voice information using a keyword table pre-stored in the EPROM 204 . The keyword table associates configuration information with keywords associated with the configuration information. A keyword is one of related information related to setting information.

図５は、キーワードテーブルの一例を示す図である。キーワードテーブルは、キーワードと設定情報とを関連付ける複数のキーワードレコードを含む。設定情報は、ＭＦＰ１００に実行させる処理またはＭＦＰ１００が処理を実行する条件を定める。キーワードレコードは、キーワードの項目と設定情報の項目とを含む。キーワードレコードにおいて、キーワードの項目にはキーワードが設定され、設定情報の項目には処理の名称または／および処理を実行する条件が設定される。キーワードテーブルが、キーワードとＭＦＰ１００に実行させる処理とを関連付ける一例としては、キーワード「スキャン」を含むキーワードレコードは、キーワード「スキャン」に対してＭＦＰ１００に原稿を読み取らせるスキャン処理を関連付ける。キーワードテーブルは、１つのキーワードに対して処理と、その処理を実行するための条件とを関連付ける場合がある。例えば、キーワード「２ｉｎ１」を含むキーワードレコードは、キーワード「２ｉｎ１」に対して複数の画像を合成する合成処理と、合成処理を実行する条件として合成の元になる画像の数が２つであることを関連付ける。キーワードテーブルは、１つのキーワードに対して複数の処理を関連付ける場合がある。例えば、キーワード「コピー」を含むキーワードレコードは、キーワード「コピー」に対してコピー処理を関連付ける。コピー処理は、ＭＦＰ１００に原稿を読み取らせるスキャン処理と用紙に画像を形成するプリント処理とを含む。 FIG. 5 is a diagram showing an example of a keyword table. The keyword table includes multiple keyword records that associate keywords with setting information. The setting information defines the process to be executed by MFP 100 or the conditions under which MFP 100 executes the process. The keyword record includes keyword items and setting information items. In the keyword record, a keyword is set in the keyword field, and a process name and/or conditions for executing the process are set in the setting information field. As an example of how the keyword table associates a keyword with a process to be executed by MFP 100, a keyword record including the keyword "scan" associates the keyword "scan" with a scan process that causes MFP 100 to read a document. The keyword table may associate one keyword with a process and conditions for executing the process. For example, for a keyword record containing the keyword "2in1", the number of images to be used as the basis for synthesis must be two as a condition for performing synthesis processing for synthesizing a plurality of images for the keyword "2in1". associate. The keyword table may associate multiple processes with one keyword. For example, a keyword record containing the keyword "copy" associates a copy action with the keyword "copy". Copy processing includes scan processing for causing MFP 100 to read a document and print processing for forming an image on paper.

キーワードテーブルが、キーワードとＭＦＰ１００が処理を実行する条件を関連付ける一例としては、キーワード「カラー」に対してＭＦＰ１００が実行するプリント処理の条件としてフルカラーを関連付ける。また、別の一例として、キーワード「太郎」に対して、ＭＦＰ１００がデータを送信する送信処理を実行する条件としてユーザーに対して割り当てられた宛先情報である電子メールアドレス「ｔａｒｏ＠ａａａ．ｃｐｍ」を関連付ける。宛先情報は、電子メールアドレスの他に、ファクシミリ番号、ＩＰアドレスが用いられる。なお、キーワード「太郎」は予め登録されたユーザーの名称を示す。 As an example of how the keyword table associates a keyword with a condition for the MFP 100 to execute processing, the keyword "color" is associated with full color as a condition for print processing to be executed by the MFP 100 . As another example, for the keyword "Taro", the e-mail address "taro@aaa.cpm", which is the destination information assigned to the user as a condition for the MFP 100 to execute the data transmission process, is Associate. As the destination information, facsimile numbers and IP addresses are used in addition to e-mail addresses. Note that the keyword "Taro" indicates the name of a pre-registered user.

図４に戻って、キーワード抽出部２７７は、音声情報とキーワードテーブルに設定されている複数のキーワードとを比較し、音声情報の少なくとも一部と同一または類似する部分を含むキーワードが存在すれば、そのキーワードが設定されたキーワードレコードを抽出する。キーワード抽出部２７７は、抽出されたキーワードレコードに設定された設定情報を決定し、決定された設定情報を処理決定部２７１に出力する。 Returning to FIG. 4, the keyword extraction unit 277 compares the voice information with a plurality of keywords set in the keyword table. Extract the keyword record that the keyword is set. Keyword extraction portion 277 determines setting information set in the extracted keyword record, and outputs the determined setting information to processing determination portion 271 .

処理決定部２７１は、キーワード抽出部２７７により決定された設定情報に基づいて、ＭＦＰ１００に実行させる処理を定めたジョブを生成する。具体的には、処理決定部２７１は、設定情報で定まる処理を設定情報で定まる条件でＭＦＰ１００に実行させるジョブを生成する。 Process determination portion 271 generates a job defining a process to be executed by MFP 100 based on the setting information determined by keyword extraction portion 277 . Specifically, process determining portion 271 generates a job that causes MFP 100 to execute a process determined by the setting information under conditions determined by the setting information.

例えば、処理決定部２７１は、音声情報が「コピー」および「フルカラー」のキーワードを含む場合、キーワードテーブルによりキーワード「コピー」に関連付けられたスキャン処理およびプリント処理を定める設定情報を決定し、キーワード「フルカラー」に関連付けられたフルカラーでスキャン処理およびプリント処理を実行する条件を定める設定情報を決定する。処理決定部２７１は、原稿をフルカラーで読み取るスキャン処理と、スキャン処理が実行されて出力されるフルカラーの画像データの画像をフルカラーで用紙に形成するプリント処理とをＭＦＰ１００に実行させるジョブを決定する。 For example, when the audio information includes the keywords "copy" and "full color", the processing determination unit 271 determines setting information defining scan processing and print processing associated with the keyword "copy" from the keyword table, Determine setting information that defines conditions for executing scan processing and print processing in full color associated with "full color". Processing determination unit 271 determines a job that causes MFP 100 to execute a scan process of reading a document in full color and a print process of forming an image of full-color image data output by the scan process on paper in full color.

また、処理決定部２７１は、音声情報が「送る」および「Ｔａｒｏ」のキーワードを含む場合、キーワードテーブルによりキーワード「送る」に関連付けられたスキャン処理およびデータ送信処理が特定され、キーワードテーブルによりキーワード「Ｔａｒｏ」に関連付けられた宛先を示す設定情報が特定される。この場合、処理決定部２７１は、原稿をモノクロで読み取るスキャン処理と、スキャン処理が実行されて出力されるモノクロの画像データを、名称がＴａｒｏのユーザーに対して登録された電子メールアドレス宛ての電子メールに添付して送信するデータ送信処理と、をＭＦＰ１００に実行させるジョブを生成する。 Further, when the voice information includes the keywords "send" and "Taro", the process determination unit 271 specifies the scanning process and the data transmission process associated with the keyword "send" from the keyword table, and the keyword "send" from the keyword table. Setting information indicating a destination associated with "Taro" is specified. In this case, the processing determination unit 271 performs scanning processing for reading the document in monochrome, and sends the monochrome image data output after the scanning processing to an electronic mail address registered for the user whose name is Taro. A job is generated that causes MFP 100 to execute a data transmission process for attaching the data to an e-mail and transmitting the data.

なお、処理決定部２７１およびキーワード抽出部２７７がキーワードテーブルを用いて音声情報から設定情報の決定する例を説明したが、ニューラルネットワークなどを用いたディープラーニング技術を採用して、音声情報と設定情報との関係を学習したモデルを生成しておき、そのモデルを用いて音声情報から設定情報を決定してもよい。 Although an example in which the processing determining unit 271 and the keyword extracting unit 277 determine setting information from voice information using a keyword table has been described, deep learning technology using a neural network or the like is employed to obtain voice information and setting information. A model may be generated by learning the relationship between and, and the setting information may be determined from the voice information using the model.

処理決定部２７１は、ジョブを生成する場合、そのジョブを生成するために用いた１以上の音声情報とそれぞれ組になる１以上のユーザー識別情報を決定する。例えば、第１ユーザーのユーザー識別情報と組になる１以上の音声情報に基づいてジョブが生成される場合、第１ユーザーのユーザー識別情報を決定する。第１ユーザーのユーザー識別情報と組になる１以上の音声情報と第２ユーザーのユーザー識別情報と組になる１以上の音声情報とに基づいてジョブが生成される場合、第１ユーザーのユーザー識別情報と第２ユーザーのユーザー識別情報とを決定する。処理決定部２７１は、ジョブと、決定された１以上のユーザー識別情報との組を依頼ジョブ生成部２７５および通常ジョブ生成部２７３に出力する。ここでは、自装置であるスマートスピーカー２００が集音した音声を発声したユーザーＡが第１ユーザーであり、ペアリング装置であるスマートスピーカー２００Ｂが集音した音声を発声したユーザーＢが第２ユーザーである。 When generating a job, process determination unit 271 determines one or more pieces of user identification information paired with one or more pieces of audio information used to generate the job. For example, if the job is generated based on one or more voice information paired with the user identification of the first user, then the user identification of the first user is determined. If the job is generated based on the one or more voice information paired with the user identification information of the first user and the one or more voice information paired with the user identification information of the second user, the user identification of the first user determining the information and the user identification of the second user; Process determining portion 271 outputs a set of a job and one or more pieces of determined user identification information to requested job generating portion 275 and normal job generating portion 273 . Here, user A who uttered the sound collected by smart speaker 200, which is the device itself, is the first user, and user B who uttered the sound collected by smart speaker 200B, which is the pairing device, is the second user. be.

依頼ジョブ生成部２７５は、処理決定部２７１からジョブと、１以上のユーザー識別情報との組が入力され、依頼ジョブを生成する。依頼ジョブは、依頼者である第１ユーザーが許諾者である第２ユーザーに依頼した作業を第２ユーザーが実行するために、第２ユーザーがＭＦＰ１００に実行させるジョブである。換言すれば、依頼ジョブは、第２ユーザーによってＭＦＰ１００に実行が指示されるジョブである。このため、依頼ジョブは、処理決定部２７１から入力されるジョブに、ＭＦＰ１００がジョブにより定められた処理を実行するための開始条件として第２ユーザーによる指示が設定されたジョブである。開始条件として設定される指示は、認証されるために第２ユーザーが入力する操作を含む。 Requested job generating portion 275 receives a set of a job and one or more pieces of user identification information from process determining portion 271 and generates a requested job. The requested job is a job that the second user causes MFP 100 to execute in order for the second user to execute the work requested by the first user who is the requester to the second user who is the licensor. In other words, the requested job is a job whose execution is instructed to MFP 100 by the second user. Therefore, the requested job is a job input from process determining portion 271 in which an instruction by the second user is set as a start condition for MFP 100 to execute the process determined by the job. The instruction set as the start condition includes an operation input by the second user in order to be authenticated.

依頼ジョブ生成部２７５は、１以上の音声情報にそれぞれ含まれるユーザー識別情報が複数の場合であって、音声認識部２５３から入力される１以上の音声情報のうちに依頼文字列を含む音声情報が存在する場合に依頼ジョブを生成する。依頼文字列は、他人に作業を依頼する場合に用いる語を含む。例えば、依頼文字列は、「お願いします」、「して下さい」、「しろ」等を含む。依頼文字列は予め定めておけばよい。また、ＣＰＵ２０１にＭＦＰ１００を使用する複数のユーザー間の会話を学習させてＣＰＵ２０１が依頼文字列を決定してもよい。 Requested job generating portion 275 generates voice information including a requested character string among the one or more voice information input from voice recognition portion 253 when there is a plurality of pieces of user identification information included in each of the one or more voice information. Create a request job if exists. The request character string includes words used when requesting work to others. For example, the request character string includes "please", "please", "shiro", and the like. A request character string may be determined in advance. Alternatively, the CPU 201 may determine the request character string by having the CPU 201 learn conversations between a plurality of users using the MFP 100 .

ただし、依頼ジョブ生成部２７５は、音声認識部２５３から入力される１以上の音声情報のうちに許諾文字列を含む文字情報が存在しない場合は依頼ジョブを生成しない。許諾文字列は、他人からの依頼を受ける場合に用いる語を含む。許諾文字列は、「了解しました」、「わかりました」、「引き受けます」等を含む。許諾文字列は予め定めておけばよい。また、許諾文字列は、ＭＦＰ１００を使用する複数のユーザー間の会話を学習することにより決定されてもよい。 However, requested job generating portion 275 does not generate a requested job if character information including a permitted character string does not exist in one or more pieces of voice information input from voice recognition portion 253 . The authorization character string includes words used when receiving a request from another person. The permission string includes "understood", "understood", "accept", and the like. The permission character string may be determined in advance. Also, the permission string may be determined by learning conversations between multiple users of MFP 100 .

依頼ジョブ生成部２７５は、依頼文字列を含む音声情報が存在する場合、その音声情報と組になるユーザー識別情報を第１ユーザーのユーザー識別情報に特定する。ここでは、ユーザーＡがユーザーＢに電話で作業を依頼するので、ユーザーＡが第１ユーザーに決定される。また、依頼ジョブ生成部２７５は、許諾文字列を含む音声情報が存在する場合、その音声情報と組になるユーザー識別情報を第２ユーザーのユーザー識別情報に特定する。ここでは、ユーザーＢが第２ユーザーに決定される。依頼ジョブ生成部２７５は、処理決定部２７１から入力されるジョブに、ＭＦＰ１００がそのジョブにより定められた処理を実行するための開始条件として第２ユーザーによる指示を受け付けることを設定することにより、依頼ジョブを生成する。第２ユーザーによる指示の受け付けは、第２ユーザーが認証されることを含む。依頼ジョブ生成部２７５は、第１ユーザーのユーザー識別情報と依頼ジョブとの組をジョブ送信部２５９に出力する。 When there is voice information including the requested character string, requested job generation unit 275 specifies user identification information paired with the voice information as user identification information of the first user. Here, since user A calls user B to request work, user A is determined to be the first user. Further, when voice information including the permission character string exists, requested job generating portion 275 specifies user identification information paired with the voice information as user identification information of the second user. Here, user B is determined to be the second user. Requested job generation portion 275 sets, in the job input from process determination portion 271, acceptance of an instruction from the second user as a start condition for MFP 100 to execute the process defined by the job. Generate a job. Acceptance of the instruction by the second user includes the second user being authenticated. Requested job generating portion 275 outputs a set of the user identification information of the first user and the requested job to job transmitting portion 259 .

依頼ジョブ生成部２７５は、許諾文字列を含む音声情報が存在する場合に、その音声情報と組になる時刻情報で示される時刻よりも前の時刻を示す時刻情報と組になる音声情報を決定する。そして、依頼ジョブ生成部２７５は、決定された音声情報から依頼文字列を含む文字情報を抽出する。これにより、許諾文字列を含む音声情報が存在する場合に、依頼ジョブを生成すればよいので、音声情報のすべてからキーワードを抽出する必要がなく、負荷をできるだけ小さくすることができる。この場合は、許諾文字列を含む音声情報が存在することが確認された後にキーワードの抽出およびジョブの生成を開始するのが好ましい。 When there is voice information including the permission character string, the requested job generation unit 275 determines voice information paired with the time information indicating a time earlier than the time indicated by the time information paired with the voice information. do. Then, requested job generating section 275 extracts character information including the requested character string from the determined voice information. As a result, when there is voice information containing a permission character string, it is sufficient to generate a requested job, so there is no need to extract keywords from all of the voice information, and the load can be reduced as much as possible. In this case, it is preferable to start extracting keywords and generating jobs after confirming the presence of voice information containing the permitted character string.

依頼ジョブ生成部２７５は、処理決定部２７１からジョブが入力されかつ依頼ジョブを生成しない場合、通常ジョブ生成部２７３に通常生成指示を出力する。通常ジョブ生成部２７３は、処理決定部２７１からジョブと１以上のユーザー識別情報との組が入力される。通常ジョブ生成部２７３は、依頼ジョブ生成部２７５から通常生成指示が入力される場合、処理決定部２７１から入力されるジョブと１以上のユーザー識別情報とに基づいて通常ジョブを生成する。通常ジョブは、依頼ジョブ以外のジョブである。 When a job is input from process determining portion 271 and the requested job is not to be generated, requested job generating portion 275 outputs a normal generation instruction to normal job generating portion 273 . Normal job generation portion 273 receives a set of a job and one or more pieces of user identification information from process determination portion 271 . When a normal generation instruction is input from requested job generation portion 275, normal job generation portion 273 generates a normal job based on the job input from process determination portion 271 and one or more pieces of user identification information. A normal job is a job other than a requested job.

通常ジョブ生成部２７３は、処理決定部２７１から入力される１以上のユーザー識別情報のうち指示文字列を含む音声情報により関連付けられたユーザー識別情報のユーザーを、指示ユーザーに決定する。指示文字列は、処理の内容を指示する語であり。例えば、指示文字列は、「したい。」、「する。」等である。また、通常ジョブ生成部２７３は、処理決定部２７１から入力される１以上のユーザー識別情報のうち、キーワードの数が最大となるユーザー識別情報で特定されるユーザーを指示ユーザーに決定してもよい。この場合、処理決定部２７１がジョブを生成する際に用いた複数のキーワードをそれぞれ含む複数の音声情報を用いて、ユーザー識別情報ごとにそれに対応するキーワードを集計し、ユーザー識別情報に対するキーワードの数を求めるようにすればよい。通常ジョブ生成部２７３は、処理決定部２７１から入力されるジョブに、ＭＦＰ１００がジョブにより定められた処理を実行するための開始条件として指示ユーザーによる指示を設定することにより、通常ジョブを生成する。指示ユーザーによる指示の受け付けは、指示ユーザーが認証されることを含む。通常ジョブ生成部２７３は、通常ジョブをジョブ送信部２５９に出力する。これにより、ジョブ送信部２５９により通常ジョブがＭＦＰ１００に送信される。ＭＦＰ１００においては、スマートスピーカー２００から通常ジョブを受信すると通常ジョブをＨＤＤ１１５に記憶し、指示ユーザーが操作部１６３を操作すると通常ジョブを実行可能な状態に設定する。 Normal job generation portion 273 determines, as an instruction user, a user whose user identification information is associated by voice information including an instruction character string among the one or more pieces of user identification information input from process determination portion 271 . The instruction string is a word that indicates the content of the processing. For example, the instruction string is "I want to do it", "I want to do it", and the like. In addition, normal job generation portion 273 may determine, as the designated user, the user identified by the user identification information having the largest number of keywords among the one or more pieces of user identification information input from process determination portion 271. . In this case, using a plurality of voice information each including a plurality of keywords used when the processing determination unit 271 generates a job, the keywords corresponding to each user identification information are aggregated, and the number of keywords corresponding to the user identification information is calculated. should be sought. Normal job generation portion 273 generates a normal job by setting an instruction by the instructing user as a start condition for MFP 100 to execute a process determined by the job in the job input from process determination portion 271 . Acceptance of the instruction by the instruction user includes authentication of the instruction user. Normal job generating portion 273 outputs the normal job to job transmitting portion 259 . As a result, job transmitting portion 259 transmits the normal job to MFP 100 . In MFP 100 , when a normal job is received from smart speaker 200 , the normal job is stored in HDD 115 , and when the instructing user operates operation unit 163 , the normal job is set to an executable state.

ジョブ送信部２５９は、ＭＦＰ１００にジョブを送信する。ジョブ送信部２５９は、依頼ジョブ生成部２７５から依頼ジョブと第１ユーザーのユーザー識別情報との組が入力される場合、依頼ジョブと第１ユーザーのユーザー識別情報との組をＭＦＰ１００に送信する。また、ジョブ送信部２５９は、通常ジョブ生成部２７３から通常ジョブが入力される場合、通常ジョブをＭＦＰ１００に送信する。 Job transmission portion 259 transmits a job to MFP 100 . When a set of the requested job and the first user's user identification information is input from requested job generating portion 275 , job transmitting portion 259 transmits the set of the requested job and the first user's user identification information to MFP 100 . Further, when a normal job is input from normal job generating portion 273 , job transmitting portion 259 transmits the normal job to MFP 100 .

ジョブ送信部２５９は、操作ユーザー通知部２８１を含む。操作ユーザー通知部２８１は、依頼ジョブと第１ユーザーのユーザー識別情報との組がＭＦＰ１００に送信される場合、第２ユーザーに依頼ジョブがＭＦＰ１００で実行可能なことを通知する。第２ユーザーは、依頼ジョブで開始条件として設定されている第２ユーザーのユーザー識別情報によって特定される。例えば、操作ユーザー通知部２８１は、通信部２０５を制御して、ＭＦＰ１００に実行可能な依頼ジョブが存在することを示すメッセージを含む電子メールを、第２ユーザーに送信する。第２ユーザーの電子メールアドレスは予め記憶されている。なお、通知方法は、電子メールに限らず、メッセージ送信であってもよい。これにより、第２ユーザーに、ＭＦＰ１００に依頼ジョブを実行させるための操作を入力すればよいことを通知することができる。 Job transmission portion 259 includes an operating user notification portion 281 . When a set of the requested job and the user identification information of the first user is transmitted to MFP 100 , operating user notification portion 281 notifies the second user that the requested job can be executed by MFP 100 . The second user is specified by the user identification information of the second user set as the start condition in the requested job. For example, operating user notification unit 281 controls communication unit 205 to send an e-mail including a message indicating that MFP 100 has an executable requested job to the second user. The e-mail address of the second user is pre-stored. Note that the notification method is not limited to e-mail and may be message transmission. Thus, the second user can be notified that he or she should input an operation for causing MFP 100 to execute the requested job.

図６は、第１の実施の形態におけるＭＦＰ１００が備えるＣＰＵ１１１が有する機能の一例を示すブロック図である。図６に示す機能は、ハードウェアで実現してもよいし、ＭＦＰ１００が備えるＣＰＵ１１１が、ＲＯＭ１１３、ＨＤＤ１１５、ＣＤ－ＲＯＭ１１８に記憶されたジョブ制御プログラムを実行することにより、ＣＰＵ１１１により実現される機能である。ジョブ制御プログラムは、ジョブ生成プログラムの一部である。図６を参照して、ＣＰＵ１１１は、操作ユーザー特定部５１と、設定部５３と、ジョブ制御部５５と、依頼者通知部５７と、を含む。 FIG. 6 is a block diagram showing an example of functions of CPU 111 of MFP 100 according to the first embodiment. The functions shown in FIG. 6 may be realized by hardware, or are functions realized by CPU 111 provided in MFP 100 by executing a job control program stored in ROM 113, HDD 115, and CD-ROM 118. be. The Job Control Program is part of the Job Generator Program. Referring to FIG. 6 , CPU 111 includes an operating user specifying portion 51 , a setting portion 53 , a job control portion 55 and a client notifying portion 57 .

ジョブ制御部５５は、ジョブ受信部８１と、関連付部８３と、ジョブ実行部８５と、を含む。ジョブ受信部８１は、通信Ｉ／Ｆ部１１２を制御し、スマートスピーカー２００が送信するジョブを受信する。ジョブ受信部８１は、スマートスピーカー２００から依頼ジョブと第１ユーザーのユーザー識別情報との組を受信する場合、依頼ジョブをＨＤＤ１１５に記憶する。また、ジョブ受信部８１は、スマートスピーカー２００から通常ジョブを受信する場合、通常ジョブをＨＤＤ１１５に記憶する。 Job control portion 55 includes a job reception portion 81 , an association portion 83 , and a job execution portion 85 . Job receiving portion 81 controls communication I/F portion 112 and receives a job transmitted from smart speaker 200 . When job receiving portion 81 receives a set of a requested job and the user identification information of the first user from smart speaker 200 , job receiving portion 81 stores the requested job in HDD 115 . In addition, when job receiving portion 81 receives a normal job from smart speaker 200 , job receiving portion 81 stores the normal job in HDD 115 .

関連付部８３は、通信Ｉ／Ｆ部１１２がスマートスピーカー２００から依頼ジョブと第１ユーザーのユーザー識別情報との組を受信する場合、依頼ジョブと第１ユーザーとを関連付ける。具体的には、関連付部８３は、ＨＤＤ１１５に記憶された依頼ジョブを識別するためのジョブ識別情報と第１ユーザーのユーザー識別情報とを含むユーザーレコードを生成し、ＨＤＤ１１５に記憶する。 Association unit 83 associates the requested job with the first user when communication I/F unit 112 receives a set of the requested job and the first user's user identification information from smart speaker 200 . Specifically, association unit 83 generates a user record including job identification information for identifying the requested job stored in HDD 115 and user identification information of the first user, and stores the user record in HDD 115 .

ジョブ実行部８５は、ハードウェア資源を制御してジョブを実行する。ハードウェア資源は、通信Ｉ／Ｆ部１１２、ＨＤＤ１１５、ファクシミリ部１１６、自動原稿搬送装置１２０、原稿読取部１３０、画像形成部１４０、給紙部１５０、後処理部１５５および操作パネル１６０を含む。ジョブは、例えば、コピージョブ、プリントジョブ、スキャンジョブ、ファクシミリ送信ジョブ、データ送信ジョブを含む。なお、ジョブ実行部８５が実行可能なジョブは、これらに限定されることなく、他のジョブを含んでもよい。コピージョブは、原稿読取部１３０に原稿を読み取らせるスキャン処理と、原稿読取部１３０が原稿を読み取って出力するデータの画像を画像形成部１４０に形成させるプリント処理とを含む。プリントジョブは、画像形成部１４０に、ＨＤＤ１１５に記憶されたデータ、通信Ｉ／Ｆ部１１２が外部から受信するプリントデータの画像を用紙に形成させるプリント処理を含む。スキャンジョブは、原稿読取部１３０に原稿を読み取らせるスキャン処理と、原稿読取部１３０が原稿を読み取って出力する画像データを出力する出力処理を含む。出力処理は、データをＨＤＤ１１５に記憶させるデータ記憶処理と、データを通信Ｉ／Ｆ部１１２に外部に送信するデータ送信処理を含む。ファクシミリ送信ジョブは、原稿読取部１３０に原稿を読み取らせるスキャン処理と、原稿読取部１３０が原稿を読み取って出力するデータをファクシミリ部１１６に送信させるファクシミリ送信処理とを含む。データ送信ジョブは、ＨＤＤ１１５に記憶されたデータまたは原稿読取部１３０が原稿を読み取って出力するデータを、通信Ｉ／Ｆ部１１２を制御して他のコンピューターに送信するデータ送信処理を含む。 The job execution unit 85 executes jobs by controlling hardware resources. The hardware resources include communication I/F section 112 , HDD 115 , facsimile section 116 , automatic document feeder 120 , document reading section 130 , image forming section 140 , paper feeding section 150 , post-processing section 155 and operation panel 160 . Jobs include, for example, copy jobs, print jobs, scan jobs, facsimile transmission jobs, and data transmission jobs. Jobs that can be executed by job execution unit 85 are not limited to these, and may include other jobs. The copy job includes scan processing for causing document reading unit 130 to read the document, and print processing for causing image forming unit 140 to form an image of data output by document reading unit 130 reading the document. The print job includes print processing for causing image forming portion 140 to form an image of data stored in HDD 115 and print data externally received by communication I/F portion 112 on paper. The scan job includes scan processing for causing document reading unit 130 to read the document, and output processing for outputting image data output by document reading unit 130 reading the document. The output processing includes data storage processing for storing data in HDD 115 and data transmission processing for transmitting data to communication I/F portion 112 to the outside. The facsimile transmission job includes scanning processing for causing document reading unit 130 to read a document, and facsimile transmission processing for causing document reading unit 130 to read and output data from the document and transmit the data to facsimile unit 116 . The data transmission job includes a data transmission process of controlling communication I/F unit 112 to transmit data stored in HDD 115 or data read and output by document reading unit 130 to another computer.

操作ユーザー特定部５１は、ＭＦＰ１００を操作する操作ユーザーを特定する。例えば、ユーザーが操作パネル１６０を操作する場合、そのユーザーがユーザー識別情報を操作部１６３に入力する場合に操作部１６３に入力されたユーザー識別情報で識別されるユーザーを操作ユーザーとして特定する。また、ＭＦＰ１００がカードリーダーを備える場合、操作ユーザー特定部５１は、カードリーダーがカードに記憶されたユーザー識別情報を読み取る場合、ユーザーがカードリーダーにカードに記憶されたユーザー識別情報を読み取らせる操作を受け付ける。操作ユーザー特定部５１は、カードリーダーが読み取ったユーザー識別情報で識別されるユーザーを操作ユーザーとして特定する。カードリーダーは、磁気カードリーダーであってもよいし、ＮＦＣ（Ｎｅａｒｆｉｅｌｄｃｏｍｍｕｎｉｃａｔｉｏｎ）規格で通信する無線通信装置であってもよい。操作ユーザー特定部５１は、操作ユーザーを特定する場合、操作ユーザーのユーザー識別情報を設定部５３に出力する。 Operating user identification unit 51 identifies an operating user who operates MFP 100 . For example, when a user operates operation panel 160 and the user inputs user identification information to operation unit 163, the user identified by the user identification information input to operation unit 163 is identified as the operating user. Further, when MFP 100 includes a card reader, operating user identification unit 51 allows the user to cause the card reader to read the user identification information stored in the card when the card reader reads the user identification information stored in the card. accept. The operating user identification unit 51 identifies the user identified by the user identification information read by the card reader as the operating user. The card reader may be a magnetic card reader, or may be a wireless communication device that communicates according to the NFC (Near field communication) standard. When specifying the operating user, operating user specifying portion 51 outputs the user identification information of the operating user to setting portion 53 .

設定部５３は、操作ユーザー特定部５１から操作ユーザーのユーザー識別情報が入力される場合、そのユーザー識別情報のユーザーによる指示が開始条件に設定されている依頼ジョブまたは通常ジョブがＨＤＤ１１５に記憶されているか否かを判断する。そのような依頼ジョブまたは通常ジョブがＨＤＤ１１５に記憶されている場合、設定部５３は、その依頼ジョブまたは通常ジョブを実行可能な状態に設定する。特に、設定部５３は、依頼ジョブを実行可能な状態に設定する場合、第２ユーザーによる指示を受け付けるための設定ボタンを表示部１６１に表示する。設定ボタンは、依頼ジョブを実行するコマンドが割り当てられる。 When the user identification information of the operating user is input from operating user specifying portion 51 , setting portion 53 stores in HDD 115 a requested job or a normal job whose start condition is set to an instruction by the user of the user identification information. determine whether there is If such a requested job or normal job is stored in HDD 115, setting unit 53 sets the requested job or normal job to an executable state. In particular, when setting the requested job to an executable state, setting portion 53 displays a setting button for accepting an instruction from the second user on display portion 161 . A setting button is assigned a command for executing a requested job.

図７は、ログイン画面の一例を示す図である。図７に示すログイン画面５００は、第２ユーザーが操作部１６３を操作して、ＭＦＰ１００によりユーザーが特定される場合に表示部１６１に表示される。図７を参照して、ログイン画面５００は、種々の処理を選択するための複数の選択ボタン５０３と設定ボタン５０１とを含む。設定ボタン５０１は、第２ユーザーが操作部１６３を操作してＭＦＰ１００により第２ユーザーが特定される場合に表示される。したがって、ＨＤＤ１１５に依頼ジョブが記憶されていない場合には、ログイン画面５００に設定ボタン５０１は含まれない。設定ボタン５０１は、「作業依頼」の文字列と、「（××さん）」の文字列を含む。「（××さん）」の文字列は第１ユーザーのユーザー識別情報である第１ユーザー名を含む。したがって、設定ボタン５０１を見る第２ユーザーは、設定ボタン５０１が、第１ユーザーにより依頼された作業を遂行するための依頼ジョブを実行可能な状態にするためのボタンであることを知ることができる。 FIG. 7 is a diagram showing an example of a login screen. Login screen 500 shown in FIG. 7 is displayed on display portion 161 when the second user operates operation portion 163 and MFP 100 identifies the user. Referring to FIG. 7, login screen 500 includes a plurality of selection buttons 503 and a setting button 501 for selecting various processes. Setting button 501 is displayed when the second user operates operation unit 163 and MFP 100 identifies the second user. Accordingly, when no requested job is stored in HDD 115 , login screen 500 does not include setting button 501 . The setting button 501 includes a character string “work request” and a character string “(Mr. XX)”. The character string "(Mr. XX)" includes the first user name, which is the user identification information of the first user. Therefore, the second user who sees the setting button 501 can know that the setting button 501 is a button for enabling the requested job for performing the work requested by the first user. .

図６に戻って、設定部５３は、設定ボタン５０１を表示部１６１に表示するのに代えて、ＨＤＤ１１５に記憶されている１以上のジョブを識別するためのジョブ識別情報を選択可能に表示部１６１に表示してもよい。この場合、ＨＤＤ１１５に依頼ジョブまたは通常ジョブが記憶されている場合、表示部１６１に依頼ジョブおよび通常ジョブのジョブ識別情報が選択可能に表示される。設定部５３は、ユーザーが依頼ジョブのジョブ識別情報を選択する場合、設定ボタン５０１が指示された場合と同様に、依頼ジョブまたは通常ジョブを実行可能な状態に設定する。 Returning to FIG. 6, instead of displaying setting button 501 on display portion 161, setting portion 53 displays job identification information for identifying one or more jobs stored in HDD 115 in a selectable manner. 161 may be displayed. In this case, when a requested job or a normal job is stored in HDD 115, display portion 161 displays job identification information of the requested job and the normal job in a selectable manner. When the user selects the job identification information of the requested job, the setting unit 53 sets the requested job or the normal job to an executable state in the same manner as when the set button 501 is instructed.

設定部５３は、実行可能に設定したジョブの設定値を変更する操作を受け付ける場合、ジョブの設定値を変更する。これにより、第２ユーザーは、依頼ジョブに誤って設定されている設定値を正しい値に変更することができ、また、設定されていない設定値を追加して設定することができる。 Setting unit 53 changes the setting values of the job when accepting an operation to change the setting values of the job set to be executable. Thereby, the second user can change the setting values erroneously set in the requested job to correct values, and can add and set setting values that have not been set.

また、第２ユーザーによる指示が開始条件に設定されている依頼ジョブが実行されることなくＨＤＤ１１５に記憶されてから所定の時間が経過する場合がある。この場合に設定部５３は、操作ユーザー特定部５１から第２ユーザーのユーザー識別情報が入力される場合であっても、依頼ジョブを実行可能な状態に設定しない。具体的には、設定部５３は、依頼ジョブの実行のユーザーによる指示を受け付けるための設定ボタン５０１を表示部１６１に表示しない。所定時間を経過しても依頼ジョブが実行されていない場合は、依頼ジョブの実行が不要となった確率が高く、不要となった依頼ジョブがＨＤＤ１１５に記憶された状態が継続しないようにするためである。不要なデータを削除することにより記憶資源を有効に利用することができる。 Also, a predetermined time may elapse after a requested job whose start condition is set to an instruction by the second user is stored in HDD 115 without being executed. In this case, setting unit 53 does not set the requested job to an executable state even when the user identification information of the second user is input from operating user specifying unit 51 . Specifically, setting portion 53 does not display on display portion 161 setting button 501 for accepting the user's instruction to execute the requested job. If the requested job has not been executed after a predetermined period of time has elapsed, there is a high probability that the requested job has become unnecessary. is. Storage resources can be effectively used by deleting unnecessary data.

ジョブ制御部５５に含まれるジョブ実行部８５は、第２ユーザーが操作部１６３に入力する依頼ジョブの実行を指示する実行操作を受け付ける場合、依頼ジョブを実行する。また、ジョブ実行部８５は、指示ユーザーが操作部１６３に入力する通常ジョブの実行操作を受け付ける場合、通常ジョブを実行する。指示ユーザーは、通常ジョブのジョブ識別情報と関連付部８３により関連付けられたユーザー識別情報で特定される。 Job execution portion 85 included in job control portion 55 executes the requested job when receiving an execution operation input to operation portion 163 by the second user to instruct execution of the requested job. In addition, job executing portion 85 executes a normal job when receiving a normal job execution operation input to operation portion 163 by the instructing user. The designated user is identified by the user identification information associated by the association unit 83 with the job identification information of the normal job.

依頼者通知部５７は、依頼ジョブが実行されることに応じて、依頼ジョブが実行されたことを第１ユーザーに通知する。第１ユーザーは、依頼ジョブのジョブ識別情報と関連付部８３により関連付けられたユーザー識別情報で特定される。依頼者通知部５７は、第１ユーザーに対して予め登録された方法で依頼ジョブが実行されたことを通知する。例えば、依頼者通知部５７は、第２ユーザーにより依頼ジョブが実行されたことを示すメッセージを、第１ユーザーに対して予め登録された宛先に電子メールで送信する。 The requester notification unit 57 notifies the first user that the requested job has been executed in response to the execution of the requested job. The first user is identified by the user identification information associated by the association unit 83 with the job identification information of the requested job. The requester notification unit 57 notifies the first user that the requested job has been executed by a method registered in advance. For example, the requester notification unit 57 sends an e-mail message indicating that the requested job has been executed by the second user to a destination registered in advance for the first user.

図８は、第１の実施の形態におけるジョブ生成の流れの一例を示すフローチャートである。ジョブ生成は、スマートスピーカー２００が備えるＣＰＵ２０１がＲＯＭ２０２、ＥＰＲＯＭ２０４に記憶されたジョブ制御プログラムを実行することにより、ＣＰＵ２０１により実行される処理である。ジョブ制御プログラムは、ジョブ生成プログラムの一部である。 FIG. 8 is a flow chart showing an example of the flow of job generation according to the first embodiment. Job generation is a process executed by the CPU 201 of the smart speaker 200 by executing a job control program stored in the ROM 202 and the EPROM 204 . The Job Control Program is part of the Job Generator Program.

図８を参照して、スマートスピーカー２００が備えるＣＰＵ２０１は、音声を受け付けたか否かを判断する。具体的には、ＣＰＵ２０１は、マイクロフォン２０８が出力する音声データを受け付ける場合、音声を受け付けたと判断する。ＣＰＵ２０１は、マイクロフォン２０８から音声データを受け付けたならば処理をステップＳ０２に進めるが、そうでなければ処理をステップＳ０７に進める。ここでは、ユーザーＡの音声を受け付ける場合を例に説明する。 Referring to FIG. 8, CPU 201 included in smart speaker 200 determines whether or not voice has been received. Specifically, when the CPU 201 accepts audio data output from the microphone 208, it determines that the audio has been accepted. If CPU 201 receives audio data from microphone 208, CPU 201 advances the process to step S02, but otherwise advances the process to step S07. Here, a case of accepting user A's voice will be described as an example.

ステップＳ０２においては、ＣＰＵ２０１は、音声認識し、処理をステップＳ０３に進める。具体的には、ＣＰＵ２０１は、ステップＳ０１において受け付けられた音声データで特定される音声を音声認識し、音声を文字で構成される音声情報に変換する。ステップＳ０３においては、音声から変換された音声情報が決定され、処理はステップＳ０４に進む。 In step S02, CPU 201 recognizes the voice and advances the process to step S03. Specifically, CPU 201 recognizes the voice specified by the voice data accepted in step S01, and converts the voice into voice information composed of characters. In step S03, voice information converted from voice is determined, and the process proceeds to step S04.

ステップＳ０４においては、ユーザー特定処理が実行され、処理はステップＳ０５に進む。ユーザー特定処理の詳細は後述するが、音声データで特定される音声を発声したユーザーを特定する処理である。ここでは、ユーザーＡが特定される。ステップＳ０５においては、ＣＰＵ２０１は、音声レコードを生成し、処理をステップＳ０６に進める。具体的には、ＣＰＵ２０１は、ステップＳ０３で決定された音声情報とステップＳ０４において特定されたユーザーのユーザー識別情報と、音声データが受け付けられた日時と、を含む音声レコードを生成する。音声レコードは、音声から変換された音声情報とその音声を発声したユーザーのユーザー識別情報と、その音声が発声された日時と、を関連付ける情報である。ステップＳ０６においては、音声レコードがＨＤＤ１１５に記憶された音声テーブルに追加され、処理はステップＳ０９に進む。 In step S04, user identification processing is executed, and the processing proceeds to step S05. Details of the user specifying process will be described later, but this is a process of specifying the user who has uttered the voice specified by the voice data. Here, user A is identified. In step S05, CPU 201 generates a voice record and advances the process to step S06. Specifically, CPU 201 generates a voice record including the voice information determined in step S03, the user identification information of the user specified in step S04, and the date and time when the voice data was received. A voice record is information that associates voice information converted from voice, user identification information of the user who uttered the voice, and the date and time when the voice was uttered. In step S06, the voice record is added to the voice table stored in HDD 115, and the process proceeds to step S09.

ステップＳ０９においては、ステップＳ０３において決定された音声情報がユーザー識別情報を含むか否かが判断される。音声情報がユーザー識別情報を含むならば処理はステップＳ１０に進むが、そうでなければ処理はステップＳ１１に進む。ここでは、ユーザーＡがユーザーＢの名前を通話で発声するので、ユーザー識別情報としてユーザーＢの名前が音声情報から抽出される。ステップＳ１０においては、ペアリング処理が実行され、処理はステップＳ１１に進む。ペアリング処理の詳細は後述するが、ステップＳ０９において音声情報から抽出されるユーザー識別情報で特定されるユーザーが発声した音声を収集する装置をペアリング装置として決定する処理である。ここでは、スマートスピーカー２００Ｂがペアリング装置に決定される。 In step S09, it is determined whether or not the voice information determined in step S03 includes user identification information. If the voice information contains user identification information, the process proceeds to step S10; otherwise, the process proceeds to step S11. Here, since the user A speaks the name of the user B during the call, the name of the user B is extracted from the voice information as the user identification information. In step S10, a pairing process is executed, and the process proceeds to step S11. Although the details of the pairing process will be described later, it is a process of determining, as a pairing device, a device that collects the voice uttered by the user specified by the user identification information extracted from the voice information in step S09. Here, smart speaker 200B is determined as the pairing device.

一方、ステップＳ０７においては、ペアリング装置が存在するか否かを判断する。ステップＳ１０が実行されてペアリング装置が決定されているならば処理はステップＳ０８に進むが、そうでなければ処理はステップＳ０１に戻る。ステップＳ０８においては、ペアリング装置から音声情報とユーザー識別情報と時刻情報とが取得され、処理はステップＳ０１に戻る。具体的には、ＣＰＵ１１１は、ペアリング装置であるスマートスピーカー２００Ｂに音声情報の送信を要求する。ペアリング装置であるスマートスピーカー２００Ｂは、音声情報の送信が要求されると、音声情報とユーザー識別情報と時刻情報とを返信するので、ＣＰＵ１１１は、それらを受信する場合、それらを含む音声レコードをＨＤＤ１１５に記憶された音声テーブルに追加する。 On the other hand, in step S07, it is determined whether or not there is a pairing device. If step S10 has been performed and a pairing device has been determined, the process proceeds to step S08, otherwise the process returns to step S01. In step S08, voice information, user identification information, and time information are acquired from the pairing device, and the process returns to step S01. Specifically, the CPU 111 requests the smart speaker 200B, which is the pairing device, to transmit the audio information. When smart speaker 200B, which is a pairing device, is requested to send voice information, it returns voice information, user identification information, and time information. Add to the voice table stored in HDD 115 .

ステップＳ１１以降の処理が実行される場合、音声テーブルに登録された音声レコードが処理対象となる。具体的には、ステップＳ１１以降の処理では、ステップＳ１１が実行される前の段階までに、音声テーブルに新たに登録された音声レコードが処理対象となる。したがって、音声テーブルに登録されている音声レコードのすべてが処理対象となる。音声テーブルには、ステップＳ０３が実行されてスマートスピーカー２００で収集されたユーザーＡが発声した音声から決定された音声情報を含む音声レコードと、ステップＳ０８が実行されてペアリング装置であるスマートスピーカー２００Ａで収集されたユーザーＢが発声した音声から決定された音声情報を含む音声レコードとが登録されている。 When the processes after step S11 are executed, the voice records registered in the voice table are processed. Specifically, in the processing after step S11, the voice record newly registered in the voice table before step S11 is executed is processed. Therefore, all voice records registered in the voice table are processed. In the voice table, step S03 is performed and the voice record including voice information determined from the voice uttered by user A collected by the smart speaker 200, and step S08 is performed and the smart speaker 200A, which is a pairing device, is stored. and a voice record including voice information determined from the voice uttered by user B collected in .

ステップＳ１１においては、音声情報が許諾文字列を含むか否かが判断される。ステップＳ０７において取得された音声情報、換言すれば、ペアリング装置であるスマートスピーカー２００Ｂで収集された音声から決定された音声情報が許諾文字列を含む確率が高い。許諾文字列は、他人からの依頼を受ける場合に用いる言葉を含む文字列であり、予め定められている。具体的には、許諾文字列は、「了解」、「解る」「引き受け」等を含む。ステップＳ０８において取得された音声情報が許諾文字列を含むならば処理はステップＳ１２に進むが、そうでなければ処理はステップＳ１７に進む。 In step S11, it is determined whether or not the voice information contains a permitted character string. There is a high probability that the audio information acquired in step S07, in other words, the audio information determined from the audio collected by the smart speaker 200B, which is the pairing device, contains the permitted character string. The permission character string is a predetermined character string including words used when receiving a request from another person. Specifically, the permission character string includes "understand", "understand", "accept", and the like. If the voice information acquired in step S08 contains the permitted character string, the process proceeds to step S12; otherwise, the process proceeds to step S17.

ステップＳ１２においては、許諾文字列と組みになるユーザー識別情報のユーザーが第２ユーザーに決定され、処理はステップＳ１３に進む。処理がステップＳ１３に処理が進む場合、処理対象の音声情報が許諾文字列を含む。このため、第２ユーザーは、許諾文字列を発声したユーザーであり、第１ユーザーからの作業の依頼を受諾する受諾者である。ここでは、ユーザーＢが第２ユーザーに決定される。 In step S12, the user of the user identification information paired with the permission character string is determined to be the second user, and the process proceeds to step S13. When the process proceeds to step S13, the voice information to be processed includes the permission character string. Therefore, the second user is the user who utters the permission character string and is the acceptor who accepts the work request from the first user. Here, user B is determined to be the second user.

ステップＳ１３においては、ジョブ生成サブ処理が実行され、処理はステップＳ１４に進む。ジョブ生成サブ処理の詳細は後述するが、ＥＰＲＯＭ２０４に記憶された音声テーブルに含まれる１以上の音声レコードに基づいてジョブを生成するとともに、第２ユーザーに作業を依頼した第１ユーザーを決定する処理である。ここでは、ユーザーＡが第１ユーザーに決定される。ステップＳ１４においては、依頼ジョブが生成され、処理はステップＳ１５に進む。ジョブ生成サブ処理が実行されることにより生成されるジョブと、ステップＳ１２において決定された第２ユーザーとに基づいて依頼ジョブが生成される。依頼ジョブは、ＭＦＰ１００に依頼ジョブで定められる処理を実行させる開始条件として、第２ユーザーによる操作を受け付けることが設定されたジョブである。具体的には、ステップＳ１３において生成されたジョブに、開始条件としてステップＳ１２において決定された第２ユーザーによる操作を受け付けることが設定されることにより、依頼ジョブが生成される。次のステップＳ１５においては、ＣＰＵ２０１は、通信部２０５を制御して、依頼ジョブをＭＦＰ１００に送信し、処理をステップＳ１６に進める。ステップＳ１６においては、依頼ジョブの存在が第２ユーザーに通知され、処理は終了する。具体的には、ＭＦＰ１００に実行可能な依頼ジョブが蓄積されていることを示すメッセージを含み、宛先アドレスに第２ユーザーであるユーザーＢの電子メールアドレスを設定した電子メールは生成され、通信部２０５によりその電子メールが送信される。 In step S13, a job generation sub-process is executed, and the process proceeds to step S14. The details of the job generation sub-processing will be described later, but this is a process of generating a job based on one or more voice records contained in the voice table stored in the EPROM 204, and determining the first user who requested the work from the second user. is. Here, user A is determined to be the first user. In step S14, a requested job is generated, and the process proceeds to step S15. A requested job is generated based on the job generated by executing the job generation sub-process and the second user determined in step S12. A requested job is a job for which acceptance of an operation by the second user is set as a condition for starting MFP 100 to execute processing defined by the requested job. Specifically, the requested job is generated by setting the acceptance of the operation by the second user determined in step S12 as a start condition in the job generated in step S13. In the next step S15, CPU 201 controls communication unit 205 to transmit the requested job to MFP 100, and the process proceeds to step S16. In step S16, the existence of the requested job is notified to the second user, and the process ends. Specifically, an email containing a message indicating that executable requested jobs are stored in MFP 100 and having the email address of user B, who is the second user, set as the destination address is generated. will send the email.

一方、ステップＳ１７においては、音声情報が指示文字列を含むか否かが判断される。音声情報が指示文字列を含むならば処理はステップＳ１８に進むが、そうでなければ処理はステップＳ０１に戻る。ステップＳ１８においては、指示文字列を発声したユーザーが指示ユーザーに決定され、処理はステップＳ１９に進む。処理対象の音声情報に関連付けられたユーザー識別情報で特定されるユーザーが、指示文字列を発声した指示ユーザーに決定される。処理がステップＳ１８に処理が進む場合、ステップＳ０３において決定された音声情報、換言すれば、スマートスピーカー２００で収集された音声から決定された音声情報が許諾文字列を含む確率が高い。ステップＳ１９においては、ジョブ生成サブ処理が実行され、処理はステップＳ１６に進む。 On the other hand, in step S17, it is determined whether or not the voice information contains the designated character string. If the voice information contains the designated character string, the process proceeds to step S18; otherwise, the process returns to step S01. In step S18, the user who uttered the instruction character string is determined as the instruction user, and the process proceeds to step S19. The user identified by the user identification information associated with the voice information to be processed is determined as the instruction user who uttered the instruction character string. When the process proceeds to step S18, there is a high probability that the audio information determined in step S03, in other words, the audio information determined from the audio collected by smart speaker 200, includes the permitted character string. In step S19, a job generation sub-process is executed, and the process proceeds to step S16.

ステップＳ２０においては、通常ジョブが生成され、処理はステップＳ２１に進む。通常ジョブは、ＭＦＰ１００に通常ジョブで定められる処理を実行させる開始条件として、指示ユーザーによる操作を受け付けることが設定されたジョブである。具体的には、ステップＳ１９において生成されたジョブを、開始条件としてステップＳ１８において決定された指示ユーザーによる操作を受け付けることが設定されることにより、通常ジョブが生成される。次のステップＳ２１においては、通信部２０５を制御して、通常ジョブがＭＦＰ１００に送信され、処理はステップＳ２２に進む。ステップＳ２２においては、通常ジョブの存在が指示ユーザーに通知され、処理は終了する。具体的には、ＭＦＰ１００に実行可能な通常ジョブが蓄積されていることを示すメッセージを含み、宛先アドレスに指示ユーザーの電子メールアドレスが設定された電子メールが生成され、通信部２０５によりその電子メールが送信される。 In step S20, a normal job is generated, and the process proceeds to step S21. A normal job is a job for which acceptance of an operation by an instructing user is set as a condition for starting MFP 100 to execute processing defined by the normal job. Specifically, a normal job is generated by setting the job generated in step S19 as a start condition to accept an operation by the instruction user determined in step S18. In the next step S21, communication unit 205 is controlled to transmit the normal job to MFP 100, and the process proceeds to step S22. In step S22, the instruction user is notified of the existence of the normal job, and the process ends. Specifically, an e-mail containing a message indicating that executable normal jobs are stored in MFP 100 and having the e-mail address of the designated user set as the destination address is generated, and communication unit 205 generates the e-mail. is sent.

図９は、ユーザー特定処理の流れの一例を示すフローチャートである。ユーザー特定処理は、図８のステップＳ０４において実行される処理である。ユーザー特定処理が実行される前の段階で、音声が受け付けられている。図９を参照して、ＣＰＵ２０１は、音声を発声したユーザーをサーバー４００に問い合わせる（ステップＳ３１）。音声から声紋を抽出し、抽出された声紋をサーバー４００に送信することにより、その声紋のユーザーが誰であるかを問い合わせる。サーバー４００は、問合せに応じて、声紋で特定されるユーザーのユーザー識別情報を返信するので、ＣＰＵ２０１は、ステップＳ３２において、サーバー４００からユーザー識別情報を受信するまで待機状態となり（ステップＳ３２でＮＯ）、サーバー４００からユーザー識別情報を受信すると（ステップＳ３２でＹＥＳ）、処理をステップＳ３３に進める。 FIG. 9 is a flowchart showing an example of the flow of user identification processing. The user identification process is a process executed in step S04 of FIG. Speech is accepted before the user identification process is executed. Referring to FIG. 9, CPU 201 inquires of server 400 about the user who has uttered the voice (step S31). By extracting a voiceprint from the voice and transmitting the extracted voiceprint to the server 400, the user of the voiceprint is queried. In response to the inquiry, server 400 returns the user identification information of the user identified by the voiceprint, so CPU 201 waits in step S32 until the user identification information is received from server 400 (NO in step S32). , the user identification information is received from server 400 (YES in step S32), the process proceeds to step S33.

サーバー４００は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのうちで同一のユーザーの音声を収集する装置が複数の場合に、それらに調停指示を送信する。調停指示は、同一のユーザーの音声を収集する複数の装置を識別するための装置識別情報を含む。このため、ＣＰＵ２０１は、ステップＳ３３において、サーバー４００から調停指示を受信したか否かを判断する。調停指示を受信したならば処理はステップＳ３４に進み、そうでなければ処理はステップＳ３９に進む。 Server 400 transmits an arbitration instruction to multiple smart speakers 200, 200A, 200B, and 200C when multiple devices collect the voice of the same user. The arbitration instructions include device identification information for identifying multiple devices that collect the same user's voice. Therefore, CPU 201 determines whether or not an arbitration instruction has been received from server 400 in step S33. If the arbitration instruction is received, the process proceeds to step S34; otherwise, the process proceeds to step S39.

ステップＳ３４においては、音量が比較される。具体的には、ＣＰＵ２０１は、調停指示に含まれる装置識別情報で特定される装置のうち他の装置のすべてから音声情報に対応する音声の音量を取得する。ステップＳ３５においては、ＣＰＵ２０１は、自装置の音量が最大か否かを判断する。自装置の音量が最大ならば処理はステップＳ３９に進むが、そうでなければ処理はステップＳ３６に進む。ステップＳ３６においては音量が同一の装置が存在するか否かが判断される。音量が同一の装置が存在するならば処理はステップＳ３７に進むが、そうでなければ処理は終了する。 Volumes are compared in step S34. Specifically, the CPU 201 acquires the volume of the sound corresponding to the sound information from all the other devices among the devices specified by the device identification information included in the arbitration instruction. In step S35, the CPU 201 determines whether or not the volume of its own device is the maximum. If the volume of its own device is the maximum, the process proceeds to step S39; otherwise, the process proceeds to step S36. In step S36, it is determined whether or not there is a device with the same volume. If there is a device with the same volume, the process proceeds to step S37; otherwise, the process ends.

ステップＳ３７においては、他の装置のいずれかから禁止信号が受信されたか否かを判断する。禁止信号が受信されたならば処理は終了するが、そうでなければ処理はステップＳ３８に進む。ステップＳ３８においては、他の装置のすべてに禁止信号が送信され、処理はステップＳ３９に進む。ステップＳ３９においては、ステップＳ３２において受信されたユーザー識別情報で識別されるユーザーを特定し、処理はジョブ生成処理に戻る。 In step S37, it is determined whether or not a prohibition signal has been received from any of the other devices. If the inhibit signal is received, the process ends; otherwise, the process proceeds to step S38. In step S38, a prohibition signal is transmitted to all other devices, and the process proceeds to step S39. In step S39, the user identified by the user identification information received in step S32 is specified, and the process returns to the job generation process.

図１０は、ペアリング処理の流れの一例を示すフローチャートである。ペアリング処理は、図８のステップＳ１０において実行される処理である。ペアリング処理が実行される前のステップＳ０９において、音声情報からユーザー識別情報が抽出されている。図１０を参照して、ＣＰＵ２０１は、音声情報からユーザー識別情報で識別されるユーザーが発声する音声を収集する他の装置をサーバー４００に問い合わせる。具体的には、ＣＰＵ２０１は、ユーザー識別情報をサーバー４００に送信することにより、他の装置を問い合わせる。サーバー４００は、問合せに応じて、スマートスピーカー２００Ａ，２００Ｂ，２００Ｃのうちにユーザー識別情報で識別されるユーザーが発声する音声を収集する装置が存在すれば、その装置の装置識別情報を返信する。ＣＰＵ２０１は、ステップＳ４２において、サーバー４００から装置識別情報を受信したか否かを判断する。装置識別情報が受信されたならば処理はステップＳ４３に進むが、そうでなければ処理がジョブ生成処理に戻る。ステップＳ４３においては、サーバー４００から受信された装置識別情報で識別される装置がペアリング装置に決定され、処理はジョブ生成処理に戻る。 FIG. 10 is a flowchart showing an example of the flow of pairing processing. A pairing process is a process performed in FIG.8 S10. User identification information is extracted from the voice information in step S09 before the pairing process is executed. Referring to FIG. 10, CPU 201 inquires of server 400 about another device that collects the voice uttered by the user identified by the user identification information from the voice information. Specifically, CPU 201 queries other devices by transmitting user identification information to server 400 . In response to the inquiry, if there is a device among smart speakers 200A, 200B, and 200C that collects the voice uttered by the user identified by the user identification information, server 400 returns the device identification information of that device. CPU 201 determines whether device identification information has been received from server 400 in step S42. If the device identification information has been received, the process proceeds to step S43; otherwise, the process returns to the job generation process. In step S43, the device identified by the device identification information received from server 400 is determined as the pairing device, and the process returns to the job generation process.

図１１は、ジョブ生成サブ処理の流れの一例を示すフローチャートである。ジョブ生成サブ処理は、図８のステップＳ１３およびステップＳ１９で実行される処理である。ジョブ生成サブ処理が実行される前のステップＳ０６において、許諾文字列を含む文字情報を含む音声レコードがＨＤＤ１１５に記憶され場合と、指示文字列を含む文字情報を含む音声レコードがＨＤＤ１１５に記憶される場合とがある。以下、許諾文字列を含む音声情報を含む音声レコードまたは指示文字列を含む音声情報を含む音声レコードを処理対象レコードという。 FIG. 11 is a flowchart showing an example of the flow of job generation sub-processing. The job generation sub-process is a process executed in steps S13 and S19 of FIG. In step S06 before the job generation sub-process is executed, HDD 115 stores a voice record including character information including a permission character string, and a voice record including character information including an instruction character string is stored in HDD 115. There are cases. Hereinafter, an audio record including audio information including a permission character string or an audio record including audio information including an instruction character string is referred to as a record to be processed.

図１１を参照して、ＣＰＵ２０１は、処理対象レコードよりも１つ前の時刻の音声レコードを読出す（ステップＳ５１）。具体的には、ＣＰＵ２０１は、ＥＰＲＯＭ２０４に記憶されている音声テーブルに含まれる複数の音声レコードのうちから処理対象レコードに含まれる時刻情報で示される時刻より前で最も近い時刻を示す時刻情報を含む音声レコードを読出す。次のステップＳ５２においては、音声レコードに含まれる音声情報が特定される。 Referring to FIG. 11, CPU 201 reads out the voice record at the time immediately preceding the record to be processed (step S51). Specifically, the CPU 201 includes time information indicating the nearest time earlier than the time indicated by the time information included in the record to be processed among the plurality of voice records included in the voice table stored in the EPROM 204. Read voice record. In the next step S52, audio information contained in the audio record is specified.

そして、音声情報が依頼文字列を含むか否かが判断される（ステップＳ５３）。依頼文字列は、他人に作業を依頼する場合に用いる語を含む。依頼文字列は予め定めておけばよい。音声情報が依頼文字列を含むならば処理はステップＳ５４に進むが、そうでなければ処理はステップＳ５５に進む。ステップＳ５４においては、依頼文字列を発声したユーザーが第１ユーザーに決定され、処理はステップＳ５５に進む。ステップＳ５１において読み出された音声レコードに含まれるユーザー識別情報で特定されるユーザーが第１ユーザーに決定される。 Then, it is determined whether or not the voice information contains the requested character string (step S53). The request character string includes words used when requesting work to others. A request character string may be determined in advance. If the voice information contains the requested character string, the process proceeds to step S54; otherwise, the process proceeds to step S55. In step S54, the user who uttered the requested character string is determined as the first user, and the process proceeds to step S55. The user identified by the user identification information included in the voice record read in step S51 is determined as the first user.

ステップＳ５５においては、音声情報から設定情報が決定されたか否かを判断する。音声情報が、処理の対象となるデータのファイル名を含む場合に、処理対象のデータを特定する設定情報が決定される。音声情報から設定情報が決定されたならば処理はステップＳ５８に進むが、そうでなければ処理はステップＳ５６に進む。 In step S55, it is determined whether or not the setting information has been determined from the voice information. When the audio information includes the file name of data to be processed, setting information specifying the data to be processed is determined. If the setting information is determined from the audio information, the process proceeds to step S58; otherwise, the process proceeds to step S56.

ステップＳ５６においては、音声情報はキーワードと比較され、処理はステップＳ５７に進む。具体的には、音声情報の少なくとも一部の発音と、ＥＰＲＯＭ２０４に記憶されているキーワードテーブルに登録されているキーワードの少なくとも一部の発音とが比較される。次のステップＳ５７においては、比較の結果、比較の結果、音声情報の少なくとも一部と発音が同一または類似の文字列を含むキーワードが存在するか否かが判断される。そのようなキーワードが存在すれば処理はステップＳ５８に進むが、存在しなれば処理はステップＳ５９に進む。 In step S56, the voice information is compared with the keyword and the process proceeds to step S57. Specifically, the pronunciation of at least part of the voice information is compared with the pronunciation of at least part of the keywords registered in the keyword table stored in EPROM 204 . In the next step S57, as a result of the comparison, it is determined whether or not there is a keyword including a character string having the same or similar pronunciation as at least part of the voice information. If such a keyword exists, the process proceeds to step S58; otherwise, the process proceeds to step S59.

ステップＳ５８においては、設定情報が決定され、処理はステップＳ５９に進む。処理がステップＳ５５から進む場合は、ステップＳ５５において決定された設定情報が決定され、処理がステップＳ５７から進む場合は、キーワードに関連付けられた設定情報が設定される。 In step S58, setting information is determined, and the process proceeds to step S59. When the process proceeds from step S55, the setting information determined in step S55 is determined, and when the process proceeds from step S57, the setting information associated with the keyword is set.

ステップＳ５９においては、ステップＳ５８において決定された設定情報に基づいて、処理が決定されるか否かが判断される。処理が決定されるならば処理がステップＳ６０に進むが、そうでなければ処理はステップＳ５１に戻る。 In step S59, it is determined whether or not the process is determined based on the setting information determined in step S58. If the process is determined, the process proceeds to step S60; otherwise, the process returns to step S51.

ステップＳ６０においては、ステップＳ５８において決定された設定情報で定まる条件で、ステップＳ５８において決定された設定情報で定まる処理をＭＦＰ１００に実行させるためのジョブが生成され、処理はステップＳ６１に進む。ステップＳ６１においては、ジョブが完成したか否かが判断する。ジョブが複数の処理を定める場合があり、複数の処理のすべてが定まる場合にジョブが完成されたと判断される。ジョブが完成したならば処理はジョブ生成処理に戻るが、そうでなければ処理はステップＳ５１に戻る。 In step S60, under the conditions determined by the setting information determined in step S58, a job is generated for causing MFP 100 to execute the process determined by the setting information determined in step S58, and the process proceeds to step S61. In step S61, it is determined whether or not the job has been completed. A job may define multiple processes, and a job is considered complete when all of the multiple processes have been defined. If the job is completed, the process returns to the job generation process; otherwise, the process returns to step S51.

図１２は、ジョブ実行処理の流れの一例を示すフローチャートである。ジョブ実行処理は、ＭＦＰ１００が備えるＣＰＵ１１１が、ＲＯＭ１１３、ＨＤＤ１１５、ＣＤ－ＲＯＭ１１８に記憶されたジョブ実行プログラムを実行することにより、ＣＰＵ１１１により実現される機能である。ジョブ実行プログラムはジョブ生成プログラムの一部である。図１１を参照して、ＭＦＰ１００が備えるＣＰＵ１１１は、依頼ジョブを受信したか否かを判断する（ステップＳ７１）。ＣＰＵ１１１は、通信Ｉ／Ｆ部１１２を制御して、スマートスピーカー２００から依頼ジョブと第１ユーザーのユーザー識別情報とを受信したか否かを判断する。依頼ジョブを受信したならば処理はステップＳ７２に進むが、そうでなければ処理はステップＳ７３に進む。ステップＳ７２においては、ＨＤＤ１１５に、依頼ジョブと依頼ジョブとともに受信される第１ユーザーのユーザー識別情報とが記憶され、処理はステップＳ７３に進む。ユーザー識別情報は、依頼ジョブと関連付けて記憶される。 FIG. 12 is a flowchart illustrating an example of the flow of job execution processing. The job execution process is a function realized by CPU 111 of MFP 100 by executing a job execution program stored in ROM 113 , HDD 115 and CD-ROM 118 . The job executor is part of the job generator. Referring to FIG. 11, CPU 111 of MFP 100 determines whether or not a requested job has been received (step S71). CPU 111 controls communication I/F unit 112 to determine whether the requested job and the user identification information of the first user have been received from smart speaker 200 . If the requested job has been received, the process proceeds to step S72; otherwise, the process proceeds to step S73. In step S72, the requested job and the user identification information of the first user received together with the requested job are stored in HDD 115, and the process proceeds to step S73. User identification information is stored in association with the requested job.

ステップＳ７３において、通常ジョブが受信されたか否かが判断される。ＣＰＵ１１１は、通信Ｉ／Ｆ部１１２を制御して、スマートスピーカー２００から通常ジョブを受信したか否かを判断する。通常ジョブを受信したならば処理はステップＳ７４に進むが、そうでなければ処理はステップＳ７５に進む。ステップＳ７４においては、ＨＤＤ１１５に、通常ジョブが記憶され、処理はステップＳ７５に進む。 In step S73, it is determined whether or not a normal job has been received. CPU 111 controls communication I/F unit 112 to determine whether or not a normal job has been received from smart speaker 200 . If the normal job has been received, the process proceeds to step S74; otherwise, the process proceeds to step S75. In step S74, the normal job is stored in HDD 115, and the process proceeds to step S75.

ステップＳ７５おいては、操作部１６３がユーザーにより入力される操作を受け付けたか否かが判断される。操作を受け付けたならば処理はステップＳ７６に進むが、そうでなければ処理はステップＳ７１に戻る。ステップＳ７６においては、操作部１６３を操作するユーザーが特定され、処理はステップＳ７７に進む。ユーザーが操作部１６３にユーザー識別情報を入力する場合に操作部１６３に入力されたユーザー識別情報で識別されるユーザーは操作ユーザーとして特定される。また、ユーザーがカードリーダーにカードに記憶されたユーザー識別情報を読み取らせる操作が受け付けられ、カードリーダーが読み取ったユーザー識別情報で識別されるユーザーは操作ユーザーとして特定される。 In step S75, it is determined whether operation unit 163 has received an operation input by the user. If the operation is accepted, the process proceeds to step S76; otherwise, the process returns to step S71. In step S76, the user who operates operation unit 163 is identified, and the process proceeds to step S77. When the user inputs user identification information to operation unit 163, the user identified by the user identification information input to operation unit 163 is identified as the operating user. Further, an operation by the user to cause the card reader to read the user identification information stored in the card is accepted, and the user identified by the user identification information read by the card reader is specified as the operating user.

ステップＳ７７においては、特定されたユーザーに対応する依頼ジョブが存在するか否かが判断される。特定されたユーザーの指示が開始条件に設定されている依頼ジョブがＨＤＤ１１５に記憶されているか否かが判断される。依頼ジョブが存在するならば処理はステップＳ７８に進むが、そうでなければ処理はステップＳ８１に進む。 In step S77, it is determined whether or not there is a requested job corresponding to the identified user. It is determined whether or not a requested job whose start condition is set to the specified user's instruction is stored in HDD 115 . If the requested job exists, the process proceeds to step S78; otherwise, the process proceeds to step S81.

ステップＳ７８においては、依頼ジョブが受信されてからの経過時間が所定時間以内か否かは判断される。所定時間以内ならば処理はステップＳ７９に進むが、所定時間を経過していれば処理は終了する。所定時間は、予め定められた値である。依頼ジョブが受信されてから所定時間が経過した場合には依頼ジョブが不要となった確率が高い。このため、不要となった依頼ジョブが実行されないようにできる。 In step S78, it is determined whether or not the elapsed time since the requested job was received is within a predetermined time. If it is within the predetermined time, the process proceeds to step S79, but if the predetermined time has passed, the process ends. The predetermined time is a predetermined value. If a predetermined time has passed since the requested job was received, there is a high probability that the requested job is no longer needed. Therefore, it is possible to prevent unnecessary requested jobs from being executed.

ステップＳ７９においては実行指示処理を実行し、処理はステップＳ８０に進む。実行指示処理の詳細は後述する。ステップＳ８０においては、依頼ジョブが実行されたことが第１ユーザーに通知され、処理は終了する。ステップＳ７７において特定された依頼ジョブとともにステップＳ７２においてＨＤＤ１１５に記憶されたユーザー識別情報を用いて、第１ユーザーは特定される。例えば、第２ユーザーにより依頼ジョブが実行されたことを示すメッセージを含み、第１ユーザーの電子メールアドレスを宛先に含む電子メールが生成され、通信Ｉ／Ｆ部１１２を介して電子メールが送信される。 In step S79, an execution instruction process is executed, and the process proceeds to step S80. Details of the execution instruction process will be described later. In step S80, the first user is notified that the requested job has been executed, and the process ends. The first user is identified using the user identification information stored in HDD 115 in step S72 together with the requested job identified in step S77. For example, an e-mail containing a message indicating that the requested job has been executed by the second user and including the e-mail address of the first user as the destination is generated, and the e-mail is transmitted via communication I/F unit 112. be.

ステップＳ８１においては、ステップＳ７６において特定されたユーザーに対応する通常ジョブが存在するか否かが判断される。特定されたユーザーの指示が開始条件に設定されている通常ジョブがＨＤＤ１１５に記憶されているか否かを判断する。通常ジョブが存在するならば処理はステップＳ８２に進むが、そうでなければ処理は終了する。ステップＳ８２においては通常ジョブが受信されてからの経過時間が所定時間以内か否かを判断する。所定時間以内ならば処理はステップＳ８３に進むが、そうでなければ処理は終了する。通常ジョブが受信されてから所定時間が経過した場合には通常ジョブが不要となった確率が高い。このため、不要となった通常ジョブが実行されないようにできる。ステップＳ８３においては、実行指示処理を実行し、処理は終了する。実行指示処理が実行される場合に通常ジョブが処理対象にされ、通常ジョブが実行される。 In step S81, it is determined whether or not there is a normal job corresponding to the user identified in step S76. It is determined whether or not a normal job for which the specified user's instruction is set as a start condition is stored in HDD 115 . If the normal job exists, the process proceeds to step S82; otherwise, the process ends. In step S82, it is determined whether or not the elapsed time since the normal job was received is within a predetermined time. If it is within the predetermined time, the process proceeds to step S83; otherwise, the process ends. If a predetermined time has passed since the normal job was received, there is a high probability that the normal job is no longer needed. Therefore, unnecessary normal jobs can be prevented from being executed. In step S83, an execution instruction process is executed, and the process ends. When the execution instruction process is executed, the normal job is targeted for processing, and the normal job is executed.

図１３は、実行指示処理の流れの一例を示すフローチャートである。実行指示処理は、図１２のステップＳ７９またはＳ８３において実行される処理である。実行指示処理が実行される前の段階で依頼ジョブが決定されている。図１２を参照して、ＣＰＵ１１１は、依頼ジョブが存在するか否かを判断する。ＭＦＰ１００を操作するユーザーとして特定されたユーザーに対応する依頼ジョブが存在するか否かが判断される（ステップＳ９１）。ＭＦＰ１００を操作するユーザーの指示が開始条件に設定されている依頼ジョブがＨＤＤ１１５に記憶されているか否かが判断される。依頼ジョブが存在するならば処理はステップＳ９３に進むが、そうでなければ処理はステップＳ９４に進む。 FIG. 13 is a flowchart illustrating an example of the flow of execution instruction processing. The execution instruction process is a process executed in step S79 or S83 of FIG. The requested job is determined before the execution instruction process is executed. Referring to FIG. 12, CPU 111 determines whether or not there is a requested job. It is determined whether or not there is a requested job corresponding to the user identified as the user operating MFP 100 (step S91). It is determined whether HDD 115 stores a requested job whose start condition is an instruction from the user who operates MFP 100 . If the requested job exists, the process proceeds to step S93; otherwise, the process proceeds to step S94.

ステップＳ９２においては、表示部１６１に設定ボタン５０１が表示され、処理はステップＳ９３に進む。設定ボタン５０１は、依頼ジョブを実行可能な状態に設定する操作を受け付けるためのボタンであり、依頼ジョブを実行可能な状態に設定するコマンドが関連付けられている。ステップＳ９３においては、設定ボタン５０１が指示されたか否かが判断される。設定ボタン５０１が指示されるまで待機状態となり（ステップＳ９３でＮＯ）、設定ボタン５０１が指示されたならば（ステップＳ９３でＹＥＳ）、処理はステップＳ９５に進む。なお、設定ボタン５０１が表示されてから待機時間経過しても設定ボタン５０１が指示されない場合には、処理を終了してもよい。待機時間は予め定められた時間である。 In step S92, the setting button 501 is displayed on the display section 161, and the process proceeds to step S93. A setting button 501 is a button for accepting an operation for setting the requested job to an executable state, and is associated with a command for setting the requested job to an executable state. In step S93, it is determined whether or not the setting button 501 has been designated. A standby state is maintained until the setting button 501 is designated (NO in step S93), and if the setting button 501 is designated (YES in step S93), the process proceeds to step S95. Note that if the setting button 501 is not instructed even after the standby time has elapsed since the setting button 501 was displayed, the processing may be terminated. The waiting time is a predetermined time.

一方、ステップＳ９４においては、通常ジョブが選択されたか否かが判断される。ＭＦＰ１００を操作するユーザーに関連付けられたジョブの一覧が表示されるジョブ選択画面が表示された状態で、通常ジョブが選択されたか否かを判断する。通常ジョブが選択されるまで待機状態となり、通常ジョブが選択されたならば処理はステップＳ９５に進む。なお、通常ジョブとは別のジョブが選択された場合に処理がステップＳ９５に進むようにしてもよい。また、ジョブ選択画面が表示されてから待機時間が経過しても通常ジョブが選択されない場合に処理を終了してもよい。待機時間は予め定められた時間である。 On the other hand, in step S94, it is determined whether or not the normal job has been selected. With the job selection screen displaying a list of jobs associated with the user operating MFP 100 displayed, it is determined whether or not a normal job has been selected. The process waits until the normal job is selected, and if the normal job is selected, the process proceeds to step S95. Note that the process may proceed to step S95 when a job different from the normal job is selected. Further, the process may be terminated when the normal job is not selected even after the waiting time has elapsed since the job selection screen was displayed. The waiting time is a predetermined time.

ステップＳ９５においては、設定値を設定するための設定画面が表示され、処理はステップＳ９６に進む。ステップＳ９６においては、ユーザーにより入力される操作によって処理が分岐する。ユーザーにより入力される操作が実行操作ならば処理はステップＳ９５に進み、設定操作ならば処理はステップＳ９８に進む。実行操作は、ジョブの実行を指示するための操作である。設定操作は、ジョブの設定値を設定するための操作である。ステップＳ９８においては、設定操作に従って設定値が設定され、処理はステップＳ９６に戻る。ステップＳ９７においては、依頼ジョブまたは通常ジョブが実行され、処理はジョブ実行処理に戻る。 In step S95, a setting screen for setting setting values is displayed, and the process proceeds to step S96. In step S96, the process branches depending on the operation input by the user. If the operation input by the user is the execution operation, the process proceeds to step S95, and if the operation is the setting operation, the process proceeds to step S98. An execution operation is an operation for instructing execution of a job. The setting operation is an operation for setting job setting values. In step S98, the setting value is set according to the setting operation, and the process returns to step S96. In step S97, the requested job or normal job is executed, and the process returns to the job execution process.

以上説明したように第１の実施の形態における音声処理システム１において、音声処理システム１は、複数のスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃを備え、複数のスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれ、例えば、スマートスピーカー２００は、マイクロフォン２０８により収集された音声を発声したユーザーを特定し、第１ユーザーが特定される音声および他のスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃ、例えば、スマートスピーカー２００Ａで収集される第２ユーザーが発声する音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブをＭＦＰ１００に実行させるためのジョブとして生成する。このため、マイクロフォン２０８により収集された音声およびスマートスピーカー２００Ａで収集される第２ユーザーが発声する音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブが生成される。このため、離れた位置に存在する第１ユーザーおよび第２ユーザーの会話から依頼ジョブが生成されるので、ジョブを生成するための操作を簡略化することができる。 As described above, in the speech processing system 1 according to the first embodiment, the speech processing system 1 includes a plurality of smart speakers 200, 200A, 200B, and 200C, and a plurality of smart speakers 200, 200A, 200B, and 200C. For example, the smart speaker 200 identifies the user who uttered the voice collected by the microphone 208, the first user identified voice and other smart speakers 200, 200A, 200B, 200C, e.g., smart speaker 200A. Based on the collected voice uttered by the second user, the job requested by the first user to be executed by the second user is generated as a job for MFP 100 to execute. Therefore, based on the voice collected by the microphone 208 and the voice uttered by the second user collected by the smart speaker 200A, the requested job that the first user has requested the second user to execute is generated. Therefore, since the requested job is generated from the conversation of the first user and the second user who exist at distant locations, the operation for generating the job can be simplified.

また、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれは、例えば、スマートスピーカー２００は、さらに、第１ユーザーが特定される音声から予め登録された複数のユーザーいずれかのユーザー識別情報が検出されることに応じて、検出されたユーザー識別情報で識別されるユーザーを第２ユーザーに決定し、他のスマートスピーカー２００Ａ，２００Ｂ，２００Ｃのうち第２ユーザーが発声した音声を収集する音声収集装置、例えば、スマートスピーカー２００Ｂをペアリング装置に決定する。このため、第１ユーザーと会話する第２ユーザーが発声した音声を収集するスマートスピーカー２００Ｂを容易に決定することができる。 Further, each of the smart speakers 200, 200A, 200B, and 200C, for example, the smart speaker 200, further detects the user identification information of any one of a plurality of pre-registered users from the voice identifying the first user. determines the user identified by the detected user identification information as the second user, and collects the voice uttered by the second user among the other smart speakers 200A, 200B, and 200C, for example, The smart speaker 200B is determined as the pairing device. Therefore, it is possible to easily determine the smart speaker 200B that collects the voice uttered by the second user who converses with the first user.

また、スマートスピーカー２００Ｂ，２００Ｃが第２ユーザーの発声する音声を収集する場合、スマートスピーカー２００Ｂ，２００Ｃそれぞれがいずれか一方のみを第２ユーザーが発声する音声を収集する装置に決定する。このため、スマートスピーカー２００Ｂ，２００Ｃが第２ユーザーの発声する音声を収集する状態で、スマートスピーカー２００が第２ユーザーの発声した音声を収集する音声収集装置を決定する場合、スマートスピーカー２００Ｂ，２００Ｃそれぞれで決定された一方が第２ユーザーの発声した音声を収集する音声収集装置に決定される。このため、音声認識の精度を高めることができる。 Also, when the smart speakers 200B and 200C collect the voice uttered by the second user, only one of the smart speakers 200B and 200C is determined to be the device for collecting the voice uttered by the second user. Therefore, when the smart speakers 200B and 200C collect the voice uttered by the second user and the smart speaker 200 determines the voice collection device for collecting the voice uttered by the second user, the smart speakers 200B and 200C The one determined in (1) is determined to be the voice collecting device that collects the voice uttered by the second user. Therefore, the accuracy of voice recognition can be improved.

また、スマートスピーカー２００は、ペアリング装置であるスマートスピーカー２００Ｂで収集される音声が許諾の内容を示す場合、依頼ジョブを生成する。このため、第１ユーザーによる依頼を第２ユーザーが受けない場合にジョブを生成しないようにすることができる。 Further, smart speaker 200 generates a request job when the voice collected by smart speaker 200B, which is a pairing device, indicates the content of the permission. Therefore, it is possible to prevent the job from being generated when the second user does not accept the request from the first user.

＜第２の実施の形態＞
第２の実施の形態におけるＭＦＰ１００は、第１の実施の形態におけるＭＦＰ１００と同様に画像処理装置として機能するとともに、ジョブ生成装置として機能する。第２の実施の形態における音声処理システム１の全体概要は、図１に示した第１の実施の形態における音声処理システム１の全体概要と同じである。第２の実施の形態におけるＭＦＰ１００のハードウェア構成は、図３に示したハードウェア構成と同じである。第２の実施の形態におけるスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれのハードウェア構成は、図２に示したハードウェア構成と同じである。したがって、それらについての説明は繰り返さない。 <Second Embodiment>
MFP 100 in the second embodiment functions as an image processing apparatus as well as a job generation apparatus as does MFP 100 in the first embodiment. The general outline of the speech processing system 1 in the second embodiment is the same as the general outline of the speech processing system 1 in the first embodiment shown in FIG. The hardware configuration of MFP 100 in the second embodiment is the same as the hardware configuration shown in FIG. The hardware configuration of each of smart speakers 200, 200A, 200B, and 200C in the second embodiment is the same as the hardware configuration shown in FIG. Therefore, descriptions thereof will not be repeated.

図１４は、第２の実施の形態におけるスマートスピーカー２００が備えるＣＰＵ２０１が有する機能の一例を示すブロック図である。図１４に示す機能が図４に示した機能と異なる点は、ジョブ生成部２５７、ジョブ送信部２５９、通話者決定部２６３、装置決定部２６５および音声情報取得部２６７が削除された点、音声情報送信部２９１が追加された点である。その他の機能は図４に示した機能と同じなので、ここでは説明は繰り返さない。音声情報送信部２９１は、音声認識部２５３から音声情報と時刻情報との組が入力され、ユーザー特定部２５５からユーザー識別情報と時刻情報との組が入力される。音声情報送信部２９１は、それぞれと組になる時刻情報が同じユーザー識別情報と音声情報と時刻情報とを、ＭＦＰ１００に送信する。具体的には、音声情報送信部２９１は、通信部２０５を制御して、ユーザー識別情報と音声情報と時刻情報とをＭＦＰ１００に送信する。 FIG. 14 is a block diagram showing an example of functions of the CPU 201 included in the smart speaker 200 according to the second embodiment. The functions shown in FIG. 14 are different from the functions shown in FIG. The point is that an information transmission unit 291 is added. Since other functions are the same as those shown in FIG. 4, the description will not be repeated here. Voice information transmitting portion 291 receives a set of voice information and time information from voice recognition portion 253 and receives a set of user identification information and time information from user specifying portion 255 . Voice information transmitting portion 291 transmits to MFP 100 user identification information, voice information, and time information paired with the same time information. Specifically, voice information transmitting portion 291 controls communication portion 205 to transmit user identification information, voice information, and time information to MFP 100 .

図１５は、第２の実施の形態におけるＭＦＰ１００が備えるＣＰＵ１１１が有する機能の一例を示すブロック図である。図１５に示す機能が図６に示した機能と異なる点は、音声情報取得部７１、ジョブ生成部２５７、通話者決定部２６３、装置決定部２６５および操作ユーザー通知部２８１が追加された点、ジョブ受信部８１がジョブ受付部８１Ａに変更された点である。その他の機能は図６に示した機能と同じなのでここでは説明を繰り返さない。 FIG. 15 is a block diagram showing an example of functions of CPU 111 of MFP 100 according to the second embodiment. The functions shown in FIG. 15 are different from the functions shown in FIG. The difference is that the job reception section 81 is changed to a job reception section 81A. Since other functions are the same as those shown in FIG. 6, the description will not be repeated here.

第２の実施の形態における音声情報取得部７１は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのいずれかから音声情報とユーザー識別情報と時刻情報を取得する。具体的には、第２の実施の形態における音声情報取得部７１は、通信Ｉ／Ｆ部１１２を制御して、通信Ｉ／Ｆ部１１２がスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのいずれかから受信する音声情報とユーザー識別情報と時刻情報とを取得する。第２の実施の形態における音声情報取得部７１は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれに対応する音声テーブルをＨＤＤ１１５に記憶する。このため、第２の実施の形態における音声情報取得部７１は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのいずれか、例えば、スマートスピーカー２００から受信される音声情報とユーザー識別情報と時刻情報とを含む音声レコードを、ＨＤＤ１１５に記憶されたスマートスピーカー２００に対応する音声テーブルに追加する。 The voice information acquisition unit 71 in the second embodiment acquires voice information, user identification information, and time information from any one of smart speakers 200, 200A, 200B, and 200C. Specifically, voice information acquisition unit 71 in the second embodiment controls communication I/F unit 112 so that communication I/F unit 112 can Acquire received audio information, user identification information, and time information. Voice information acquisition unit 71 in the second embodiment stores voice tables corresponding to smart speakers 200, 200A, 200B, and 200C in HDD 115, respectively. Therefore, the voice information acquisition unit 71 in the second embodiment acquires voice information, user identification information, and time information received from any of the smart speakers 200, 200A, 200B, and 200C, for example, the smart speaker 200. Adds the containing voice record to the voice table corresponding to smart speaker 200 stored on HDD 115 .

また、第２の実施の形態における音声情報取得部７１は、装置決定部２６５により装置が特定される場合、ＨＤＤ１１５にスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃごとに記憶されている音声テーブルのうち、特定された装置に対応する音声テーブルから音声レコードを取得する。 Further, when device determination unit 265 identifies a device, voice information acquisition unit 71 in the second embodiment performs the Get the voice record from the voice table corresponding to the identified device.

第２の実施の形態におけるジョブ生成部２５７、通話者決定部２６３、装置決定部２６５および操作ユーザー通知部２８１は、第１の実施の形態におけるスマートスピーカー２００が備えるＣＰＵ２０１が有するジョブ生成部２５７、通話者決定部２６３、装置決定部２６５および操作ユーザー通知部２８１とそれぞれ同様の機能を有する。すなわち、第２の実施の形態におけるジョブ生成部２５７は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃいずれかに対応する音声テーブルに登録された音声情報、ユーザー識別情報および時刻情報に基づいて、依頼ジョブを生成するとともに、第１ユーザーを決定する。第２の実施の形態におけるジョブ生成部２５７は、依頼ジョブと第１ユーザーのユーザー識別情報とを、ジョブ制御部５５に出力する。また、第２の実施の形態におけるジョブ生成部２５７は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃいずれかに対応する音声テーブルに登録された音声情報、ユーザー識別情報および時刻情報に基づいて、通常ジョブを生成し、通常ジョブをジョブ制御部５５に出力する。第２の実施の形態におけるジョブ制御部５５のジョブ受付部８１Ａは、ジョブ生成部７３が依頼ジョブを生成する場合に依頼ジョブと第１ユーザーのユーザー識別情報とを受け付け、第２の実施の形態におけるジョブ生成部７３が通常ジョブを生成する場合に通常ジョブを受け付ける。 The job generation unit 257, the caller determination unit 263, the device determination unit 265, and the operating user notification unit 281 in the second embodiment are the job generation unit 257 of the CPU 201 included in the smart speaker 200 in the first embodiment, It has the same functions as caller determination unit 263, device determination unit 265, and operating user notification unit 281, respectively. That is, the job generation unit 257 in the second embodiment generates the requested job based on the voice information, user identification information, and time information registered in the voice table corresponding to any one of the smart speakers 200, 200A, 200B, and 200C. and determine the first user. Job generation portion 257 in the second embodiment outputs the requested job and the user identification information of the first user to job control portion 55 . Further, the job generation unit 257 in the second embodiment generates a normal job based on the voice information, user identification information, and time information registered in the voice table corresponding to any one of the smart speakers 200, 200A, 200B, and 200C. , and outputs the normal job to job control unit 55 . Job accepting portion 81A of job control portion 55 in the second embodiment accepts the requested job and the user identification information of the first user when job creating portion 73 creates the requested job. When the job generation unit 73 in generates a normal job, the normal job is accepted.

第２の実施の形態における操作ユーザー通知部２８１は、第１の実施の形態におけるスマートスピーカー２００が備えるＣＰＵ２０１が有する操作ユーザー通知部２８１と同様の機能を有する。すなわち、第２の実施の形態における操作ユーザー通知部２８１は、第２の実施の形態におけるジョブ生成部２５７により依頼ジョブが生成される場合、第２ユーザーに依頼ジョブがＭＦＰ１００で実行可能なことを通知する。 The operating user notification unit 281 according to the second embodiment has the same function as the operating user notification unit 281 of the CPU 201 included in the smart speaker 200 according to the first embodiment. That is, when a requested job is generated by job generating portion 257 in the second embodiment, operating user notifying portion 281 in the second embodiment notifies the second user that the requested job can be executed by MFP 100. Notice.

第２の実施の形態における音声処理システム１においては、第１の実施の形態における音声処理システム１のスマートスピーカー２００の機能の一部をＭＦＰ１００が有する。このため、第２の実施の形態におけるスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれの機能が低くてよく、システム構成を簡略にすることができる。 In the audio processing system 1 according to the second embodiment, the MFP 100 has part of the functions of the smart speaker 200 of the audio processing system 1 according to the first embodiment. Therefore, the functions of the smart speakers 200, 200A, 200B, and 200C in the second embodiment may be low, and the system configuration can be simplified.

＜第１の変形例＞
第１の変形例におけるサーバー４００は、第２の実施の形態におけるＭＦＰ１００の機能の一部を有する。第１の変形例におけるサーバー４００は、ジョブ生成装置として機能する。すなわち、第１の変形例におけるサーバー４００は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００ＣのＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）アシスタントとして機能する。スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれには、サーバー４００が備えるＡＩアシスタントが予め登録されている。スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれは、マイクロフォン２０８と、通信部２０５と、を少なくとも備えるようにし、マイクロフォン２０８で集音した音声を電子データである音声データに変換し、サーバー４００に送信する。サーバー４００は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれから受信する音声データに基づいて、音声認識およびユーザーを特定し、音声情報とユーザー識別情報と時刻情報とをＭＦＰ１００に送信する。 <First modification>
Server 400 in the first modification has some of the functions of MFP 100 in the second embodiment. Server 400 in the first modification functions as a job generation device. That is, server 400 in the first modification functions as an AI (Artificial Intelligence) assistant for smart speakers 200, 200A, 200B, and 200C. An AI assistant provided in the server 400 is registered in advance in each of the smart speakers 200, 200A, 200B, and 200C. Each of the smart speakers 200, 200A, 200B, and 200C includes at least a microphone 208 and a communication unit 205, converts sound collected by the microphone 208 into sound data that is electronic data, and transmits the data to the server 400. . Server 400 recognizes the voice and identifies the user based on the voice data received from each of smart speakers 200 , 200 A, 200 B, and 200 C, and transmits voice information, user identification information, and time information to MFP 100 .

第１の変形例における音声処理システム１においては、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃそれぞれの機能が低くてよく、システム構成を簡略にすることができる。 In the audio processing system 1 according to the first modification, the smart speakers 200, 200A, 200B, and 200C may have low functions, and the system configuration can be simplified.

さらに、サーバー４００の機能を、ＭＦＰ１００が備えるようにしてもよい。この場合には、サーバー４００が不要となるので、システム構成を簡略にすることができる。 Furthermore, MFP 100 may have the functions of server 400 . In this case, the server 400 becomes unnecessary, so the system configuration can be simplified.

＜第２の変形例＞
第２の実施の形態におけるＭＦＰ１００は、音声情報を受信するごとにジョブをリアルタイムで生成するようにしたが、音声情報を音声テーブルに蓄積して、所定のタイミングでジョブを生成するバッチ処理としてもよい。例えば、ＭＦＰ１００は、所定時間間隔でジョブを生成してもよいし、音声テーブルに所定数の音声レコードが追加されるごとに、ジョブを生成してもよい。 <Second modification>
The MFP 100 according to the second embodiment generates a job in real time each time voice information is received. good. For example, the MFP 100 may generate a job at predetermined time intervals, or may generate a job each time a predetermined number of voice records are added to the voice table.

以上説明したように第２の実施の形態における音声処理システム１において、スマートスピーカー２００，２００Ａ，２００Ｂ，２００ＣおよびＭＦＰ１００のいずれかが、音声を発声したユーザーを特定し、ＭＦＰ１００は、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのうち第１装置で収集された音声から第１ユーザーが特定される場合、その音声およびスマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのうち第２装置で収集される第２ユーザーが発声する音声に基づいて、第１ユーザーが第２ユーザーに実行を依頼した依頼ジョブをジョブとして生成する。このため、離れた位置に存在する第１ユーザーおよび第２ユーザーの会話から依頼ジョブが生成されるので、ジョブを生成するための操作を簡略化することができる。 As described above, in speech processing system 1 according to the second embodiment, any one of smart speakers 200, 200A, 200B, and 200C and MFP 100 identifies the user who has uttered a voice, and MFP 100 identifies smart speaker 200, When the first user is identified from the voice collected by the first device among 200A, 200B, and 200C, the voice and the second user collected by the second device among smart speakers 200, 200A, 200B, and 200C are identified. Based on the uttered voice, a requested job that the first user has requested the second user to execute is generated as a job. Therefore, since the requested job is generated from the conversation of the first user and the second user who exist at distant locations, the operation for generating the job can be simplified.

また、第２の実施の形態におけるＭＦＰ１００は、さらに、第１ユーザーが特定される音声から予め登録された複数のユーザーいずれかのユーザー識別情報が検出されることに応じて、検出されたユーザー識別情報で識別されるユーザーを第２ユーザーに決定し、スマートスピーカー２００，２００Ａ，２００Ｂ，２００Ｃのうち第２ユーザーが発声した音声を収集する装置、例えば、スマートスピーカー２００Ｂを第２装置に決定する。このため、第１ユーザーと会話する第２ユーザーが発声した音声を収集するスマートスピーカー２００Ａを容易に決定することができる。 Further, MFP 100 according to the second embodiment further detects the user identification information of any one of a plurality of pre-registered users from the voice identifying the first user. A user identified by the information is determined as a second user, and a device for collecting voices uttered by the second user, for example, smart speaker 200B, is determined as the second device among smart speakers 200, 200A, 200B, and 200C. Therefore, it is possible to easily determine the smart speaker 200A that collects the voice uttered by the second user who converses with the first user.

また、第２の実施の形態におけるＭＦＰ１００は、第２ユーザーが発声した音声を収集する音声収集装置が複数の場合、例えば、スマートスピーカー２００Ｂ，２００Ｃの場合、スマートスピーカー２００Ｂ，２００Ｃのうちで収集される音声の音量が最大の装置を第２装置に決定する。このため、音声認識の精度を高めることができる。 Further, in the case where there are a plurality of voice collection devices that collect voice uttered by the second user, for example, smart speakers 200B and 200C, MFP 100 in the second embodiment can The second device is determined as the device with the highest volume of the voice that is heard. Therefore, the accuracy of voice recognition can be improved.

また、第２の実施の形態におけるＭＦＰ１００は、第２装置で収集された音声が許諾の内容を示す場合、依頼ジョブを生成するので、第１ユーザーによる依頼を第２ユーザーが受けない場合にジョブを生成しないようにすることができる。 In addition, the MFP 100 according to the second embodiment generates a requested job when the voice collected by the second device indicates the content of the license. can be prevented from generating

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上述した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be considered that the embodiments disclosed this time are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the scope of the claims rather than the above description, and is intended to include all modifications within the meaning and range of equivalents of the scope of the claims.

＜付記＞
（１）複数の前記音声収集装置それぞれから、前記音声収集装置により収集された音声を発声したユーザーを識別するためのユーザー識別情報と前記音声収集装置により収集された音声を認識して得られる音声情報とを取得する音声情報取得手段を、さらに備え、
前記ユーザー特定手段は、前記音声情報取得手段により複数の前記音声収集装置それぞれから取得される前記ユーザー識別情報に基づいてユーザーを特定する、請求項７に記載のジョブ生成装置。 <Appendix>
(1) User identification information for identifying the user who uttered the voice collected by the voice collecting device and voice obtained by recognizing the voice collected by the voice collecting device from each of the plurality of voice collecting devices further comprising voice information acquisition means for acquiring information,
8. The job generating apparatus according to claim 7, wherein said user specifying means specifies a user based on said user identification information acquired from each of said plurality of voice collecting devices by said voice information acquiring means.

１音声処理システム、１００ＭＦＰ、２００，２００Ａ，２００Ｂ，２００Ｃスマートスピーカー、４００サーバー、３ネットワーク、５インターネット、７ゲートウェイ装置、１１０メイン回路、１１１ＣＰＵ、１１２通信Ｉ／Ｆ部、１１３ＲＯＭ、１１４ＲＡＭ、１１５ＨＤＤ、１１６ファクシミリ部、１１７外部記憶装置、１１８、ＣＤ－ＲＯＭ、１２０自動原稿搬送装置、１３０原稿読取部、１４０画像形成部、１５０給紙部、１５５後処理部、１６０操作パネル、１６１表示部、１６３操作部、１６５タッチパネル、１６７ハードキー部、２０１ＣＰＵ、２０２ＲＯＭ、２０３ＲＡＭ、２０４ＥＰＲＯＭ、２０５通信部、２０６表示部、２０７操作部、２０８マイクロフォン、２０９スピーカー、２１０シリアルインターフェース、５１操作ユーザー特定部、５３設定部、５５ジョブ制御部、５７依頼者通知部、７１音声情報取得部、７３ジョブ生成部、８１ジョブ受信部、８１Ａジョブ受付部、８３関連付部、８５ジョブ実行部、２５１音声受付部、２５３音声認識部、２５５ユーザー特定部、２５７ジョブ生成部、２５９ジョブ送信部、２６１応答部、２６３通話者決定部、２６５装置決定部、２６７音声情報取得部、２７１処理決定部、２７３通常ジョブ生成部、２７５依頼ジョブ生成部、２７７キーワード抽出部、２７９通話者決定部、２８１操作ユーザー通知部、２９１音声情報送信部。 1 voice processing system, 100 MFP, 200, 200A, 200B, 200C smart speaker, 400 server, 3 network, 5 Internet, 7 gateway device, 110 main circuit, 111 CPU, 112 communication I/F section, 113 ROM, 114 RAM , 115 HDD, 116 facsimile section, 117 external storage device, 118, CD-ROM, 120 automatic document feeder, 130 document reading section, 140 image forming section, 150 paper feeding section, 155 post-processing section, 160 operation panel, 161 display unit 163 operation unit 165 touch panel 167 hard key unit 201 CPU 202 ROM 203 RAM 204 EPROM 205 communication unit 206 display unit 207 operation unit 208 microphone 209 speaker 210 serial interface 51 Operating user identification unit 53 setting unit 55 job control unit 57 requester notification unit 71 voice information acquisition unit 73 job generation unit 81 job reception unit 81A job reception unit 83 association unit 85 job execution unit , 251 voice reception unit, 253 voice recognition unit, 255 user identification unit, 257 job generation unit, 259 job transmission unit, 261 response unit, 263 caller determination unit, 265 device determination unit, 267 voice information acquisition unit, 271 process determination 273 normal job generation unit 275 request job generation unit 277 keyword extraction unit 279 caller determination unit 281 operation user notification unit 291 voice information transmission unit.

Claims

a plurality of audio collection devices for collecting audio;
a job generation device that generates a job to be executed by the image processing device;
any one of the plurality of voice collection devices and the job generation device,
Equipped with user identification means for identifying the user who made the utterance,
The job generation device is a voice collected by a first device out of the plurality of voice collection devices and the first user is specified by the user specifying means and the first user out of the plurality of voice collection devices. Said first user requests said second user to execute, based on voice collected by a second device different from said device and for which a second user different from said first user is specified by said user specifying means A voice processing system comprising job generating means for generating the requested job as the job.

The job generation device further
A voice identifying the first user in response to detection of user identification information for identifying one of a plurality of pre-registered users from the voice identifying the first user by the user identifying means. The user identified by the user identification information detected from is determined to be the second user, and the voice collection device that collects the voice uttered by the second user among the other one or more voice collection devices is selected as the 2. The audio processing system of claim 1, further comprising device determination means for determining the second device.

When there are a plurality of the voice collecting devices that collect the voice uttered by the second user, the device determining means determines the voice collected from among the plurality of voice collecting devices that collect the voice uttered by the second user. 3. The audio processing system of claim 2, wherein the second device is determined as the audio collection device with the highest volume of .

4. The voice processing system according to claim 2, wherein said job generating means generates said requested job when the voice collected by said second device indicates the content of a license.

5. The speech processing system according to any one of claims 1 to 4, wherein said job generation device is one of said plurality of speech collection devices.

5. The voice processing system according to claim 1, wherein said job generation device is said image processing device.

A job generation device for generating a job to be executed by an image processing device,
user identification means for identifying the user who has uttered the voice;
Sound collected by a first device out of a plurality of sound collecting devices, the sound identifying the first user by the user identifying means, and a second device out of the plurality of sound collecting devices different from the first device and a voice that identifies a second user different from the first user by the user identifying means, the requested job requested by the second user to be executed by the first user based on the voice collected in and a job generating means for generating a job.

The first user is identified in response to detection of user identification information for identifying one of a plurality of pre-registered users from the voice identifying the first user by the user identifying means. The user identified by the user identification information detected from the voice is determined as the second user, and the voice collecting device that collects the voice uttered by the second user among the plurality of voice collecting devices is selected as the second user. 8. The job generating apparatus according to claim 7, further comprising a device determining means for determining the device.

When there are a plurality of the voice collecting devices that collect the voice uttered by the second user, the device determining means determines the number of voices to be collected among the plurality of voice collecting devices that collect the voice uttered by the second user. 9. The job generating apparatus according to claim 8, wherein said second device is determined as said voice collecting device having the highest volume.

10. The job generation device according to claim 7, wherein said job generation means generates said requested job when the voice collected by said second device indicates the content of the permission.

a plurality of audio collection devices for collecting audio;
A job control method executed in an audio processing system comprising a job generation device for generating a job to be executed by an image processing device,
causing any one of the plurality of voice collection devices and the job generation devices to execute a user identification step of identifying the user who made the utterance;
voice collected by a first device out of the plurality of voice collecting devices, the voice identifying the first user in the user identifying step; The requested job requested by the second user to be executed by the first user is determined based on the voice collected by the device and the voice identifying the second user different from the first user in the user identification step. A job generation method for causing the job generation device to execute a job generation step for generating the job.

A job generation method executed by a job generation device for generating a job to be executed by an image processing device, comprising:
a user identification step of identifying the user who uttered the voice;
voice collected by a first device out of a plurality of voice collecting devices, the voice identifying the first user in the user identifying step; A requested job requested by the first user to be executed by the second user based on the voice collected by the device and the voice by which the second user different from the first user is specified in the user specifying step as the job; and a job generating method for causing a job generating device to execute the above.

A job generation program executed by a computer that controls a job generation device that generates a job to be executed by an image processing device,
a user identification step of identifying the user who uttered the voice;
voice collected by a first device out of a plurality of voice collecting devices, the voice identifying the first user in the user identifying step; A requested job requested by the first user to be executed by the second user based on the voice collected by the device and the voice by which the second user different from the first user is specified in the user specifying step as the job; and a job generation program for causing the computer to execute a job generation step of generating as the job.