JP7151428B2

JP7151428B2 - Information processing system, program and information processing method

Info

Publication number: JP7151428B2
Application number: JP2018226145A
Authority: JP
Inventors: 隆之井上; 駿吉見; 基至勝又; 裕中村; かおり大関
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2022-10-12
Anticipated expiration: 2038-11-30
Also published as: JP2020088830A

Description

本発明は、情報処理システム、プログラムおよび情報処理方法に関する。 The present invention relates to an information processing system, a program, and an information processing method.

従来、複合機（ＭＦＰ：Multifunction Peripheral）などの画像形成装置に対する指示を音声にて行う操作方法が知られている。例えば、特許文献１には、音声にて操作可能な画像形成装置が開示されている。 2. Description of the Related Art Conventionally, there is known an operation method of giving instructions to an image forming apparatus such as a multifunction peripheral (MFP) by voice. For example, Patent Document 1 discloses an image forming apparatus that can be operated by voice.

しかしながら、外部装置に対する指示を音声にて行う操作方法において対話型を採用する場合、設定が複雑である程、ジョブ実行までに時間がかかることが問題であった。 However, in the case of adopting an interactive operation method for giving instructions to an external device by voice, there is a problem that the more complicated the setting is, the more time it takes to execute a job.

本発明は、上記に鑑みてなされたものであって、直感的に、設定確認のためのフィードバックやジョブ設定を行うことができる情報処理システム、プログラムおよび情報処理方法を提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an information processing system, a program, and an information processing method that enable intuitive feedback for setting confirmation and job setting. .

上述した課題を解決し、目的を達成するために、本発明は、情報処理装置及び外部装置を含む情報処理システムにおいて、前記外部装置を操作するための設定指示を含む音声情報を取得する取得部と、前記音声情報を認識する音声認識部と、前記音声認識部による前記音声情報の認識結果に基づく操作情報を前記情報処理装置の画面にて報知する報知部と、前記操作情報を前記外部装置に出力する出力部と、を備え、前記報知部は、前記操作情報にかかる設定に基づく仕上がりイメージを示す仕上がり画像を前記情報処理装置の画面に表示し、前記仕上がり画像に対する所定の操作があった場合、該仕上がり画像とともに対応する設定を記憶する、ことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides an information processing system including an information processing device and an external device, an acquisition unit that acquires voice information including setting instructions for operating the external device. a voice recognition unit for recognizing the voice information; a reporting unit for reporting operation information based on the result of recognition of the voice information by the voice recognition unit on the screen of the information processing device; and an output unit for outputting to, the reporting unit displays a finished image showing a finished image based on the setting related to the operation information on the screen of the information processing device, and there is a predetermined operation on the finished image. In this case, the corresponding setting is stored together with the finished image .

本発明によれば、設定に基づく仕上がりイメージを示す仕上がり画像を表示することで、直感的に、設定確認のためのフィードバックやジョブ設定を行うことができる、という効果を奏する。 According to the present invention, by displaying a finished image showing a finished image based on settings, it is possible to intuitively perform feedback for setting confirmation and job setting.

図１は、第１の実施の形態の音声操作システムのシステム構成図である。FIG. 1 is a system configuration diagram of a voice operation system according to the first embodiment. 図２は、ＭＦＰのハードウェア構成図である。FIG. 2 is a hardware configuration diagram of the MFP. 図３は、携帯端末装置のハードウェア構成図である。FIG. 3 is a hardware configuration diagram of the mobile terminal device. 図４は、音声認識サーバ装置のハードウェア構成図である。FIG. 4 is a hardware configuration diagram of the speech recognition server device. 図５は、ＡＩアシスタントサーバ装置のハードウェア構成図である。FIG. 5 is a hardware configuration diagram of the AI assistant server device. 図６は、携帯端末装置の機能ブロック図である。FIG. 6 is a functional block diagram of the mobile terminal device. 図７は、音声認識サーバ装置の機能ブロック図である。FIG. 7 is a functional block diagram of the speech recognition server device. 図８は、ＡＩアシスタントサーバ装置の機能ブロック図である。FIG. 8 is a functional block diagram of the AI assistant server device. 図９は、音声操作システムにおける音声操作の全体的な動作の流れを示すシーケンス図である。FIG. 9 is a sequence diagram showing the overall operation flow of voice manipulation in the voice manipulation system. 図１０は、ユーザの入力音声の解釈に用いられるエンティティ情報の一例を示す図である。FIG. 10 is a diagram showing an example of entity information used to interpret the user's input speech. 図１１は、発話フレーズに基づいて登録されるエンティティ情報を示す図である。FIG. 11 is a diagram showing entity information registered based on uttered phrases. 図１２は、対話式の入力操作の流れを示す図である。FIG. 12 is a diagram showing the flow of interactive input operations. 図１３は、図１２に示す処理が実行された場合における画面表示の一例を示す図である。FIG. 13 is a diagram showing an example of screen display when the process shown in FIG. 12 is executed. 図１４は、対話式の入力操作の前半の流れを示すシーケンス図である。FIG. 14 is a sequence diagram showing the flow of the first half of the interactive input operation. 図１５は、対話式の入力操作の後半の流れを示すシーケンス図である。FIG. 15 is a sequence diagram showing the latter half of the interactive input operation. 図１６は、スタンプの記憶態様を例示的に示す図である。FIG. 16 is a diagram exemplifying how stamps are stored. 図１７は、スタンプの変形例を示す図である。FIG. 17 is a diagram showing a modification of the stamp. 図１８は、第２の実施の形態の音声操作システムのシステム構成図である。FIG. 18 is a system configuration diagram of a voice operation system according to the second embodiment. 図１９は、スマートスピーカーのハードウェア構成図である。FIG. 19 is a hardware configuration diagram of a smart speaker. 図２０は、クラウドサービス装置のハードウェア構成図である。FIG. 20 is a hardware configuration diagram of a cloud service device. 図２１は、クラウドの全体の機能を示す概要説明図である。FIG. 21 is a schematic explanatory diagram showing the overall functions of the cloud. 図２２は、スマートスピーカーの機能ブロックの構成の一例を示す図である。FIG. 22 is a diagram illustrating an example of a functional block configuration of a smart speaker. 図２３は、クラウドサービスの各機能の構成の一例を示す図である。FIG. 23 is a diagram showing an example of the configuration of each function of the cloud service. 図２４は、起動時の動作の流れを示すシーケンス図である。FIG. 24 is a sequence diagram showing the flow of operations at startup. 図２５は、起動後の対話型動作の流れを示すシーケンス図である。FIG. 25 is a sequence diagram showing the flow of interactive operations after activation. 図２６は、起動後の対話型動作の流れを示すシーケンス図である。FIG. 26 is a sequence diagram showing the flow of interactive operations after activation. 図２７は、起動後の対話型動作の流れを示すシーケンス図である。FIG. 27 is a sequence diagram showing the flow of interactive operations after activation. 図２８は、画面表示の一例を示す図である。FIG. 28 is a diagram showing an example of screen display.

以下に添付図面を参照して、情報処理システム、プログラムおよび情報処理方法の実施の形態を詳細に説明する。 Exemplary embodiments of an information processing system, a program, and an information processing method will be described in detail below with reference to the accompanying drawings.

（第１の実施の形態）
（システム構成）
図１は、第１の実施の形態の音声操作システムのシステム構成図である。この図１に示すように、情報処理システムである第１の実施の形態の音声操作システムは、外部装置の一例である複合機（ＭＦＰ：Multifunction Peripheral）１、スマートフォン又はタブレット端末等の携帯端末装置２（情報処理装置の一例）、音声認識サーバ装置３及びＡＩ（Artificial intelligence）アシスタントサーバ装置４を、例えばＬＡＮ（Local Area Network）等の所定のネットワーク５を介して相互に接続することで形成されている。ただし、外部装置は複合機には限定されず、電子黒板やプロジェクタなどのオフィス機器を含む、種々の電子機器であっても良い。 (First embodiment)
(System configuration)
FIG. 1 is a system configuration diagram of a voice operation system according to the first embodiment. As shown in FIG. 1, the voice operation system of the first embodiment, which is an information processing system, includes a multifunction peripheral (MFP) 1, which is an example of an external device, a mobile terminal device such as a smartphone or a tablet terminal. 2 (an example of an information processing device), a voice recognition server device 3 and an AI (artificial intelligence) assistant server device 4 are formed by interconnecting them via a predetermined network 5 such as a LAN (Local Area Network). ing. However, the external device is not limited to the multifunction machine, and may be various electronic devices including office equipment such as an electronic blackboard and a projector.

携帯端末装置２は、ＭＦＰ１を音声操作するための、ユーザからの音声入力を受け付ける。また、受け付けた操作を、ユーザへ音声または画面表示でフィードバックする。また、携帯端末装置２は、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４の間のデータ通信（後述するテキストデータの通信）を中継する。音声認識サーバ装置３は、携帯端末装置２から受信した音声データを分析し、テキストデータに変換する。なお、音声認識サーバ装置３は、第１サーバ装置に相当する。ＡＩアシスタントサーバ装置４は、テキストデータを分析し、事前登録されているユーザの意図（ＭＦＰ１のジョブ実行命令）に変換して携帯端末装置２に送信する。 The mobile terminal device 2 accepts voice input from the user for operating the MFP 1 by voice. Also, the received operation is fed back to the user by voice or screen display. The mobile terminal device 2 also relays data communication (communication of text data to be described later) between the voice recognition server device 3 and the AI assistant server device 4 . The speech recognition server device 3 analyzes the speech data received from the mobile terminal device 2 and converts it into text data. Note that the voice recognition server device 3 corresponds to a first server device. The AI assistant server device 4 analyzes the text data, converts it into the pre-registered user's intention (job execution command of the MFP 1 ), and transmits it to the portable terminal device 2 .

なお、ＡＩアシスタントサーバ装置４は、第２サーバ装置に相当する。ＭＦＰ１は、携帯端末装置２から送信されるジョブ実行命令を実行する。なお、携帯端末装置２とＭＦＰ１との通信は、無線通信であっても有線通信であっても良い。つまり、携帯端末装置２は、ＭＦＰ１に固定的に接続される操作端末であっても良い。 Note that the AI assistant server device 4 corresponds to a second server device. MFP 1 executes a job execution command transmitted from portable terminal device 2 . Communication between the mobile terminal device 2 and the MFP 1 may be wireless communication or wired communication. In other words, the mobile terminal device 2 may be an operation terminal fixedly connected to the MFP 1 .

また、この例では、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４の、２つのサーバ装置が設けられていることとしたが、各サーバ装置３、４を物理的に一つのサーバ装置としてもよい。または、各サーバ装置３、４を、さらに複数のサーバ装置で実現してもよい。 Also, in this example, two server devices, the voice recognition server device 3 and the AI assistant server device 4, are provided, but each of the server devices 3 and 4 may be physically one server device. . Alternatively, each of the server devices 3 and 4 may be further implemented by a plurality of server devices.

（ＭＦＰのハードウェア構成）
図２は、音声操作システムに設けられているＭＦＰ１のハードウェア構成図である。ＭＦＰ１は、例えばプリンタ機能及びスキャナ機能等の複数の機能を備えている。すなわち、ＭＦＰ１は、図２に示すように、コントローラ１９、通信部１５、操作部１６、スキャナエンジン１７及びプリンタエンジン１８を有している。 (MFP hardware configuration)
FIG. 2 is a hardware configuration diagram of the MFP 1 provided in the voice operation system. The MFP 1 has multiple functions such as a printer function and a scanner function. That is, the MFP 1 has a controller 19, a communication section 15, an operation section 16, a scanner engine 17 and a printer engine 18, as shown in FIG.

コントローラ１９は、ＣＰＵ１０、ＡＳＩＣ（Application Specific Integrated Circuit）１１、メモリ１２、ＨＤＤ（Hard Disk Drive）１３及びタイマ１４を有する。ＣＰＵ１０～タイマ１４は、それぞれバスラインを介して通信可能なように相互に接続されている。 The controller 19 has a CPU 10 , an ASIC (Application Specific Integrated Circuit) 11 , a memory 12 , a HDD (Hard Disk Drive) 13 and a timer 14 . The CPU 10 to timer 14 are interconnected so as to be communicable via a bus line.

通信部１５は、ネットワーク５に接続されており、後述するように、例えばスキャン指示又は印刷指示等の、携帯端末装置２を用いて音声入力されたジョブ実行命令を取得する。 The communication unit 15 is connected to the network 5, and acquires a job execution command, such as a scan instruction or a print instruction, input by voice using the portable terminal device 2, as described later.

操作部１６は、液晶表示部（ＬＣＤ：Liquid Crystal Display）とタッチセンサとが一体的に形成された、いわゆるタッチパネルとなっている。操作者は、操作部１６を用いて所望の動作の実行命令を行う場合、操作部１６に表示された操作ボタン（ソフトウェアキー）を接触操作することで、所望の動作を指定する。 The operation unit 16 is a so-called touch panel in which a liquid crystal display (LCD) and a touch sensor are integrally formed. When the operator uses the operation unit 16 to issue a command to execute a desired operation, the operator touches an operation button (software key) displayed on the operation unit 16 to specify the desired operation.

スキャナエンジン１７は、スキャナユニットを制御して、光学的に原稿の読み取りを行う。プリンタエンジン１８は、画像書き込みユニットを制御して、例えば転写紙等に画像を印刷する。ＣＰＵ１０は、画像形成装置を統括的に制御する。ＡＳＩＣ１１は、いわゆる大規模集積回路（ＬＳＩ：Large-Scale Integration）となっており、スキャナエンジン１７及びプリンタエンジン１８で処理する画像に必要な各種の画像処理等を行う。携帯端末装置２から取得したジョブ実行命令を実行する手段であるスキャナエンジン１７及びプリンタエンジン１８は、機能部に相当する。 The scanner engine 17 controls the scanner unit to optically read the document. The printer engine 18 controls the image writing unit to print an image on transfer paper or the like. The CPU 10 comprehensively controls the image forming apparatus. The ASIC 11 is a so-called large-scale integrated circuit (LSI: Large-Scale Integration), and performs various image processing necessary for images processed by the scanner engine 17 and the printer engine 18 . The scanner engine 17 and printer engine 18, which are means for executing a job execution command acquired from the mobile terminal device 2, correspond to functional units.

メモリ１２は、ＣＰＵ１０が実行する各種アプリケーション及びアプリケーションを実行する際に用いられる種々のデータが記憶されている。ＨＤＤ１３は、画像データ、各種のプログラム、フォントデータ、及び、各種のファイル等を記憶する。なお、ＨＤＤ１３の代わり又はＨＤＤ１３と共に、ＳＳＤ（Solid State Drive）を設けてもよい。 The memory 12 stores various applications executed by the CPU 10 and various data used when executing the applications. The HDD 13 stores image data, various programs, font data, various files, and the like. An SSD (Solid State Drive) may be provided instead of the HDD 13 or together with the HDD 13 .

（携帯端末のハードウェア構成）
図３は、音声操作システムに設けられている携帯端末装置２のハードウェア構成図である。携帯端末装置２は、図３に示すようにＣＰＵ２１、ＲＡＭ２２、不揮発性のＲＯＭ２３、インタフェース部（Ｉ／Ｆ部）２４及び通信部２５を、バスライン２６を介して相互に接続して形成されている。ＲＡＭ２２には、電子メール、スキャンした画像等の宛先となるユーザの電子メールアドレス等が記憶されたアドレス帳が記憶されている。また、ＲＡＭ２２には、印刷を行う画像データであるファイル等が記憶されている。 (Hardware configuration of mobile terminal)
FIG. 3 is a hardware configuration diagram of the mobile terminal device 2 provided in the voice operation system. The mobile terminal device 2 is formed by connecting a CPU 21, a RAM 22, a nonvolatile ROM 23, an interface section (I/F section) 24, and a communication section 25 to each other via a bus line 26 as shown in FIG. there is The RAM 22 stores an address book that stores e-mail addresses of users to whom e-mails, scanned images, etc. are sent. The RAM 22 also stores files and the like that are image data to be printed.

ＲＯＭ２３には、操作音声処理プログラムが記憶されている。ＣＰＵ２１は、この操作音声処理プログラムを実行することで、ＭＦＰ１の音声入力操作を可能とする。 The ROM 23 stores an operation voice processing program. The CPU 21 enables the voice input operation of the MFP 1 by executing this operation voice processing program.

Ｉ／Ｆ部２４には、タッチパネル２７、スピーカ部２８及びマイクロホン部２９が接続されている。マイクロホン部２９は、通話音声の他、ＭＦＰ１に対するジョブの実行命令の入力音声を集音（取得）する。入力音声は、通信部２５を介して音声認識サーバ装置３に送信され、テキストデータに変換される。 A touch panel 27 , a speaker section 28 and a microphone section 29 are connected to the I/F section 24 . The microphone unit 29 collects (obtains) the input voice of a job execution command to the MFP 1 in addition to the call voice. The input speech is transmitted to the speech recognition server device 3 via the communication unit 25 and converted into text data.

（音声認識サーバ装置のハードウェア構成）
図４は、音声操作システムに設けられている音声認識サーバ装置３のハードウェア構成図である。音声認識サーバ装置３は、図４に示すように、ＣＰＵ３１、ＲＡＭ３２、ＲＯＭ３３、ＨＤＤ（Hard Disk Drive）３４、インタフェース部（Ｉ／Ｆ部）３５及び通信部３６を、バスライン３７を介して相互に接続して形成されている。Ｉ／Ｆ部３５には、表示部３８及び操作部３９が接続されている。ＨＤＤ３４には、音声データをテキストデータに変換するための操作音声変換プログラムが記憶されている。ＣＰＵ３１は、操作音声変換プログラムを実行することで、携帯端末装置２から送信された音声データをテキストデータに変換して、携帯端末装置２に返信する。 (Hardware configuration of speech recognition server device)
FIG. 4 is a hardware configuration diagram of the voice recognition server device 3 provided in the voice operation system. As shown in FIG. 4, the speech recognition server device 3 connects a CPU 31, a RAM 32, a ROM 33, a HDD (Hard Disk Drive) 34, an interface section (I/F section) 35, and a communication section 36 to each other via a bus line 37. formed by connecting to A display unit 38 and an operation unit 39 are connected to the I/F unit 35 . The HDD 34 stores an operation voice conversion program for converting voice data into text data. By executing the operation voice conversion program, the CPU 31 converts the voice data transmitted from the mobile terminal device 2 into text data and sends the text data back to the mobile terminal device 2 .

（ＡＩアシスタントサーバ装置のハードウェア構成）
図５は、音声操作システムに設けられているＡＩアシスタントサーバ装置４のハードウェア構成図である。ＡＩアシスタントサーバ装置４は、図５に示すように、ＣＰＵ４１、ＲＡＭ４２、ＲＯＭ４３、ＨＤＤ４４、インタフェース部（Ｉ／Ｆ部）４５及び通信部４６を、バスライン４７を介して相互に接続して形成されている。Ｉ／Ｆ部４５には、表示部４８及び操作部４９が接続されている。ＨＤＤ４４には、ユーザから指示されたジョブを解釈するための操作解釈プログラムが記憶されている。ＣＰＵ４１は、この操作解釈プログラムを実行することで、音声認識サーバ装置３で生成（変換）されたテキストデータから、ユーザから指示されたジョブを解釈する。この解釈結果は、携帯端末装置２に送信される。携帯端末は、解釈結果をジョブ命令に変換してＭＦＰ１に供給する。これにより、携帯端末装置２を介して入力した音声により、ＭＦＰ１を操作できる。 (Hardware configuration of AI assistant server device)
FIG. 5 is a hardware configuration diagram of the AI assistant server device 4 provided in the voice operation system. The AI assistant server device 4 is formed by interconnecting a CPU 41, a RAM 42, a ROM 43, an HDD 44, an interface section (I/F section) 45, and a communication section 46 via a bus line 47, as shown in FIG. ing. A display unit 48 and an operation unit 49 are connected to the I/F unit 45 . The HDD 44 stores an operation interpretation program for interpreting a job designated by the user. By executing this operation interpretation program, the CPU 41 interprets the job instructed by the user from the text data generated (converted) by the speech recognition server device 3 . This interpretation result is transmitted to the mobile terminal device 2 . The mobile terminal converts the interpreted result into a job command and supplies it to the MFP1. As a result, the MFP 1 can be operated by voice input via the mobile terminal device 2 .

（携帯端末の機能）
図６は、音声操作システムに設けられている携帯端末装置２の機能ブロック図である。携帯端末装置２のＣＰＵ２１は、ＲＯＭ２３に記憶されている操作音声処理プログラムを実行することで、図６に示すように取得部５１、通信制御部５２、フィードバック部５５、処理能力取得部５６、実行判定部５７及び検索部５８として機能する。 (mobile terminal functions)
FIG. 6 is a functional block diagram of the mobile terminal device 2 provided in the voice operation system. By executing the operation voice processing program stored in the ROM 23, the CPU 21 of the mobile terminal device 2 executes an acquisition unit 51, a communication control unit 52, a feedback unit 55, a processing capacity acquisition unit 56, and an execution program as shown in FIG. It functions as a determination unit 57 and a search unit 58 .

取得部５１は、取得部の一例であり、マイクロホン部２９を介して集音された、ＭＦＰ１を音声操作するためのユーザの指示音声を取得する。通信制御部５２は、出力部の一例であり、携帯端末装置２とＭＦＰ１との間、携帯端末装置２と音声認識サーバ装置３との間、及び、携帯端末装置２とＡＩアシスタントサーバ装置４との間の通信をそれぞれ制御する。解釈結果変換部５３は、ＡＩアシスタントサーバ装置４でユーザの指示音声のテキストデータの解釈結果を、ＭＦＰ１に対するジョブの実行命令に変換する。実行指示部５４は、ジョブの実行命令をＭＦＰ１に送信して、ジョブの実行を指示する。 Acquisition unit 51 is an example of an acquisition unit, and acquires a user's instruction voice for voice operation of MFP 1 collected via microphone unit 29 . The communication control unit 52 is an example of an output unit, and is between the mobile terminal device 2 and the MFP 1, between the mobile terminal device 2 and the voice recognition server device 3, and between the mobile terminal device 2 and the AI assistant server device 4. each control the communication between The interpretation result conversion unit 53 converts the interpretation result of the text data of the instruction voice of the user in the AI assistant server device 4 into a job execution command for the MFP 1 . Execution instructing unit 54 transmits a job execution instruction to MFP 1 to instruct execution of the job.

フィードバック部５５は、報知部の一例であり、対話型の音声入力操作を実現すべく、例えば不足するデータを補う入力を促す音声または画面表示をフィードバックし、又は、入力を確認する音声または画面表示のフィードバック等を行う。処理能力取得部５６は、ＭＦＰ１から、例えば処理可能な最大画素数等の処理能力を取得する。実行判定部５７は、ＭＦＰ１の能力と、ユーザから指定されたジョブとを比較することで、ユーザから指定されたジョブをＭＦＰ１で実行可能か否か判定する。検索部５８は、ユーザから音声指示された宛先又はファイル等をＲＡＭ２２等のメモリから検索する。 The feedback unit 55 is an example of a notification unit, and in order to realize an interactive voice input operation, for example, feeds back voice or screen display prompting input to compensate for missing data, or voice or screen display for confirming input. feedback, etc. The processing capacity acquisition unit 56 acquires the processing capacity, such as the maximum number of pixels that can be processed, from the MFP 1 . Execution determination unit 57 determines whether or not MFP 1 can execute the job specified by the user by comparing the capability of MFP 1 with the job specified by the user. The search unit 58 searches a memory such as the RAM 22 for a destination, a file, or the like specified by the user's voice.

なお、この例では、取得部５１～検索部５８をソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、取得部５１～検索部５８が実現する機能は、操作音声処理プログラム単体で実現しても良いし、他のプログラムに処理の一部を実行させる、又は他のプログラムを用いて間接的に処理を実行させても良い。例えば、ＭＦＰ１の処理能力などの情報の取得は他のプログラムが実行し、処理能力取得部５６は他のプログラムが取得した情報を取得することで、間接的にＭＦＰ１が有する情報を取得することができる。 In this example, the acquiring unit 51 to the searching unit 58 are implemented by software, but some or all of them may be implemented by hardware such as an IC (Integrated Circuit). In addition, the functions realized by the acquisition unit 51 to the search unit 58 may be realized by the operation voice processing program alone, may be executed by another program to execute a part of the processing, or may be indirectly obtained by using another program. Processing may be executed. For example, acquisition of information such as the processing capacity of MFP 1 is executed by another program, and the processing capacity acquiring unit 56 acquires the information acquired by the other program, thereby indirectly acquiring information possessed by MFP 1 . can.

（音声認識サーバ装置の機能）
図７は、音声操作システムに設けられている音声認識サーバ装置３の機能ブロック図である。音声認識サーバ装置３のＣＰＵ３１は、ＨＤＤ３４に記憶されている操作音声変換プログラムを実行することで、図７に示すように取得部６１、テキスト変換部６２及び通信制御部６３として機能する。取得部６１は、携帯端末装置２から送信される、ユーザにより入力された音声データを取得する。テキスト変換部６２は、音声認識部の一例であり、ユーザにより入力された音声データをテキストデータに変換する。通信制御部６３は、ユーザにより入力された音声データの受信、及び、携帯端末装置２に対するテキストデータの送信等を行うように、通信部３６を通信制御する。 (Function of voice recognition server device)
FIG. 7 is a functional block diagram of the voice recognition server device 3 provided in the voice operation system. The CPU 31 of the speech recognition server device 3 functions as an acquisition unit 61, a text conversion unit 62, and a communication control unit 63 as shown in FIG. 7 by executing the operation speech conversion program stored in the HDD . The acquisition unit 61 acquires voice data input by the user, which is transmitted from the mobile terminal device 2 . The text conversion unit 62 is an example of a speech recognition unit, and converts voice data input by the user into text data. The communication control unit 63 performs communication control of the communication unit 36 so as to receive voice data input by the user, transmit text data to the mobile terminal device 2, and the like.

なお、この例では、取得部６１～通信制御部６３をソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、取得部６１～通信制御部６３が実現する機能は、操作音声変換プログラム単体で実現しても良いし、他のプログラムに処理の一部を実行させる、又は他のプログラムを用いて間接的に処理を実行させても良い。 In this example, the acquisition unit 61 to the communication control unit 63 are realized by software, but some or all of them may be realized by hardware such as an IC (Integrated Circuit). In addition, the functions realized by the acquisition unit 61 to the communication control unit 63 may be realized by the operation voice conversion program alone, by causing another program to execute part of the processing, or indirectly by using another program. You can let the .

（ＡＩアシスタントサーバ装置の機能）
図８は、音声操作システムに設けられているＡＩアシスタントサーバ装置４の機能ブロック図である。ＡＩアシスタントサーバ装置４のＣＰＵ４１は、ＨＤＤ４４に記憶されている操作解釈プログラムを実行することで、図８に示すように取得部７１、解釈部７２及び通信制御部７３として機能する。取得部７１は、携帯端末装置２から送信される、ユーザにより入力された音声データのテキストデータを取得する。解釈部７２は、テキストデータに基づいて、ユーザからの操作指示を解釈する。通信制御部７３は、ユーザの携帯端末装置２に対する解釈結果の送信、及び、ユーザにより入力された音声データのテキストデータの受信等を行うように通信部４６を通信制御する。 (Function of AI assistant server device)
FIG. 8 is a functional block diagram of the AI assistant server device 4 provided in the voice operation system. The CPU 41 of the AI assistant server device 4 functions as an acquisition unit 71, an interpretation unit 72, and a communication control unit 73 as shown in FIG. 8 by executing the operation interpretation program stored in the HDD 44. FIG. The acquisition unit 71 acquires text data of voice data input by the user, which is transmitted from the mobile terminal device 2 . The interpretation unit 72 interprets an operation instruction from the user based on the text data. The communication control unit 73 controls communication of the communication unit 46 so as to transmit interpretation results to the mobile terminal device 2 of the user and receive text data of voice data input by the user.

なお、この例では、取得部７１～通信制御部７３をソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、取得部７１～通信制御部７３が実現する機能は、操作解釈プログラム単体で実現しても良いし、他のプログラムに処理の一部を実行させる、又は他のプログラムを用いて間接的に処理を実行させても良い。 In this example, the acquisition unit 71 to the communication control unit 73 are implemented by software, but some or all of them may be implemented by hardware such as an IC (Integrated Circuit). In addition, the functions realized by the acquisition unit 71 to the communication control unit 73 may be realized by the operation interpretation program alone, or may be executed indirectly by using another program to execute a part of the processing by another program. Processing may be executed.

また、操作音声処理プログラム、操作音声変換プログラム及び操作解釈プログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ－ＲＯＭ、フレキシブルディスク（ＦＤ）などのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、ＣＤ－Ｒ、ＤＶＤ（Digital Versatile Disk）、ブルーレイディスク（登録商標）、半導体メモリなどのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、インターネット等のネットワーク経由でインストールするかたちで提供してもよいし、機器内のＲＯＭ等に予め組み込んで提供してもよい。 In addition, the operation voice processing program, the operation voice conversion program and the operation interpretation program are recorded in a computer-readable recording medium such as a CD-ROM or a flexible disk (FD) as files in an installable format or an executable format. may be provided as Alternatively, it may be provided by being recorded on a computer-readable recording medium such as a CD-R, a DVD (Digital Versatile Disk), a Blu-ray Disc (registered trademark), or a semiconductor memory. Moreover, it may be provided in a form of being installed via a network such as the Internet, or may be provided by being incorporated in advance in a ROM or the like in the device.

（音声入力操作の全体的な動作）
次に、実施の形態の音声操作システムにおける音声入力操作の全体的な動作を説明する。図９は、音声操作システムにおける音声操作の全体的な動作の流れを示すシーケンス図である。この図９の例は、ＭＦＰ１の両面コピー機能を、携帯端末装置２を介して音声入力操作する例である。この場合、ユーザは、携帯端末装置２の操作音声処理プログラムを起動し、例えば「両面でコピー」と発声する。このユーザの音声は、携帯端末装置２のマイクロホン部２９で集音され、取得部５１により取得される（ステップＳ１）。携帯端末装置２の通信制御部５２は、この「両面でコピー」との音声データを音声認識サーバ装置３に送信し、テキスト変換要求を行うように通信部２５を制御する（ステップＳ２）。 (Overall operation of voice input operation)
Next, the overall voice input operation in the voice operation system according to the embodiment will be described. FIG. 9 is a sequence diagram showing the overall operation flow of voice manipulation in the voice manipulation system. The example in FIG. 9 is an example in which the double-sided copy function of the MFP 1 is operated by voice input via the mobile terminal device 2 . In this case, the user activates the operation voice processing program of the mobile terminal device 2 and utters, for example, "copy on both sides". The user's voice is collected by the microphone unit 29 of the mobile terminal device 2 and acquired by the acquisition unit 51 (step S1). The communication control section 52 of the mobile terminal device 2 transmits the voice data of "copy on both sides" to the voice recognition server device 3, and controls the communication section 25 to request text conversion (step S2).

音声認識サーバ装置３のテキスト変換部６２は、「両面でコピー」との音声データをテキストデータに変換処理する。そして、通信制御部６３は、変換処理されたテキストデータを、携帯端末装置２に送信するように通信部３６を制御する（ステップＳ３）。携帯端末装置２の通信制御部５２は、「両面でコピー」とのテキストデータを、ＡＩアシスタントサーバ装置４に送信する（ステップＳ４）。 The text conversion unit 62 of the speech recognition server device 3 converts the speech data of "copy on both sides" into text data. Then, the communication control section 63 controls the communication section 36 to transmit the converted text data to the mobile terminal device 2 (step S3). The communication control unit 52 of the mobile terminal device 2 transmits the text data "copy on both sides" to the AI assistant server device 4 (step S4).

この例の場合、ＡＩアシスタントサーバ装置４の解釈部７２は、ＭＦＰ１に実行を要求する動作は「コピーである（Action:Copy_Execcute）」と解釈し、「印刷面は両面である（印刷面＝両面）」と解釈する（ステップＳ５）。このように、解釈部７２は、テキストデータに基づいて、ユーザから指定されたジョブの種別（アクション）及び内容（パラメータ）を示す解釈結果を生成する。この解釈結果は、ＡＩアシスタントサーバ装置４の通信制御部６３により、通信部４６を介して携帯端末装置２に送信される（ステップＳ６）。 In this example, the interpreting unit 72 of the AI assistant server device 4 interprets the action requested to the MFP 1 to be "copy (Action: Copy_Execute)", and interprets that "the printing surface is double-sided (printing surface = double-sided )” (step S5). In this manner, the interpretation unit 72 generates an interpretation result indicating the type (action) and content (parameters) of the job specified by the user, based on the text data. This interpretation result is transmitted to the portable terminal device 2 via the communication section 46 by the communication control section 63 of the AI assistant server device 4 (step S6).

携帯端末装置２の解釈結果変換部５３は、ＡＩアシスタントサーバ装置４から受信した解釈結果を、ＭＦＰ１のジョブ命令に変換処理する（ステップＳ７）。以下の表１に、解釈結果と、解釈結果から変換処理されたジョブ命令の一例を示す。なお、解釈結果変換部５３は、解釈結果をジョブ命令に変換するために、表１に相当する情報を携帯端末装置２の記憶部（ＲＯＭ２３）に記憶し、参照できる構成としても良い。 Interpretation result conversion unit 53 of mobile terminal device 2 converts the interpretation result received from AI assistant server device 4 into a job command for MFP 1 (step S7). Table 1 below shows an example of an interpretation result and a job command converted from the interpretation result. Note that the interpretation result conversion unit 53 may store information corresponding to Table 1 in the storage unit (ROM 23) of the mobile terminal device 2 and refer to it in order to convert the interpretation result into a job command.

この表１の例の場合、「ＣＯＰＹ＿ＥＸＥＣＵＴＥ」、「ＳＣＡＮ＿ＥＸＥＣＵＴＥ」、「ＰＲＩＮＴ＿ＥＸＥＣＵＴＥ」及び「ＦＡＸ＿ＥＸＥＣＵＴＥ」が、アクション（Ａｃｔｉｏｎ）の一例として示されている。また、「印刷面」及び「部数」等がパラメータ（Ｐａｒａｍｅｔｅｒ）の一例として示されている。なお、パラメータとしては、ジョブの設定値として指定可能な全てのパラメータが含まれる。 In the example of Table 1, "COPY_EXECUTE", "SCAN_EXECUTE", "PRINT_EXECUTE" and "FAX_EXECUTE" are shown as examples of actions. Also, "printing surface", "number of copies", etc. are shown as examples of parameters. The parameters include all parameters that can be specified as job setting values.

携帯端末装置２の解釈結果変換部５３は、「ＣＯＰＹ＿ＥＸＥＣＵＴＥ」の解釈結果を、「コピージョブの実行」との、ＭＦＰ１のジョブ命令に変換処理する。同様に、解釈結果変換部５３は、「ＳＣＡＮ＿ＥＸＥＣＵＴＥ」の解釈結果を、「スキャンジョブの実行」との、ＭＦＰ１のジョブ命令に変換処理する。同様に、解釈結果変換部５３は、「ＰＲＩＮＴ＿ＥＸＥＣＵＴＥ」の解釈結果を、「印刷ジョブの実行」との、ＭＦＰ１のジョブ命令に変換処理する。同様に、解釈結果変換部５３は、「ＦＡＸ＿ＥＸＥＣＵＴＥ」の解釈結果を、「ＦＡＸジョブの実行」との、ＭＦＰ１のジョブ命令に変換処理する。 The interpretation result conversion unit 53 of the portable terminal device 2 converts the interpretation result of "COPY_EXECUTE" into a job command of the MFP 1 "execute copy job". Similarly, interpretation result conversion unit 53 converts the interpretation result of "SCAN_EXECUTE" into a job command of MFP 1, "execute scan job". Similarly, interpretation result conversion unit 53 converts the interpretation result of "PRINT_EXECUTE" into a job command of MFP 1 "execute print job". Similarly, interpretation result conversion unit 53 converts the interpretation result of "FAX_EXECUTE" into a job command of MFP 1, "execute FAX job".

また、携帯端末装置２の解釈結果変換部５３は、解釈結果に「印刷面」のパラメータが含まれている場合、「印刷面の設定値の変更」を行うＭＦＰ１のジョブ命令を形成する。同様に、解釈結果変換部５３は、解釈結果に「部数」のパラメータが含まれている場合、「部数の設定値の変更」を行うＭＦＰ１のジョブ命令を形成する。 Further, when the interpretation result includes the parameter of "printing surface", the interpretation result conversion unit 53 of the portable terminal device 2 forms a job command for the MFP 1 to "change the setting value of the printing surface". Similarly, when the interpretation result includes a parameter of "number of copies", the interpretation result conversion unit 53 forms a job command for the MFP 1 to "change the set value of the number of copies".

すなわち、携帯端末装置２の解釈結果変換部５３は、解釈結果の「Ａｃｔｉｏｎ」に含まれる情報で、ＭＦＰ１に実行させるジョブの種類を判断し、「Ｐａｒａｍｅｔｅｒ」に含まれる値を、ジョブの設定値と判断して、解釈結果をジョブ命令に変換処理する。 That is, the interpretation result conversion unit 53 of the portable terminal device 2 determines the type of job to be executed by the MFP 1 based on the information included in the "Action" of the interpretation result, and converts the value included in the "Parameter" to the job setting value. Then, the interpreted result is converted into a job instruction.

携帯端末装置２の通信制御部５２は、このようにして形成されたジョブ命令を、ＭＦＰ１に送信するように通信部２５を制御する（ステップＳ８）。この例の場合、「コピージョブ実行（印刷面＝両面）」のジョブ命令がＭＦＰ１に送信される。これにより、ＭＦＰ１で両面印刷が実行される。 The communication control unit 52 of the mobile terminal device 2 controls the communication unit 25 so as to transmit the job command thus formed to the MFP 1 (step S8). In the case of this example, a job command of "execute copy job (print side=both sides)" is transmitted to MFP1. As a result, MFP 1 executes double-sided printing.

（ＡＩアシスタントサーバ装置における解釈動作の詳細）
ＡＩアシスタントサーバ装置４のＨＤＤ４４のＡＩ記憶部４０には、ユーザが音声入力により指示しているジョブを解釈するためＡＩアシスタントサービス情報が記憶されている。このＡＩアシスタントサービス情報は、エンティティ（Ｅｎｔｉｔｙ）情報、アクション（Ａｃｔｉｏｎ）情報及びインテント（Ｉｎｔｅｎｔ）情報を含んで構成されている。エンティティ情報は、ジョブのパラメータと自然言語を関連付ける情報である。１つのパラメータに複数の類義語が登録可能となっている。アクション情報は、ジョブの種類を示す情報である。インテント情報は、ユーザの発話フレーズ（自然言語）とエンティティ情報、及び、ユーザの発話フレーズ（自然言語）とアクション情報を、それぞれ関連付ける情報である。インテント情報により、パラメータの発話順序又はニュアンスが多少変わっても、正しい解釈が可能となっている。また、インデント情報により、入力された内容に基づいてレスポンスのテキスト（解釈結果）を生成可能となっている。 (Details of the interpretation operation in the AI assistant server device)
The AI storage unit 40 of the HDD 44 of the AI assistant server device 4 stores AI assistant service information for interpreting a job instructed by a user's voice input. This AI assistant service information includes entity information, action information, and intent information. Entity information is information that associates job parameters with natural language. Multiple synonyms can be registered for one parameter. The action information is information indicating the type of job. The intent information is information that associates a user's uttered phrase (natural language) with entity information, and a user's uttered phrase (natural language) with action information, respectively. The intent information enables correct interpretation even if the utterance order or nuance of the parameters changes slightly. Also, the indentation information makes it possible to generate a response text (interpretation result) based on the input content.

図１０は、ユーザの入力音声の解釈に用いられるエンティティ情報の一例を示す図である。この図１０は、印刷色（ＰｒｉｎｔＣｏｌｏｒ）に対応するエンティティ情報である。この図１０において、「ＰｒｉｎｔＣｏｌｏｒ」の文字は、エンティティ名を示している。また、図１０において、左の列の「ａｕｔｏ＿ｃｏｌｏｒ」、「ｍｏｎｏｃｈｒｏｍｅ」、「ｃｏｌｏｒ」・・・等の文字は、パラメータ名を示している。また、図１０において、右の列の「ａｕｔｏ＿ｃｏｌｏｒ」、「ｍｏｎｏｃｈｒｏｍｅ，ｂｌａｃｋａｎｄｗｈｉｔｅ」、「ｃｏｌｏｒ，ｆｕｌｌｃｏｌｏｒ」・・・等の文字は、類義語を示している。 FIG. 10 is a diagram showing an example of entity information used to interpret the user's input speech. FIG. 10 shows entity information corresponding to Print Color. In FIG. 10, the characters "Print Color" indicate the entity name. In FIG. 10, characters such as “auto_color”, “monochrome”, “color”, etc. in the left column indicate parameter names. In FIG. 10, characters such as “auto_color”, “monochrome, black and white”, “color, full color”, etc. in the right column indicate synonyms.

この図１０からわかるように、エンティティ情報としては、パラメータ及び類義語が関連付けられて記憶されている。パラメータと共に、類義語を登録することで、例えばモノクロでのコピーを指示する場合に、「Please copy by black and white」と発話しても、「Please copy by monochrome」と発話しても、パラメータの設定を可能とすることができる。 As can be seen from FIG. 10, as entity information, parameters and synonyms are associated and stored. By registering synonyms along with parameters, for example, when instructing to copy in monochrome, even if you say "Please copy by black and white" or "Please copy by monochrome", you can set parameters can be made possible.

図１１は、発話フレーズに基づいて登録されるエンティティ情報を示す図である。図１１（ａ）は、ユーザの発話フレーズの例、図１１（ｂ）は、アクション名、図１１（ｃ）は、エンティティ情報を示している。この図１１（ａ）～図１１（ｃ）に示すように、ＡＩアシスタントサーバ装置４が備える表示部４８に表示される画面上において、操作部４９を操作することで、ユーザの発話をドラッグする。または、ネットワークを介してＡＩアシスタントサーバ装置４にアクセスした装置の表示部に表示される画面上において、この装置の操作部を操作することで、ユーザの発話をドラッグする。 FIG. 11 is a diagram showing entity information registered based on uttered phrases. FIG. 11(a) shows an example of a user's uttered phrase, FIG. 11(b) shows an action name, and FIG. 11(c) shows entity information. As shown in FIGS. 11A to 11C, the user's utterance is dragged by operating the operation unit 49 on the screen displayed on the display unit 48 provided in the AI assistant server device 4. . Alternatively, the user's utterance is dragged by operating the operation unit of the device on the screen displayed on the display unit of the device that accessed the AI assistant server device 4 via the network.

これにより、関連付け対象となるエンティティ情報を選択できるようになっている。また、選択したエンティティ情報でバリュー（ＶＡＬＵＥ）を設定すると、応答で入るパラメータが変更される。例えば、「Ｐｌｅａｓｅｃｏｐｙｂｙｂｌａｃｋａｎｄｗｈｉｔｅ」と発話した場合、バリューを「＄ｐｒｉｎｔＣｏｌｏｒ」とすると、戻り値として「ｐｒｉｎｔＣｏｌｏｒ＝ｍｏｎｏｃｈｒｏｍｅ」が返る。これに対して、バリューを「＄ｐｒｉｎｔＣｏｌｏｒ．ｏｒｉｇｉｎａｌ」とすると、戻り値として「ｐｒｉｎｔＣｏｌｏｒ＝ｂｌａｃｋａｎｄｗｈｉｔｅ」が返る。ここで、バリューを「＄ｐｒｉｎｔＣｏｌｏｒ．ｏｒｉｇｉｎａｌ」とすると、ユーザの発話内容がそのまま応答のパラメータとして返すことができる。 This makes it possible to select entity information to be associated. Also, if a value (VALUE) is set in the selected entity information, the parameters entered in the response are changed. For example, when "Please copy by black and white" is uttered and the value is "$printColor", "printColor=monochrome" is returned as the return value. On the other hand, if the value is "$printColor.original", "printColor=black and white" is returned as the return value. Here, if the value is "$printColor.original", the content of the user's utterance can be returned as it is as a parameter of the response.

（対話型動作）
次に、実施の形態の音声操作システムでは、ユーザからの入力内容に基づいて、システムが応答する対話型システムを実現している。実施の形態の音声操作システムでは、対話に必要な定型文を応答する以外に、ＭＦＰ１の操作特有の応答として、「入力不足フィードバック」及び「入力確認フィードバック」の、２種類の応答を行うことで、対話型ＭＦＰ操作システムを実現している。 (interactive behavior)
Next, the voice operation system of the embodiment realizes an interactive system in which the system responds based on the contents of input from the user. In the voice operation system according to the embodiment, in addition to responding with fixed sentences necessary for dialogue, two types of responses, "input insufficient feedback" and "input confirmation feedback", are given as responses specific to the operation of the MFP 1. , realizes an interactive MFP operation system.

「入力不足フィードバック」は、ジョブを実行するために必要な情報が揃っていない場合に出力される応答である。ユーザの入力結果を聞き取れなかった場合、又は、必須パラメータが不足している場合に出力される。つまり、必須パラメータ以外のパラメータについては、指示されていない場合であっても入力不足フィードバックを行う必要はない。また、パラメータ以外にも、コピー機能又はスキャン機能等のうち、利用する機能を確認する処理を含んでも良い。 "Insufficient input feedback" is a response that is output when the information required to execute a job is not complete. This is output when the user's input result cannot be heard, or when a required parameter is missing. In other words, there is no need to provide insufficient input feedback for parameters other than essential parameters, even if they are not indicated. In addition to the parameters, processing for confirming a function to be used, such as a copy function or a scan function, may also be included.

例えば、携帯端末装置２が通信接続中の外部装置の種類に応じて、ユーザに確認する機能、及びパラメータを変更しても良い。この場合、処理能力取得部５６が外部装置との通信が確立した後の所定のタイミングで外部装置の種類や機能を示す情報を取得し、取得した情報に基づいてユーザに確認する機能及びパラメータを例えばフィードバック部５５が決定してもよい。例えば、外部装置の種類がＭＦＰ１である場合には、コピー、プリント、スキャン、ＦＡＸなどのＭＰＦ１が有する機能をユーザに確認でき、更に、コピー、プリント、スキャン、ＦＡＸのうちＭＦＰ１が有している機能についてのみ、いずれの機能を使用するかをユーザに確認しても良い。 For example, the functions and parameters to be confirmed by the user may be changed according to the type of external device with which the mobile terminal device 2 is in communication connection. In this case, the processing capacity acquisition unit 56 acquires information indicating the type and function of the external device at a predetermined timing after communication with the external device is established, and based on the acquired information, determines the functions and parameters to be confirmed by the user. For example, the feedback unit 55 may decide. For example, when the type of the external device is MFP1, the user can confirm the functions of copy, print, scan, FAX, etc. that MPF1 has. As for the functions, the user may be asked which function to use.

「入力確認フィードバック」は、ジョブを実行するために必要な情報が揃った場合に出力される応答である。つまり、入力確認フィードバックは、全ての必須パラメータについて指示された場合に行われる。また、入力確認フィードバックは、現在の設定値でジョブを実行するか、又は、設定値を変更するかの選択をユーザに促すために行う。なお、現在の設定値でジョブを実行するか否かを確認するために、ユーザにより指示された全てのパラメータ（必須パラメータか必須パラメータ以外のパラメータかに関わらず）を音声出力することで、ユーザに確認することができる。 "Input confirmation feedback" is a response that is output when the information necessary for executing a job is complete. That is, input confirmation feedback is provided when indicated for all required parameters. Also, the input confirmation feedback is provided to prompt the user to select whether to execute the job with the current set values or to change the set values. In addition, in order to confirm whether or not to execute the job with the current setting values, all parameters specified by the user (regardless of whether they are required parameters or parameters other than required parameters) can be output by voice. can be verified.

図１２は、対話式の入力操作の流れを示す図である。このような各フィードバックを含めた、システムとユーザの対話型操作の例が図１２である。この図１２の例は、カラーロ画像を両面（上下開き）でコピーを行い、上２ヶ所でステープルするようにＭＦＰ１を操作する例である。また、この例の場合、ステープル位置が必須パラメータとなっている。なお、必須パラメータはステープル位置に限定されず、モノクロ、カラー、又は、用紙サイズ等、複数のパラメータを含めても良い。 FIG. 12 is a diagram showing the flow of interactive input operations. An example of interactive operation between the system and the user, including each of these feedbacks, is shown in FIG. The example of FIG. 12 is an example of manipulating the MFP 1 so as to copy color images on both sides (upper/lower opening) and staple them at the upper two positions. Also, in this example, the staple position is an essential parameter. Note that the essential parameter is not limited to the staple position, and may include a plurality of parameters such as monochrome, color, or paper size.

また、図１３は図１２に示す処理が実行された場合における画面表示の一例を示す図である。即ち、携帯端末装置２は、ユーザ発話した内容（認識結果）と、ＡＩアシスタントサーバ装置４からフィードバックされた内容（操作情報）をタッチパネル２７の画面上に表示する。なお、図１３において携帯端末装置２のタッチパネル２７の画面右側から吹き出し表示されるコメントは、ユーザが携帯端末装置２に対して発話した内容を示すコメントを示す。また、図１３において携帯端末装置２のタッチパネル２７の画面左側から吹き出し表示されるコメント及び画像は、ＡＩアシスタントサーバ装置４からユーザの発話に対してフィードバックされた内容を示すコメント、又は画像（スタンプ）である。つまり、携帯端末装置２は、ＡＩアシスタントサーバ装置４からフィードバックを受けた場合、音声出力でユーザへフィードバックすると同時に、タッチパネル２７の画面表示によってもフィードバックを行う。ただし、音声出力のフィードバックについては省略しても良い。 FIG. 13 is a diagram showing an example of screen display when the processing shown in FIG. 12 is executed. In other words, the mobile terminal device 2 displays the content of the user's speech (recognition result) and the content (operation information) fed back from the AI assistant server device 4 on the screen of the touch panel 27 . Note that the comment balloon-displayed from the right side of the screen of the touch panel 27 of the mobile terminal device 2 in FIG. 13, the comments and images displayed in balloons from the left side of the screen of the touch panel 27 of the mobile terminal device 2 are comments or images (stamps) indicating the content of feedback from the AI assistant server device 4 in response to the user's utterance. is. In other words, when the mobile terminal device 2 receives feedback from the AI assistant server device 4 , the mobile terminal device 2 outputs the feedback to the user through voice output, and at the same time, also provides the feedback through the screen display of the touch panel 27 . However, feedback of audio output may be omitted.

なお、パラメータのうち、いずれのパラメータが必須パラメータであるかは、ＡＩアシスタントサーバ装置４が有する記憶部に予め記憶しておくことがでる。また、ユーザが操作部４９を操作によって、又はネットワークを介してＡＩアシスタントサーバ装置４に対してアクセスすることで、いずれのパラメータが必須パラメータであるかは適宜変更することができても良い。 It should be noted that which parameters among the parameters are essential parameters can be stored in advance in the storage unit of the AI assistant server device 4 . Further, the user may be able to appropriately change which parameter is the essential parameter by operating the operation unit 49 or by accessing the AI assistant server device 4 via the network.

図１２の例は、斜線で示す発話がユーザの発話であり、斜線の無い発話がシステムの発話、網掛けで示す会話が携帯端末装置２の画面またはシステムの発話（音声）である。まず、システムが、「コピーしますか？スキャンしますか？」との音声出力を行うと、ユーザは、「コピー」と発音してコピー機能の利用を指示する。システム側では、ユーザから指定された「コピー」の設定値の入力を求めるべく、「設定値を入力してください。」との音声出力を携帯端末装置２にて行う。 In the example of FIG. 12, the utterances indicated by hatching are user utterances, the utterances without slanting lines are the system utterances, and the hatched conversations are the screen of the mobile terminal device 2 or the system utterances (voice). First, when the system outputs a voice saying "Do you want to copy? Do you want to scan?", the user pronounces "copy" to instruct the use of the copy function. On the system side, in order to request input of the setting value of "copy" specified by the user, the portable terminal device 2 outputs a voice message "Please input the setting value."

これに対して、ユーザは、「カラー、両面、上下開き、ステープル上２ヶ所」と発話したとする。この例の場合、上述のようにステープル位置が必須のパラメータとなっている。このため、システム側では、入力不足ではないと判断し、仕上がりイメージを示す画像（スタンプ）を携帯端末装置２に表示させ、コピー開始の指示を促す。これが、ジョブを実行するために必要な情報が揃った場合に出力される「入力確認フィードバック」である。 In response to this, it is assumed that the user utters "color, double-sided, open top and bottom, two places on top of staple". In this example, the staple position is an essential parameter as described above. Therefore, the system side determines that the input is not insufficient, displays an image (stamp) indicating a finished image on the mobile terminal device 2, and prompts an instruction to start copying. This is the "input confirmation feedback" that is output when the information necessary to execute the job is complete.

そして、ユーザが「ＯＫ」と応答すると、システム側は、「ジョブを実行します」との応答を行い、ユーザから指示されたジョブを実行する。 Then, when the user replies "OK", the system responds "Job will be executed" and executes the job instructed by the user.

（対話型動作の流れ）
図１４及び図１５は、このような対話型動作の流れを示すシーケンス図である。図１４のシーケンス図は、対話型動作の前半の動作の流れを示しており、図１５のシーケンス図は、対話型動作の後半の動作の流れを示している。 (Interactive operation flow)
14 and 15 are sequence diagrams showing the flow of such interactive operations. The sequence diagram of FIG. 14 shows the flow of the first half of the interactive operation, and the sequence diagram of FIG. 15 shows the flow of the second half of the interactive operation.

まず、ユーザにより携帯端末装置２の操作音声処理プログラムが起動操作されると（ステップＳ１１）、フィードバック部５５は、「コピーしますか？スキャンしますか？」との音声および画面表示でのフィードバックを行う（ステップＳ１２）。 First, when the user activates the operation voice processing program of the mobile terminal device 2 (step S11), the feedback unit 55 provides voice and screen display feedback of "Do you want to copy? Do you want to scan?" (step S12).

なお、携帯端末装置２は、「コピーしますか？スキャンしますか？」のコメントを、ステップＳ１２の音声フィードバックと共にタッチパネル２７の画面に表示する。つまり、携帯端末装置２は、携帯端末装置２のＲＯＭ２３などに予め記憶されたテキストデータを表示させる。 The portable terminal device 2 displays the comment "Do you want to copy? Do you want to scan?" That is, the mobile terminal device 2 displays the text data stored in advance in the ROM 23 of the mobile terminal device 2 or the like.

ユーザが、「コピー」と発音すると（ステップＳ１３）、携帯端末装置２の通信制御部５２は、音声認識サーバ装置３に対して、「コピー」との音声データを送信し、テキスト変換要求を行う（ステップＳ１４）。音声認識サーバ装置３のテキスト変換部６２は、「コピー」との音声データをテキストデータに変換処理し、携帯端末装置２に送信する（ステップＳ１５）。 When the user pronounces "copy" (step S13), the communication control unit 52 of the mobile terminal device 2 transmits voice data of "copy" to the voice recognition server device 3 and requests text conversion. (Step S14). The text conversion unit 62 of the speech recognition server device 3 converts the speech data of "copy" into text data, and transmits the text data to the portable terminal device 2 (step S15).

なお、携帯端末装置２は、「コピー」のコメントを、ステップＳ１５で携帯端末装置２が音声認識サーバ装置３からテキストデータを受信したタイミングでタッチパネル２７の画面に表示する。このとき、携帯端末装置２は、「コピー」を音声でフィードバックしても良いし、省略しても良い。 The mobile terminal device 2 displays the comment "copy" on the screen of the touch panel 27 at the timing when the mobile terminal device 2 receives the text data from the voice recognition server device 3 in step S15. At this time, the mobile terminal device 2 may give voice feedback of "copy" or may omit it.

携帯端末装置２の取得部５１は、このテキストデータを取得する。また、携帯端末装置２の通信制御部５２は、取得したテキストデータをＡＩアシスタントサーバ装置４に送信する（ステップＳ１６）。ＡＩアシスタントサーバ装置４の解釈部７２は、図１０及び図１１を用いて説明したように、受信したテキストデータで示されるユーザの発話フレーズに基づいて、アクション及びパラメータを解釈する。この例の場合、ユーザは、「コピー」としか発音していないため、ステープル位置等が不明となる（入力不足）。 The acquisition unit 51 of the mobile terminal device 2 acquires this text data. Further, the communication control unit 52 of the mobile terminal device 2 transmits the acquired text data to the AI assistant server device 4 (step S16). The interpretation unit 72 of the AI assistant server device 4 interprets the action and parameters based on the user's uttered phrase indicated by the received text data, as described with reference to FIGS. 10 and 11 . In this example, the user pronounces only "copy", so the stapling position and the like are unknown (insufficient input).

このため、解釈部７２は、「Ｃｏｐｙ＿Ｐａｒａｍｅｔｅｒ＿Ｓｅｔｔｉｎｇ」としたアクションに、「設定値を入力してください」とのレスポンス（Ｒｅｓｐｏｎｓｅ）を付加した解釈結果を形成する（ステップＳ１７）。ＡＩアシスタントサーバ装置４の通信制御部７３、この解釈結果を携帯端末装置２に送信する（ステップＳ１８）。携帯端末装置２のフィードバック部５５は、スピーカ部２８を介して「設定値を入力してください」との音声出力を行うと共に、タッチパネル２７に対して「設定値を入力してください」とのテキスト表示を行う（ステップＳ１９：入力不足フィードバック）。 Therefore, the interpretation unit 72 forms an interpretation result by adding a response of "Please input a setting value" to the action of "Copy_Parameter_Setting" (step S17). The communication control unit 73 of the AI assistant server device 4 transmits this interpretation result to the mobile terminal device 2 (step S18). The feedback unit 55 of the mobile terminal device 2 outputs a voice message “Please input the setting value” through the speaker unit 28, and also outputs a text message “Please input the setting value” to the touch panel 27. Display is performed (step S19: input shortage feedback).

次に、入力不足フィードバックがあったため、ユーザは、例えば「カラー、両面、上下開き、ステープル上２ヶ所」と発話する（ステップＳ２０）。携帯端末装置２の通信制御部５２は、音声認識サーバ装置３に対して、「カラー、両面、上下開き、ステープル上２ヶ所」との音声データを送信し、テキスト変換要求を行う（ステップＳ２１）。音声認識サーバ装置３のテキスト変換部６２は、「カラー、両面、上下開き、ステープル上２ヶ所」との音声データをテキストデータに変換処理し、携帯端末装置２に送信する（ステップＳ２２）。 Next, since there is insufficient input feedback, the user utters, for example, "color, double-sided, top and bottom open, two places on staple" (step S20). The communication control unit 52 of the mobile terminal device 2 transmits voice data of "color, double-sided, top and bottom opening, two staples on top" to the voice recognition server device 3, and requests text conversion (step S21). . The text conversion unit 62 of the speech recognition server device 3 converts the speech data of "color, double-sided, top and bottom opening, staple top 2 places" into text data, and transmits the text data to the portable terminal device 2 (step S22).

携帯端末装置２の取得部５１は、このテキストデータを取得する。また、携帯端末装置２の通信制御部５２は、取得したテキストデータをＡＩアシスタントサーバ装置４に送信する（ステップＳ２３）。ＡＩアシスタントサーバ装置４の解釈部７２は、受信したテキストデータで示されるユーザの発話フレーズに基づいて、アクション及びパラメータを解釈する（ステップＳ２４）。この例の場合、ユーザは、「コピー」及び「カラー、両面、上下開き、ステープル上２ヶ所」と発音しているため、コピーのジョブに対する必須パラメータ不足は解消する。このため、解釈部７２は、「Ｃｏｐｙ＿Ｃｏｎｆｉｒｍ」としたアクションに、「カラー／モノクロ＝カラー」、「印刷面＝両面」、「開き方向＝上下開き」及び「ステープル位置＝上２ヶ所」とのパラメータを付加した解釈結果を形成する。ＡＩアシスタントサーバ装置４の通信制御部７３、この解釈結果を携帯端末装置２に送信する（ステップＳ２５）。 The acquisition unit 51 of the mobile terminal device 2 acquires this text data. Further, the communication control unit 52 of the mobile terminal device 2 transmits the acquired text data to the AI assistant server device 4 (step S23). The interpretation unit 72 of the AI assistant server device 4 interprets the action and parameters based on the user's uttered phrase indicated by the received text data (step S24). In this example, the user pronounces "copy" and "color, double-sided, open top and bottom, two places on top of staple", so the lack of essential parameters for the copy job is resolved. For this reason, the interpreting unit 72 adds the parameters "color/monochrome = color", "printing side = double-sided", "opening direction = top and bottom opening", and "stapling position = top two places" to the action "Copy_Confirm". to form an interpretation result with the addition of The communication control unit 73 of the AI assistant server device 4 transmits this interpretation result to the mobile terminal device 2 (step S25).

携帯端末装置２のフィードバック部５５は、必須パラメータ不足が解消し、コピーを開始する準備が整ったため、ＡＩアシスタントサーバ装置４からフィードバックされた解析結果をコメント表示する代わりに、解釈結果に含まれるＲｅｓｐｏｎｓｅに基づいて、図１３に示すように解析結果に基づく仕上がりイメージを画像（スタンプ）を生成する（ステップＳ２７）。 The feedback unit 55 of the mobile terminal device 2 resolves the shortage of essential parameters and is ready to start copying. 13, an image (stamp) is generated based on the analysis result as shown in FIG. 13 (step S27).

なお、必須パラメータが不足していると判断された場合は、更にコメント表示、及び音声出力を行うことでユーザに不足している必須パラメータの設定を促すことができる。 If it is determined that the required parameters are insufficient, the user can be urged to set the missing required parameters by further displaying a comment and outputting a voice.

携帯端末装置２のフィードバック部５５は、フィードバックされた解析結果の「Action」が「Copy_confirm」であった場合に、「Parameter」を参照する。そして、携帯端末装置２のフィードバック部５５は、「Parameter」の値に一致する仕上がりイメージを示す画像（スタンプ）を検索し、検索した画像（スタンプ）をタッチパネル２７に表示させる。 The feedback unit 55 of the mobile terminal device 2 refers to "Parameter" when the "Action" of the feedback analysis result is "Copy_confirm". Then, the feedback unit 55 of the mobile terminal device 2 searches for an image (stamp) indicating a finished image that matches the value of “Parameter”, and causes the touch panel 27 to display the searched image (stamp).

ここで、「Parameter」に複数の設定値が設定されている場合は、全ての設定値を満足する画像（スタンプ）を検索する。例えば、携帯端末装置２のＲＯＭ２３には、設定値と対応付けて画像（スタンプ）が記憶されている。 Here, if a plurality of setting values are set in "Parameter", an image (stamp) that satisfies all the setting values is searched. For example, the ROM 23 of the mobile terminal device 2 stores images (stamps) in association with setting values.

ここで、図１６はスタンプの記憶態様を例示的に示す図である。例えば、図１６に示すように、画像（スタンプ）をテーブルデータとして記憶することができる。なお、全ての設定値を満足する画像（スタンプ）がない場合は、最も近い一の画像（スタンプ）を表示しても良い。 Here, FIG. 16 is a diagram exemplifying a stamp storage mode. For example, as shown in FIG. 16, images (stamps) can be stored as table data. If there is no image (stamp) that satisfies all set values, the closest image (stamp) may be displayed.

なお、図１６に示すテーブルデータは、携帯端末装置２ではなく、携帯端末装置２がアクセス可能な外部装置に記憶されていても良い。例えば、携帯端末装置２が通信中のＭＦＰ１に記憶されていても良いし、ネットワーク５を介して接続されサーバに記憶されていても良い。この場合、操作音声処理プログラムは、ＭＦＰ１又はサーバにアクセスして解析結果に含まれる設定値を送信し、ＭＦＰ１又はサーバからの応答として該設定値を満たす画像（スタンプ）を取得することができる。 Note that the table data shown in FIG. 16 may be stored not in the mobile terminal device 2 but in an external device that the mobile terminal device 2 can access. For example, it may be stored in the MFP 1 with which the mobile terminal device 2 is communicating, or may be stored in a server connected via the network 5 . In this case, the operation voice processing program can access the MFP 1 or the server, transmit the setting values included in the analysis result, and obtain an image (stamp) that satisfies the setting values as a response from the MFP 1 or the server.

また、上記では、ＡＩアシスタントサーバ装置４からのフィードバックに基づいて携帯端末装置２が画像（スタンプ）を検索する場合について説明したが、これに限定されず、ＡＩアシスタントサーバ装置４で画像（スタンプ）を検索しても良い。この場合、「Action:Copy_confirm」、「Parameter:カラー／モノクロ＝カラー、印刷面＝両面、開き方向＝上下開き、ステープル＝上２ヶ所」という解析結果のフィードバックに加えて、仕上がりイメージを示す画像（スタンプ）を携帯端末装置２へ送信する。携帯端末装置２は、受信した画像（スタンプ）をタッチパネル２７に表示させる。 In the above description, a case where the mobile terminal device 2 searches for an image (stamp) based on feedback from the AI assistant server device 4 has been described. You can search for In this case, an image ( stamp) to the mobile terminal device 2 . The mobile terminal device 2 displays the received image (stamp) on the touch panel 27 .

このとき、ＡＩアシスタントサーバ装置４は、解析結果に含まれる「Parameter」の値に一致する仕上がりイメージを示す画像（スタンプ）を検索する。ＡＩアシスタントサーバ装置４は、ＡＩアシスタントサーバ装置４が有するＨＤＤ４４、又はＡＩアシスタントサーバ装置４がアクセス可能なサーバに問い合わせることで、イメージ画像を示す画像（スタンプ）を検索、取得することができる。 At this time, the AI assistant server device 4 searches for an image (stamp) showing a finished image that matches the value of "Parameter" included in the analysis result. The AI assistant server device 4 can search and acquire an image (stamp) indicating an image by querying the HDD 44 of the AI assistant server device 4 or a server accessible by the AI assistant server device 4 .

ここで、図１７はスタンプの変形例を示す図である。例えば、ＡＩアシスタントサーバ装置４からフィードバックされた解析結果が「Action:Copy_confirm」、「Parameter:印刷面＝両面、部数＝２」である場合、仕上がりイメージを示す画像（スタンプ）としては、図１７のように表示することができる。図１７に示す例では、部数を示す数字である「２」を仕上がりイメージと共に表示する。 Here, FIG. 17 is a diagram showing a modification of the stamp. For example, when the analysis result fed back from the AI assistant server device 4 is "Action: Copy_confirm" and "Parameter: Print surface = double-sided, number of copies = 2", the image (stamp) of FIG. can be displayed as In the example shown in FIG. 17, the number "2" indicating the number of copies is displayed together with the finish image.

携帯端末装置２は、仕上がりイメージを示す画像（スタンプ）を表示することに加えて、「両面で２部コピーします。よろしいですか？」という音声フィードバックを行っても良いし、音声フィードバックは省略しても良い。また、仕上がりイメージを表示することに加えて、「両面で２部コピーします。よろしいですか？」というコメント表示しても良いし、コメント表示は省略しても良い。 In addition to displaying an image (stamp) indicating the finished image, the mobile terminal device 2 may provide voice feedback such as "I will copy two copies on both sides. Are you sure?", or the voice feedback may be omitted. You can Also, in addition to displaying the finished image, a comment "I will copy two copies on both sides. Are you sure?" may be displayed, or the comment display may be omitted.

なお、仕上がりイメージを示す画像（スタンプ）は、携帯端末装置２のタッチパネル２７によって選択可能に表示することができる。例えば、携帯端末装置２は、過去のジョブ実行時に表示されたコメント及び画像を、携帯端末装置２のＲＯＭ２３に記憶しておくことができる。これにより、操作音声処理プログラムが起動すると、図１３に示すように、過去のジョブ実行時に表示されたコメント及び画像を表示することができる。 An image (stamp) indicating a finished image can be displayed in a selectable manner on the touch panel 27 of the mobile terminal device 2 . For example, the mobile terminal device 2 can store in the ROM 23 of the mobile terminal device 2 comments and images that were displayed when jobs were executed in the past. As a result, when the operation voice processing program is activated, the comments and images that were displayed when the job was executed in the past can be displayed as shown in FIG.

ここで、過去のジョブ実行時にフィードバックされた画像を、ユーザが携帯端末装置２のタッチパネル２７をタッチすることで選択した場合、携帯端末装置２（操作音声処理プログラム）は、該画像に対応する設定値を今回のジョブの設定値として反映させることができる。 Here, when the user selects an image fed back during execution of a past job by touching the touch panel 27 of the mobile terminal device 2, the mobile terminal device 2 (operation voice processing program) performs settings corresponding to the image. The value can be reflected as the setting value of the current job.

また、携帯端末装置２（操作音声処理プログラム）は、画像が選択された場合、図１３に示すように、選択された画像を（１７：００での表示のように）再度表示させるとともに、該画像に紐づく設定値を「Parameter」として、ジョブ実行をＭＦＰ１に対して指示する。例えば、「Action:Copy_execute」、「Parameter:カラー／モノクロ＝カラー、印刷面＝両面、開き方向＝上下開き、ステープル＝上２ヶ所」の解釈結果をＡＩアシスタントサーバ装置４から受信した場合と同様の処理を行うことができる。この場合、携帯端末装置２（操作音声処理プログラム）は、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４を用いることなく、ジョブを実行できる。 Further, when an image is selected, the portable terminal device 2 (operation voice processing program) displays the selected image again (like the display at 17:00) as shown in FIG. The MFP 1 is instructed to execute a job using a setting value associated with an image as "Parameter". For example, the interpretation result of "Action: Copy_execute", "Parameter: color/monochrome = color, print surface = double-sided, opening direction = top and bottom opening, staple = top two places" is received from the AI assistant server device 4. can be processed. In this case, the mobile terminal device 2 (operation voice processing program) can execute the job without using the voice recognition server device 3 and the AI assistant server device 4 .

この場合、携帯端末装置２は、仕上がりイメージを示す画像（スタンプ）と、該画像（スタンプ）に対応する設定値（つまり、ＡＩアシスタントサーバ装置４からフィードバックされた「Parameter」の値）とを携帯端末装置２のＲＯＭ２３に紐づけて記憶しておく。 In this case, the mobile terminal device 2 carries an image (stamp) indicating a finished image and a setting value corresponding to the image (stamp) (that is, the "Parameter" value fed back from the AI assistant server device 4). It is stored in association with the ROM 23 of the terminal device 2 .

携帯端末装置２（操作音声処理プログラム）は、仕上がりイメージを示す画像（スタンプ）を、次回以降のジョブ実行時に使用できるように、記憶しておく。つまり、携帯端末装置２（操作音声処理プログラム）は、タッチパネル２７に表示される、仕上がりイメージを示す画像（スタンプ）を、保存するように指示する。 The mobile terminal device 2 (operation voice processing program) stores an image (stamp) indicating a finished image so that it can be used when executing a job from the next time onward. That is, the mobile terminal device 2 (operation voice processing program) instructs to save the image (stamp) displayed on the touch panel 27, which indicates the finished image.

例えば、ユーザが仕上がりイメージを示す画像（スタンプ）を所定時間タッチし続けた場合（長押しした場合）、携帯端末装置２（操作音声処理プログラム）は、該画像を保存するか否かの選択を受け付ける画面を表示させる。携帯端末装置２（操作音声処理プログラム）は、ユーザが画像の保存を指示した場合、該画像を携帯端末装置２のＲＯＭ２３に記憶させる。このとき、携帯端末装置２（操作音声処理プログラム）は、画像（スタンプ）と、該画像（スタンプ）に対応する設定値とを紐づけてＲＯＭ２３に記憶する。 For example, when the user continues to touch the image (stamp) indicating the finished image for a predetermined time (long press), the mobile terminal device 2 (operation voice processing program) selects whether or not to save the image. Display the acceptance screen. When the user instructs to save an image, the mobile terminal device 2 (operation voice processing program) stores the image in the ROM 23 of the mobile terminal device 2 . At this time, the mobile terminal device 2 (operation voice processing program) associates the image (stamp) with the setting value corresponding to the image (stamp) and stores them in the ROM 23 .

このように記憶した画像は、ユーザの指示によって呼び出すことができる。例えば、携帯端末装置２（操作音声処理プログラム）は、図１３の左下に示すアイコンＩ１をユーザがタッチすると、予め記憶した画像の一覧を表示する。携帯端末装置２（操作音声処理プログラム）は、該一覧から所望の画像をユーザが指定した場合、図１３に示すように該画像を（１７：００での表示のように）タッチパネル２７に表示する。これにより、該画像に対応する設定値を、今回のジョブ設定値として反映させることができる。 The images stored in this way can be called up by a user's instruction. For example, the portable terminal device 2 (operation voice processing program) displays a list of pre-stored images when the user touches the icon I1 shown in the lower left of FIG. When the user designates a desired image from the list, the mobile terminal device 2 (operation voice processing program) displays the image on the touch panel 27 as shown in FIG. 13 (like the display at 17:00). . Thereby, the setting values corresponding to the image can be reflected as the current job setting values.

図１４に戻り、次に、携帯端末装置２は、仕上がりイメージを画像（スタンプ）を表示するなどの上述の入力確認フィードバックを行う（ステップＳ２７）。この入力確認フィードバックに対して、ユーザは、設定値の変更又はコピーの開始を指示する音声入力を行う。 Returning to FIG. 14, next, the portable terminal device 2 performs the above-described input confirmation feedback such as displaying a finished image (stamp) (step S27). In response to this input confirmation feedback, the user performs voice input instructing to change the setting value or start copying.

複数のパラメータのうち、いずれのパラメータが必須パラメータであるかは、ＡＩアシストサーバ装置４の記憶部に予め記憶しておくことができる。この場合、解釈部７２は、記憶部に記憶された必須パラメータの情報に基づいて、携帯端末装置２から取得したパラメータが全ての必須パラメータの設定を行っているか否かを判断し、必須パラメータについて設定が行われていない場合は、必須パラメータの設定を行うように、携帯端末装置２を介してユーザへ促すことができる。 Which parameter among the plurality of parameters is the essential parameter can be stored in advance in the storage unit of the AI assist server device 4 . In this case, the interpretation unit 72 determines whether or not the parameters acquired from the mobile terminal device 2 set all the required parameters based on the information of the required parameters stored in the storage unit. If the settings have not been made, the user can be prompted via the mobile terminal device 2 to set the essential parameters.

このように、操作音声処理プログラムは、予め携帯端末装置２で記憶されたテキストデータ、音声認識サーバ装置３から受信したテキストデータ、ＡＩアシスタントサーバ装置４から受信したResponseに基づいて、コメントを携帯端末装置２のタッチパネル２７の画面に表示させる。 In this way, the operation voice processing program transmits comments to the mobile terminal device 2 based on the text data stored in advance in the mobile terminal device 2, the text data received from the voice recognition server device 3, and the Response received from the AI assistant server device 4. It is displayed on the screen of the touch panel 27 of the device 2 .

図１５のシーケンス図のステップＳ３５～ステップＳ４２が、設定値の変更を音声指示した場合の動作の流れを示すシーケンス図である。この図１５において、ユーザが設定値を変更する発話を行うと（ステップＳ３５）、音声認識サーバ装置３のテキスト変換部６２が、変更された設定値のテキストデータに生成し、携帯端末装置２を介してＡＩアシスタントサーバ装置４に送信する（ステップＳ３６～ステップＳ３８）。ＡＩアシスタントサーバ装置４は、受信したテキストデータで示されるユーザの発話フレーズに基づいて、変更された設定値を示す解釈結果を生成し（ステップＳ３９）、携帯端末装置２に送信する（ステップＳ４０）。 FIG. 15 is a sequence diagram showing the flow of operations when steps S35 to S42 in the sequence diagram of FIG. 15 are voice instructions to change the set values. In FIG. 15, when the user makes an utterance to change the setting value (step S35), the text conversion unit 62 of the speech recognition server device 3 generates text data of the changed setting value, to the AI assistant server device 4 via (steps S36 to S38). AI assistant server device 4 generates an interpretation result indicating the changed setting value based on the user's uttered phrase indicated by the received text data (step S39), and transmits it to portable terminal device 2 (step S40). .

携帯端末装置２のフィードバック部５５は、解釈結果に含まれるＲｅｓｐｏｎｓｅに基づいてフィードバックのテキストを生成し（ステップＳ４１）、例えば「モノクロ、２部、両面でコピーします。よろしいですか？」等の、上述の入力確認フィードバックを行うことで、変更された設定値でコピーを開始してよいか否かの確認を行う（ステップＳ４２）。 The feedback unit 55 of the mobile terminal device 2 generates a feedback text based on the response included in the interpretation result (step S41), for example, "Monochrome, 2 copies, double-sided copy. Are you sure?" By performing the above-described input confirmation feedback, it is confirmed whether or not copying can be started with the changed set values (step S42).

図１５のシーケンス図のステップＳ４３～ステップＳ５０が、コピーの開始を指示した際の各部の動作の流れである。すなわち、上述の入力確認フィードバックにより、ユーザが「はい」と応答すると（ステップＳ４３）、テキスト化され、携帯端末装置２を介してＡＩアシスタントサーバ装置４に送信される（ステップＳ４４～ステップＳ４６）。ＡＩアシスタントサーバ装置４は、受信したテキストデータに基づいてコピー開始指示を認識すると、「Ｃｏｐｙ＿Ｅｘｅｃｕｔｅ」としたアクションに、「印刷面＝両面」及び「部数＝１部」とのパラメータを付加した解釈結果を形成し、これを携帯端末装置２に送信する（ステップＳ４７～ステップＳ４８）。 Steps S43 to S50 in the sequence diagram of FIG. 15 are the operation flow of each section when the start of copying is instructed. That is, when the user responds "yes" to the input confirmation feedback (step S43), the text is converted to text and sent to the AI assistant server device 4 via the mobile terminal device 2 (steps S44 to S46). When the AI assistant server device 4 recognizes the copy start instruction based on the received text data, the interpretation result is obtained by adding the parameters of "print side=both sides" and "number of copies=1 copy" to the action of "Copy_Execute". is formed and transmitted to the mobile terminal device 2 (steps S47 and S48).

携帯端末装置２の解釈結果変換部５３は、解釈結果をＭＦＰ１のジョブ命令に変換処理し（ステップＳ４９）、ＭＦＰ１に送信する（ステップＳ５０）。これにより、音声入力操作により、ＭＦＰ１をコピー制御することができる。 Interpretation result conversion unit 53 of portable terminal device 2 converts the interpretation result into a job command for MFP 1 (step S49), and transmits the job command to MFP 1 (step S50). As a result, the MFP 1 can be copy-controlled by the voice input operation.

なお、携帯端末装置２がＭＦＰ１へジョブ命令を送信した後、ＭＦＰ１において連結コピーモードＯＮで、且つ、ステープルモードＯＮの場合に、親機である携帯端末装置２は子機であるＭＦＰ１に対して、フィニッシャなどの周辺機器の接続状況を要求する。ＭＦＰ１に周辺機器が接続されていない場合は、携帯端末装置２のタッチパネル２７に連結コピーができない旨の表示を行う。 After the mobile terminal device 2 transmits the job command to the MFP 1, when the MFP 1 is in the connection copy mode ON and the stapling mode ON, the mobile terminal device 2 as the master device sends the job command to the MFP 1 as the slave device. , to request the connection status of peripherals such as finishers. When the peripheral device is not connected to the MFP 1, the touch panel 27 of the portable terminal device 2 displays a message indicating that the linked copy cannot be performed.

（ＡＩアシスタントサーバ装置４からフィードバックされる情報の例）
以下の表２に、ＡＩアシスタントサーバ装置４から携帯端末装置２にフィードバックされる解釈結果の一例を示す。 (Example of information fed back from AI assistant server device 4)
Table 2 below shows an example of interpretation results fed back from the AI assistant server device 4 to the mobile terminal device 2 .

この表２に示すように、例えばジョブの設定値の入力促すための「Ｃｏｐｙ＿Ｐａｒａｍｅｔｅｒ＿Ｓｅｔｔｉｎｇ」、ジョブの設定値の確認を促すための「Ｃｏｐｙ＿Ｃｏｎｆｉｒｍ」、ジョブの実行開始を伝えるための「Ｃｏｐｙ＿Ｅｘｅｃｕｔｅ」等のアクションが、解釈結果に含められて携帯端末装置２にフィードバックされる。 As shown in Table 2, for example, actions such as "Copy_Parameter_Setting" for prompting input of job setting values, "Copy_Confirm" for prompting confirmation of job setting values, and "Copy_Execute" for notifying start of job execution. is included in the interpretation result and fed back to the mobile terminal device 2 .

フィードバック部５５は、解釈結果に含まれるアクション、パラメータ、レスポンスに応じて、ユーザに対するフィードバックを判断することができる。フィードバック部５５は、フィードバックする内容を決定するために、表２に相当する情報を携帯端末装置２の記憶部に記憶し、参照できる構成としても良い。なお、表２では、コピーの場合を例に説明したが、プリント、スキャン、ＦＡＸも表２と同様にアクションとして、ジョブの設定値の入力促すための「Ｐａｒａｍｅｔｅｒ＿Ｓｅｔｔｉｎｇ」、ジョブの設定値の確認を促すための「Ｃｏｎｆｉｒｍ」が用いられても良い。 The feedback unit 55 can determine feedback to the user according to actions, parameters, and responses included in the interpretation result. The feedback unit 55 may store information corresponding to Table 2 in the storage unit of the mobile terminal device 2 and refer to it in order to determine the content to be fed back. In Table 2, the case of copying was explained as an example, but in the case of printing, scanning, and facsimile, as in Table 2, the action is "Parameter_Setting" for prompting the input of job setting values, and confirming the job setting values. "Confirm" may be used to prompt.

また、例えば両面又は片面等の印刷面の設定値、又は、コピー部数等のパラメータが、解釈結果に含められて携帯端末装置２にフィードバックされる。さらに、必須パラメータが不足している場合、不足するパラメータの入力を促すメッセージが、レスポンスとして解釈結果に含められて携帯端末装置２にフィードバックされる。 In addition, parameters such as the set value of the printing surface such as double-sided or single-sided, or the number of copies are included in the interpretation result and fed back to the portable terminal device 2 . Furthermore, if the required parameters are missing, a message prompting the input of the missing parameters is included in the interpretation result as a response and fed back to the mobile terminal device 2 .

このように本実施の形態によれば、設定に基づく仕上がりイメージを示す仕上がり画像を表示することで、直感的に、設定確認のためのフィードバックやジョブ設定を行うことができる。 As described above, according to the present embodiment, by displaying a finished image showing a finished image based on settings, it is possible to intuitively perform feedback for setting confirmation and job setting.

なお、上述の実施の形態の説明では、音声認識サーバ装置３でユーザの発話に対応するテキストデータを生成し、ＡＩアシスタントサーバ装置４でテキストデータに基づいて、ユーザの意図している操作を解釈した。しかし、携帯端末装置２側に、このような音声認識機能及び解釈機能を設け、携帯端末装置２で、ユーザの発話から意図する操作を解釈してもよい。これにより、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４を不要とすることができ、システム構成を簡素化できる。 In the description of the above embodiment, the speech recognition server device 3 generates text data corresponding to the user's utterance, and the AI assistant server device 4 interprets the operation intended by the user based on the text data. did. However, the mobile terminal device 2 may be provided with such a speech recognition function and an interpretation function, and the mobile terminal device 2 may interpret the intended operation from the user's utterance. This eliminates the need for the voice recognition server device 3 and the AI assistant server device 4, thereby simplifying the system configuration.

（第２の実施の形態）
次に、第２の実施の形態について説明する。 (Second embodiment)
Next, a second embodiment will be described.

第２の実施の形態は、携帯端末装置２に代えてスマートスピーカーを適用する点が、第１の実施の形態と異なる。以下、第２の実施の形態の説明では、第１の実施の形態と同一部分の説明については省略し、第１の実施の形態と異なる箇所について説明する。 The second embodiment differs from the first embodiment in that a smart speaker is applied instead of the mobile terminal device 2 . Hereinafter, in the description of the second embodiment, the description of the same portions as those of the first embodiment will be omitted, and the portions different from those of the first embodiment will be described.

図１８は、第２の実施の形態の音声操作システムのシステム構成図である。この図１８に示すように、第２の実施の形態の音声操作システムは、図１で説明した携帯端末装置２に代えてスマートスピーカー５０（情報処理装置の一例）を適用したものである。スマートスピーカーとは、ＡＩスピーカーとも呼ばれ、対話型の音声操作に対応したＡＩアシスタント機能を持つスピーカーである。 FIG. 18 is a system configuration diagram of a voice operation system according to the second embodiment. As shown in FIG. 18, the voice operation system of the second embodiment employs a smart speaker 50 (an example of an information processing device) in place of the mobile terminal device 2 described in FIG. A smart speaker, also called an AI speaker, is a speaker with an AI assistant function that supports interactive voice operations.

音声操作システムは、外部装置の一例であるＭＦＰ１、スマートスピーカー５０（情報処理装置の一例）、クラウドサービス装置６０を、例えばＬＡＮ（Local Area Network）等の所定のネットワーク５を介して相互に接続することで形成されている。ただし、外部装置は複合機には限定されず、電子黒板やプロジェクタなどのオフィス機器を含む、種々の電子機器であっても良い。 The voice operation system connects an MFP 1, which is an example of an external device, a smart speaker 50 (an example of an information processing device), and a cloud service device 60, via a predetermined network 5 such as a LAN (Local Area Network). It is formed by However, the external device is not limited to the multifunction machine, and may be various electronic devices including office equipment such as an electronic blackboard and a projector.

スマートスピーカー５０は、ＭＦＰ１を音声操作するための、ユーザからの音声入力を受け付ける。スマートスピーカー５０は、ＭＦＰ１に近接して設置される。また、スマートスピーカー５０とＭＦＰ１とは、１対１で対応する。したがって、スマートスピーカー５０は、基本的に、ＭＦＰ１の前で操作しているユーザを対象として機能提供を行う。ただし、これに限定されず、スマートスピーカ－５０は複数のＭＰＦ１及び他の電子機器と対応しても良い。 Smart speaker 50 accepts voice input from the user to operate MFP 1 by voice. Smart speaker 50 is installed close to MFP 1 . Also, the smart speaker 50 and the MFP 1 have a one-to-one correspondence. Therefore, the smart speaker 50 basically provides functions for the user who is operating in front of the MFP 1 . However, this is not the only option, and the smart speaker 50 may correspond to multiple MPFs 1 and other electronic devices.

クラウドサービス装置６０は、物理的に一つのサーバ装置としてもよいし、複数のサーバ装置で実現してもよい。クラウドサービス装置６０は、音声データをテキストデータに変換し、更にユーザの意図を解釈するための操作音声変換プログラムがインストールされている制御装置である。また、クラウドサービス装置６０は、ＭＦＰ１を管理するための管理プログラムがインストールされている制御装置である。したがって、クラウドサービス装置６０は、第１の実施の形態の音声認識サーバ装置３やＡＩアシスタントサーバ装置４と同様の機能を発揮する。 The cloud service device 60 may physically be one server device, or may be realized by a plurality of server devices. The cloud service device 60 is a control device installed with an operation voice conversion program for converting voice data into text data and interpreting user's intentions. Cloud service device 60 is a control device in which a management program for managing MFP 1 is installed. Therefore, the cloud service device 60 exhibits functions similar to those of the speech recognition server device 3 and the AI assistant server device 4 of the first embodiment.

操作音声変換プログラムは、ＭＦＰ１に対する操作用の音声辞書と操作を作成／登録する。管理プログラムは、スマートスピーカー５０やＭＦＰ１のアカウント／デバイスを紐付け、システム全体を管理する。 The operation voice conversion program creates/registers an operation voice dictionary and operations for the MFP1. The management program links accounts/devices of the smart speaker 50 and MFP 1 and manages the entire system.

（スマートスピーカー５０のハードウェア構成）
図１９は、音声操作システムに設けられているスマートスピーカー５０のハードウェア構成図である。図１９に示すように、スマートスピーカー５０は、図３で説明した携帯端末装置２と同様に、ＣＰＵ２１、ＲＡＭ２２、ＲＯＭ２３、インタフェース部（Ｉ／Ｆ部）２４及び通信部２５を、バスライン２６を介して相互に接続して形成されている。 (Hardware configuration of smart speaker 50)
FIG. 19 is a hardware configuration diagram of the smart speaker 50 provided in the voice operation system. As shown in FIG. 19, the smart speaker 50 includes a CPU 21, a RAM 22, a ROM 23, an interface section (I/F section) 24 and a communication section 25, and a bus line 26, similarly to the portable terminal device 2 described in FIG. are formed by connecting to each other through

Ｉ／Ｆ部２４には、タッチパネル２７、スピーカ部２８及びマイクロホン部２９が接続されている。マイクロホン部２９は、通話音声の他、ＭＦＰ１に対するジョブの実行命令の入力音声を集音（取得）する。入力音声は、通信部２５を介してクラウドサービス装置６０に送信され、テキストデータに変換される。 A touch panel 27 , a speaker section 28 and a microphone section 29 are connected to the I/F section 24 . The microphone unit 29 collects (obtains) the input voice of a job execution command to the MFP 1 in addition to the call voice. The input speech is transmitted to the cloud service device 60 via the communication unit 25 and converted into text data.

（クラウドサービス装置６０のハードウェア構成）
図２０は、音声操作システムに設けられているクラウドサービス装置６０のハードウェア構成図である。なお、図２０においては、クラウドサービス装置６０は、物理的に一つのサーバ装置で構成されているものとする。図２０に示すように、クラウドサービス装置６０は、図４で説明した音声認識サーバ装置３と同様に、ＣＰＵ３１、ＲＡＭ３２、ＲＯＭ３３、ＨＤＤ（Hard Disk Drive）３４、インタフェース部（Ｉ／Ｆ部）３５及び通信部３６を、バスライン３７を介して相互に接続して形成されている。Ｉ／Ｆ部３５には、表示部３８及び操作部３９が接続されている。ＨＤＤ３４には、ＭＦＰ１に対する操作用の音声辞書と操作を作成／登録するための操作音声変換プログラムが記憶されている。また、ＨＤＤ３４には、スマートスピーカー５０やＭＦＰ１のアカウント／デバイスを紐付け、システム全体を管理する管理プログラムが記憶されている。ＣＰＵ３１は、操作音声変換プログラムや管理プログラムを実行することで、携帯端末装置２から送信された音声データに基づいて、ＭＦＰ１を操作可能とする。 (Hardware Configuration of Cloud Service Device 60)
FIG. 20 is a hardware configuration diagram of the cloud service device 60 provided in the voice operation system. Note that in FIG. 20, the cloud service device 60 is physically composed of one server device. As shown in FIG. 20, the cloud service device 60 includes a CPU 31, a RAM 32, a ROM 33, a HDD (Hard Disk Drive) 34, an interface section (I/F section) 35, as in the speech recognition server device 3 described in FIG. and a communication unit 36 are connected to each other via a bus line 37 . A display unit 38 and an operation unit 39 are connected to the I/F unit 35 . The HDD 34 stores a voice dictionary for operations on the MFP 1 and an operation voice conversion program for creating/registering operations. The HDD 34 also stores a management program that links accounts/devices of the smart speaker 50 and the MFP 1 and manages the entire system. The CPU 31 enables the MFP 1 to be operated based on the voice data transmitted from the mobile terminal device 2 by executing the operation voice conversion program and the management program.

（全体の機能構成）
図２１は、全体の機能の概要説明図である。図２１には、クラウドサービスを提供する主な機能を示している。主な機能の詳細や、図２１に示したスマートスピーカー５０についての機能の説明については、図２２～図２３を参照して後に説明する。 (Overall functional configuration)
FIG. 21 is a schematic explanatory diagram of the overall functions. FIG. 21 shows main functions for providing cloud services. Details of the main functions and functional descriptions of the smart speaker 50 shown in FIG. 21 will be described later with reference to FIGS. 22-23.

クラウド１００の機能は、１つのクラウドサービス装置６０、あるいは複数のクラウドサービス装置６０により実現される。これらの機能は１つまたは複数のクラウドサービス装置６０に適宜設定されるものであり、１つのクラウドサービス装置６０でもよいし、複数のクラウドサービス装置６０でもよい。 Functions of the cloud 100 are implemented by one cloud service device 60 or a plurality of cloud service devices 60 . These functions are appropriately set in one or a plurality of cloud service devices 60, and one cloud service device 60 or a plurality of cloud service devices 60 may be provided.

クラウドサービス装置６０のＣＰＵ３１はＨＤＤ３４の操作音声変換プログラムをＲＡＭ３２に読み出して実行することにより操作音声変換部３１０として機能する。操作音声変換部３１０は、音声データをテキストデータに変換する機能を有する。更に、操作音声変換部３１０は、テキストデータを予め定義された辞書情報と一致するか否かを判断する機能を有する。更に、操作音声変換部３１０は、マッチした場合にはテキストデータをユーザの意図を示すアクションおよびジョブ条件などの変数を示すパラメータに変換する機能を有する。 The CPU 31 of the cloud service device 60 functions as an operation voice conversion unit 310 by reading the operation voice conversion program from the HDD 34 into the RAM 32 and executing it. The operation voice conversion unit 310 has a function of converting voice data into text data. Furthermore, the operation voice conversion unit 310 has a function of determining whether or not the text data matches predefined dictionary information. Further, the operation voice conversion unit 310 has a function of converting the text data into parameters indicating variables such as an action indicating the user's intention and job conditions when there is a match.

また、クラウドサービス装置６０のＣＰＵ３１はＨＤＤ３４の音声アシスタントプログラムをＲＡＭ３２に読み出して実行することにより音声アシスタント部３２０として機能する。音声アシスタント部３２０は、辞書情報を保持する機能を有する。 Further, the CPU 31 of the cloud service device 60 functions as the voice assistant unit 320 by reading the voice assistant program of the HDD 34 to the RAM 32 and executing it. Voice assistant unit 320 has a function of holding dictionary information.

また、クラウドサービス装置６０のＣＰＵ３１はＨＤＤ３４の管理プログラムをＲＡＭ３２に読み出して実行することにより管理部３３０として機能する。管理部３３０は、アクションとパラメータに基づいてＭＦＰ１が解釈可能な形式であるジョブ実行指示に変換した上で登録されたＭＦＰ１へ送信する機能を有する。 In addition, the CPU 31 of the cloud service device 60 functions as a management unit 330 by reading the management program of the HDD 34 to the RAM 32 and executing it. The management unit 330 has a function of converting the action and parameters into a job execution instruction in a format interpretable by the MFP 1 and transmitting the instruction to the registered MFP 1 .

このようにクラウド１００は、少なくとも操作音声変換部３１０、音声アシスタント部３２０、および管理部３３０の機能によりクラウドサービス３００を提供する。 As described above, the cloud 100 provides the cloud service 300 by at least the functions of the operation voice conversion unit 310 , the voice assistant unit 320 and the management unit 330 .

クラウドサービス３００は、ＭＦＰ１や情報処理装置との通信に基づき、各種の情報をＤＢに記憶する。一例として、管理部３３０が、管理ＤＢ３４０や、紐づけ用ＤＢ３５０や、機器情報ＤＢ３６０などを使用して各種情報を管理する。 The cloud service 300 stores various types of information in the DB based on communication with the MFP 1 and information processing devices. As an example, the management unit 330 manages various information using the management DB 340, the linking DB 350, the device information DB 360, and the like.

管理ＤＢ３４０は、テキストデータ、画像データ、音声データなど、クラウドサービス３００が提供するコンテンツにかかるデータを記憶するデータベースである。 The management DB 340 is a database that stores data related to content provided by the cloud service 300, such as text data, image data, and audio data.

紐づけ用ＤＢ３５０は、情報処理装置と紐づける外部装置を記憶するデータベースである。紐づけ用ＤＢ３５０は、本例では、情報処理装置として使用するスマートスピーカー５０のデバイスＩＤと、そのスマートスピーカー５０と対応付ける外部装置（本例ではＭＦＰ１）のＩＤとを対応付けて記憶する。なお、スマートスピーカー５０と外部装置は一対一で紐づけられていても良いが、スマートスピーカー５０と複数の外部装置を紐づけても良い。つまり、デバイスＩＤと紐づく外部装置の種類と個数は限定されない。また、外部装置とスマートスピーカー５０の紐づけの方法についても上記の方法に限定されない。つまり、ユーザアカウントやユーザＩＤなどのユーザを特定する情報と外部装置とを紐づける構成であっても良い。この場合、デバイスＩＤなどのスマートスピーカー５０からクラウドへ送信されるスマートスピーカー５０を特定する情報と、ユーザを特定する情報とをクラウド１００の紐づけ用ＤＢなどに記憶しておき、管理部３３０はデバイスＩＤと紐づくユーザを特定する情報に基づいて外部装置を特定する構成であっても良い。若しくは、スマートスピーカー５０からデバイスＩＤに代えてユーザを特定する情報を送信しても良い。また、ユーザを特定するための情報に代えて、部署や企業などの組織を特定する情報、又は部屋や建物などの場所を特定する情報と、外部装置とを紐づける構成であっても良く、この場合は１以上のスマートスピーカー５０と１以上の外部装置を紐づけても良い。 The linking DB 350 is a database that stores external devices linked to the information processing apparatus. In this example, the linking DB 350 associates and stores the device ID of the smart speaker 50 used as the information processing device and the ID of the external device (MFP 1 in this example) associated with the smart speaker 50 . Note that the smart speaker 50 and an external device may be associated one-to-one, or the smart speaker 50 and a plurality of external devices may be associated. In other words, the type and number of external devices associated with the device ID are not limited. Also, the method of linking the external device and the smart speaker 50 is not limited to the above method. In other words, a configuration may be employed in which information identifying a user, such as a user account or a user ID, is linked to an external device. In this case, information specifying the smart speaker 50, such as the device ID, which is transmitted from the smart speaker 50 to the cloud, and information specifying the user are stored in a linking DB or the like of the cloud 100, and the management unit 330 The configuration may be such that the external device is specified based on information specifying the user associated with the device ID. Alternatively, information identifying the user may be transmitted from the smart speaker 50 instead of the device ID. Further, instead of the information for identifying the user, information for identifying an organization such as a department or company, or information for identifying a location such as a room or building may be linked with an external device. In this case, one or more smart speakers 50 and one or more external devices may be linked.

機器情報ＤＢ３６０は、ＭＦＰ１を含む各外部装置のＩＤとそれぞれの機器情報とを対応付けて記憶するデータベースである。 The device information DB 360 is a database that associates and stores the ID of each external device including the MFP 1 and the respective device information.

（スマートスピーカー５０の機能）
図２２は、スマートスピーカー５０の機能ブロックの構成の一例を示す図である。スマートスピーカー５０のＣＰＵ２１は、ＲＯＭ２３に記憶されている操作処理プログラムを実行することで、図２２に示すように取得部２１１、通信制御部２１２、フィードバック部２１３として機能する。 (Function of smart speaker 50)
FIG. 22 is a diagram showing an example of the functional block configuration of the smart speaker 50. As shown in FIG. The CPU 21 of the smart speaker 50 functions as an acquisition unit 211, a communication control unit 212, and a feedback unit 213 as shown in FIG. 22 by executing the operation processing program stored in the ROM 23. FIG.

取得部２１１は、マイクロホン部２９（図３参照）を介して集音された、ＭＦＰ１を音声操作するためのユーザの指示音声を、取得する。なお、取得部２１１は、タッチパネル２７（図３参照）や物理スイッチ（不図示）などを介してユーザの操作を取得してもよい。通信制御部２１２は、クラウド１００との間の通信を制御する。通信制御部２１２は、クラウド１００と通信し、取得部２１１が取得した情報をクラウド１００へ送信したり、クラウド１００からテキストデータや画像データ、音声データを取得したりする。また、通信制御部２１２は、取得部２１１が取得した情報をクラウド１００へ送信する場合、スマートスピーカー５０を特定するデバイスＩＤを共に送信してもよい。 Acquisition unit 211 acquires a user's instruction voice for voice operation of MFP 1, which is collected through microphone unit 29 (see FIG. 3). Note that the acquisition unit 211 may acquire the user's operation via the touch panel 27 (see FIG. 3), a physical switch (not shown), or the like. The communication control unit 212 controls communication with the cloud 100 . The communication control unit 212 communicates with the cloud 100 , transmits information acquired by the acquisition unit 211 to the cloud 100 , and acquires text data, image data, and audio data from the cloud 100 . When transmitting the information acquired by the acquisition unit 211 to the cloud 100 , the communication control unit 212 may also transmit a device ID that identifies the smart speaker 50 .

フィードバック部２１３は、対話型の音声入力操作を実現すべく、例えば不足するデータを補う入力を促す音声や、入力を確認する音声などをユーザ側にフィードバックする。また、フィードバック部２１３は、タッチパネル２７のディスプレイ表示を制御することによって、テキストまたは画像としてユーザに対してフィードバックを行ってもよい。 The feedback unit 213 feeds back to the user side, for example, a voice prompting an input to compensate for missing data, a voice confirming the input, and the like, in order to realize an interactive voice input operation. Further, the feedback unit 213 may give feedback to the user in the form of text or images by controlling the display of the touch panel 27 .

なお、この例では、取得部２１１～フィードバック部２１３をソフトウェアで実現することとしたが、これらのうちの一部または全部をＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、取得部２１１～フィードバック部２１３の各機能は、操作処理プログラム単体で実現してもよいし、他のプログラムに処理の一部を実行させる、または他のプログラムを用いて間接的に処理を実行させてもよい。 In this example, the acquisition unit 211 to the feedback unit 213 are implemented by software, but part or all of them may be implemented by hardware such as an IC (Integrated Circuit). Further, each function of the acquisition unit 211 to the feedback unit 213 may be realized by the operation processing program alone, or may be executed by another program to execute part of the processing, or may be indirectly processed by using another program. may be executed.

（クラウドサービスの機能の詳細）
図２３は、クラウドサービスの各機能の構成の一例を示す図である。操作音声変換部３１０は、図２３に示すように、取得部３１１や、テキスト変換部３１２や、解釈部３１３や、出力部３１４などの機能を含む。取得部３１１は、スマートスピーカー５０から送信される音声データ（ユーザにより入力された音声データ）を取得する。また、取得部３１１は、スマートスピーカー５０のタッチパネル２７や物理スイッチ（ボタンなども含む）などに対してユーザが行った操作を示すデータを取得してもよい。テキスト変換部３１２は、音声データ（スマートスピーカー５０において入力されたユーザの音声データ）をテキストデータに変換するＳＴＴ（Speech To Text）を含む。解釈部３１３は、テキスト変換部３１２により変換されたテキストデータに基づいてユーザの指示の内容を解釈する。具体的に、解釈部３１３は、テキスト変換部３１２により変換されたテキストデータに含まれる単語などが、音声アシスタント部３２０が提供する辞書情報にマッチしているか否かを確認し、マッチしている場合に、ジョブの種類を示すアクションと、ジョブ条件などの変数を示すパラメータとに変換する。そして、解釈部３１３は、例えば音声データの取得元であるスマートスピーカー５０を特定するデバイスＩＤなどと共に、アクションおよびパラメータを管理部３３０に対して送信する。出力部３１４は、テキストデータを音声データに合成するＴＴＳ（Text To Speech）を含む。出力部３１４は、通信部３６（図４参照）を通信制御し、スマートスピーカー５０にテキストデータ、音声データ、画像データなどのデータの送信等を行う。 (Details of cloud service functions)
FIG. 23 is a diagram showing an example of the configuration of each function of the cloud service. The operation voice conversion unit 310 includes functions such as an acquisition unit 311, a text conversion unit 312, an interpretation unit 313, and an output unit 314, as shown in FIG. The acquisition unit 311 acquires audio data transmitted from the smart speaker 50 (audio data input by the user). The acquisition unit 311 may also acquire data indicating operations performed by the user on the touch panel 27 of the smart speaker 50, physical switches (including buttons and the like), and the like. The text conversion unit 312 includes an STT (Speech To Text) that converts voice data (user's voice data input by the smart speaker 50) into text data. The interpretation unit 313 interprets the content of the user's instruction based on the text data converted by the text conversion unit 312 . Specifically, the interpreting unit 313 confirms whether the words included in the text data converted by the text converting unit 312 match the dictionary information provided by the voice assistant unit 320, and confirms whether or not they match the dictionary information provided by the voice assistant unit 320. In this case, the action is converted into an action indicating the type of job and a parameter indicating variables such as job conditions. The interpretation unit 313 then transmits the action and the parameter to the management unit 330 together with, for example, a device ID that identifies the smart speaker 50 from which the audio data is obtained. The output unit 314 includes TTS (Text To Speech) for synthesizing text data with voice data. The output unit 314 controls the communication of the communication unit 36 (see FIG. 4), and transmits data such as text data, voice data, and image data to the smart speaker 50 .

なお、この例では、取得部３１１～出力部３１４をソフトウェアで実現することとしたが、これらのうちの一部または全部をＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、取得部３１１～出力部３１４が実現する各機能は、操作音声変換プログラム単体で実現してもよいし、他のプログラムに処理の一部を実行させる、または他のプログラムを用いて間接的に処理を実行させてもよい。また、操作音声変換プログラムの解釈部３１３の機能の一部または全てを音声アシスタントプログラムに実行させてもよい。この場合、例えばテキストデータに含まれる単語などが辞書情報にマッチしているか否かの確認、マッチしている場合にユーザの意図を示すアクションとジョブ条件などの変数を示すパラメータへの変換は、音声アシスタント部３２０が行う。解釈部３１３はアクションおよびパラメータを音声アシスタント部３２０から取得するだけでよい。 In this example, the acquisition unit 311 to the output unit 314 are implemented by software, but part or all of them may be implemented by hardware such as an IC (Integrated Circuit). In addition, each function realized by the acquisition unit 311 to the output unit 314 may be realized by the operation voice conversion program alone, by causing another program to execute part of the processing, or indirectly by using another program. may be allowed to perform the processing. Also, part or all of the functions of the interpretation unit 313 of the operation voice conversion program may be executed by the voice assistant program. In this case, for example, checking whether words contained in the text data match the dictionary information, and if they match, conversion to parameters indicating variables such as actions indicating user's intentions and job conditions, etc. Voice assistant unit 320 performs this. The interpreter 313 only needs to obtain the actions and parameters from the voice assistant 320 .

音声アシスタント部３２０は、図２３に示すように提供部３２１の機能を含む。提供部３２１は、テキストデータとアクションおよびパラメータの関係を予め定義した辞書情報を管理し、操作音声変換部３１０に辞書情報を提供する。なお、音声アシスタント部３２０は、操作音声変換部３１０からテキストデータを受け付けて、そのテキストデータからユーザの操作指示を解釈してもよい。例えば、音声アシスタント部３２０は、解釈部３１３からテキストデータを取得し、テキストデータに含まれる単語などが辞書情報にマッチしているか否かを確認し、マッチしている場合にテキストデータをアクションとパラメータに変換する。その後、アクションおよびパラメータを解釈部３１３に提供する。 The voice assistant unit 320 includes the functions of the providing unit 321 as shown in FIG. The providing unit 321 manages dictionary information that predefines the relationship between text data, actions, and parameters, and provides the operation voice converting unit 310 with the dictionary information. Note that the voice assistant unit 320 may receive text data from the operation voice conversion unit 310 and interpret the user's operation instructions from the text data. For example, the voice assistant unit 320 acquires text data from the interpretation unit 313, checks whether words included in the text data match the dictionary information, and if they match, the text data is treated as an action. Convert to parameters. The actions and parameters are then provided to the interpreter 313 .

なお、この例では、音声アシスタント部３２０（提供部３２１を含む）をソフトウェアで実現することとしたが、そのうちの一部または全部をＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、提供部３２１などの機能は、音声アシスタントプログラム単体で実現してもよいし、他のプログラムに処理の一部を実行させる、または他のプログラムを用いて間接的に処理を実行させてもよい。 In this example, the voice assistant unit 320 (including the providing unit 321) is implemented by software, but part or all of it may be implemented by hardware such as an IC (Integrated Circuit). In addition, the functions of the providing unit 321 and the like may be realized by the voice assistant program alone, may be caused to execute part of the processing by another program, or may be indirectly executed by using another program. good.

管理部３３０は、図２３に示すように、取得部３３１や、解釈結果変換部３３２や、実行指示部３３３や、機器情報取得部３３４や、実行判定部３３５や、通知部３３６や、ＤＢ管理部３３７などの機能を含む。 As shown in FIG. 23, the management unit 330 includes an acquisition unit 331, an interpretation result conversion unit 332, an execution instruction unit 333, a device information acquisition unit 334, an execution determination unit 335, a notification unit 336, and a DB management unit. 337 and other functions.

取得部３３１は、解釈部３１３から解釈結果を取得する。 Acquisition unit 331 acquires an interpretation result from interpretation unit 313 .

解釈結果変換部３３２は、操作音声変換部３１０で変換されたアクションおよびパラメータなどの解釈結果を、ＭＦＰ１が解釈可能なジョブの実行命令に変換する。 The interpretation result conversion unit 332 converts the interpretation result of the action and parameters converted by the operation voice conversion unit 310 into a job execution command that can be interpreted by the MFP 1 .

実行指示部３３３は、ジョブの実行命令をＭＦＰ１に送信することによりジョブの実行を指示する。具体的に、実行指示部３３３は、アクションおよびパラメータと共に、ユーザが音声指示したスマートスピーカー５０のデバイスＩＤを取得する。実行指示部３３３は、取得したデバイスＩＤに対応するＭＦＰ１を紐づけ用ＤＢ３５０（図２１参照）から検索し、検索により得られたＭＦＰ１に対してジョブ実行命令を送信する。 Execution instruction unit 333 instructs execution of a job by transmitting a job execution instruction to MFP 1 . Specifically, the execution instructing unit 333 acquires the device ID of the smart speaker 50 given the voice instruction by the user, along with the action and the parameter. The execution instruction unit 333 searches the MFP 1 corresponding to the obtained device ID from the linking DB 350 (see FIG. 21), and transmits a job execution instruction to the MFP 1 obtained by the search.

機器情報取得部３３４は、登録されている各外部装置（この例ではＭＦＰ１）から機器情報を取得する。例えば、機器情報取得部３３４は、処理可能な最大画素数等の処理能力を示す情報を取得する。また、機器情報取得部３３４は、ＭＦＰ１との間で、通信接続が確立されているか否かを示す接続状態、ＭＦＰ１の電源のＯＮ／ＯＦＦまたはスリープモードであるかを示す電力状態、エラーの有無とエラーの種類、用紙やトナーなどの消耗品の残余状況、ユーザのログイン状態、ログインユーザに使用が許可された機能を示す権限情報、などを含む機器状態を示す情報も設定に応じて適宜取得する。 The device information acquisition unit 334 acquires device information from each registered external device (MFP 1 in this example). For example, the device information acquisition unit 334 acquires information indicating processing capacity such as the maximum number of pixels that can be processed. In addition, the device information acquiring unit 334 obtains the connection status indicating whether or not communication connection is established with the MFP 1, the power status indicating whether the MFP 1 is powered on/off or in sleep mode, and the presence or absence of an error. and error type, remaining amount of consumables such as paper and toner, user login status, authorization information indicating the functions the logged-in user is permitted to use, etc. do.

なお、機器情報取得部３３４は、複数のＭＦＰ１から処理能力などの機器情報を取得した場合、機器情報ＤＢ３６０（図２１参照）において、各外部装置を特定するＩＤなどの情報と紐づけてそれぞれの機器情報を管理する。 Note that when the device information acquisition unit 334 acquires device information such as the processing capability from a plurality of MFPs 1, the device information DB 360 (see FIG. 21) associates each external device with information such as an ID that identifies each external device. Manage device information.

実行判定部３３５は、ＭＦＰ１の処理能力と、ユーザから指定されたジョブ（即ち、操作音声変換部３１０で生成されたアクションおよびパラメータ）とを比較することで、ユーザから指定されたジョブをＭＦＰ１で実行可能か否か判定する。ユーザから指定されたジョブ実行が実行可能と判断した場合はＭＦＰ１に対してジョブ実行命令を送信する。なお、実行不可能と判断した場合は通知部３３６により操作音声変換部３１０を介してスマートスピーカー５０に対してエラーメッセージなどをレスポンス情報としてフィードバックさせてもよい。 The execution determination unit 335 compares the processing capability of the MFP 1 with the job specified by the user (that is, the actions and parameters generated by the operation voice conversion unit 310). Determine whether it is executable or not. If it is determined that the job specified by the user can be executed, it transmits a job execution command to MFP1. Note that when it is determined that the operation cannot be performed, the notification unit 336 may feed back an error message or the like as response information to the smart speaker 50 via the operation voice conversion unit 310 .

通知部３３６は、ユーザのジョブ実行指示への応答としてテキストデータ、音声データ、画像データなどを操作音声変換部３１０へ送信する。また、ジョブの実行するためのジョブ条件を示すパラメータが不足している場合には、操作音声変換部３１０を介してスマートスピーカー５０に対してフィードバックすることでユーザにパラメータの更なる指示を促す。ここで、不足しているパラメータを確認するために必要な情報として、パラメータ情報を送信してもよいし、ユーザにパラメータの指定を促すために必要な情報としてテキストデータ、音声データ、画像データを送信してもよい。 The notification unit 336 transmits text data, voice data, image data, etc. to the operation voice conversion unit 310 as a response to the user's job execution instruction. Further, when the parameters indicating the job conditions for executing the job are insufficient, feedback is provided to the smart speaker 50 via the operation voice conversion unit 310 to prompt the user to further instruct the parameters. Here, parameter information may be sent as information necessary to confirm missing parameters, or text data, voice data, and image data may be sent as information necessary to prompt the user to specify parameters. You may send.

ＤＢ管理部３３７は、管理ＤＢ３４０、紐づけ用ＤＢ３５０、および機器情報ＤＢ３６０を管理する。具体的には、各種テーブルの設定や、各種テーブルに対してのデータの登録、検索、削除、更新などを行う。例えば、ＤＢ管理部３３７は、ＭＦＰ１、スマートスピーカー５０、またはクラウドサービス装置６０のクライアントデバイスに入力された情報および指示に基づいて、スマートスピーカー５０のデバイスＩＤとＭＦＰ１のＩＤとを紐づけて紐づけ用ＤＢ３５０に登録する。紐づけ用ＤＢ３５０は、スマートスピーカー５０のデバイスＩＤとＭＦＰ１のＩＤとを紐づけた情報をテーブルデータなどで保持する。 The DB management unit 337 manages a management DB 340 , a linking DB 350 and a device information DB 360 . Specifically, it sets various tables, and registers, searches, deletes, and updates data in various tables. For example, the DB management unit 337 associates the device ID of the smart speaker 50 with the ID of the MFP 1 based on information and instructions input to the client device of the MFP 1, the smart speaker 50, or the cloud service apparatus 60. Register in DB 350 for use. The linking DB 350 holds information linking the device ID of the smart speaker 50 and the ID of the MFP 1 as table data or the like.

（対話型動作の流れ）
図２４～図２７は、音声操作システムにおいてユーザがシステムと対話してＭＦＰの操作を行う場合の全体の動作の一例を示す図である。図２４は、起動時の動作の流れを示しており、図２５～図２７は、起動後の対話型動作の流れを示している。システムと対話して操作を行う場合、対話のセッション管理が必要になる。対話のセッション管理については後述する。ここでは、一例としてユーザがスマートスピーカー５０を介してカラー画像を両面上下開き、ステープル上２か所で２部のコピーを行う操作を指示する場合の動作を示す。この例では、部数（＝２部）が必須パラメータとなっているが、部数に限定されず、モノクロ、カラー、または、用紙サイズ等、複数のパラメータを必須パラメータに含めてもよい。 (Interactive operation flow)
24 to 27 are diagrams showing an example of the overall operation when the user interacts with the voice operation system to operate the MFP. FIG. 24 shows the flow of operations at startup, and FIGS. 25 to 27 show the flows of interactive operations after startup. Interacting with the system to perform operations requires session management of the interaction. Conversation session management will be described later. Here, as an example, an operation is shown in which the user instructs an operation to open a color image on both sides vertically via the smart speaker 50 and to make two copies at two positions on the staple. In this example, the number of copies (=2 copies) is an essential parameter, but it is not limited to the number of copies, and a plurality of parameters such as monochrome, color, paper size, etc. may be included in the essential parameters.

まず、ユーザによりスマートスピーカー５０（操作処理プログラム）が起動操作された後、例えばユーザがスマートスピーカー５０に起動ワードを音声入力する（ステップＳ１´）。ここで、ユーザが音声アシスタントプログラムを起動するための起動ワードを発話することで、所望の音声アシスタントプログラムを起動させることができる。スマートスピーカー５０（通信制御部２１２）は、クラウド１００（操作音声変換部３１０）に対して起動ワードの音声データを送信する（ステップＳ２´）。 First, after the smart speaker 50 (operation processing program) is activated by the user, for example, the user inputs an activation word into the smart speaker 50 by voice (step S1′). Here, the desired voice assistant program can be started by the user uttering an activation word for starting the voice assistant program. The smart speaker 50 (communication control unit 212) transmits voice data of the activation word to the cloud 100 (operation voice conversion unit 310) (step S2').

クラウド１００では、操作音声変換部３１０（取得部３１１）が、スマートスピーカー５０から送信データを取得し、操作音声変換部３１０（テキスト変換部３１２）が、音声データをテキスト化、つまりテキストデータに変換処理する（ステップＳ３´）。 In the cloud 100, an operation voice conversion unit 310 (acquisition unit 311) acquires transmission data from the smart speaker 50, and an operation voice conversion unit 310 (text conversion unit 312) converts the voice data into text, that is, converts it into text data. process (step S3').

操作音声変換部３１０（解釈部３１３）は、音声アシスタント部３２０（提供部３２１）に辞書情報を要求して音声アシスタント部３２０（提供部３２１）から辞書情報を取得する（ステップＳ４´）。 The operation voice converting unit 310 (interpreting unit 313) requests dictionary information from the voice assistant unit 320 (providing unit 321) and acquires the dictionary information from the voice assistant unit 320 (providing unit 321) (step S4').

さらに、操作音声変換部３１０（解釈部３１３）は、取得した辞書情報から、テキスト解釈を行う（ステップＳ５´）。 Further, the operation voice conversion unit 310 (interpretation unit 313) interprets the text from the acquired dictionary information (step S5').

そして、操作音声変換部３１０（解釈部３１３）は、その解釈結果を、管理部３３０に渡す（ステップＳ６´）。 Then, the operation voice conversion section 310 (interpretation section 313) passes the interpretation result to the management section 330 (step S6').

ここで、管理部３３０は、必要に応じて、紐づけ用ＤＢの検索（ステップＳ７１）、接続状態確認（ステップＳ７２）、アプリ状態確認（ステップＳ７３）、機器情報取得（ステップＳ７４）などを行う。なお、これらの処理の順番は適宜入れ替えてよい。また、各処理は、それぞれ、別のタイミングで行っていれば、ここでの処理を省略してもよい。 Here, the management unit 330 searches the linking DB (step S71), checks the connection status (step S72), checks the application status (step S73), acquires device information (step S74), etc., as necessary. . Note that the order of these processes may be changed as appropriate. Also, if each process is performed at a different timing, the process here may be omitted.

紐づけ用ＤＢの検索（ステップＳ７１）では、管理部３３０（ＤＢ管理部３３７）が、取得したデバイスＩＤ（スマートスピーカー５０のＩＤ）に対応するＭＦＰ１（ＭＦＰ１のＩＤ）を、紐づけ用ＤＢ３５０から検索して取得する。このとき、管理部３３０（通知部３３６）は、デバイスＩＤと紐づくＭＦＰ１のＩＤが検索で得られなかった場合、スマートスピーカー５０が通信対象と紐づけられていないことを操作音声変換部３１０（出力部３１４）を介してユーザに通知する。例えば、管理部３３０（通知部３３６）は、「このデバイスは機器と紐づけられていません」とのレスポンスを含むレスポンス情報を生成する。ここで、管理部３３０（通知部３３６）は、デバイスと通信対象を紐づけする方法をレスポンスに含めてもよい。なお、ステップＳ７１は、デバイスＩＤを取得した他の任意のタイミングで行ってもよい。 In the linking DB search (step S71), the management unit 330 (DB management unit 337) retrieves the MFP 1 (MFP 1 ID) corresponding to the acquired device ID (smart speaker 50 ID) from the linking DB 350. Search and retrieve. At this time, if the ID of the MFP 1 associated with the device ID is not obtained by searching, the management unit 330 (notification unit 336) notifies the operation voice conversion unit 310 ( The user is notified via the output unit 314). For example, the management unit 330 (notification unit 336) generates response information including a response that "this device is not associated with equipment". Here, the management unit 330 (notification unit 336) may include in the response a method for associating the device with the communication target. Note that step S71 may be performed at any timing other than when the device ID is acquired.

接続状態確認（ステップＳ７２）は、管理部３３０が通信対象（この例ではＭＦＰ１）の機器状態を確認する。例えばＤＢ管理部３３７が機器情報ＤＢ３６０の予め取得した機器情報を参照することで確認する。あるは、機器情報取得部３３４が通信対象のＭＦＰ１から機器情報を取得して確認してもよい。ここで、機器状態の確認は、例えば通信対象のＭＦＰ１と通信可能か否かおよびＭＦＰ１が使用可能か否かの確認である。このとき、デバイスＩＤと紐づくＭＦＰ１（確認対象のＭＦＰ１）との接続が確立していない場合、または、そのＭＦＰ１が起動中などで使用できない場合、管理部３３０（通知部３３６）は、操作音声変換部３１０（出力部３１４）を介してユーザに通知を行う。例えば、管理部３３０（通知部３３６）は、「機器がオフラインです」または「機器が準備中です」とのレスポンスを含むレスポンス情報を生成して通知する。ここで、管理部３３０（通知部３３６）は、対策方法をレスポンスに含めてもよい。なお、機器状態の確認は、操作音声変換部３１０（解釈部３１３）からアクションおよびパラメータ、デバイスＩＤを取得した他の任意のタイミングで行ってもよい。 In connection state confirmation (step S72), the management unit 330 confirms the device state of the communication target (MFP 1 in this example). For example, the DB management unit 337 checks by referring to device information acquired in advance in the device information DB 360 . Alternatively, the device information acquisition unit 334 may acquire and confirm the device information from the MFP 1 to be communicated with. Here, confirmation of the device state is, for example, confirmation of whether or not communication with the MFP 1 to be communicated is possible and whether or not the MFP 1 is usable. At this time, if the connection with the MFP 1 (confirmation target MFP 1) associated with the device ID has not been established, or if the MFP 1 cannot be used because it is being activated, the management unit 330 (notification unit 336) outputs an operation voice. The user is notified via the conversion unit 310 (output unit 314). For example, the management unit 330 (notification unit 336) generates and notifies response information including a response of "the device is offline" or "the device is being prepared". Here, the management unit 330 (notification unit 336) may include the countermeasure method in the response. Note that the confirmation of the device status may be performed at any other time when the action, parameter, and device ID are acquired from the operation voice conversion unit 310 (interpretation unit 313).

アプリ状態確認（ステップＳ７３）は、管理部３３０が通信対象のＭＦＰ１にユーザから指定された機能を実行するアプリケーションの状態を確認する。例えばＤＢ管理部３３７が、機器情報ＤＢ３６０の予め取得した機器情報を参照することで確認する。あるいは、機器情報取得部３３４が通信対象のＭＦＰ１から機器情報を取得して確認してもよい。ここで、アプリ状態の確認は、例えばアプリケーションがインストールされているか否か、アプリケーションが実行可能な状態であるか否かの確認である。そして、実行を指示された機能がコピーであったとして、コピーに関するアプリケーションがデバイスＩＤと紐づくＭＦＰ１にインストールされていなかったり、あるいは、アプリケーションが起動中などで使用できなかったりする場合は、管理部３３０（通知部３３６）は、操作音声変換部３１０（出力部３１４）を介してユーザに通知を行う。例えば、管理部３３０（通知部３３６）は、「アプリケーションがインストールされていません」または「アプリケーションは現在利用できません」とのレスポンスを含むレスポンス情報を生成して通知する。ここで、管理部３３０（通知部３３６）は、対策方法をレスポンスに含めてもよい。なお、アプリケーションの状態の確認は、操作音声変換部３１０（解釈部３１３）からアクションおよびパラメータ、デバイスＩＤを取得した他の任意のタイミングで行ってもよい。 In application status confirmation (step S73), management unit 330 confirms the status of an application that executes a function specified by the user in MFP 1 to be communicated with. For example, the DB management unit 337 checks by referring to device information acquired in advance in the device information DB 360 . Alternatively, the device information acquisition unit 334 may acquire and confirm the device information from the MFP 1 to be communicated with. Here, confirmation of the application state is, for example, confirmation of whether or not the application is installed and whether or not the application is in an executable state. Assuming that the function instructed to be executed is copying, if an application related to copying is not installed in the MFP 1 associated with the device ID, or if the application cannot be used because it is running, the management unit 330 (notification unit 336) notifies the user via operation voice conversion unit 310 (output unit 314). For example, the management unit 330 (notification unit 336) generates and notifies response information including a response of "the application is not installed" or "the application is currently unavailable". Here, the management unit 330 (notification unit 336) may include the countermeasure method in the response. Note that confirmation of the status of the application may be performed at any other timing when the action, parameter, and device ID are acquired from the operation voice conversion unit 310 (interpretation unit 313).

機器情報取得（ステップＳ７４）は、管理部３３０が通信対象（この例ではＭＦＰ１）の機器情報を取得する。例えばＤＢ管理部３３７が機器情報ＤＢ３６０の予め取得した機器情報を取得する。あるは、機器情報取得部３３４が通信対象のＭＦＰ１から機器情報を取得してもよい。ここで取得する機器状態は、例えば通信対象のＭＦＰ１において、ユーザが指示したジョブ種類およびジョブ条件が実行可能か否かを判断する際に利用される。 In device information acquisition (step S74), the management unit 330 acquires device information of a communication target (MFP 1 in this example). For example, the DB management unit 337 acquires previously acquired device information from the device information DB 360 . Alternatively, the device information acquisition unit 334 may acquire device information from the MFP 1 to be communicated with. The device status acquired here is used, for example, in the communication target MFP 1 when determining whether or not the job type and job conditions specified by the user can be executed.

これらの処理が起動後の任意のタイミングで完了していると、管理部３３０（実行判定部３３５）は、必須パラメータ不足判断を行う（ステップＳ７５）。管理部３３０（実行判定部３３５）は、必須パラメータ不足判断において、解釈結果のアクションおよびパラメータに基づいてジョブ実行に必要な条件が全て揃っているか否かを判断する。 If these processes are completed at an arbitrary timing after activation, the management unit 330 (execution determination unit 335) determines that essential parameters are insufficient (step S75). The management unit 330 (execution determination unit 335) determines whether or not all the conditions necessary for job execution are met based on the action and parameters of the interpretation result in the essential parameter shortage determination.

なお、音声アシスタントプログラムの起動を指示する際においてジョブの種類および必須の設定条件をすべて指定していた場合には、これ以降に示す「入力フィードバック」のステップは省略し、ＭＦＰ１に実行を指示してもよい。 Note that if all the job types and essential setting conditions have been specified when instructing to start the voice assistant program, the "input feedback" step shown below is omitted and the MFP 1 is instructed to execute. may

この段階では、音声で起動が指示されただけで、それ以外に、ＭＦＰ１が有する複数のアクションやパラメータの指定をユーザが受け付けていないため、管理部３３０（実行判定部３３５）は、必須パラメータを満たしていないと判断する。音声アシスタントプログラムの起動を指示する際に必須の条件の指示漏れがある場合にも必須パラメータを満たしていないと判断する。従って、管理部３３０（通知部３３６）はレスポンス情報を作成し、操作音声変換部３１０（出力部３１４）を介してスマートスピーカー５０にレスポンス情報を送信する（ステップＳ７６、ステップＳ７７）。 At this stage, the user has not received the designation of a plurality of actions and parameters that the MFP 1 has, except that the activation is instructed by voice. determine that it is not satisfied. Even if there is an omission of an instruction of essential conditions when instructing activation of the voice assistant program, it is determined that the essential parameters are not satisfied. Therefore, the management unit 330 (notification unit 336) creates response information and transmits the response information to the smart speaker 50 via the operation voice conversion unit 310 (output unit 314) (steps S76 and S77).

なお、管理部３３０（ＤＢ管理部３３７）は、このスマートスピーカー５０との通信のセッションを管理ＤＢ３４０で管理する。管理部３３０（通知部３３６）はスマートスピーカー５０にレスポンス情報を送信する際に、セッションが継続していることを示す状態情報を送信することができる。ここで状態情報は、セッションが継続していることを示す情報である。後段の手順においても適宜説明を省略しているが、クラウド１００がスマートスピーカー５０に問合せを行う場合、状態情報を含めてスマートスピーカー５０に送信する。 Note that the management unit 330 (DB management unit 337 ) manages the communication session with the smart speaker 50 in the management DB 340 . When the management unit 330 (notification unit 336) transmits response information to the smart speaker 50, it can transmit state information indicating that the session is continuing. Here, the state information is information indicating that the session is continuing. Although descriptions are omitted as appropriate in the subsequent procedures, when the cloud 100 makes an inquiry to the smart speaker 50 , it transmits to the smart speaker 50 including state information.

レスポンス情報には、ユーザに問合せる内容としてテキストデータ、音声データ、画像データを含めることができる。ここでは一例として、「コピーしますか？スキャンしますか？」の音声データを送信する。これにより、スマートスピーカー５０（フィードバック部２１３）は、「コピーしますか？スキャンしますか？」と音声でフィードバックを行う（ステップＳ７８）。 The response information can include text data, voice data, and image data as contents of inquiry to the user. Here, as an example, voice data of "Do you want to copy? Do you want to scan?" is transmitted. As a result, the smart speaker 50 (feedback unit 213) provides audio feedback such as "Do you want to copy? Do you want to scan?" (step S78).

なお、ユーザに対してジョブの種類またはジョブの設定条件の入力を促すメッセージであればフィードバックの内容はこれに限定されない。更に、ユーザに対するフィードバックは、音声出力だけでなく、タッチパネルにテキストまたは画像を表示することで行ってもよい。その場合、スマートスピーカー５０にテキストデータや画像データ（表示情報）などを送信する。 Note that the content of the feedback is not limited to this as long as it is a message prompting the user to enter the type of job or setting conditions for the job. Further, the feedback to the user may be provided by displaying text or images on the touch panel as well as voice output. In that case, text data, image data (display information), and the like are transmitted to the smart speaker 50 .

ステップＳ７８の後に、ユーザが「コピー」と発話した場合（音声アシスタントプログラムの起動指示の際に「コピー」と発話していた場合も同様）、次のように処理が進む。ユーザが発話した音声は、スマートスピーカー５０（取得部２１１）により音声データとして取得される（ステップＳ１－１）。スマートスピーカー５０（通信制御部２１２）は、この「コピー」の音声データをクラウド１００に送信する（ステップＳ２－１）。このとき、スマートスピーカー５０（通信制御部２１２）は、スマートスピーカー５０を特定するデバイスＩＤをクラウド１００に送信する。 If the user utters "copy" after step S78 (the same applies if the user utters "copy" when instructing to start the voice assistant program), the process proceeds as follows. The voice uttered by the user is acquired as voice data by the smart speaker 50 (acquisition unit 211) (step S1-1). The smart speaker 50 (communication control unit 212) transmits this “copy” voice data to the cloud 100 (step S2-1). At this time, the smart speaker 50 (communication control unit 212 ) transmits the device ID that identifies the smart speaker 50 to the cloud 100 .

クラウド１００では、操作音声変換部３１０（取得部３１１）が、その音声データを取得し、その後は、ステップＳ３´～ステップＳ５´と同様に、操作音声変換部３１０においてテキスト解釈までの処理が行われ（ステップＳ３－１～ステップＳ５－１）、解釈結果が管理部３３０へ渡される（ステップＳ６－１）。ここでは、「コピーして」に対応する「Ｃｏｐｙ＿Ｅｘｅｃｕｔｅ」としたアクションを解釈結果として渡す。 In the cloud 100, the operation voice conversion unit 310 (acquisition unit 311) acquires the voice data, and then the operation voice conversion unit 310 performs processing up to text interpretation in the same manner as in steps S3′ to S5′. Then (steps S3-1 to S5-1), the interpretation result is passed to the management unit 330 (step S6-1). Here, an action of "Copy_Execute" corresponding to "copy" is passed as an interpretation result.

そして、管理部３３０（実行判定部３３５）は、改めて必須パラメータ不足判断を行う（ステップＳ７５―１）。この例の場合、ユーザは、「コピー」としか発話しておらず、必須パラメータのコピー部数等の設定値が不明である。 Then, the management unit 330 (execution determination unit 335) again determines the shortage of essential parameters (step S75-1). In this example, the user only utters "copy", and the set values such as the number of copies of the essential parameters are unknown.

このため、クラウド１００がスマートスピーカー５０に対して不足しているパラメータを問い合わせる。具体的には、この段階では設定値不足であったため、管理部３３０（通知部３３６）が「設定値を入力してください」を含むレスポンス情報を生成し、操作音声変換部３１０（出力部３１４）を介して「設定値を入力してください」の音声データをスマートスピーカー５０に送信する（ステップＳ７５－１～ステップＳ７７－１）。そして、スマートスピーカー５０（フィードバック部２１３）が「設定値を入力してください」との音声出力を行う（ステップＳ７８－１）。なお、この場合も、音声出力の他に、タッチパネル２７において「設定値を入力してください」とのテキスト表示などを行ってもよい。 Therefore, the cloud 100 queries the smart speaker 50 for the missing parameters. Specifically, since the set values are insufficient at this stage, the management unit 330 (notification unit 336) generates response information including "Please enter the set values", and the operation voice conversion unit 310 (output unit 314 ) to the smart speaker 50 (step S75-1 to step S77-1). Then, the smart speaker 50 (feedback unit 213) outputs a voice message "Please input the setting value" (step S78-1). Also in this case, in addition to voice output, the touch panel 27 may display a text message such as "Please enter a setting value."

次に、入力不足フィードバックがあったため、ユーザは、例えば「カラー両面上下開きステープル上２か所で」と発話する。ユーザが発話した音声は、スマートスピーカー５０（取得部２１１）により音声データとして取得される（ステップＳ１－２）。スマートスピーカー５０（通信制御部２１２）は、この「カラー両面上下開きステープル上２か所で」の音声データをクラウド１００に送信する（ステップＳ２－２）。このとき、スマートスピーカー５０（通信制御部２１２）は、スマートスピーカー５０を特定するデバイスＩＤをクラウド１００に送信する。 Next, since there is insufficient input feedback, the user utters, for example, "at two places on color double-sided top and bottom opening staples". The voice uttered by the user is acquired as voice data by the smart speaker 50 (acquisition unit 211) (step S1-2). The smart speaker 50 (communication control unit 212) transmits the voice data of "at two places on the color double-sided top and bottom opening staples" to the cloud 100 (step S2-2). At this time, the smart speaker 50 (communication control unit 212 ) transmits the device ID that identifies the smart speaker 50 to the cloud 100 .

クラウド１００では、操作音声変換部３１０（取得部３１１）が、その音声データを取得し、その後は、ステップＳ３´～ステップＳ５´と同様に、操作音声変換部３１０においてテキスト解釈までの処理が行われ（ステップＳ３－２～ステップＳ５－２）、解釈結果が管理部３３０へ渡される（ステップＳ６－２）。 In the cloud 100, the operation voice conversion unit 310 (acquisition unit 311) acquires the voice data, and then the operation voice conversion unit 310 performs processing up to text interpretation in the same manner as in steps S3′ to S5′. Then (steps S3-2 to S5-2), the interpretation result is passed to the management unit 330 (step S6-2).

ここでは、操作音声変換部３１０（解釈部３１３）が「Parameter:カラー／モノクロ＝カラー、印刷面＝両面、開き方向＝上下開き、ステープル＝上２ヶ所」としたパラメータを解釈結果として生成し、その解釈結果を管理部３３０へ渡す。 Here, the operation voice conversion unit 310 (interpretation unit 313) generates as an interpretation result a parameter with "Parameters: color/monochrome = color, printing surface = double-sided, opening direction = top and bottom opening, staple = top two places", The interpretation result is passed to the management unit 330 .

具体的に、管理部３３０（ＤＢ管理部３３７）は、前回の発話の解釈結果と今回の発話の解釈結果とを統合してアクションおよびパラメータを完成させる。つまり、この例では、「Ｃｏｐｙ＿Ｅｘｅｃｕｔｅ」としたアクション、および「Parameter:カラー／モノクロ＝カラー、印刷面＝両面、開き方向＝上下開き、ステープル＝上２ヶ所」としたパラメータを完成させる。管理部３３０（実行判定部３３５）は、この統合された解釈結果に基づいて改めて必須パラメータ不足判断を行う。この例の場合、ユーザが「２部」と発話したことで、コピーのジョブに対する必須パラメータ不足が解消する。 Specifically, the management unit 330 (DB management unit 337) completes the action and parameters by integrating the interpretation result of the previous utterance and the interpretation result of the current utterance. That is, in this example, the action "Copy_Execute" and the parameters "Parameters: color/monochrome = color, print surface = double-sided, opening direction = top and bottom opening, staple = top two places" are completed. The management unit 330 (execution determination unit 335) again determines the lack of essential parameters based on the integrated interpretation result. In this example, the user's utterance of "two copies" resolves the shortage of essential parameters for the copy job.

この例では、続いて管理部３３０（通知部３３６）は、入力確認フィードバックを行うために「カラー、両面、上下開き、ステープル上２か所でコピーします。よろしいですか？」とのレスポンス情報を生成し、操作音声変換部３１０（出力部３１４）を介して「カラー、両面、上下開き、ステープル上２か所でコピーします。よろしいですか？」の音声データをスマートスピーカー５０に送信する（ステップＳ７５－３～ステップＳ７７－３）。そして、スマートスピーカー５０（フィードバック部２１３）が「両面で２部コピーします。よろしいですか？」との音声出力を行う（ステップＳ７８－３）。なお、この場合も、音声出力の他に、タッチパネル２７において「カラー、両面、上下開き、ステープル上２か所でコピーします。よろしいですか？」とのテキスト表示などを行ってもよい。ここで、レスポンス情報に含まれるテキストデータや音声データを出力することに代えて、レスポンス情報に含まれる情報に基づいてスマートスピーカー５０の記憶部に記憶されたテキストデータを組み合わせて出力情報を生成してもよい。 In this example, the management unit 330 (notification unit 336) subsequently sends the response information "color, double-sided, top and bottom opening, and two stapled top copies. Are you sure?" to provide input confirmation feedback. , and transmits voice data to the smart speaker 50 via the operation voice conversion unit 310 (output unit 314) "Color, double-sided, open top/bottom, copy in 2 places on top of staple. Are you sure?" (Steps S75-3 to S77-3). Then, the smart speaker 50 (feedback section 213) outputs a voice saying "I will copy two copies on both sides. Are you sure?" (step S78-3). Also in this case, in addition to voice output, the touch panel 27 may display text such as "color, double-sided, top and bottom opening, and two stapled top copies. Are you sure?" Here, instead of outputting the text data and voice data included in the response information, the output information is generated by combining the text data stored in the storage unit of the smart speaker 50 based on the information included in the response information. may

その後、ユーザは、この入力確認フィードバックに対して、設定値の変更またはコピーの開始を端末に発話する。ユーザが設定値を変更する内容を発話した場合は、その発話した設定値の変更について、スマートスピーカー５０からクラウド１００へ音声データが送信され、クラウド１００で設定値の変更が行われて、スマートスピーカー５０に設定値の変更が行われたことについて音声でフィードバックされる。音声のフィードバックは、例えば「○○設定でコピーします。よろしいですか？」など、変更された設定値でコピーを開始してよいか否かの確認を行う。 After that, the user speaks to the terminal to change the setting value or start copying in response to this input confirmation feedback. When the user utters a content to change the setting value, voice data is transmitted from the smart speaker 50 to the cloud 100 regarding the change in the uttered setting value, and the setting value is changed in the cloud 100, and the smart speaker At 50, voice feedback is provided about the fact that the setting value has been changed. The voice feedback is, for example, "Copy with XX settings. Are you sure?"

その後も、ユーザが設定値を変更する内容を発話する場合は、この手続きが繰り返される。従って、「カラー、両面、上下開き、ステープル上２か所でコピーします。よろしいですか？」との音声出力が行われた後は、ユーザが設定値を変更する内容を発話した回数（ｋ回）だけ、手続きが繰り返される。 After that, if the user speaks to change the setting value, this procedure is repeated. Therefore, after the voice output of "color, double-sided, top-bottom opening, two-position copy on staple. Are you sure?" is output, the number of times (k times) the procedure is repeated.

ユーザが「はい」と応答するなどしてコピーの開始を指示した場合は、図２７に示されるｎ番目の手順が行われる。つまり、ユーザが発話した音声は、スマートスピーカー５０（取得部２１１）により音声データとして取得される（ステップＳ１－ｎ）。スマートスピーカー５０（通信制御部２１２）は、この「はい」の音声データをクラウド１００に送信する（ステップＳ２－ｎ）。このとき、スマートスピーカー５０（通信制御部２１２）は、スマートスピーカー５０を特定するデバイスＩＤをクラウド１００に送信する。 If the user gives an instruction to start copying by answering "yes" or the like, the n-th procedure shown in FIG. 27 is performed. That is, the voice uttered by the user is acquired as voice data by the smart speaker 50 (acquisition unit 211) (step S1-n). The smart speaker 50 (communication control unit 212) transmits this "yes" voice data to the cloud 100 (step S2-n). At this time, the smart speaker 50 (communication control unit 212 ) transmits the device ID that identifies the smart speaker 50 to the cloud 100 .

クラウド１００では、操作音声変換部３１０（取得部３１１）が、その音声データを取得し、その後は、ステップＳ３´～ステップＳ５´と同様に、操作音声変換部３１０においてテキスト解釈までの処理が行われ（ステップＳ３－ｎ～ステップＳ５－ｎ）、解釈結果が管理部３３０へ渡される（ステップＳ６－ｎ）。 In the cloud 100, the operation voice conversion unit 310 (acquisition unit 311) acquires the voice data, and then the operation voice conversion unit 310 performs processing up to text interpretation in the same manner as in steps S3′ to S5′. Then (steps S3-n to S5-n), the interpretation result is passed to the management unit 330 (step S6-n).

ここで、操作音声変換部３１０（解釈部３１３）は、コピー開始指示を認識すると、その解釈結果を管理部３３０に渡し、管理部３３０（実行判定部３３５）が、最終確認ＯＫと判断する（ステップＳ７５―ｎ）。 Here, when operation voice conversion unit 310 (interpretation unit 313) recognizes the copy start instruction, operation voice conversion unit 310 (interpretation unit 313) passes the interpretation result to management unit 330, and management unit 330 (execution determination unit 335) determines that the final confirmation is OK ( step S75-n).

これにより、管理部３３０（解釈結果変換部３３２）は、解釈結果をＭＦＰ１のジョブ命令に変換処理する（ステップＳ７６）。そして管理部３３０（実行指示部３３３）は、変換処理した実行指示情報をＭＦＰ１に送信する（ステップＳ８）。これにより、音声入力操作により、ＭＦＰ１をコピー制御することができる。 As a result, the management unit 330 (interpretation result conversion unit 332) converts the interpretation result into a job command for the MFP 1 (step S76). The management unit 330 (execution instruction unit 333) then transmits the converted execution instruction information to the MFP 1 (step S8). As a result, the MFP 1 can be copy-controlled by the voice input operation.

図２８は、スマートスピーカー５０の表示部に表示される画面を示す正面図である。図２８に示すように、スマートスピーカー５０の表示部に表示される画面は、図１３に示した携帯端末装置２に表示される画面と同じである。 FIG. 28 is a front view showing a screen displayed on the display unit of smart speaker 50. As shown in FIG. As shown in FIG. 28, the screen displayed on the display unit of the smart speaker 50 is the same as the screen displayed on the mobile terminal device 2 shown in FIG.

スマートスピーカー５０に対する発話、及びフィードバックの処理については、第１の実施の形態で示した処理と同じである。概略的には、スマートスピーカー５０は、ユーザが発話した内容と、クラウドサービス装置６０（操作音声変換プログラム）から受信したレスポンス情報を出力する。レスポンス情報は、テキストデータ、音声データ、画像データの少なくともいずれかを含む情報である。 Processing of speech and feedback to the smart speaker 50 is the same as the processing shown in the first embodiment. Schematically, the smart speaker 50 outputs the content uttered by the user and the response information received from the cloud service device 60 (operation voice conversion program). The response information is information including at least one of text data, voice data, and image data.

なお、図２８においてスマートスピーカー５０のタッチパネル２７の画面右側から吹き出し表示されるコメントは、ユーザがスマートスピーカー５０に対して発話した内容を示すコメントを示す。また、図２８においてスマートスピーカー５０のタッチパネル２７の画面左側から吹き出し表示されるコメントは、クラウドサービス装置６０からユーザの発話に対して音声フィードバックされた内容を示すコメント及び画像は、クラウドサービス装置６０からユーザの発話に対してフィードバックされた内容を示すコメント、又は画像（スタンプ）である。つまり、スマートスピーカー５０は、クラウドサービス装置６０からフィードバック情報を受信した場合、音声出力でユーザへフィードバックすると同時に、画面表示によってもフィードバックを行う。ただし、音声出力のフィードバックについては省略しても良い。 Note that the comment balloon-displayed from the right side of the screen of the touch panel 27 of the smart speaker 50 in FIG. Also, in FIG. 28 , the comment balloon-displayed from the left side of the screen of the touch panel 27 of the smart speaker 50 indicates the contents of voice feedback from the cloud service device 60 in response to the user's utterance. It is a comment or an image (stamp) indicating the content of feedback to the user's utterance. In other words, when the smart speaker 50 receives the feedback information from the cloud service device 60, the smart speaker 50 outputs the feedback to the user through voice output, and at the same time, also provides the feedback through the screen display. However, feedback of audio output may be omitted.

ここで、図２４～図２７を用いて説明すると、「コピーしますか？スキャンしますか？」のコメントは、ステップＳ７８の音声フィードバックと共にスマートスピーカー５０のタッチパネル２７の画面に表示される。 24 to 27, the comment "Do you want to copy? Do you want to scan?"

なお、スマートスピーカー５０の操作音声処理プログラムは、クラウドサービス装置６０からのレスポンス情報に基づいて表示するテキストを生成したり、スマートスピーカー５０のＲＯＭ２３などに予め記憶されたテキストデータを表示させたりしても良い。また、レスポンス情報に含まれるテキストデータ及び音声データをそのまま表示しても良い。 Note that the operation voice processing program of the smart speaker 50 generates text to be displayed based on the response information from the cloud service device 60, and displays text data pre-stored in the ROM 23 of the smart speaker 50. Also good. Also, the text data and voice data included in the response information may be displayed as they are.

スマートスピーカー５０の操作音声処理プログラムは、「コピー」のコメントを、クラウドサービス装置６０（操作音声変換プログラム）が音声データをテキストデータに変換したものをレスポンス情報として受信してスマートスピーカー５０のタッチパネル２７の画面に表示させることができる。 The operation voice processing program of the smart speaker 50 receives the comment "copy" as response information, which is converted from voice data to text data by the cloud service device 60 (operation voice conversion program), and displays the response information on the touch panel 27 of the smart speaker 50. can be displayed on the screen.

なお、クラウドサービス装置６０（操作音声変換プログラム）は、任意のタイミングでレスポンス情報を送信することができる。例えば、クラウドサービス装置６０（操作音声変換プログラム）は、テキストデータに変換したタイミングで「コピー」のレスポンス情報を生成してスマートスピーカー５０に対して送信しても良い（この場合は、「コピー」だけが表示される）。 Note that the cloud service device 60 (operation voice conversion program) can transmit the response information at any timing. For example, the cloud service device 60 (operation voice conversion program) may generate response information of “copy” at the timing of conversion into text data and transmit it to the smart speaker 50 (in this case, “copy” is displayed).

また、クラウドサービス装置６０（管理プログラム）は、「設定値を入力してください」のレスポンス情報を生成するタイミングで、「コピー」のレスポンス情報も生成して、操作音声変換プログラムを介してスマートスピーカー５０に送信しても良い（この場合、「コピー」と「設定値を入力してください」がほぼ同時にスマートスピーカー５０のタッチパネル２７に画面表示される。 In addition, the cloud service device 60 (management program) also generates the response information of "copy" at the timing of generating the response information of "Please enter the setting value", 50 (In this case, "Copy" and "Please enter the setting value" are displayed on the touch panel 27 of the smart speaker 50 almost simultaneously.

なお、操作音声変換プログラムは、管理プログラムに対して「Ｃｏｐｙ＿Ｅｘｅｃｕｔｅ」としたインテントを解釈結果として送信する際に、「コピー」のレスポンス情報を生成するために必要な情報を送信しても良い。 Note that the operation voice conversion program may transmit information necessary for generating response information of "copy" when transmitting the intent of "Copy_Execute" to the management program as an interpretation result.

また、レスポンス情報は操作音声変換プログラムが作成して、管理プログラムが操作音声変換プログラムを介して「設定値を入力してください」のレスポンス情報をスマートスピーカー５０へ送信する際に、一緒に「コピー」のレスポンス情報をスマートスピーカー５０へ送信しても良い。 In addition, the response information is created by the operation voice conversion program, and when the management program transmits the response information of "Please enter the setting value" to the smart speaker 50 via the operation voice conversion program, "copy ” may be transmitted to the smart speaker 50 .

スマートスピーカー５０の操作音声処理プログラムは、「設定値を入力してください」のコメントは、図２４～図２７のステップＳ７８－１の音声フィードバックと共にスマートスピーカー５０のタッチパネル２７の画面に表示される。即ち、クラウドサービス装置６０（管理プログラム）から受信したレスポンス情報に基づいてコメントを表示させる。 The operation voice processing program of the smart speaker 50 displays the comment "Please enter the setting value" on the screen of the touch panel 27 of the smart speaker 50 along with voice feedback in step S78-1 of FIGS. That is, the comment is displayed based on the response information received from the cloud service device 60 (management program).

スマートスピーカー５０の操作音声処理プログラムは、「カラー、両面、上下開き、ステープル上２ヶ所」のコメントを、クラウドサービス装置６０（操作音声変換プログラム）が音声データをテキストデータに変換したものを受信して画面に表示させることができる。表示の方法については、「コピー」と話した場合と同様である。 The operation voice processing program of the smart speaker 50 receives the comment "color, double-sided, top and bottom opening, two places on the staple", which is converted from voice data into text data by the cloud service device 60 (operation voice conversion program). can be displayed on the screen. The display method is the same as when "copy" is spoken.

併せて、スマートスピーカー５０の操作音声処理プログラムは、入力不足ではないと判断した場合、仕上がりイメージを示す画像（スタンプ）をタッチパネル２７に表示させ、コピー開始の指示を促す。 At the same time, if the operation voice processing program of the smart speaker 50 determines that the input is not insufficient, it displays an image (stamp) indicating a finished image on the touch panel 27 and prompts an instruction to start copying.

このように、スマートスピーカー５０は、予めスマートスピーカー５０に記憶されたテキストデータ、クラウドサービス装置６０から受信したテキストデータ又はレスポンス情報に基づいて、コメントをスマートスピーカー５０のタッチパネル２７の画面に表示させる。 In this way, the smart speaker 50 displays comments on the screen of the touch panel 27 of the smart speaker 50 based on text data stored in advance in the smart speaker 50, text data received from the cloud service device 60, or response information.

ここで、第１の実施の形態で説明した具体例を本実施の形態に適用した場合について説明する。 Here, a case where the specific example described in the first embodiment is applied to the present embodiment will be described.

クラウドサービス装置６０（管理プログラム）は、レスポンス情報として、音声出力でフィードバックするためのテキストデータ又は音声データと、表示出力でフィードバックするためのテキストデータ又は画像データ（スタンプ）を送信することができる。また、レスポンス情報として、インテント及びパラメータなど、ジョブの種類とジョブの設定条件を示す情報を含んでいても良い。この場合、クラウドサービス装置６０（操作音声変換プログラム）は、操作音声変換プログラムから取得した解析結果であるインテント及びパラメータをレスポンス情報として含むことができる。 The cloud service device 60 (management program) can transmit, as response information, text data or voice data for voice output feedback, and text data or image data (stamp) for display output feedback. Also, the response information may include information indicating the job type and job setting conditions, such as intent and parameters. In this case, the cloud service device 60 (operation voice conversion program) can include, as response information, intents and parameters that are analysis results obtained from the operation voice conversion program.

このとき、スマートスピーカー５０の操作音声処理プログラムは、クラウドサービス装置６０からフィードバックされた解析結果をコメント表示する代わりに、図２８に示すように、解析結果に基づく仕上がりイメージを画像（スタンプ）として表示することができる。 At this time, the operation voice processing program of the smart speaker 50 displays the finished image based on the analysis result as an image (stamp) as shown in FIG. can do.

スマートスピーカー５０の操作音声処理プログラムは、解析結果の「インテント」が「Copy_execute」であった場合に、「パラメータ」を参照する。そして、スマートスピーカー５０の操作音声処理プログラムは、「パラメータ」の値に一致する仕上がりイメージを示す画像（スタンプ）を検索し、検索した画像（スタンプ）をタッチパネル２７に表示させる。ここで、「パラメータ」に複数の設定値が設定されている場合は、全ての設定値を満足する画像（スタンプ）を検索する。例えば、スマートスピーカー５０のＲＯＭ２３には、設定値と対応付けて画像（スタンプ）が記憶されている。例えば、図１６に示すようにテーブルデータとして記憶することができる。なお、全ての設定値を満足する画像（スタンプ）がない場合は、最も近い一の画像（スタンプ）を表示しても良い。 The operation voice processing program of the smart speaker 50 refers to the "parameter" when the "intent" of the analysis result is "Copy_execute". Then, the operation voice processing program of the smart speaker 50 searches for an image (stamp) indicating a finished image that matches the value of the “parameter” and causes the touch panel 27 to display the searched image (stamp). Here, if a plurality of setting values are set in the "parameter", an image (stamp) that satisfies all the setting values is searched. For example, the ROM 23 of the smart speaker 50 stores images (stamps) in association with setting values. For example, it can be stored as table data as shown in FIG. If there is no image (stamp) that satisfies all set values, the closest image (stamp) may be displayed.

なお、図１６に示すテーブルデータは、スマートスピーカー５０ではなく、スマートスピーカー５０がアクセス可能な外部装置に記憶されていても良い。例えば、ネットワーク５を介して接続されたサーバに記憶されていても良い。この場合、スマートスピーカー５０の操作音声処理プログラムは、サーバにアクセスして解析結果に含まれる設定値を送信し、サーバからの応答として該設定値を満たす画像（スタンプ）を取得することができる。これに限定されずスタンプはクラウドサービス装置６０のＨＤＤ３４に記憶しても良い。 Note that the table data shown in FIG. 16 may be stored not in the smart speaker 50 but in an external device that the smart speaker 50 can access. For example, it may be stored in a server connected via network 5 . In this case, the operation voice processing program of the smart speaker 50 can access the server, transmit the setting value included in the analysis result, and acquire an image (stamp) that satisfies the setting value as a response from the server. The stamp is not limited to this and may be stored in the HDD 34 of the cloud service device 60 .

以上では、クラウドサービス装置６０からのフィードバックに基づいてスマートスピーカー５０が画像（スタンプ）を検索する場合について説明したが、これに限定されず、クラウドサービス装置６０で画像（スタンプ）を検索しても良い。この場合、レスポンス情報に、仕上がりイメージを示す画像（スタンプ）を含めてスマートスピーカー５０へ送信する。ここで、クラウドサービス装置６０（操作音声変換プログラム）が画像（スタンプ）を検索した上でスマートスピーカー５０へ送信しても良いが、他のプログラム（例えば操作音声変換プログラム）が検索及び送信しても良い。 Although the case where the smart speaker 50 searches for images (stamps) based on feedback from the cloud service device 60 has been described above, the present invention is not limited to this, and images (stamps) may be searched for by the cloud service device 60. good. In this case, the response information including an image (stamp) indicating the finished image is transmitted to the smart speaker 50 . Here, the cloud service device 60 (operation voice conversion program) may retrieve images (stamps) and transmit them to the smart speaker 50, but other programs (for example, operation voice conversion programs) may retrieve and transmit images (stamps). Also good.

スマートスピーカー５０は、受信した画像をタッチパネル２７に表示させる。このとき、クラウドサービス装置６０（管理プログラム）は、操作音声変換プログラムから取得した解析結果に含まれる「パラメータ」の値に一致する仕上がりイメージを示す画像（スタンプ）を検索する。クラウドサービス装置６０（管理プログラム）は、クラウドサービス装置６０が有するＨＤＤ４４、又はクラウドサービス装置６０がアクセス可能なサーバに問い合わせることで、イメージ画像を示す画像（スタンプ）を検索、取得することができる。 The smart speaker 50 causes the touch panel 27 to display the received image. At this time, the cloud service device 60 (management program) searches for an image (stamp) showing a finished image that matches the value of the "parameter" included in the analysis result obtained from the operation voice conversion program. The cloud service device 60 (management program) can search and acquire an image (stamp) representing an image by querying the HDD 44 of the cloud service device 60 or a server accessible by the cloud service device 60.

また、解析結果が「インテント:Copy_confirm」、「パラメータ:印刷面＝両面、部数＝２」である場合、仕上がりイメージを示す画像（スタンプ）としては、図１７のように表示することができる。図１７に示す例では、部数を示す数字である「２」を仕上がりイメージと共に表示する。 Further, when the analysis result is "intent: Copy_confirm" and "parameters: print surface = double-sided, number of copies = 2", an image (stamp) showing a finished image can be displayed as shown in FIG. In the example shown in FIG. 17, the number "2" indicating the number of copies is displayed together with the finish image.

スマートスピーカー５０は、仕上がりイメージを示す画像（スタンプ）を表示することに加えて、「両面で２部コピーします。よろしいですか？」という音声フィードバックを行っても良いし、音声フィードバックは省略しても良い。また、仕上がりイメージを表示することに加えて、「両面で２部コピーします。よろしいですか？」というコメント表示しても良いし、コメント表示は省略しても良い。 In addition to displaying an image (stamp) indicating a finished image, the smart speaker 50 may provide voice feedback such as "I will make two copies of both sides. Are you sure?" can be Also, in addition to displaying the finished image, a comment "I will copy two copies on both sides. Are you sure?" may be displayed, or the comment display may be omitted.

なお、仕上がりイメージを示す画像（スタンプ）は、スマートスピーカー５０のタッチパネル２７によって選択可能に表示することができる。例えば、スマートスピーカー２は、過去のジョブ実行時に表示されたコメント及び画像を、スマートスピーカー２のＲＯＭ２３に記憶しておくことができる。 An image (stamp) indicating a finished image can be displayed in a selectable manner on the touch panel 27 of the smart speaker 50 . For example, the smart speaker 2 can store in the ROM 23 of the smart speaker 2 comments and images that were displayed when jobs were executed in the past.

または、スマートスピーカー２は、過去のジョブ実行時に表示されたコメント及び画像を、クラウドサービス装置６０に記憶しておいても良い。操作音声処理プログラムが起動した場合、又はクラウドサービス装置６０のプログラムが呼び出されたタイミングで、操作音声変換プログラムが、又は管理プログラムが操作音声変換プログラムを介して、所定のタイミングで記憶情報をスマートスピーカー２へ送信しても良い。 Alternatively, the smart speaker 2 may store in the cloud service device 60 comments and images that were displayed when jobs were executed in the past. When the operation voice processing program is started, or when the program of the cloud service device 60 is called, the operation voice conversion program or the management program sends the stored information to the smart speaker at a predetermined timing via the operation voice conversion program. You can send it to 2.

これにより、操作音声処理プログラムが起動すると、図２８に示すように、過去のジョブ実行時に表示されたコメント及び画像を表示することができる。 As a result, when the operation voice processing program is activated, the comments and images that were displayed when the job was executed in the past can be displayed as shown in FIG.

ここで、過去のジョブ実行時にフィードバックされた画像を、ユーザが、スマートスピーカー５０のタッチパネル２７をタッチすることで選択した場合、携帯端末装置２（操作音声処理プログラム）は、該画像に対応する設定値を今回のジョブの設定値として反映させることができる。 Here, when the user selects an image fed back during execution of a past job by touching the touch panel 27 of the smart speaker 50, the mobile terminal device 2 (operation voice processing program) performs settings corresponding to the image. The value can be reflected as the setting value of the current job.

また、携帯端末装置２（操作音声処理プログラム）は、画像が選択された場合、図２８に示すように選択された画像を（１７：００での表示のように）再度表示させるとともに、該画像に紐づくジョブの種類及び設定値に基づくジョブの実行指示をクラウドサービス装置６０を介してＭＦＰ１に対して指示する。 Further, when an image is selected, the mobile terminal device 2 (operation voice processing program) displays the selected image again (like the display at 17:00) as shown in FIG. The MFP 1 is instructed via the cloud service device 60 to execute a job based on the job type and set values associated with the cloud service device 60 .

これにより、クラウドサービス装置６０（操作音声変換プログラム）は、「インテント:Copy_execute」、「パラメータ:カラー／モノクロ＝カラー、印刷面＝両面、開き方向＝上下開き、ステープル＝上二か所」の解釈結果を管理プログラムに対して送信することができる。 As a result, the cloud service device 60 (operation voice conversion program) specifies "intent: Copy_execute", "parameters: color/monochrome = color, print surface = double-sided, opening direction = top and bottom opening, staple = top two places". Interpretation results can be sent to the management program.

すなわち、スマートスピーカー５０は、画像に紐づけられた、ジョブの種類及びジョブの設定値の情報をクラウドサービス装置６０（操作音声変換プログラム）へ送信し、クラウドサービス装置６０（操作音声変換プログラム）は取得したジョブの種類及び設定値に基づいて解釈結果を生成して管理プログラムへ送信する。 That is, the smart speaker 50 transmits information on the job type and job setting values linked to the image to the cloud service device 60 (operation voice conversion program), and the cloud service device 60 (operation voice conversion program) Based on the acquired job type and set values, an interpretation result is generated and sent to the management program.

管理プログラムは、解釈結果に基づいてジョブ実行命令をＭＦＰ１に対して送信する。ここで、ジョブの種類及びジョブの設定値の情報は、レスポンス情報に含まれるインテント及びパラメータであっても良いし、レスポンス情報に含まれるテキストデータであっても良い。インテント及びパラメータの場合は、操作音声変換プログラムはテキスト化及び解釈結果の生成を行う必要なく、取得したインテント及びパラメータを管理プログラムへ送信する。また、テキストデータの場合は、操作音声変換プログラムはテキスト化は行わずに解釈結果の生成のみを行って、生成した解釈結果を管理プログラムへ送信する。 The management program transmits a job execution command to MFP 1 based on the interpretation result. Here, the information about the job type and job setting values may be the intent and parameters included in the response information, or may be text data included in the response information. In the case of intents and parameters, the operation speech conversion program sends the acquired intents and parameters to the management program without the need to convert them into text and generate interpretation results. In the case of text data, the operation voice conversion program does not convert the data into text, but only generates an interpretation result, and transmits the generated interpretation result to the management program.

この場合、スマートスピーカー５０は、仕上がりイメージを示す画像（スタンプ）と、該画像（スタンプ）に対応する設定値（つまり、クラウドサービス装置６０からフィードバックされた「インテント」及び「パラメータ」の値、又はレスポンス情報に含まれるテキストデータ）とをスマートスピーカー５０のＲＯＭ２３に紐づけて記憶しておく。 In this case, the smart speaker 50 displays an image (stamp) indicating a finished image, and setting values corresponding to the image (stamp) (that is, the values of the "intent" and "parameter" fed back from the cloud service device 60, or text data included in the response information) is associated with the ROM 23 of the smart speaker 50 and stored.

スマートスピーカー５０（操作音声処理プログラム）は、仕上がりイメージを示す画像（スタンプ）を、次回以降のジョブ実行時に使用できるように、スマートスピーカー５０に記憶しておく。つまり、スマートスピーカー５０（操作音声処理プログラム）は、スマートスピーカー５０のタッチパネル２７に表示される、仕上がりイメージを示す画像（スタンプ）を、保存するように指示する。 The smart speaker 50 (operation voice processing program) stores an image (stamp) indicating a finished image in the smart speaker 50 so that it can be used when executing a job from the next time onward. In other words, the smart speaker 50 (operation voice processing program) instructs to save an image (stamp) indicating a finished image displayed on the touch panel 27 of the smart speaker 50 .

例えば、ユーザが仕上がりイメージを示す画像（スタンプ）を所定時間タッチし続けた場合（長押しした場合）、スマートスピーカー５０（操作音声処理プログラム）は、該画像を保存するか否かの選択を受け付ける画面を表示させる。スマートスピーカー５０（操作音声処理プログラム）は、ユーザが画像の保存を指示した場合、該画像をスマートスピーカー５０のＲＯＭ２３に記憶させる。このとき、スマートスピーカー５０（操作音声処理プログラム）は、画像（スタンプ）と、該画像（スタンプ）に対応する設定値とを紐づけてＲＯＭ２３に記憶する。 For example, when the user continues to touch an image (stamp) indicating a finished image for a predetermined time (long press), the smart speaker 50 (operation voice processing program) accepts a selection as to whether to save the image. display the screen. When the user instructs to save an image, the smart speaker 50 (operation voice processing program) stores the image in the ROM 23 of the smart speaker 50 . At this time, the smart speaker 50 (operation voice processing program) associates the image (stamp) with the setting value corresponding to the image (stamp) and stores them in the ROM 23 .

このように記憶した画像は、ユーザの指示によって呼び出すことができる。例えば、スマートスピーカー５０（操作音声処理プログラム）は、図２８の左下に示すアイコンＩ１をユーザがタッチすると、予め記憶した画像の一覧を表示する。スマートスピーカー５０（操作音声処理プログラム）は、該一覧から所望の画像をユーザが指定した場合、図２８に示すように該画像が（１７：００での表示のように）タッチパネル２７に表示する。これにより、該画像に対応する設定値を、今回のジョブ設定値として反映させることができる。 The images stored in this way can be called up by a user's instruction. For example, the smart speaker 50 (operation voice processing program) displays a list of pre-stored images when the user touches the icon I1 shown in the lower left of FIG. When the user designates a desired image from the list, the smart speaker 50 (operation voice processing program) displays the image on the touch panel 27 as shown in FIG. 28 (like the display at 17:00). Thereby, the setting values corresponding to the image can be reflected as the current job setting values.

なお、以上では、スマートスピーカー５０側に画像（スタンプ）と、該画像に紐づくアクション及びパラメータを記憶する場合を例に説明したが、これに限定されず、クラウドサービス装置６０側に画像（スタンプ）と、該画像に紐づくインテント及びパラメータを記憶しても良い。これにより、過去のジョブ実行時にフィードバックされた画像を、ユーザがタッチパネル２７をタッチすることで選択した場合、該画像に対応する設定値を今回のジョブの設定値として反映させることができる。 In the above description, an image (stamp) and an action and parameters associated with the image are stored on the smart speaker 50 side. ), and the intent and parameters associated with the image. Thus, when a user selects an image fed back when executing a past job by touching the touch panel 27, the setting value corresponding to the image can be reflected as the setting value of the current job.

スマートスピーカー５０のタッチパネル２７の画面上で画像（スタンプ）が選択された場合、スマートスピーカー５０はいずれの画像が選択されたかをクラウドサービス装置６０へ通知する。例えば、画像のＩＤ情報などをクラウドサービス装置６０へ通知しても良い。 When an image (stamp) is selected on the screen of the touch panel 27 of the smart speaker 50, the smart speaker 50 notifies the cloud service device 60 of which image has been selected. For example, ID information of the image may be notified to the cloud service device 60 .

クラウドサービス装置６０の操作音声変換プログラムは、いずれの画像が選択されたかを示す情報に基づいて、該画像に紐づくインテント及びパラメータ（又はテキストデータ）をクラウドサービス装置６０におけるＨＤＤ４４などの記憶部から読み出す。つまり、クラウドサービス装置６０の記憶部には、画像とインテント及びパラメータとが紐づいて管理されている。これにより、操作音声変換プログラムは、画像と紐づくインテント及びパラメータを管理プログラムに対して送信することができる。 Based on the information indicating which image has been selected, the operation speech conversion program of the cloud service device 60 stores the intent and parameters (or text data) associated with the image in a storage unit such as the HDD 44 in the cloud service device 60. read from In other words, images, intents, and parameters are linked and managed in the storage unit of the cloud service device 60 . Thereby, the operation voice conversion program can transmit the intent and parameters associated with the image to the management program.

最後に、上述の各実施の形態は、一例として提示したものであり、本発明の範囲を限定することは意図していない。この新規な実施の形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことも可能である。 Finally, each embodiment described above is presented as an example and is not intended to limit the scope of the present invention. This novel embodiment can be embodied in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention.

このような各実施の形態及び各実施の形態の変形は、発明の範囲や要旨に含まれると共に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Such each embodiment and modifications of each embodiment are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

なお、上記実施の形態では、本発明の画像形成装置を、コピー機能、プリンタ機能、スキャナ機能およびファクシミリ機能のうち少なくとも２つの機能を有する複合機に適用した例を挙げて説明するが、複写機、プリンタ、スキャナ装置、ファクシミリ装置等の画像形成装置であればいずれにも適用することができる。 In the above embodiment, an example in which the image forming apparatus of the present invention is applied to a multifunctional machine having at least two functions out of a copy function, a printer function, a scanner function and a facsimile function will be described. , printers, scanners, facsimile machines, and other image forming apparatuses.

２、５０情報処理装置
１外部装置
５１、２１１取得部
５２、２１２出力部
５５、２１３報知部
６２音声認識部 2, 50 information processing device 1 external device 51, 211 acquisition unit 52, 212 output unit 55, 213 notification unit 62 speech recognition unit

特開２０１４－２０３０２４号公報JP 2014-203024 A

Claims

In an information processing system including an information processing device and an external device,
an acquisition unit that acquires voice information including setting instructions for operating the external device;
a voice recognition unit that recognizes the voice information;
a notification unit that notifies operation information based on the result of recognition of the voice information by the voice recognition unit on a screen of the information processing device;
an output unit that outputs the operation information to the external device;
with
The notification unit displays on the screen of the information processing apparatus a finished image indicating a finished image based on the settings related to the operation information, and when a predetermined operation is performed on the finished image, the corresponding settings are displayed together with the finished image. remember ,
An information processing system characterized by:

The notification unit displays a list of the stored finished images on a screen of the information processing device.
The information processing system according to claim 1 , characterized by:

A computer that controls an information processing device,
an acquisition unit that acquires voice information including setting instructions for operating an external device;
a notification unit that notifies, on a screen of the information processing device, operation information based on a result of recognition of the voice information by a voice recognition unit that recognizes the voice information;
an output unit that outputs the operation information to the external device;
function as
The notification unit displays on the screen of the information processing apparatus a finished image indicating a finished image based on the settings related to the operation information, and when a predetermined operation is performed on the finished image, the corresponding settings are displayed together with the finished image. remember ,
A program characterized by

An information processing method in an information processing system including an information processing device and an external device,
an acquisition step of acquiring audio information including setting instructions for operating the external device;
a voice recognition step of recognizing the voice information;
a notification step of notifying operation information based on the result of recognition of the voice information by the voice recognition step on the screen of the information processing device;
an output step of outputting the operation information to the external device;
including
The notifying step displays a finished image showing a finished image based on the settings related to the operation information on the screen of the information processing device , and when a predetermined operation is performed on the finished image, the corresponding settings are displayed together with the finished image. remember ,
An information processing method characterized by: