JP2021113899A

JP2021113899A - Information processing system, information processing method, and program

Info

Publication number: JP2021113899A
Application number: JP2020006467A
Authority: JP
Inventors: 裕中村; Yutaka Nakamura; 圭祐寺崎; Keisuke Terasaki
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2021-08-05

Abstract

To simplify a voice operation for, when causing an image formation device to execute a plurality of successive jobs by sound, executing the jobs.SOLUTION: An information processing device receives first voice data at a first timing when a voice acquisition device that collects voices to obtain voice data makes transmission, transmits a reading instruction for reading an original converted based on a predetermined reading condition from the first voice data to an image reading device that performs reading of an image on the original at least one or more times, and transmits the reading instruction to the image reading device again in a case where second voice data received at a second timing after the first timing has a content that enables continuation of execution of the reading instruction based on the first voice data.SELECTED DRAWING: Figure 8b

Description

本発明は、情報処理システム、情報処理方法及びプログラムに関する。 The present invention relates to information processing systems, information processing methods and programs.

今日において、音声により機器操作を行うＡＩ（ＡＩ：Artificial Intelligence）音声アシスタントが知られている。また、音声によって操作可能なスマート家電も知られており、音声操作の分野は今後も更なる成長が見込まれている。 Today, AI (Artificial Intelligence) voice assistants that operate devices by voice are known. In addition, smart home appliances that can be operated by voice are also known, and the field of voice operation is expected to grow further in the future.

例えば、音声操作の一例として、端末装置に対して発話された内容をサーバで解釈し、解釈した結果に基づくジョブをネットワークを介して接続された画像形成装置が実行するシステムが開示されている（特許文献１参照）。 For example, as an example of voice operation, a system is disclosed in which a server interprets the content uttered to a terminal device and an image forming device connected via a network executes a job based on the interpretation result (a system is disclosed. See Patent Document 1).

しかしながら、特許文献１に開示された発明においては、連続する複数のジョブを音声入力によって画像形成装置に実行させたい場合についての開示、示唆等はなく、その都度ジョブを実行するための条件を音声入力して設定する必要があった。 However, in the invention disclosed in Patent Document 1, there is no disclosure or suggestion regarding the case where the image forming apparatus wants to execute a plurality of consecutive jobs by voice input, and the conditions for executing the job are set by voice each time. I had to enter and set it.

本発明は、上述の課題に鑑みてなされたものであり、連続する複数のジョブを音声によって画像形成装置に実行させる場合、ジョブを実行させるための音声操作を簡略化することを目的とする。 The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to simplify a voice operation for executing a job when a plurality of consecutive jobs are executed by an image forming apparatus by voice.

上述した課題を解決し、目的を達成するために、本発明は、音声を集音して音声データを得る音声取得装置と、原稿に対して少なくとも１回以上の画像の読取りを行う画像読取装置と、前記音声取得装置が送信した第１のタイミングで第１の音声データを受信し、当該第１の音声データから所定の読取条件に基づいて前記原稿を読み取るための読取命令に変換して、前記所定の読取条件に基づいた前記読取命令を前記画像読取装置に送信する情報処理装置と、を備えた情報処理システムであって、前記情報処理装置は、前記第１のタイミングよりも後の第２のタイミングで受信した第２の音声データが前記第１の音声データに基づく前記読取命令の実行の継続を可能とする内容である場合に、前記読取命令を前記画像読取装置に再送することを特徴とする情報処理システムである。 In order to solve the above-mentioned problems and achieve the object, the present invention has an audio acquisition device that collects audio to obtain audio data, and an image reader that reads an image at least once on a document. Then, the first audio data is received at the first timing transmitted by the audio acquisition device, and the first audio data is converted into a scanning instruction for scanning the document based on a predetermined scanning condition. An information processing system including an information processing device that transmits the reading command based on the predetermined reading conditions to the image reading device, wherein the information processing device has a second timing after the first timing. When the second audio data received at the timing of 2 has a content that enables the execution of the reading instruction based on the first audio data to be continued, the reading instruction is retransmitted to the image reading device. It is a characteristic information processing system.

本発明の実施形態によれば、連続する複数のジョブを音声によって画像形成装置に実行させる場合、ジョブを実行させるための音声操作を簡略化することが可能になる。 According to the embodiment of the present invention, when a plurality of consecutive jobs are executed by the image forming apparatus by voice, it is possible to simplify the voice operation for executing the jobs.

本実施形態に係る音声操作システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the voice operation system which concerns on this embodiment. スマートスピーカのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of a smart speaker. 音声認識サーバ装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the voice recognition server apparatus. ＡＩアシスタントサーバ装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the AI assistant server apparatus. 各種テーブルデータの一例である。This is an example of various table data. ＭＦＰのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware configuration of the MFP. 音声操作システムを構成する各装置の機能ブロックの一例を示す図である。It is a figure which shows an example of the functional block of each device which constitutes a voice operation system. 第１の実施形態におけるユーザの発話に基づく読取処理の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the reading process based on the utterance of a user in 1st Embodiment. 第１の実施形態におけるユーザの発話に基づく読取処理の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the reading process based on the utterance of a user in 1st Embodiment. 第１の実施形態における情報の補完及び問合せ処理の一例を示すフローチャートである。It is a flowchart which shows an example of information complementation and inquiry processing in 1st Embodiment. 第１の実施形態における読取命令の変換及び送信の一例を示すフローチャートである。It is a flowchart which shows an example of conversion and transmission of the reading instruction in 1st Embodiment. 第２の実施形態におけるユーザの発話に基づく読取処理の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the reading process based on the utterance of the user in 2nd Embodiment. 第２の実施形態におけるユーザの発話に基づく読取処理の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the reading process based on the utterance of the user in 2nd Embodiment. 第２の実施形態における読取命令の実行処理の一例を示すフローチャートである。It is a flowchart which shows an example of the execution process of a read instruction in 2nd Embodiment.

以下、情報処理システム、情報処理方法及びプログラムの適用例となる音声操作システムの説明をする。 Hereinafter, the information processing system, the information processing method, and the voice operation system as an application example of the program will be described.

〔システムの概略〕
図１は、本実施形態に係る音声操作システムの構成の一例を示す図である。図１に示すように、本実施形態の音声操作システム１は、少なくとも１以上のスマートスピーカ２、音声認識サーバ装置３、ＡＩアシスタントサーバ装置４及び少なくとも１以上の複合機６（ＭＦＰ：Multifunction Peripheral。以下、単にＭＦＰ６とも呼ぶ）を、例えばＬＡＮ（Local Area Network）等のネットワーク７を介して相互に接続することで形成されている。 [Overview of the system]
FIG. 1 is a diagram showing an example of a configuration of a voice operation system according to the present embodiment. As shown in FIG. 1, the voice operation system 1 of the present embodiment includes at least one or more smart speakers 2, a voice recognition server device 3, an AI assistant server device 4, and at least one or more multifunction devices 6 (MFP: Multifunction Peripheral). Hereinafter, it is also simply referred to as MFP 6), and is formed by connecting to each other via a network 7 such as a LAN (Local Area Network).

ここで、音声操作システム１は、情報処理システムの一例である。スマートスピーカ２は、内蔵されているマイクで音声を集音して音声データを得るリモート操作可能な周知のスマートスピーカ装置である。また、スマートスピーカ２は、音声入力によって、例えば、音楽及び動画などの各種コンテンツ、天気、ニュース等の視聴を可能にする人工知能を搭載した機器であり、例えば、対話型の音声操作に対応したＡＩアシスタント機能を持つスピーカを指す。さらに、スマートスピーカ２は、照明及び家電等の各種機器を音声によって遠隔操作する機能も備える。 Here, the voice operation system 1 is an example of an information processing system. The smart speaker 2 is a well-known smart speaker device that can be remotely operated to collect voice with a built-in microphone and obtain voice data. Further, the smart speaker 2 is a device equipped with artificial intelligence that enables viewing of various contents such as music and moving images, weather, news, etc. by voice input, and supports, for example, interactive voice operation. Refers to a speaker that has an AI assistant function. Further, the smart speaker 2 also has a function of remotely controlling various devices such as lighting and home appliances by voice.

スマートスピーカ２は、例えば、ユーザ（使用者ともいう）の発話による音声操作を受け付け、音声操作によって得られる音声データ（音声情報ともいう）に基づいて、ＭＦＰ６に対する原稿を読み取るための命令（以下、読取命令と記載する）等の各種命令（ジョブ）に伴う処理（以下、所定の処理ともいう）を実行する音声操作システム１における音声取得装置の一例として機能する。なお、スマートスピーカ２は、上述したように音声操作システム１において１以上備えられていてもよい。また、スマートスピーカ２は、受け付けた音声操作に基づく音声データを、ネットワーク７を介して音声認識サーバ装置３（又はクラウドサービス装置５）に送信する。さらに、スマートスピーカ２は、ユーザの発話による音声操作から得られる音声データを補完するために、ユーザに対してフィードバックを行うためのマイク機能、カメラ機能等を有してもよい。 The smart speaker 2 receives, for example, a voice operation uttered by a user (also referred to as a user), and is instructed to read a document to the MFP 6 (hereinafter, also referred to as voice information) based on voice data (also referred to as voice information) obtained by the voice operation. It functions as an example of a voice acquisition device in a voice operation system 1 that executes a process (hereinafter, also referred to as a predetermined process) associated with various commands (jobs) such as a read command). As described above, one or more smart speakers 2 may be provided in the voice operation system 1. Further, the smart speaker 2 transmits voice data based on the received voice operation to the voice recognition server device 3 (or cloud service device 5) via the network 7. Further, the smart speaker 2 may have a microphone function, a camera function, and the like for giving feedback to the user in order to supplement the voice data obtained from the voice operation by the user's utterance.

音声認識サーバ装置３は、スマートスピーカ２で得られた音声データを受信し、テキストデータに変換する機能を備える。また、ＡＩアシスタントサーバ装置４は、音声認識サーバ装置３と連携してスマートスピーカ２で得られた音声データを処理する機能を備える。音声認識サーバ装置３及びＡＩアシスタントサーバ装置４は、ネットワーク７を介して相互に接続され、一つに纏めてクラウドサービス装置５としても機能する。クラウドサービス装置５は、例えば、ＭＦＰ６に対して読取命令を生成して送信する。 The voice recognition server device 3 has a function of receiving the voice data obtained by the smart speaker 2 and converting it into text data. Further, the AI assistant server device 4 has a function of processing the voice data obtained by the smart speaker 2 in cooperation with the voice recognition server device 3. The voice recognition server device 3 and the AI assistant server device 4 are connected to each other via the network 7, and collectively function as a cloud service device 5. The cloud service device 5 generates and transmits a read instruction to the MFP 6, for example.

上述したクラウドサービス装置５を構成する音声認識サーバ装置３及びＡＩアシスタントサーバ装置４のうちの少なくとも一つ又はその両方は、情報処理装置の一例である。 At least one or both of the voice recognition server device 3 and the AI assistant server device 4 constituting the cloud service device 5 described above is an example of an information processing device.

ＡＩアシスタントサーバ装置４は、音声認識サーバ装置３によって変換されたユーザの意図に基づいてＭＦＰ６が解釈可能な読取命令に変換する。ＡＩアシスタントサーバ装置４は、変換した読取命令等を、ネットワーク７を介してＭＦＰ６に送信する。ここで、読取命令は、例えば、ユーザがスマートスピーカ２に対して音声操作により与えられた原稿を読み取るための指示（以下、原稿読取指示と記載する）に基づいて生成される。なお、原稿読取指示は、情報処理要求の一例である。 The AI assistant server device 4 converts the reading instruction into a read instruction that can be interpreted by the MFP 6 based on the user's intention converted by the voice recognition server device 3. The AI assistant server device 4 transmits the converted read instruction or the like to the MFP 6 via the network 7. Here, the reading command is generated based on, for example, an instruction for the user to read the document given to the smart speaker 2 by voice operation (hereinafter, referred to as a document reading instruction). The document reading instruction is an example of an information processing request.

また、ＡＩアシスタントサーバ装置４は、ＨＤＤ４４等の記憶部に管理データベース４０１（以下、管理ＤＢ４０１という）及び紐づけ用データベース４０２（以下、紐づけ用ＤＢ４０２という）を備えている。管理ＤＢ４０１及び紐づけ用ＤＢ４０２は、例えば、クラウドサービス装置５がネットワーク７上に備えるＨＤＤ等の記憶部を用いることができる。このほか、管理ＤＢ４０１及び紐づけ用ＤＢ４０２のうち、一方又は両方を、ネットワーク７を介してクラウドサービス装置５でアクセス可能な別のサーバ装置に記憶してもよい。 Further, the AI assistant server device 4 includes a management database 401 (hereinafter referred to as a management DB 401) and a linking database 402 (hereinafter referred to as a linking DB 402) in a storage unit such as the HDD 44. For the management DB 401 and the linking DB 402, for example, a storage unit such as an HDD provided on the network 7 by the cloud service device 5 can be used. In addition, one or both of the management DB 401 and the linking DB 402 may be stored in another server device accessible by the cloud service device 5 via the network 7.

管理ＤＢ４０１には、例えば、ＡＩアシスタントサーバ装置４が提供するコンテンツ（データ）としてのテキストデータ、画像データ及び音声データ等が記憶されている。 The management DB 401 stores, for example, text data, image data, voice data, and the like as contents (data) provided by the AI assistant server device 4.

なお、管理ＤＢ４０１で管理される情報は、例えば、ネットワーク７を介して接続されるＭＦＰ６によって新規追加又は変更することができる。図１では、管理ＤＢ４０１とＭＦＰ６は別体として図示しているが、同一の機能を備えたサーバとして構成してもよい。この場合、後述する管理プログラムは、管理ＤＢ４０１に対してＭＦＰ６に対する読取命令を送信することによって、管理ＤＢ４０１が管理する各種情報を取得してもよい。 The information managed by the management DB 401 can be newly added or changed by, for example, the MFP 6 connected via the network 7. Although the management DB 401 and the MFP 6 are shown as separate bodies in FIG. 1, they may be configured as servers having the same functions. In this case, the management program described later may acquire various information managed by the management DB 401 by transmitting a read command to the MFP 6 to the management DB 401.

一方、紐づけ用ＤＢ４０２には、例えば、各スマートスピーカ２（音声取得装置）を識別するためのデバイスＩＤ（以下、単にデバイスＩＤともいう）と、各スマートスピーカ２に関連付けられた情報処理装置としてのＭＦＰ６（ＭＦＰ＿＃１、ＭＦＰ＿＃２、等）の機器ＩＤとが関連付けられて記憶されている。この紐づけ用ＤＢ４０２の詳細については、後述する。 On the other hand, the linking DB 402 includes, for example, a device ID for identifying each smart speaker 2 (voice acquisition device) (hereinafter, also simply referred to as a device ID) and an information processing device associated with each smart speaker 2. It is stored in association with the device ID of the MFP6 (MFP_ # 1, MFP_ # 2, etc.). Details of this linking DB 402 will be described later.

本実施形態では、管理ＤＢ４０１及び紐づけ用ＤＢ４０２は、ＡＩアシスタントサーバ装置４に含まれることを例示しているが、それぞれＡＩアシスタントサーバ装置４と別に設けられてもよいし、いずれか一方がＡＩアシスタントサーバ装置４に含まれ、他方がＡＩアシスタントサーバ装置４と別に設けられてもよい。 In the present embodiment, the management DB 401 and the linking DB 402 are illustrated to be included in the AI assistant server device 4, but they may be provided separately from the AI assistant server device 4, or one of them may be AI. It is included in the assistant server device 4, and the other may be provided separately from the AI assistant server device 4.

また、本実施形態では、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４の二つのサーバ装置を一つに纏めたクラウドサービス装置５として説明する。但し、音声認識サーバ装置３、ＡＩアシスタントサーバ装置４のそれぞれは、さらに複数のサーバ装置に分けて実現されてもよい。 Further, in the present embodiment, the two server devices of the voice recognition server device 3 and the AI assistant server device 4 will be described as a cloud service device 5 which is integrated into one. However, each of the voice recognition server device 3 and the AI assistant server device 4 may be further divided into a plurality of server devices.

さらに、本実施形態では、クラウドサービス装置５の機能の一部又は全部を、スマートスピーカ２又はＭＦＰ６が有していてもよい。クラウドサービス装置５の機能の全部をスマートスピーカ２又はＭＦＰ６が有している場合、音声操作システム１にクラウドサービス装置５は含まれていなくてもよい。このような場合、スマートスピーカ２はクラウドサービス装置５を介さずにＭＦＰ６と通信してもよく、音声操作システム１は、スマートスピーカ２及びＭＦＰ６を纏めた入力応答システム８を構築してもよい。 Further, in the present embodiment, the smart speaker 2 or the MFP 6 may have a part or all of the functions of the cloud service device 5. When the smart speaker 2 or the MFP 6 has all the functions of the cloud service device 5, the voice operation system 1 may not include the cloud service device 5. In such a case, the smart speaker 2 may communicate with the MFP 6 without going through the cloud service device 5, and the voice operation system 1 may construct an input response system 8 in which the smart speaker 2 and the MFP 6 are integrated.

上述したように、クラウドサービス装置５は、音声認識サーバ装置３とＡＩアシスタントサーバ装置４を含む場合について説明したが、音声認識サーバ装置３の機能の一部又は全部をＡＩアシスタントサーバ装置４が有していてもよいし、ＡＩアシスタントサーバ装置４の機能の一部又は全部を音声認識サーバ装置３が有していてもよい。つまり、音声認識サーバ装置３とＡＩアシスタントサーバ装置４が互いの機能を補完し合う構成でもよい。また、クラウドサービス装置５は一つのサーバによって構成されていてもよいし、３以上のサーバによって構成されていてもよい。 As described above, the case where the cloud service device 5 includes the voice recognition server device 3 and the AI assistant server device 4 has been described, but the AI assistant server device 4 has some or all of the functions of the voice recognition server device 3. The voice recognition server device 3 may have a part or all of the functions of the AI assistant server device 4. That is, the voice recognition server device 3 and the AI assistant server device 4 may complement each other's functions. Further, the cloud service device 5 may be composed of one server or three or more servers.

上述した各構成によって、音声操作システム１では、ユーザから発話された読取処理に係る音声をスマートスピーカ２が集音して音声データを取得してクラウドサービス装置５に音声データを送信する。クラウドサービス装置５は、スマートスピーカ２から受信した音声データに基づいて読取命令を生成し、生成した読取命令を、ネットワーク７を介してＭＦＰ６に送信する。さらに、読取命令を受信したＭＦＰ６は、受信した読取命令を実行する。ここで、ＭＦＰ６は、画像読取装置の一例である。なお、上述したネットワーク７は、有線ＬＡＮ、無線ＬＡＮのいずれで構成されてもよい。 According to each of the above configurations, in the voice operation system 1, the smart speaker 2 collects the voice related to the reading process uttered by the user, acquires the voice data, and transmits the voice data to the cloud service device 5. The cloud service device 5 generates a read instruction based on the voice data received from the smart speaker 2, and transmits the generated read instruction to the MFP 6 via the network 7. Further, the MFP 6 that has received the read instruction executes the received read instruction. Here, the MFP 6 is an example of an image reading device. The network 7 described above may be composed of either a wired LAN or a wireless LAN.

〔ハードウェア構成〕
次に、図２乃至図６を用いて、本実施形態のスマートスピーカ２、音声認識サーバ装置３、ＡＩアシスタントサーバ装置４及びＭＦＰ６のハードウェア構成を詳細に説明する。 [Hardware configuration]
Next, the hardware configurations of the smart speaker 2, the voice recognition server device 3, the AI assistant server device 4, and the MFP 6 of the present embodiment will be described in detail with reference to FIGS. 2 to 6.

＜スマートスピーカのハードウェア構成＞
図２は、スマートスピーカのハードウェア構成の一例を示す図である。音声取得装置の一例としてのスマートスピーカ２は、図２に示すようにＣＰＵ２１、ＲＡＭ２２、ＲＯＭ２３、インターフェイス部（Ｉ／Ｆ部）２４及び通信部２５を含むハードウェア資源を、内部バス２６を介して相互に接続される。 <Hardware configuration of smart speaker>
FIG. 2 is a diagram showing an example of the hardware configuration of the smart speaker. As shown in FIG. 2, the smart speaker 2 as an example of the voice acquisition device supplies hardware resources including a CPU 21, a RAM 22, a ROM 23, an interface unit (I / F unit) 24, and a communication unit 25 via an internal bus 26. To be connected to each other.

ＣＰＵ２１は、スマートスピーカ２全体を統括制御する制御デバイスである。 The CPU 21 is a control device that controls the entire smart speaker 2 in an integrated manner.

ＲＡＭ２２は、例えば、ＲＯＭ２３等に記憶された各種プログラムがダウンロードされ、ＣＰＵ２１によって各種処理が実行されるワークエリアとしての機能を有する。 The RAM 22 has, for example, a function as a work area in which various programs stored in the ROM 23 or the like are downloaded and various processes are executed by the CPU 21.

ＲＯＭ２３には、操作音声処理プログラムを含む各種プログラムを構成するデータが記憶されている。ＣＰＵ２１は、これらの処理プログラムを実行することで、ＭＦＰ６に対する音声操作による処理を可能とする。また、ＣＰＵ２１は、クラウドサービス装置５から取得したデータのタッチパネル２７への表示制御、スピーカ部２８を介したフィードバックのための音声出力制御、画像出力制御等を実行する。 The ROM 23 stores data constituting various programs including an operation voice processing program. By executing these processing programs, the CPU 21 enables processing by voice operation on the MFP 6. Further, the CPU 21 executes display control of data acquired from the cloud service device 5 on the touch panel 27, voice output control for feedback via the speaker unit 28, image output control, and the like.

Ｉ／Ｆ部２４には、タッチパネル２７、スピーカ部２８、マイクロホン部２９及び撮像部（カメラ部）３０が接続される。 A touch panel 27, a speaker unit 28, a microphone unit 29, and an imaging unit (camera unit) 30 are connected to the I / F unit 24.

通信部２５は、ユーザによる音声操作によって得られた情報を、ネットワーク７を介して音声認識サーバ装置３に送信する。また、通信部２５は、ネットワーク７を介して他の装置と通信を行う際、有線、無線いずれの通信形態でも通信を行うことが可能である。 The communication unit 25 transmits the information obtained by the voice operation by the user to the voice recognition server device 3 via the network 7. Further, when communicating with another device via the network 7, the communication unit 25 can perform communication in either a wired or wireless communication form.

内部バス２６は、ＣＰＵ２１、ＲＡＭ２２、ＲＯＭ２３、Ｉ／Ｆ部２４及び通信部２５を接続する汎用バスである。この内部バス２６は、スマートスピーカ等の汎用機器で一般的に用いられるバスであればその種類は問わない。 The internal bus 26 is a general-purpose bus that connects the CPU 21, RAM 22, ROM 23, I / F unit 24, and communication unit 25. The type of the internal bus 26 does not matter as long as it is a bus generally used in general-purpose devices such as smart speakers.

タッチパネル２７は、例えば、液晶表示部（ＬＣＤ：Liquid Crystal Display）とタッチセンサとが一体的に形成されたものである。タッチパネル２７は、液晶表示部上に配置されたタッチキー等に対してユーザがタッチ動作等を行うことによって、所望の動作が指定される。 In the touch panel 27, for example, a liquid crystal display (LCD: Liquid Crystal Display) and a touch sensor are integrally formed. The touch panel 27 is designated as a desired operation by the user performing a touch operation or the like on a touch key or the like arranged on the liquid crystal display unit.

スピーカ部２８は、ユーザに対して、不足する情報の入力等を促すための音声による音声フィードバックを行う。 The speaker unit 28 provides voice feedback to the user to prompt the user to input insufficient information or the like.

マイクロホン部２９は、例えば、音声操作によってＭＦＰ６に対して原稿の読み取りを実行させるために、ユーザが発話した音声によって与えられた音声データを取得する。取得された音声データは、通信部２５を介して音声認識サーバ装置３に送信され、音声認識サーバ装置３でテキストデータに変換される。 The microphone unit 29 acquires voice data given by the voice spoken by the user, for example, in order to cause the MFP 6 to read the document by voice operation. The acquired voice data is transmitted to the voice recognition server device 3 via the communication unit 25, and is converted into text data by the voice recognition server device 3.

撮像部（カメラ部）３０は、スマートスピーカ２を使用するユーザ及びその他の画像等を撮影する。撮影された画像等は、動画像データ若しくは静止画像データ（以下、単に画像データと呼ぶ）として通信部２５を介して音声認識サーバ装置３に送信される。 The image pickup unit (camera unit) 30 captures a user who uses the smart speaker 2 and other images. The captured image or the like is transmitted to the voice recognition server device 3 via the communication unit 25 as moving image data or still image data (hereinafter, simply referred to as image data).

＜音声認識サーバ装置のハードウェア構成＞
図３は、音声認識サーバ装置のハードウェア構成の一例を示す図である。
音声認識サーバ装置３は、図３に示すように、ＣＰＵ３１、ＲＡＭ３２、ＲＯＭ３３、ＨＤＤ（Hard Disk Drive）３４、インターフェイス部（Ｉ／Ｆ部）３５及び通信部３６を含むハードウェア資源を、内部バス３７を介して相互に接続される。また、Ｉ／Ｆ部３５には、表示部３８及び操作部３９が接続される。 <Hardware configuration of voice recognition server device>
FIG. 3 is a diagram showing an example of the hardware configuration of the voice recognition server device.
As shown in FIG. 3, the voice recognition server device 3 uses an internal bus for hardware resources including a CPU 31, a RAM 32, a ROM 33, an HDD (Hard Disk Drive) 34, an interface unit (I / F unit) 35, and a communication unit 36. They are interconnected via 37. Further, a display unit 38 and an operation unit 39 are connected to the I / F unit 35.

ＨＤＤ３４には、以下の操作音声変換プログラムを構成するデータが記憶されている。なお、操作音声変換プログラムは、例えば、スマートスピーカ２から受信した音声データをテキストデータに変換する。続いて、操作音声変換プログラムは、変換したテキストデータを予め定義された辞書情報と一致するか否かを判断する。辞書情報と一致するか否かの判断において、操作音声変換プログラムは、辞書情報と一致した場合には、テキストデータをユーザの意図を示すインテント（Ｉｎｔｅｎｔ）及び所定の処理の実行条件などの変数を示すパラメータに変換する。その後、操作音声変換プログラムは、ユーザの意図を示すインテント及び所定の処理の実行条件などの変数を示すパラメータを、ＡＩアシスタントサーバ装置４に送信する。 The HDD 34 stores data constituting the following operation voice conversion program. The operation voice conversion program converts, for example, the voice data received from the smart speaker 2 into text data. Subsequently, the operation voice conversion program determines whether or not the converted text data matches the predefined dictionary information. In determining whether or not the information matches the dictionary information, the operation voice conversion program uses variables such as an intent indicating the user's intention and execution conditions of a predetermined process when the operation voice conversion program matches the dictionary information. Convert to a parameter that indicates. After that, the operation voice conversion program transmits to the AI assistant server device 4 parameters indicating variables such as an intent indicating the user's intention and execution conditions of a predetermined process.

ＣＰＵ３１は、上述した操作音声変換プログラムを含む各種プログラムを実行する。つまり、音声認識サーバ装置３は、音声データを受信して、音声を解析する装置として機能する。なお、操作音声変換プログラム、操作画像変換プログラム、音声アシスタントプログラムは、一つのサーバ装置で実行されてもよいし、それぞれ異なるサーバ装置で実行されてもよい。さらに、複数のサーバ装置の連携によって、これらのプログラムが実行されてもよい。 The CPU 31 executes various programs including the above-mentioned operation voice conversion program. That is, the voice recognition server device 3 functions as a device that receives voice data and analyzes the voice. The operation voice conversion program, the operation image conversion program, and the voice assistant program may be executed by one server device or may be executed by different server devices. Further, these programs may be executed by coordinating a plurality of server devices.

ＲＡＭ３２は、例えば、ＲＯＭ３３等の記憶部に記憶された各種プログラムがダウンロードされ、ＣＰＵ３１によって各種処理が実行されるワークエリアとしての機能を有する。 The RAM 32 has a function as a work area in which various programs stored in a storage unit such as a ROM 33 are downloaded and various processes are executed by the CPU 31.

ＲＯＭ３３には、ＨＤＤ３４に記憶された各種プログラム以外のその他のプログラムを構成するデータが記憶されている。ＣＰＵ３１は、ＲＯＭ３３に記憶された各種プログラムを実行することで、スマートスピーカ２及びＡＩアシスタントサーバ装置４との間の制御を行ってもよい。 The ROM 33 stores data constituting other programs other than the various programs stored in the HDD 34. The CPU 31 may control between the smart speaker 2 and the AI assistant server device 4 by executing various programs stored in the ROM 33.

Ｉ／Ｆ部３５には、表示部３８及び操作部３９が接続される。 A display unit 38 and an operation unit 39 are connected to the I / F unit 35.

通信部３６は、ユーザの発話に伴う音声操作によって得られた音声データをスマートスピーカ２から受信する。また、通信部３６は、ネットワーク７を介して他の装置と通信を行う際、有線、無線いずれの通信形態でも通信を行うことが可能である。 The communication unit 36 receives the voice data obtained by the voice operation accompanying the user's utterance from the smart speaker 2. Further, when communicating with another device via the network 7, the communication unit 36 can perform communication in either a wired or wireless communication form.

内部バス３７は、ＣＰＵ３１、ＲＡＭ３２、ＲＯＭ３３、ＨＤＤ３４、Ｉ／Ｆ部３５及び通信部３６を接続する汎用バスである。この内部バス３７は、音声認識サーバ装置３がサーバ装置としての機能が実現できるものであれば、その種類は問わない。 The internal bus 37 is a general-purpose bus that connects the CPU 31, RAM 32, ROM 33, HDD 34, I / F unit 35, and communication unit 36. The type of the internal bus 37 does not matter as long as the voice recognition server device 3 can realize the function as a server device.

表示部３８は、例えば、液晶表示部（ＬＣＤ：Liquid Crystal Display）で構成され、例えば、音声認識サーバ装置３の各種状態を表示する。 The display unit 38 is composed of, for example, a liquid crystal display (LCD), and displays, for example, various states of the voice recognition server device 3.

操作部３９は、例えば、液晶表示部とタッチセンサとが一体的に形成された、いわゆるタッチパネルである。操作者（ユーザ）は、操作部３９を用いて所望の動作の実行命令を行う場合、操作部３９に表示された操作ボタン（ソフトウェアキー）等を接触操作することで、所望の動作を指定する。 The operation unit 39 is, for example, a so-called touch panel in which a liquid crystal display unit and a touch sensor are integrally formed. When the operator (user) uses the operation unit 39 to issue an execution command for a desired operation, the operator (user) specifies the desired operation by touching an operation button (software key) or the like displayed on the operation unit 39. ..

また、操作音声処理プログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）などのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、ＣＤ−Ｒ、ＤＶＤ（Digital Versatile Disk）、ブルーレイディスク（登録商標）、半導体メモリなどのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、インターネット等のネットワーク経由でインストールするかたちで提供してもよいし、音声認識サーバ装置３のＲＯＭ等の記憶部に予め組み込んで提供してもよい。 Further, the operation voice processing program may be provided by recording a file in an installable format or an executable format on a recording medium readable by a computer device such as a CD-ROM or a flexible disk (FD). Further, it may be recorded and provided on a recording medium readable by a computer device such as a CD-R, a DVD (Digital Versatile Disk), a Blu-ray disc (registered trademark), or a semiconductor memory. Further, it may be provided in the form of being installed via a network such as the Internet, or may be provided by being incorporated in advance in a storage unit such as a ROM of the voice recognition server device 3.

＜ＡＩアシスタントサーバ装置のハードウェア構成＞
図４は、ＡＩアシスタントサーバ装置のハードウェア構成の一例を示す図である。ＡＩアシスタントサーバ装置４は、ＣＰＵ４１、ＲＡＭ４２、ＲＯＭ４３、ＨＤＤ４４、インターフェイス部（Ｉ／Ｆ部）４５及び通信部４６を含むハードウェア資源を、内部バス４７を介して相互に接続している。また、Ｉ／Ｆ部４５には、表示部４８及び操作部４９が接続されている。 <Hardware configuration of AI assistant server device>
FIG. 4 is a diagram showing an example of the hardware configuration of the AI assistant server device. The AI assistant server device 4 connects hardware resources including a CPU 41, a RAM 42, a ROM 43, an HDD 44, an interface unit (I / F unit) 45, and a communication unit 46 to each other via an internal bus 47. Further, a display unit 48 and an operation unit 49 are connected to the I / F unit 45.

ＨＤＤ４４のＡＩ記憶部４０には、ユーザが音声入力により指示する読取命令を解釈するための辞書情報が記憶されている。この辞書情報には、後述するエンティティ（Ｅｎｔｉｔｙ）情報、アクション（Ａｃｔｉｏｎ）情報及びインテント情報が含まれる。また、ＨＤＤ４４には、以下に示すユーザ管理テーブル４０２ａ及び装置管理テーブル４０２ｂがそれぞれ記憶されている。これらのテーブルは、予め所定の設定値が与えられているが、適宜追加及び変更が行われてもよい。以下に、ユーザ管理テーブル４０２ａ及び装置管理テーブル４０２ｂの概要を説明する。 The AI storage unit 40 of the HDD 44 stores dictionary information for interpreting a reading instruction instructed by a user by voice input. This dictionary information includes entity information, action information, and intent information, which will be described later. Further, the HDD 44 stores the following user management table 402a and device management table 402b, respectively. Although predetermined setting values are given to these tables in advance, additions and changes may be made as appropriate. The outline of the user management table 402a and the device management table 402b will be described below.

（各種テーブル）
図５は、各種テーブルの一例である。ＡＩアシスタントサーバ装置４のＨＤＤ４４等の記憶部には、図５（ａ）に示すユーザ管理テーブル４０２ａ、図５（ｂ）に示す装置管理テーブル４０２ｂ及び図５（ｃ）に示す命令管理テーブル４０２ｃによって構成された紐づけ用ＤＢ４０２が構築されている。ただし、命令管理テーブル４０２ｃはＭＦＰ６に記憶されていてもよい。この場合、命令管理テーブル４０２ｃは一つのＭＦＰ６でのみ利用されるため、命令管理テーブル４０２ｃは画像読取装置名及び画像読取装置の装置ＩＤを含まなくてもよい。これらのうち、ユーザ管理テーブル４０２ａでは、音声取得装置のデバイスＩＤ毎に、ユーザ名、ユーザＩＤを含む情報が関連付けられて管理される。また、装置管理テーブル４０２ｂでは、音声取得装置名又は音声取得装置のデバイスＩＤ毎に、ＭＦＰ６(画像読取装置)の装置名、画像読取装置を識別するための装置ＩＤ（以下、単に装置ＩＤともいう）及び画像読取装置の接続情報を含む各種情報が関連付けられて管理される。さらに、命令管理テーブル４０２ｃでは、画像読取装置名又は装置ＩＤ毎に、原稿サイズ、ファイル形式、解像度、カラー／モノクロ、シングルページ／マルチページ、宛先及び連続処理フラグが関連付けられて管理される。 (Various tables)
FIG. 5 is an example of various tables. In the storage unit such as the HDD 44 of the AI assistant server device 4, the user management table 402a shown in FIG. 5A, the device management table 402b shown in FIG. 5B, and the instruction management table 402c shown in FIG. 5C are used. The configured associating DB 402 is constructed. However, the instruction management table 402c may be stored in the MFP 6. In this case, since the instruction management table 402c is used only by one MFP 6, the instruction management table 402c does not have to include the image reading device name and the device ID of the image reading device. Of these, in the user management table 402a, information including a user name and a user ID is associated and managed for each device ID of the voice acquisition device. Further, in the device management table 402b, the device name of the MFP 6 (image reading device) and the device ID for identifying the image reading device (hereinafter, also simply referred to as the device ID) are used for each voice acquisition device name or device ID of the voice acquisition device. ) And various information including the connection information of the image reader are associated and managed. Further, in the instruction management table 402c, the document size, file format, resolution, color / monochrome, single page / multipage, destination, and continuous processing flag are managed in association with each image reading device name or device ID.

ユーザ管理テーブル４０２ａ及び装置管理テーブル４０２ｂで用いられる音声取得装置のデバイスＩＤは、上述したように音声取得装置の一例としてのスマートスピーカ２を識別するためのデバイス識別情報の一例である。つまり、デバイスＩＤは、音声取得装置を識別するための装置識別情報の一例である。また、音声取得装置のデバイスＩＤに代えて又は加えて、音声取得装置の装置名を示す音声取得装置名を管理してもよい。 The device ID of the voice acquisition device used in the user management table 402a and the device management table 402b is an example of device identification information for identifying the smart speaker 2 as an example of the voice acquisition device as described above. That is, the device ID is an example of device identification information for identifying the voice acquisition device. Further, instead of or in addition to the device ID of the voice acquisition device, the voice acquisition device name indicating the device name of the voice acquisition device may be managed.

また、装置管理テーブル４０２ｂで管理される画像読取装置名は、ユーザが使用するスマートスピーカ２に対する発話によって読取命令等が実行される画像読取装置の装置名である。この画像読取装置名には、上述したＭＦＰ６、単体で稼働するスキャナ等の装置名が与えられる。 The name of the image reading device managed by the device management table 402b is the name of the device of the image reading device in which the reading command or the like is executed by the utterance to the smart speaker 2 used by the user. The name of the image reading device is given the name of a device such as the above-mentioned MFP 6 or a scanner that operates independently.

一方、装置ＩＤは、画像読取装置を識別するための装置識別情報の一例である。また、装置ＩＤは、ＭＦＰ６を識別するための情報である。 On the other hand, the device ID is an example of device identification information for identifying an image reading device. The device ID is information for identifying the MFP 6.

また、ＨＤＤ４４には画像読取装置（ＭＦＰ）毎に接続情報が割り振られて記憶されている。ここで、接続情報はそれぞれのＭＦＰと通信接続するために必要な情報であり、例えばアドレス情報が与えられる。 Further, connection information is assigned and stored in the HDD 44 for each image reading device (MFP). Here, the connection information is information necessary for communicating with each MFP, and for example, address information is given.

なお、装置管理テーブル４０２ｂは、未登録の新たな使用者のユーザＩＤ及びその使用者が使用する音声取得装置のデバイスＩＤ並びにその使用者が指定した装置ＩＤをそれぞれ関連付けて、新たに追加登録されるようにしてもよい。 The device management table 402b is newly additionally registered by associating the user ID of a new unregistered user, the device ID of the voice acquisition device used by the user, and the device ID specified by the user. You may do so.

さらに、命令管理テーブル４０２ｃの連続処理フラグは、原稿が複数ページからなる書籍等の場合に、ユーザが発する所定の発話内容に応じて、原稿の読取り処理を継続するか否かを判断するためのフラグとして管理される。この連続処理フラグは、ユーザが最初の原稿の読取りを指定した後、「次」、「続けて」等の発話内容が検出された場合に、例えば、『１』の値が設定されて管理される。一方、原稿が１枚だけの場合では、ユーザから次の原稿の読取りを示唆する発話はされないため、この連続処理フラグは、例えば、『０』の値が設定されて管理される。なお、連続処理フラグは、初期設定値として『０」が与えられてもよい。 Further, the continuous processing flag of the instruction management table 402c is for determining whether or not to continue the reading process of the manuscript according to a predetermined utterance content uttered by the user when the manuscript is a book or the like composed of a plurality of pages. It is managed as a flag. This continuous processing flag is managed by setting a value of, for example, "1" when the utterance content such as "next" or "continue" is detected after the user specifies to read the first original. NS. On the other hand, when there is only one document, the user does not make an utterance suggesting reading of the next document, so that the continuous processing flag is managed by setting a value of "0", for example. The continuous processing flag may be given "0" as an initial setting value.

なお、ユーザから発話された「終了」、「以上」又は「これで最後」等の発話内容に応じて複数ページからなる原稿の最終ページの読取りが完了した場合、又は１ページのみの原稿の読取りが完了した場合には、命令管理テーブル４０２ｃの命令は削除される。ただし、連続処理フラグが『１』の場合は『０』」に変更又は設定され、『０』の場合はその値が維持されるようにしてもよい。この連続処理フラグに係る設定処理については、後ほど詳細に説明する。 It should be noted that when the reading of the last page of the manuscript consisting of multiple pages is completed according to the utterance content such as "end", "or more" or "this is the last" uttered by the user, or the reading of the manuscript of only one page is completed. When is completed, the instruction in the instruction management table 402c is deleted. However, if the continuous processing flag is "1", it may be changed or set to "0", and if it is "0", that value may be maintained. The setting process related to this continuous processing flag will be described in detail later.

さらに、命令管理テーブル４０２ｃも同様に、未登録の新たなユーザのユーザＩＤ及びそのユーザが使用する音声取得装置のデバイスＩＤ並びにそのユーザが指定した装置ＩＤをそれぞれ関連付けて、新たに追加登録されるようにしてもよい。例えば、ＡＩアシスタントサーバ装置４は、ユーザの発話に基づいて命令を生成するタイミングやＭＦＰ６に対して読取命令を送信するタイミングにおいて、命令管理テーブル４０２ｃに命令が含まれているか否かを確認し、含まれていない場合は命令管理テーブル４０２ｃに登録することができる。このとき、命令を送信する対象となるＭＦＰ６を特定するための情報として画像読取装置名又は画像読取装置の装置ＩＤと、ユーザによって指定された各種パラメータとを関連付けて、連続処理フラグは０として登録する。 Further, the instruction management table 402c is also newly additionally registered by associating the user ID of the new unregistered user, the device ID of the voice acquisition device used by the user, and the device ID specified by the user, respectively. You may do so. For example, the AI assistant server device 4 confirms whether or not the instruction is included in the instruction management table 402c at the timing of generating the instruction based on the utterance of the user and the timing of transmitting the read instruction to the MFP 6. If it is not included, it can be registered in the instruction management table 402c. At this time, the continuous processing flag is registered as 0 by associating the image reading device name or the device ID of the image reading device with various parameters specified by the user as information for identifying the MFP 6 to which the instruction is transmitted. do.

ＣＰＵ４１は、音声認識サーバ装置３で生成（変換）された解釈結果をＭＦＰ６に対する読取命令等のデータに変換してネットワーク７を介してＭＦＰ６に送信する。なお、ユーザから指示された意図は、例えば、ＭＦＰ６への読取命令及び各種命令のための指示を含む。このようにして、スマートスピーカ２で取得された音声データにより、ＭＦＰ６を操作することができる。 The CPU 41 converts the interpretation result generated (converted) by the voice recognition server device 3 into data such as a read command for the MFP 6 and transmits the data to the MFP 6 via the network 7. The intention instructed by the user includes, for example, a reading instruction to the MFP 6 and an instruction for various instructions. In this way, the MFP 6 can be operated by the voice data acquired by the smart speaker 2.

ＲＡＭ４２は、例えば、ＨＤＤ４４等の記憶部に記憶された各種プログラムがダウンロードされ、ＣＰＵ４１によって各種処理が実行されるワークエリアとしての機能を有する。 The RAM 42 has a function as a work area where various programs stored in a storage unit such as the HDD 44 are downloaded and various processes are executed by the CPU 41, for example.

ＲＯＭ４３には、例えば、ＨＤＤ４４に記憶されたプログラム以外の各種プログラムを構成するデータが記憶されている。 The ROM 43 stores, for example, data constituting various programs other than the programs stored in the HDD 44.

ＨＤＤ４４には、上述したように管理ＤＢ４０１及び紐づけ用ＤＢ４０２が構築されている。管理ＤＢ４０１には、例えば、ＡＩアシスタントサーバ装置４がクラウドサービス装置５として提供するコンテンツを示すテキストデータ、画像データ及び音声データ等が記憶されている。また、紐づけ用ＤＢ４０２には、例えば、スマートスピーカ２が複数用いられることを想定して、以下の情報が記憶されている。その情報とは、例えば、各スマートスピーカ２を特定する各デバイスＩＤと、各スマートスピーカ２への音声操作によって読取命令等が実行されるＭＦＰ６の装置ＩＤとが関連付けられた情報である。すなわち、紐づけ用ＤＢ４０２には、各スマートスピーカ２に対する音声操作により使用可能なＭＦＰ６を特定できるように、各スマートスピーカ２のデバイスＩＤとＭＦＰ６の機器ＩＤとが関連付けられて装置管理テーブル４０２ｂとして記憶されている。 As described above, the management DB 401 and the linking DB 402 are constructed in the HDD 44. The management DB 401 stores, for example, text data, image data, voice data, and the like indicating the contents provided by the AI assistant server device 4 as the cloud service device 5. Further, in the associating DB 402, for example, the following information is stored on the assumption that a plurality of smart speakers 2 are used. The information is, for example, information in which each device ID that identifies each smart speaker 2 and the device ID of the MFP 6 that executes a reading command or the like by a voice operation on each smart speaker 2 are associated with each other. That is, in the associating DB 402, the device ID of each smart speaker 2 and the device ID of the MFP 6 are associated with each other and stored as a device management table 402b so that the MFP 6 that can be used can be specified by voice operation for each smart speaker 2. Has been done.

Ｉ／Ｆ部４５には、表示部４８及び操作部４９が接続される。 A display unit 48 and an operation unit 49 are connected to the I / F unit 45.

通信部４６は、音声認識サーバ装置３及びＭＦＰ６に対するデータの送受信を、ネットワーク７を介して行う。また、通信部４６は、ネットワーク７を介して他の装置と通信を行う際、有線、無線いずれの通信形態でも通信を行うことが可能である。 The communication unit 46 transmits / receives data to / from the voice recognition server device 3 and the MFP 6 via the network 7. Further, when communicating with another device via the network 7, the communication unit 46 can perform communication in either a wired or wireless communication form.

内部バス４７は、ＣＰＵ４１、ＲＡＭ４２、ＲＯＭ４３、ＨＤＤ４４、Ｉ／Ｆ部４５及び通信部４６を接続する汎用バスである。この内部バス４７は、ＡＩアシスタントサーバ装置４が情報処理装置の機能を実現するものであれば、その種類は問わない。 The internal bus 47 is a general-purpose bus that connects the CPU 41, the RAM 42, the ROM 43, the HDD 44, the I / F unit 45, and the communication unit 46. The type of the internal bus 47 does not matter as long as the AI assistant server device 4 realizes the function of the information processing device.

表示部４８は、例えば、液晶表示部（ＬＣＤ：Liquid Crystal Display）で構成され、例えば、ＡＩアシスタントサーバ装置４の各種状態を表示する。 The display unit 48 is composed of, for example, a liquid crystal display (LCD), and displays, for example, various states of the AI assistant server device 4.

操作部４９は、例えば、液晶表示部とタッチセンサとが一体的に形成された、いわゆるタッチパネルである。操作者（ユーザ）は、操作部３９を用いて所望の動作の実行命令を行う場合、操作部４９に表示された操作ボタン（ソフトウェアキー）等を接触操作することで、所望の動作を指定する。 The operation unit 49 is, for example, a so-called touch panel in which a liquid crystal display unit and a touch sensor are integrally formed. When the operator (user) issues an execution command for a desired operation using the operation unit 39, the operator (user) specifies the desired operation by touching an operation button (software key) or the like displayed on the operation unit 49. ..

＜クラウドサービス装置のハードウェア構成＞
クラウドサービス装置５は、上述したように、例えば、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４を纏めたもので、スマートスピーカ２及びＭＦＰ６とそれぞれネットワーク７を介して接続される。クラウドサービス装置５を構成するハードウェア構成は、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４で説明したとおりである。 <Hardware configuration of cloud service device>
As described above, the cloud service device 5 is, for example, a combination of the voice recognition server device 3 and the AI assistant server device 4, and is connected to the smart speaker 2 and the MFP 6 via the network 7, respectively. The hardware configuration that constitutes the cloud service device 5 is as described in the voice recognition server device 3 and the AI assistant server device 4.

＜ＭＦＰのハードウェア構成＞
図６は、ＭＦＰのハードウェア構成の一例を示す図である。ＭＦＰ６は、コントローラ６００、近距離無線通信回路６２０、エンジン制御部６３０、操作パネル６４０、ネットワークＩ／Ｆ６５０を備えている。 <Hardware configuration of MFP>
FIG. 6 is a diagram showing an example of the hardware configuration of the MFP. The MFP 6 includes a controller 600, a short-range wireless communication circuit 620, an engine control unit 630, an operation panel 640, and a network I / F 650.

これらのうち、コントローラ６００は、例えば、操作パネル６４０からの入力等を制御する。また、コントローラ６００は、ＭＦＰ６の全体制御を行う制御部としてのＣＰＵ６０１、システムメモリ（ＭＥＭ−Ｐ）６０２、ノースブリッジ（ＮＢ）６０３、サウスブリッジ（ＳＢ）６０４、ＡＳＩＣ（Application Specific Integrated Circuit）６０６、記憶部としてのローカルメモリ（ＭＥＭ−Ｃ）６０７、ＨＤＤコントローラ６０８及び記憶部としてのＨＤＤ６０９を有する。さらに、ＮＢ６０３とＡＳＩＣ６０６との間は、ＡＧＰ（Accelerated Graphics Port）バス６２１で接続される。 Of these, the controller 600 controls, for example, the input from the operation panel 640. Further, the controller 600 includes a CPU 601 as a control unit that controls the entire MFP 6, a system memory (MEM-P) 602, a north bridge (NB) 603, a south bridge (SB) 604, and an ASIC (Application Specific Integrated Circuit) 606. It has a local memory (MEM-C) 607 as a storage unit, an HDD controller 608, and an HDD 609 as a storage unit. Further, the NB 603 and the ASIC 606 are connected by an AGP (Accelerated Graphics Port) bus 621.

ＮＢ６０３は、ＣＰＵ６０１と、ＭＥＭ−Ｐ６０２、ＳＢ６０４及びＡＳＩＣ６０６とを接続するためのブリッジ回路である。ＮＢ６０３は、ＭＥＭ−Ｐ６０２に対する読み書きなどを制御するメモリコントローラと、ＰＣＩ（Peripheral Component Interconnect）マスタ及びＡＧＰターゲットとを有する。 The NB 603 is a bridge circuit for connecting the CPU 601 to the MEM-P602, SB 604, and ASIC 606. The NB 603 has a memory controller that controls reading and writing to the MEM-P602, a PCI (Peripheral Component Interconnect) master, and an AGP target.

ＭＥＭ−Ｐ６０２は、コントローラ６００の各機能を実現させるプログラム及びデータの格納用メモリであるＲＯＭ６０２ａ、プログラム及びデータの展開並びに原稿スキャン時のストレージ用メモリ及びメモリ印刷時の描画用メモリなどとして用いるＲＡＭ６０２ｂを備える。なお、ＲＡＭ６０２ｂに記憶されているプログラムは、インストール可能な形式又は実行可能な形式のファイルで、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成してもよい。 The MEM-P602 includes a ROM 602a, which is a memory for storing programs and data that realizes each function of the controller 600, a RAM 602b, which is used as a storage memory for developing programs and data, a storage memory for document scanning, and a drawing memory for memory printing. Be prepared. The program stored in the RAM 602b is a file in an installable format or an executable format, and is recorded and provided on a computer-readable recording medium such as a CD-ROM, CD-R, or DVD. It may be configured.

ＳＢ６０４は、ＮＢ６０３とＰＣＩデバイス、周辺デバイスとを接続するためのブリッジ回路である。 The SB 604 is a bridge circuit for connecting the NB 603 to a PCI device and peripheral devices.

ＡＳＩＣ６０６は、画像処理用のハードウェア要素を有する画像処理用途向けのＩＣ（Integrated Circuit）である。その役割は、ＡＧＰバス６２１、ＰＣＩバス６２２、ＨＤＤコントローラ６０８及びＭＥＭ−Ｃ６０７をそれぞれ接続するブリッジ回路である。また、ＡＳＩＣ６０６は、ＰＣＩターゲット及びＡＧＰマスタ、ＡＳＩＣ６０６に接続される他のデバイスの動作及びタイミングを調停するアービタ（ＡＲＢ）、ＭＥＭ−Ｃ６０７を制御するメモリコントローラ、ＤＭＡ制御を司るＤＭＡＣ（Direct Memory Access Controller）、スキャナ部６３１及びプリンタ部６３２との間でＰＣＩバス６２２を介したデータ転送を行うＰＣＩユニットを有する。 The ASIC 606 is an IC (Integrated Circuit) for image processing applications having hardware elements for image processing. Its role is a bridge circuit that connects the AGP bus 621, the PCI bus 622, the HDD controller 608, and the MEM-C607, respectively. Further, the ASIC 606 includes a PCI target and an AGP master, an arbiter (ARB) that mediates the operation and timing of other devices connected to the ASIC 606, a memory controller that controls MEM-C607, and a DMAC (Direct Memory Access Controller) that controls DMA control. ), A PCI unit that transfers data between the scanner unit 631 and the printer unit 632 via the PCI bus 622.

なお、ＡＳＩＣ６０６には、ＵＳＢ（Universal Serial Bus）のインターフェイス、及び、ＩＥＥＥ１３９４（Institute of Electrical and Electronics Engineers 1394）のインターフェイスを接続するようにしてもよい。 A USB (Universal Serial Bus) interface and an IEEE 1394 (Institute of Electrical and Electronics Engineers 1394) interface may be connected to the ASIC 606.

ＭＥＭ−Ｃ６０７は、コピー用画像バッファ及び符号バッファとして用いるローカルメモリである。 The MEM-C607 is a local memory used as a copy image buffer and a code buffer.

ＨＤＤ６０９は、画像データの蓄積、読み取られた原稿の印刷時に用いるフォントデータの蓄積、フォームの蓄積等を行うためのストレージである。ＨＤＤコントローラ６０８は、ＣＰＵ６０１の制御にしたがってＨＤＤ６０９に対するデータの読出し又は書込みを制御する。 The HDD 609 is a storage for accumulating image data, accumulating font data used when printing a read original, accumulating forms, and the like. The HDD controller 608 controls reading or writing of data to the HDD 609 according to the control of the CPU 601.

ＡＧＰバス６２１は、グラフィック処理を高速化するために提案されたグラフィックスアクセラレータカード用のバスインターフェイスである。ＡＧＰバス６２１は、ＭＥＭ−Ｐ６０２に高スループットで直接アクセスすることにより、グラフィックスアクセラレータカードを高速にすることができる。 The AGP bus 621 is a bus interface for a graphics accelerator card proposed to speed up graphics processing. The AGP bus 621 can speed up the graphics accelerator card by directly accessing the MEM-P602 with high throughput.

近距離無線通信回路６２０は、近距離無線通信を行うための回路であり、近距離無線通信回路用アンテナ６２０ａを備える。近距離無線通信回路６２０は、例えば、ＮＦＣ（Near Field Communication）、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の無線通信回路である。 The short-range wireless communication circuit 620 is a circuit for performing short-range wireless communication, and includes an antenna 620a for the short-range wireless communication circuit. The short-range wireless communication circuit 620 is, for example, a wireless communication circuit such as NFC (Near Field Communication) or Bluetooth (registered trademark).

エンジン制御部６３０は、スキャナ部６３１及びプリンタ部６３２によって構成される。スキャナ部６３１及びプリンタ部６３２には、誤差拡散及びガンマ変換などの画像処理部分が含まれる。 The engine control unit 630 is composed of a scanner unit 631 and a printer unit 632. The scanner unit 631 and the printer unit 632 include image processing portions such as error diffusion and gamma conversion.

（スキャナ部の構成）
操作部１１の一部としての操作パネル６４０は、ＭＦＰ６に搭載又は接続可能であり、パネル表示部６４０ａ及びパネル操作部６４０ｂを含む。本実施形態では、一例としてＭＦＰ６に接続可能な状態を示している。パネル表示部６４０ａは、現在の設定値及び選択画面等を表示させ、操作者からの入力を受け付けるタッチパネル等を備える。また、パネル操作部６４０ｂは、原稿サイズ、ファイル形式、解像度等で与えられる原稿の読取りに係る属性情報（各種条件ともいう）の入力を受け付けるテンキー及びコピー開始指示を受け付けるスタートキー等を備える。原稿の読取りに係る属性情報は、具体的には、命令管理テーブル４０２ｃに例示した、原稿サイズ、ファイル形式、解像度、カラー／モノクロ、シングルページ／マルチページ、宛先及び連続処理フラグ等が与えられる。 (Configuration of scanner unit)
The operation panel 640 as a part of the operation unit 11 can be mounted on or connected to the MFP 6, and includes a panel display unit 640a and a panel operation unit 640b. In this embodiment, a state in which the MFP 6 can be connected is shown as an example. The panel display unit 640a is provided with a touch panel or the like that displays the current set value, the selection screen, or the like and receives input from the operator. Further, the panel operation unit 640b includes a numeric keypad that accepts input of attribute information (also referred to as various conditions) related to reading the original given in the original size, file format, resolution, and the like, and a start key that accepts a copy start instruction. Specifically, the attribute information related to the reading of the original is given the original size, file format, resolution, color / monochrome, single page / multipage, destination, continuous processing flag, etc., as exemplified in the instruction management table 402c.

ネットワークＩ／Ｆ６５０は、通信ネットワークを利用してデータ通信をするためのインターフェイスである。近距離無線通信回路６２０及びネットワークＩ／Ｆ６５０は、ＰＣＩバス６２２を介して、ＡＳＩＣ６０６に電気的に接続される。 The network I / F 650 is an interface for performing data communication using a communication network. The short-range wireless communication circuit 620 and the network I / F 650 are electrically connected to the ASIC 606 via the PCI bus 622.

なお、ＭＦＰ６は、パネル表示部６４０ａに表示される又はパネル操作部６４０ｂが備えるアプリケーション切替キーにより、ドキュメントボックス機能、コピー機能、プリンタ機能及びファクシミリ機能を切り替えて選択することが可能となる。つまり、ＭＦＰ６は、ドキュメントボックス機能の選択時にはドキュメントボックスモードとなり、コピー機能の選択時にはコピーモードとなり、プリンタ機能の選択時にはプリンタモードとなり、ファクシミリ機能の選択時にはファクシミリモードとなる。 The MFP 6 can be selected by switching the document box function, the copy function, the printer function, and the facsimile function by the application switching key displayed on the panel display unit 640a or provided in the panel operation unit 640b. That is, the MFP 6 is in the document box mode when the document box function is selected, in the copy mode when the copy function is selected, in the printer mode when the printer function is selected, and in the facsimile mode when the facsimile function is selected.

〔機能構成〕
＜音声操作システムの機能構成＞
図７は、音声操作システムを構成する各装置の機能ブロックの一例を示す図である。音声操作システム１は、図１に示したように、スマートスピーカ２、音声認識サーバ装置３、ＡＩアシスタントサーバ装置４、クラウドサービス装置５（音声認識サーバ装置３及びＡＩアシスタントサーバ装置４を纏めたもの）及びＭＦＰ６がそれぞれネットワーク７を介して接続されている。 [Functional configuration]
<Functional configuration of voice operation system>
FIG. 7 is a diagram showing an example of functional blocks of each device constituting the voice operation system. As shown in FIG. 1, the voice operation system 1 is a combination of a smart speaker 2, a voice recognition server device 3, an AI assistant server device 4, and a cloud service device 5 (voice recognition server device 3 and AI assistant server device 4). ) And the MFP 6 are connected via the network 7, respectively.

＜スマートスピーカの機能構成＞
スマートスピーカ２は、クラウドサービス装置５を構成する音声認識サーバ装置３及びＡＩアシスタントサーバ装置４との間で、例えば音声データ、画像データ及びテキストデータ等のデータ通信を行う。 <Functional configuration of smart speaker>
The smart speaker 2 performs data communication such as voice data, image data, and text data with the voice recognition server device 3 and the AI assistant server device 4 constituting the cloud service device 5.

図２に示したスマートスピーカ２のＣＰＵ２１は、ＲＯＭ２３等の記憶部に記憶された操作音声処理プログラムをＲＡＭ２２に展開して実行することで、例えば、通信制御部２５１、取得部２５２、フィードバック部２５３、記憶・読出処理部２５４（以下、通信制御部２５１〜記憶・読出処理部２５４とも記載する）として機能又は機能する手段を構成する。 The CPU 21 of the smart speaker 2 shown in FIG. 2 expands and executes the operation voice processing program stored in the storage unit such as the ROM 23 in the RAM 22, for example, the communication control unit 251 and the acquisition unit 252, and the feedback unit 253. , The storage / reading processing unit 254 (hereinafter, also referred to as the communication control unit 251 to the storage / reading processing unit 254) constitutes a function or a means for functioning.

＜スマートスピーカの各機能構成＞
次に、スマートスピーカ２の各機能構成について説明する。通信制御部２５１は、ネットワーク７を介してスマートスピーカ２と音声認識サーバ装置３又はクラウドサービス装置５との間の通信を制御し、各種データ又は情報の送受信を行う。その際、通信制御部２５１は、スマートスピーカ２の通信部２５を制御して各種データ又は情報の送受信を行う。通信制御部２５１は、次に説明する取得部２５２が取得した当該スマートスピーカ２に対してユーザが行った所定の操作及び指示等に基づく情報を音声認識サーバ装置３（又はクラウドサービス装置５）に送信する。また、通信制御部２５１は、フィードバックのために、クラウドサービス装置５からテキストデータ、画像データ、音声データ等を取得する。さらに、通信制御部２５１は、ユーザが行った所定の操作及び指示等に係る情報を音声認識サーバ装置３（又はクラウドサービス装置５）に送信する際に、スマートスピーカ２を特定するデバイスＩＤもあわせて送信する。 <Each function configuration of smart speaker>
Next, each functional configuration of the smart speaker 2 will be described. The communication control unit 251 controls communication between the smart speaker 2 and the voice recognition server device 3 or the cloud service device 5 via the network 7, and transmits / receives various data or information. At that time, the communication control unit 251 controls the communication unit 25 of the smart speaker 2 to transmit and receive various data or information. The communication control unit 251 transmits information based on a predetermined operation and instruction performed by the user to the smart speaker 2 acquired by the acquisition unit 252 described below to the voice recognition server device 3 (or cloud service device 5). Send. Further, the communication control unit 251 acquires text data, image data, voice data, etc. from the cloud service device 5 for feedback. Further, the communication control unit 251 also includes a device ID that identifies the smart speaker 2 when transmitting information related to a predetermined operation and instruction performed by the user to the voice recognition server device 3 (or cloud service device 5). And send.

上述したように、通信制御部２５１は、スマートスピーカ２がＬＡＮ等のネットワーク７を介して接続される他の装置との通信を制御する。この通信を行う際の通信方式は、例えば、一般的にＬＡＮで使用されるＥｔｈｅｒｎｅｔ（登録商標）等の通信プロトコルが用いられる。この通信制御部については、後述する音声認識サーバ装置３、ＡＩアシスタントサーバ装置４、クラウドサービス装置５及びＭＦＰ６が有する各通信制御部についても同様の機能を有する。 As described above, the communication control unit 251 controls the communication of the smart speaker 2 with another device connected via the network 7 such as a LAN. As a communication method for performing this communication, for example, a communication protocol such as Ethernet (registered trademark) generally used in a LAN is used. The communication control unit has the same function as each communication control unit of the voice recognition server device 3, the AI assistant server device 4, the cloud service device 5, and the MFP 6, which will be described later.

取得部２５２は、音声データ取得手段の一例である。取得部２５２は、マイクロホン部２９を介して集音されたユーザの音声操作に伴う指示音声を取得する。また、取得部２５２は、ユーザによるタップ操作又は物理スイッチの押下などの機械操作を含む指示操作を取得してもよい。つまり、取得部２５２は、指示音声及び指示操作を含む指示を表す情報のうち少なくとも一つを取得する。ここで、上述した指示を表す情報は、指示情報に相当する。なお、ユーザの指示音声には、例えば、ＭＦＰ６等に原稿の読取りを実行させるための読取命令及び各種命令を実行するための処理実行命令に変換するための情報が含まれる。 The acquisition unit 252 is an example of the voice data acquisition means. The acquisition unit 252 acquires the instruction voice accompanying the user's voice operation collected through the microphone unit 29. Further, the acquisition unit 252 may acquire an instruction operation including a machine operation such as a tap operation or a physical switch press by the user. That is, the acquisition unit 252 acquires at least one of the information representing the instruction including the instruction voice and the instruction operation. Here, the information representing the above-mentioned instruction corresponds to the instruction information. The user's instruction voice includes, for example, information for converting into a reading command for causing the MFP 6 or the like to read the document and a processing execution command for executing various commands.

取得部２５２は、上述した操作音声処理プログラムを実行することで、ユーザの発話によって与えられた音声データを取得して音声認識サーバ装置３（又はクラウドサービス装置５）に送信する。さらに取得部２５２は、フィードバック部２５３と協働して、クラウドサービス装置５から取得したデータ（音声データ、画像データ及びテキストデータ等）を、タッチパネル２７に表示するか、又はスピーカ部２８を介した音声をユーザに通知する。なお、タッチパネル２７は、スマートスピーカ２と一体で構成されていてもよいし、別々に構成されていてもよい。スマートスピーカ２と別々に構成される場合、タッチパネル２７は、スマートスピーカ２と行う無線通信等に必要な無線通信インターフェイスを備えておけばよい。 By executing the operation voice processing program described above, the acquisition unit 252 acquires the voice data given by the user's utterance and transmits it to the voice recognition server device 3 (or the cloud service device 5). Further, the acquisition unit 252 cooperates with the feedback unit 253 to display the data (voice data, image data, text data, etc.) acquired from the cloud service device 5 on the touch panel 27, or via the speaker unit 28. Notify the user of the voice. The touch panel 27 may be integrally configured with the smart speaker 2 or may be configured separately. When configured separately from the smart speaker 2, the touch panel 27 may be provided with a wireless communication interface necessary for wireless communication or the like performed with the smart speaker 2.

フィードバック部２５３は、ユーザの発話によって与えられた音声データに基づいてＭＦＰ６で実行される原稿の読取り及び所定の処理において、必要に応じてスマートスピーカ２がユーザに対して応答するように機能する。このフィードバック部２５３によって、本実施形態はユーザとの間での対話型システムを実現している。また、この対話型システムにおける音声操作を実現するため、フィードバック部２５３は、例えば、ユーザの指示音声に対して不足するデータを補うために音声のフィードバックを行う。さらに、フィードバック部２５３は、タッチパネル２７の画面への表示により、フィードバック対象のテキスト、音声又は画像をユーザに提供してもよい。なお、フィードバック部２５３による対話型動作及びフィードバックの詳細については、後述する。 The feedback unit 253 functions so that the smart speaker 2 responds to the user as necessary in the reading of the document and the predetermined processing executed by the MFP 6 based on the voice data given by the user's utterance. By the feedback unit 253, the present embodiment realizes an interactive system with the user. Further, in order to realize the voice operation in this interactive system, the feedback unit 253 provides voice feedback to supplement, for example, insufficient data with respect to the user's instruction voice. Further, the feedback unit 253 may provide the user with the text, voice, or image to be fed back by displaying the touch panel 27 on the screen. The details of the interactive operation and the feedback by the feedback unit 253 will be described later.

記憶・読出処理部２５４は、例えば、ＲＯＭ２３に各種データを記憶したり、ＲＯＭ２３に記憶された操作音声処理プログラム等の各種データを読み出したりする処理を行う。 The storage / reading processing unit 254 performs processing such as storing various data in the ROM 23 and reading various data such as an operation voice processing program stored in the ROM 23.

なお、本実施形態では、通信制御部２５１〜記憶・読出処理部２５４は、ソフトウェアで実現されてもよい。また、通信制御部２５１〜記憶・読出処理部２５４は、他のプログラムに処理の一部を実行させ、又は他のプログラムを用いて間接的に処理を実行させてもよい。さらに、通信制御部２５１〜記憶・読出処理部２５４は、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現されてもよい。 In this embodiment, the communication control unit 251 to the storage / reading processing unit 254 may be realized by software. Further, the communication control unit 251 to the storage / reading processing unit 254 may cause another program to execute a part of the processing, or may indirectly execute the processing using the other program. Further, the communication control unit 251 to the storage / reading processing unit 254 may be partially or wholly realized by hardware such as an IC (Integrated Circuit).

＜音声認識サーバ装置の機能構成＞
音声認識サーバ装置３は、スマートスピーカ２から受信した音声データを解析し、テキストデータへ変換する。また、テキストデータと事前登録されている辞書情報とに基づいてユーザの意図を解釈し、解釈結果をＡＩアシスタントサーバ装置４に送信する。 <Functional configuration of voice recognition server device>
The voice recognition server device 3 analyzes the voice data received from the smart speaker 2 and converts it into text data. Further, the user's intention is interpreted based on the text data and the pre-registered dictionary information, and the interpretation result is transmitted to the AI assistant server device 4.

音声認識サーバ装置３のＣＰＵ３１は、スマートスピーカ２を介してユーザによって与えられた音声データに応じて、ＨＤＤ３４等の記憶部に記憶された操作音声変換プログラム等をＲＡＭ３２に展開して実行する。この操作音声変換プログラムが実行されることにより、ＣＰＵ３１は、例えば、通信制御部３５１、取得部３５２、テキスト変換部３５３、解釈部３５４、出力部３５５、提供部３５６及び記憶・読出処理部３５７（以下、通信制御部３５１〜記憶・読出処理部３５７とも記載する）として機能又は機能する手段を構成する。 The CPU 31 of the voice recognition server device 3 expands and executes an operation voice conversion program or the like stored in a storage unit such as the HDD 34 in the RAM 32 according to the voice data given by the user via the smart speaker 2. By executing this operation voice conversion program, the CPU 31 can use, for example, the communication control unit 351 and the acquisition unit 352, the text conversion unit 353, the interpretation unit 354, the output unit 355, the provision unit 356, and the storage / reading processing unit 357 ( Hereinafter, it constitutes a function or a means for functioning as a communication control unit 351 to a storage / reading processing unit 357).

＜音声認識サーバ装置の各機能構成＞
次に、音声認識サーバ装置３の各機能構成について説明する。通信制御部３５１は、ネットワーク７を介してスマートスピーカ２又はＡＩアシスタントサーバ装置４との間の通信を制御し、各種データ又は情報の送受信を行う。具体的には、通信制御部３５１は、ユーザによって与えられた音声データの受信及びスマートスピーカ２に対するテキストデータの送信等を行うように、音声認識サーバ装置３の通信部３６を制御する。 <Each function configuration of voice recognition server device>
Next, each functional configuration of the voice recognition server device 3 will be described. The communication control unit 351 controls communication with the smart speaker 2 or the AI assistant server device 4 via the network 7, and transmits / receives various data or information. Specifically, the communication control unit 351 controls the communication unit 36 of the voice recognition server device 3 so as to receive the voice data given by the user, transmit the text data to the smart speaker 2, and the like.

取得部３５２は、スマートスピーカ２から送信される所定の操作及び指示等に基づく情報を取得する。また、取得部３５２は、スマートスピーカ２のタッチパネル、ボタン又はスイッチ等のユーザ操作に基づく情報を取得してもよい。 The acquisition unit 352 acquires information based on predetermined operations, instructions, and the like transmitted from the smart speaker 2. In addition, the acquisition unit 352 may acquire information based on user operations such as the touch panel, buttons, or switches of the smart speaker 2.

テキスト変換部３５３は、取得部３５２で取得した情報、すなわち音声データをテキストデータに変換する。 The text conversion unit 353 converts the information acquired by the acquisition unit 352, that is, the voice data into text data.

解釈部３５４は、テキスト変換部３５３で変換されたテキストデータに基づいて、ユーザからの指示を解釈する。具体的には、解釈部３５４は、音声アシスタントプログラムから提供された辞書情報に基づいて、テキストデータに含まれる単語などが辞書情報と一致しているか否かを判断する。そして、辞書情報と一致している場合には、解釈部３５４は、ユーザの意図を示すインテントと所定の処理の実行条件などの変数を示すパラメータに変換する。解釈部３５４は、インテント及びパラメータを、通信制御部３５１を介してＡＩアシスタントサーバ装置４で実行される管理プログラムに送信する。このとき、解釈部３５４は、スマートスピーカ２のデバイスＩＤもインテント及びパラメータと共に通信制御部３５１を介してＡＩアシスタントサーバ装置４で実行される管理プログラムに送信する。 The interpretation unit 354 interprets the instruction from the user based on the text data converted by the text conversion unit 353. Specifically, the interpretation unit 354 determines whether or not the words included in the text data match the dictionary information based on the dictionary information provided by the voice assistant program. Then, if it matches the dictionary information, the interpretation unit 354 converts it into a parameter indicating a variable such as an intent indicating the user's intention and an execution condition of a predetermined process. The interpretation unit 354 transmits the intent and the parameters to the management program executed by the AI assistant server device 4 via the communication control unit 351. At this time, the interpretation unit 354 also transmits the device ID of the smart speaker 2 together with the intent and the parameter to the management program executed by the AI assistant server device 4 via the communication control unit 351.

出力部３５５は、スマートスピーカ２に対するテキストデータ、音声データ、画像データ等のデータの送信を行うように、通信部３６を制御する。 The output unit 355 controls the communication unit 36 so as to transmit data such as text data, voice data, and image data to the smart speaker 2.

さらに、ＣＰＵ３１は、ＨＤＤ３４等の記憶部に記憶された音声アシスタントプログラムを実行することで、提供部３５６として機能する。 Further, the CPU 31 functions as the providing unit 356 by executing the voice assistant program stored in the storage unit such as the HDD 34.

提供部３５６は、ＨＤＤ３４等の記憶部に記憶されているテキストデータ、インテント及びパラメータの関係を予め定義した辞書情報を管理し、操作音声変換プログラムに対して提供する。また、提供部３５６は、テキスト変換部３５３で変換したテキストデータに基づいて、ユーザからの発話内容を変換、解釈してもよい。すなわち、提供部３５６は、テキスト変換部３５３及び解釈部３５４の機能を併せ持っていてもよい。具体的には、提供部３５６は、まず操作音声変換プログラムからテキストデータを取得し、テキストデータに含まれる単語などが辞書情報と一致しているか否かを判断する。その判断の結果、辞書情報と一致している場合には、提供部３５６は、テキストデータをインテントとパラメータに変換する。その後、提供部３５６は、インテント及びパラメータを操作音声変換プログラムに対して提供する。 The providing unit 356 manages the dictionary information in which the relationship between the text data, the intent, and the parameters stored in the storage unit such as the HDD 34 is defined in advance, and provides the dictionary information to the operation voice conversion program. Further, the providing unit 356 may convert and interpret the utterance content from the user based on the text data converted by the text conversion unit 353. That is, the providing unit 356 may also have the functions of the text conversion unit 353 and the interpretation unit 354. Specifically, the providing unit 356 first acquires text data from the operation voice conversion program, and determines whether or not the words or the like included in the text data match the dictionary information. As a result of the determination, if the information matches the dictionary information, the providing unit 356 converts the text data into an intent and a parameter. After that, the providing unit 356 provides the intent and the parameter to the operation voice conversion program.

記憶・読出処理部３５７は、例えば、ＲＯＭ３３に記憶された操作音声変換プログラム等の各種プログラムを構成するデータの読出し処理を行う。 The storage / reading processing unit 357 performs reading processing of data constituting various programs such as an operation voice conversion program stored in the ROM 33, for example.

なお、本実施形態では、通信制御部３５１〜記憶・読出処理部３５７は、ソフトウェアで実現されてもよい。また、通信制御部３５１〜記憶・読出処理部３５７は、他のプログラムに処理の一部を実行させ、又は他のプログラムを用いて間接的に処理を実行させてもよい。例えば、操作音声変換プログラムの解釈部３５４の機能の一部又は全てを音声アシスタントプログラムに実行させてもよい。さらに、操作画像変換プログラムの解釈部３５４の機能の一部又は全てを画像アシスタントプログラムに実行させてもよい。これらの場合、例えば、テキストデータに含まれる単語などが辞書情報と一致しているか否かの判断、及び辞書情報と一致している場合にユーザの意図を示すインテントと所定の処理の実行条件などの変数を示すパラメータへの変換は、音声アシスタントアプリ、画像アシスタントアプリ等に実行させてもよい。さらに、解釈部３５４は、インテント及びパラメータを音声アシスタントプログラム等から取得するものとしてもよい。さらに、通信制御部３５１〜記憶・読出処理部３５７のうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。 In this embodiment, the communication control unit 351 to the storage / reading processing unit 357 may be realized by software. Further, the communication control unit 351 to the storage / reading processing unit 357 may cause another program to execute a part of the processing, or may indirectly execute the processing using the other program. For example, the voice assistant program may execute a part or all of the functions of the interpretation unit 354 of the operation voice conversion program. Further, the image assistant program may execute a part or all of the functions of the interpretation unit 354 of the operation image conversion program. In these cases, for example, it is determined whether or not the words included in the text data match the dictionary information, and the intent indicating the user's intention and the execution condition of the predetermined process when they match the dictionary information. The conversion to the parameter indicating the variable such as may be executed by the voice assistant application, the image assistant application, or the like. Further, the interpretation unit 354 may acquire the intent and the parameter from the voice assistant program or the like. Further, a part or all of the communication control unit 351 to the storage / reading processing unit 357 may be realized by hardware such as an IC (Integrated Circuit).

また、上述した例では、提供部３５６をソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよいこと等は、上述の他のプログラムと同様である。 Further, in the above-mentioned example, the providing unit 356 is realized by software, but some or all of them may be realized by hardware such as an IC (Integrated Circuit). Similar to other programs.

＜ＡＩアシスタントサーバ装置の機能構成＞
ＡＩアシスタントサーバ装置４は、例えば、音声認識サーバ装置３で実行された操作音声変換プログラムによって得られたインテント、パラメータ及びスマートスピーカ２のデバイスＩＤ等を取得して、後述する各機能の処理を行う。 <Functional configuration of AI assistant server device>
The AI assistant server device 4 acquires, for example, the intent, parameters, device ID of the smart speaker 2 and the like obtained by the operation voice conversion program executed by the voice recognition server device 3, and processes each function described later. conduct.

また、ＡＩアシスタントサーバ装置４は、音声認識サーバ装置３から受信した解釈結果を、ＭＦＰ６に対する読取命令等のデータに変換する。その後、ＡＩアシスタントサーバ装置４は、変換した読取命令等のデータをＭＦＰ６に送信する。ＭＦＰ６では、ＡＩアシスタントサーバ装置４から送信される読取命令等にしたがって所定の処理が実行される。なお、ＡＩアシスタントサーバ装置４は、ＭＦＰ６に読取命令等を送信する以外に、例えば、ＭＦＰ６を管理する他のサーバ装置が存在すれば、ＭＦＰ６で実行される他の実行命令等を他のサーバ装置に送信してもよい。 Further, the AI assistant server device 4 converts the interpretation result received from the voice recognition server device 3 into data such as a read command for the MFP 6. After that, the AI assistant server device 4 transmits the converted data such as the read instruction to the MFP 6. In the MFP 6, a predetermined process is executed according to a read command or the like transmitted from the AI assistant server device 4. In addition to transmitting a read command or the like to the MFP 6, the AI assistant server device 4 transmits, for example, another execution command or the like executed by the MFP 6 to another server device if another server device that manages the MFP 6 exists. May be sent to.

ＡＩアシスタントサーバ装置４のＣＰＵ４１は、ネットワーク７を介して音声認識サーバ装置３のＨＤＤ３４等の記憶部に記憶された管理プログラムを取得し、ＲＡＭ４２に展開して実行する。ＣＰＵ４１は、この管理プログラムを実行することで、例えば、通信制御部４５１、取得部４５２、解釈結果変換部４５３、実行判定部４５４、補完部４５５、実行指示部４５６、機器情報取得部４５７、通知部４５８、管理部４５９、検索部４６０及び記憶・読出処理部４６１（以下、通信制御部４５１〜記憶・読出処理部４６１とも記載する）として機能又は機能する手段を構成する。 The CPU 41 of the AI assistant server device 4 acquires the management program stored in the storage unit such as the HDD 34 of the voice recognition server device 3 via the network 7, expands it into the RAM 42, and executes it. By executing this management program, the CPU 41 may execute, for example, the communication control unit 451 and the acquisition unit 452, the interpretation result conversion unit 453, the execution determination unit 454, the complementary unit 455, the execution instruction unit 456, the device information acquisition unit 457, and the notification. It constitutes a unit 458, a management unit 459, a search unit 460, and a storage / reading processing unit 461 (hereinafter, also referred to as a communication control unit 451-storage / reading processing unit 461).

＜ＡＩアシスタントサーバ装置の各機能構成＞
次に、ＡＩアシスタントサーバ装置４の各機能構成について説明する。通信制御部４５１は、ユーザのスマートスピーカ２に対する解釈結果の送信、及びユーザによって与えられた音声データに係るテキストデータの受信等を行うように通信部４６を制御する。 <Each function configuration of AI assistant server device>
Next, each functional configuration of the AI assistant server device 4 will be described. The communication control unit 451 controls the communication unit 46 so as to transmit the interpretation result to the user's smart speaker 2, receive the text data related to the voice data given by the user, and the like.

取得部４５２は、は、音声認識サーバ装置３から送信されるインテント、パラメータ及びスマートスピーカ２のデバイスＩＤ等を取得する。 The acquisition unit 452 acquires the intent, the parameter, the device ID of the smart speaker 2, and the like transmitted from the voice recognition server device 3.

解釈結果変換部４５３は、操作音声変換プログラムで変換されたインテント及びパラメータなどの解釈結果を、ＭＦＰ６が解釈可能な読取命令等に変換する。この解釈結果変換部４５３は、ＡＩアシスタントサーバ装置４（又はクラウドサービス装置５）で実行される管理プログラムの機能の一つであり、読取命令変換手段の機能を担う。また、ＭＦＰ６が解釈可能な読取命令は、当該ＭＦＰ６における原稿の読取り処理（以下、読取処理と記載する）を実行するための情報（処理情報）の一例である。 The interpretation result conversion unit 453 converts the interpretation result such as the intent and the parameter converted by the operation voice conversion program into a reading command or the like that can be interpreted by the MFP 6. The interpretation result conversion unit 453 is one of the functions of the management program executed by the AI assistant server device 4 (or the cloud service device 5), and has a function of a read instruction conversion means. Further, the scanning instruction that can be interpreted by the MFP 6 is an example of information (processing information) for executing the document scanning process (hereinafter, referred to as scanning process) in the MFP 6.

実行判定部４５４は、取得した機器情報で示されるＭＦＰ６の状態と、ユーザから指定された原稿読取指示及び印刷指示等を比較することで、ユーザから指定された原稿読取指示及び印刷指示等に基づく各処理をＭＦＰ６で実行することが可能か否かを判断する。ユーザから指定された原稿の読取り及び印刷に係る内容は、例えば、ユーザから指示された時間帯に当該ＭＦＰ６が使用可能か否かの判断処理、当該ＭＦＰ６の電源状態の変更処理、当該ＭＦＰ６に対する原稿の読取り処理及び印刷処理である。また、ユーザから指定された原稿読取指示及び印刷指示等に基づく各処理が実行可能と判断された場合、実行判定部４５４は、解釈結果変換部４５３に対して、ＭＦＰ６に出力要求の一例としての読取命令及び印刷命令等に変換するよう判定する。一方、実行不可能と判断した場合、実行判定部４５４は、操作音声変換プログラム等の実行の下、スマートスピーカ２に対してエラーメッセージ等のレスポンス情報をフィードバックする。 The execution determination unit 454 compares the state of the MFP 6 indicated by the acquired device information with the document reading instruction and the printing instruction specified by the user, and is based on the document reading instruction and the printing instruction specified by the user. It is determined whether or not each process can be executed by the MFP 6. The contents related to reading and printing of the document specified by the user include, for example, a process of determining whether or not the MFP 6 can be used during the time zone instructed by the user, a process of changing the power state of the MFP 6, and a document for the MFP 6. Reading process and printing process. Further, when it is determined that each process based on the document reading instruction and the printing instruction specified by the user can be executed, the execution determination unit 454 causes the interpretation result conversion unit 453 as an example of an output request to the MFP 6. It is determined to convert to a read command, a print command, or the like. On the other hand, when it is determined that the execution is impossible, the execution determination unit 454 feeds back the response information such as an error message to the smart speaker 2 under the execution of the operation voice conversion program or the like.

補完部４５５は、解釈結果変換部４５３によって変換される読取命令及び印刷命令等の各種実行命令に対して、装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃを参照して、ＭＦＰ６における処理に必要な情報を補完する機能を有する。この処理に必要な情報とは、例えば、ＭＦＰ６に対する読取命令及び印刷命令等への変換に必要な情報である。この補完部４５５は補完手段の一例である。 The complement unit 455 refers to the device management table 402b and the instruction management table 402c for various execution instructions such as a read instruction and a print instruction converted by the interpretation result conversion unit 453, and provides information necessary for processing in the MFP 6. It has a complementary function. The information required for this processing is, for example, information required for conversion into a read command, a print command, or the like for the MFP 6. The complementary unit 455 is an example of complementary means.

実行指示部４５６は、解釈結果変換部４５３で変換されたＭＦＰ６への読取命令及び印刷命令等の実行を指示する。また、実行指示部４５６は、ユーザが使用したスマートスピーカ２を特定するデバイスＩＤに関連付けられているＭＦＰ６を紐づけ用ＤＢ４０２から検索し、ＭＦＰ６に対して、インテント及びパラメータと共に読取命令及び印刷命令等を送信する。 The execution instruction unit 456 instructs the MFP 6 to execute the read instruction, the print instruction, and the like converted by the interpretation result conversion unit 453. Further, the execution instruction unit 456 searches the MFP6 associated with the device ID that identifies the smart speaker 2 used by the user from the association DB 402, and gives the MFP6 a read command and a print command together with an intent and a parameter. Etc. are sent.

機器情報取得部４５７は、例えば、ＭＦＰ６との通信接続が確立されているか否かを示す接続状態、ＭＦＰ６の電源のオン／オフ等に係る電力使用状態、ＭＦＰ６への電力供給状態（通常モード、省エネモード等）の機器情報を取得する。なお、機器情報取得部４５７は、ＭＦＰ６から取得した機器情報を、ＭＦＰ６を特定する装置ＩＤ等と関連付けてＨＤＤ４４等の記憶部に記憶して管理する。この機器情報の記憶先は、紐づけ用ＤＢ４０２を構築する後述する装置管理テーブル４０２ｂでもよい。さらに、機器情報取得部４５７は、装置管理テーブル４０２ｂを参照して、ＭＦＰ６で実行される読取命令の生成に関連する情報を補完する機能も有する。 The device information acquisition unit 457 is, for example, a connection state indicating whether or not a communication connection with the MFP 6 is established, a power usage state related to power on / off of the MFP 6, and a power supply state to the MFP 6 (normal mode, Acquire device information (energy saving mode, etc.). The device information acquisition unit 457 stores and manages the device information acquired from the MFP 6 in a storage unit such as the HDD 44 in association with the device ID or the like that identifies the MFP 6. The storage destination of this device information may be the device management table 402b, which will be described later, which constructs the linking DB 402. Further, the device information acquisition unit 457 also has a function of supplementing information related to the generation of the read instruction executed by the MFP 6 with reference to the device management table 402b.

通知部４５８は、ユーザによる原稿読取指示及び印刷指示等への応答としてテキストデータ、音声データ及び画像データ等を操作音声変換プログラム等に通知する。また、ＭＦＰ６に対する読取命令及び印刷命令等の実行条件を示すパラメータが不足している場合には、通知部４５８は、操作音声変換プログラム等を介してスマートスピーカ２に対してフィードバックを行う。つまり、通知部４５８は、ユーザに対して不足しているパラメータの入力を促す。ここで、通知部４５８は、不足しているパラメータを確認するために必要な情報として、所定のパラメータ情報をスマートスピーカ２に送信してもよいし、パラメータの指定を促すために必要な情報としてテキストデータ、音声データ及び画像データ等をスマートスピーカ２に送信してもよい。上述した処理によって、ユーザは、どんな情報が不足しているかをスマートスピーカ２から発生される音声等によって確認することができる。 The notification unit 458 notifies the operation voice conversion program or the like of text data, voice data, image data, etc. as a response to the document reading instruction, the print instruction, and the like by the user. Further, when the parameters indicating the execution conditions such as the read command and the print command for the MFP 6 are insufficient, the notification unit 458 provides feedback to the smart speaker 2 via the operation voice conversion program or the like. That is, the notification unit 458 prompts the user to input the missing parameters. Here, the notification unit 458 may transmit predetermined parameter information to the smart speaker 2 as information necessary for confirming the missing parameter, or as information necessary for prompting the specification of the parameter. Text data, audio data, image data and the like may be transmitted to the smart speaker 2. By the above-mentioned processing, the user can confirm what kind of information is lacking by voice or the like generated from the smart speaker 2.

管理部４５９は、スマートスピーカ２又はクラウドサービス装置５に接続されたクライアントデバイスに対して入力された情報に基づいて、スマートスピーカ２のデバイスＩＤとＭＦＰ６の装置ＩＤとを関連付けて、紐づけ用ＤＢ４０２に登録する。つまり、紐づけ用ＤＢ４０２では、スマートスピーカ２のデバイスＩＤとＭＦＰ６の装置ＩＤとを関連付けた情報が、装置管理テーブル４０２ｂとして記憶され、管理される。 The management unit 459 associates the device ID of the smart speaker 2 with the device ID of the MFP 6 based on the information input to the client device connected to the smart speaker 2 or the cloud service device 5, and associates the DB 402 for linking. Register with. That is, in the associating DB 402, the information associated with the device ID of the smart speaker 2 and the device ID of the MFP 6 is stored and managed as the device management table 402b.

検索部４６０は、デバイスＩＤ及びユーザＩＤ（使用者ＩＤ）に基づいてＭＦＰ６を検索し、特定する。なお、検索部４６０は、上述した管理部４５９と合わせて一つの機能ユニットとして機能してもよい。 The search unit 460 searches and identifies the MFP 6 based on the device ID and the user ID (user ID). The search unit 460 may function as one functional unit together with the management unit 459 described above.

記憶・読出処理部４６１は、ＡＩアシスタントサーバ装置４のＨＤＤ４４等の記憶部に記憶された各種データの読み出し、ＨＤＤ４４等の記憶部への各種データの書き込み等の各処理を行う。 The storage / reading processing unit 461 performs each processing such as reading various data stored in a storage unit such as HDD 44 of the AI assistant server device 4 and writing various data to the storage unit such as HDD 44.

上述した通信制御部４５１〜記憶・読出処理部４６１のそれぞれの機能は一例であり、どの機能ユニットがどのような処理を行うかは、音声操作システム１のソフトウェア構成により適宜変えてもよい。 Each function of the communication control unit 451 to the storage / reading processing unit 461 described above is an example, and which functional unit performs what kind of processing may be appropriately changed depending on the software configuration of the voice operation system 1.

なお、本実施形態では、通信制御部４５１〜記憶・読出処理部４６１をソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。また、通信制御部４５１〜記憶・読出処理部４６１が実現する機能は、音声認識サーバ装置３のＨＤＤ３４等の記憶部に記憶された他のプログラムに処理の一部を実行させる、又は他のプログラムを用いて間接的に処理を実行させてもよい。 In the present embodiment, the communication control unit 451 to the storage / reading processing unit 461 are realized by software, but some or all of them are realized by hardware such as an IC (Integrated Circuit). You may. Further, the function realized by the communication control unit 451 to the storage / reading processing unit 461 is to cause another program stored in the storage unit such as the HDD 34 of the voice recognition server device 3 to execute a part of the processing, or another program. May be used to indirectly execute the process.

（クラウドサービス装置による解釈動作の詳細）
ここで、クラウドサービス装置５による解釈動作の詳細について説明する。クラウドサービス装置５は、上述したように音声認識サーバ装置３及びＡＩアシスタントサーバ装置４を一つに纏めた装置であり、一つのサーバ装置としても機能するものである。操作音声変換プログラムは、ユーザの発話に基づく各種指示を解釈するための辞書情報に基づいてインテント及びパラメータを生成する。より具体的には、操作音声変換プログラムは、ユーザの発話によって与えられた音声データから変換されたテキストデータに含まれる単語などが辞書情報と一致するか否かを判断し、一致する場合は辞書情報に定義されているインテント及びパラメータを含む解釈結果を生成する。 (Details of interpretation operation by cloud service device)
Here, the details of the interpretation operation by the cloud service device 5 will be described. As described above, the cloud service device 5 is a device that integrates the voice recognition server device 3 and the AI assistant server device 4, and also functions as one server device. The operation voice conversion program generates intents and parameters based on dictionary information for interpreting various instructions based on the user's utterance. More specifically, the operation voice conversion program determines whether or not the words included in the text data converted from the voice data given by the user's utterance match the dictionary information, and if they match, the dictionary. Generate an interpretation result that includes the intents and parameters defined in the information.

上述した辞書情報は、インテント及びパラメータを生成することができるものであればどのような形態であってもよい。一例として、辞書情報は、エンティティ情報、インテント情報及び関連付け情報を含んで構成される。エンティティ情報は、ＭＦＰ６が所定の処理を実行するためのパラメータと自然言語を関連付ける情報である。また、一つのパラメータには、複数の類義語が登録可能である。インテント情報は、上述したように所定の処理の種類を示す情報である。関連付け情報は、ユーザが発話した発話フレーズ（自然言語）及びエンティティ情報、並びに、発話フレーズ及びインテント情報を、それぞれ関連付ける情報である。この関連付け情報により、ＡＩアシスタントサーバ装置４（又はクラウドサービス装置５）は、パラメータの発話順序又はニュアンスが多少変わっても、正しい解釈が可能となる。また、関連付け情報は、発話された内容に基づいてレスポンスのテキスト（解釈結果）を生成してもよい。なお、辞書情報は、上述したＡＩアシスタントサービス情報と一部機能を共通にする。 The above-mentioned dictionary information may be in any form as long as it can generate intents and parameters. As an example, dictionary information is configured to include entity information, intent information, and association information. The entity information is information that associates a parameter for the MFP 6 to execute a predetermined process with a natural language. In addition, a plurality of synonyms can be registered in one parameter. The intent information is information indicating a predetermined type of processing as described above. The association information is information that associates the utterance phrase (natural language) and entity information uttered by the user, and the utterance phrase and intent information, respectively. With this association information, the AI assistant server device 4 (or cloud service device 5) can correctly interpret even if the utterance order or nuance of the parameters is slightly changed. Further, the association information may generate a response text (interpretation result) based on the uttered content. The dictionary information shares some functions with the AI assistant service information described above.

さらに、エンティティ情報には、関連付け情報の一例としてのパラメータに係る類義語も関連付けられて記憶されている。この類義語には、例えば、「スキャン」や「スキャンして」といった発話内容に対して、「読み取る」、「読取り」、「読み取って」等がＭＦＰ６に対する同じ命令及び処理を与えるものとして対応付けられている。このような類義語を登録することで、クラウドサービス装置５は、例えば、ＭＦＰ６を用いて原稿を読み取る場合に、「これ１０００ｄｐｉでスキャンして」と発話しても、「これ１０００ｄｐｉで読み取って」と発話しても、同様の処理を行うパラメータとして設定することができる。つまり、クラウドサービス装置５は、同様の処理として解釈をすることができる。 Further, synonyms related to parameters as an example of association information are also associated and stored in the entity information. This synonym is associated with, for example, "reading", "reading", "reading", etc. as giving the same command and processing to the MFP 6 for the utterance contents such as "scan" and "scan". ing. By registering such synonyms, for example, when the cloud service device 5 reads a document using the MFP 6, even if it says "scan this at 1000 dpi", it says "read this at 1000 dpi". Even if you speak, you can set it as a parameter that performs the same processing. That is, the cloud service device 5 can be interpreted as the same process.

（対話型動作）
本実施形態の音声操作システム１では、ユーザの発話によって与えられた音声データに基づいてシステムが応答する対話型システムによる対話型動作を実現している。この対話型動作は、上述したように、スマートスピーカ２のフィードバック部２５３によって実行される動作の一つである。また、音声操作システム１は、対話等に必要な定型文を応答する以外に、ＭＦＰ６における原稿の読取りに係る特有の応答として、「入力不足フィードバック」及び「入力確認フィードバック」の、２種類の応答をする。これによって、音声操作システム１は、対話によるＭＦＰ６における読取処理及び印刷処理等を可能とする対話型の画像読取操作システムを実現している。 (Interactive operation)
In the voice operation system 1 of the present embodiment, the interactive operation by the interactive system in which the system responds based on the voice data given by the user's utterance is realized. As described above, this interactive operation is one of the operations executed by the feedback unit 253 of the smart speaker 2. In addition to responding to fixed phrases required for dialogue, the voice operation system 1 responds with two types of responses, "input shortage feedback" and "input confirmation feedback", as specific responses related to reading a manuscript in the MFP 6. do. As a result, the voice operation system 1 realizes an interactive image reading operation system that enables interactive reading processing, printing processing, and the like in the MFP 6.

「入力不足フィードバック」は、ＭＦＰ６における原稿の読取りを実行するために必要な情報が揃っていない場合にスマートスピーカ２から出力される応答である。さらに、「入力不足フィードバック」は、ユーザの発話によって与えられた音声データの入力内容を認識できなかった場合、又は、音声操作による入力内容に必要な項目（以下、必須パラメータという）が不足している場合にスマートスピーカ２から出力される。換言すれば、必須パラメータ以外の項目（以下、単にパラメータともいう）については、ユーザから指示されていない場合であっても入力不足フィードバックを行う必要はない。一方で、「入力不足フィードバック」は、パラメータ以外にも、ＭＦＰ６における原稿の読取りにおいて必要な機能を確認する処理を含んでもよい。 The “insufficient input feedback” is a response output from the smart speaker 2 when the information necessary for executing the reading of the document in the MFP 6 is not prepared. Further, "insufficient input feedback" is when the input content of the voice data given by the user's utterance cannot be recognized, or the items required for the input content by voice operation (hereinafter referred to as essential parameters) are insufficient. If so, it is output from the smart speaker 2. In other words, for items other than the required parameters (hereinafter, also simply referred to as parameters), it is not necessary to provide insufficient input feedback even if the user does not instruct. On the other hand, the "insufficient input feedback" may include a process of confirming a function required for reading a document in the MFP 6 in addition to the parameters.

対話型動作では、フィードバック部２５３は、クラウドサービス装置５が通信接続中の画像読取装置の種類に応じて、ユーザに確認する機能及びパラメータを変更してもよい。この場合、ＡＩアシスタントサーバ装置４の機器情報取得部４５７が、画像読取装置との通信が確立した後の所定のタイミングで画像読取装置の種類及び機能を示す情報を取得する。その後、機器情報取得部４５７は、取得した情報に基づいて、フィードバック部２５３がユーザに確認する機能及びパラメータを決定してもよい。 In the interactive operation, the feedback unit 253 may change the function and parameters to be confirmed by the user according to the type of the image reading device to which the cloud service device 5 is connected to the communication. In this case, the device information acquisition unit 457 of the AI assistant server device 4 acquires information indicating the type and function of the image reading device at a predetermined timing after the communication with the image reading device is established. After that, the device information acquisition unit 457 may determine the function and parameters that the feedback unit 253 confirms with the user based on the acquired information.

例えば、画像読取装置がＭＦＰ６である場合、フィードバック部２５３は、ＭＦＰ６での原稿の読取りに必要な項目（使用者名、使用日時、等）をユーザに確認できる。更に、フィードバック部２５３は、ＭＦＰ６で使用される備品リソース等の情報をユーザに確認してもよい。また、機器情報取得部４５７は、ユーザから指定された設定条件に応じて必須パラメータを変更してもよい。例えば、ユーザが指定した原稿の読取りの条件が見開きページ読取りの場合は、機器情報取得部４５７は、原稿の読取りに必要な具体的な条件（例えば、ＡＤＦによる原稿の読取りか原稿台による原稿の読取りか、等）を必須パラメータとして設定してもよい。 For example, when the image reading device is the MFP 6, the feedback unit 253 can confirm with the user the items (user name, date and time of use, etc.) necessary for reading the original by the MFP 6. Further, the feedback unit 253 may confirm with the user information such as equipment resources used in the MFP 6. Further, the device information acquisition unit 457 may change the essential parameters according to the setting conditions specified by the user. For example, when the reading condition of the document specified by the user is a spread page reading, the device information acquisition unit 457 has a specific condition required for reading the document (for example, reading the document by ADF or reading the document by the platen). Read or etc.) may be set as a required parameter.

「入力確認フィードバック」は、ＭＦＰ６での原稿の読取りを実行するために必要な情報が揃った場合に出力される応答である。つまり、「入力確認フィードバック」は、全ての必須パラメータについて指示された場合に行われる。また、「入力確認フィードバック」は、現在の設定値で読取処理を実行するか、又は、設定値を変更するかの選択をユーザに促すために行われる。なお、「入力確認フィードバック」が行われることによって、現在の設定値で読取処理を実行するか否かを確認するために、ユーザにより指示された全てのパラメータ（必須パラメータか必須パラメータ以外のパラメータかに関わらず）を、ユーザに確認することができる。 The “input confirmation feedback” is a response that is output when the information necessary for executing the reading of the original by the MFP 6 is prepared. That is, "input confirmation feedback" is performed when all the required parameters are instructed. Further, the "input confirmation feedback" is performed to prompt the user to select whether to execute the reading process with the current set value or to change the set value. It should be noted that all the parameters (essential parameters or parameters other than the essential parameters) instructed by the user in order to confirm whether or not the reading process is executed with the current set value by performing the "input confirmation feedback" are performed. Regardless of) can be confirmed to the user.

（ＡＩアシスタントサーバ装置からフィードバックされる情報の例）
上述の説明では、スマートスピーカ２のフィードバック部２５３はレスポンス情報に含まれるテキストデータ及び音声データを出力することとして説明した。しかし、フィードバック部２５３は、スマートスピーカ２のＲＯＭ２３等の記憶部に記憶されたテキストデータに基づいて、レスポンス情報に対応するテキストデータを形成し、フィードバック出力（音声出力及びテキスト出力のうち少なくとも一つ）を行ってもよい。なお、具体的なフィードバックの内容は後述する。 (Example of information fed back from the AI assistant server device)
In the above description, the feedback unit 253 of the smart speaker 2 has been described as outputting text data and voice data included in the response information. However, the feedback unit 253 forms text data corresponding to the response information based on the text data stored in the storage unit such as the ROM 23 of the smart speaker 2, and the feedback output (at least one of the audio output and the text output). ) May be performed. The specific content of the feedback will be described later.

次に、紐づけ用ＤＢ４０２の具体例について図５を用いて説明する。図５は、情報処理システムの一例としての音声操作システム１で用いられる紐づけ用ＤＢ４０２で管理されるデータテーブルの一例である。例えば、本実施形態では、デバイスＩＤとして「ｕｄ１００１」を有するスマートスピーカ２から原稿読取指示が与えられた画像読取装置の名称は、「ＭＦＰ＿＃１」であり、「ＭＦＰ＿＃１」の装置ＩＤは、「ｄ０００１」である。以下、詳細な説明は省略するが、図５に示した紐づけ用ＤＢ４０２の装置管理テーブル４０２ｂは、音声取得装置名毎に、音声取得装置のデバイスＩＤ、画像読取装置名及び装置ＩＤとが関連付けられている。すなわち、紐づけ用ＤＢ４０２には、各スマートスピーカ２とＭＦＰ６とを特定できるように、各スマートスピーカ２のデバイスＩＤとＭＦＰ６の装置ＩＤとがそれぞれ関連付けられて記憶されている。なお、図５に示したそれぞれのＩＤの種類及び値は一例であり、上述した内容に限らない。 Next, a specific example of the linking DB 402 will be described with reference to FIG. FIG. 5 is an example of a data table managed by the linking DB 402 used in the voice operation system 1 as an example of the information processing system. For example, in the present embodiment, the name of the image reading device to which the document reading instruction is given from the smart speaker 2 having "ud1001" as the device ID is "MFP_ # 1", and the device ID of "MFP_ # 1" is , "D0001". Although detailed description thereof will be omitted below, the device management table 402b of the linking DB 402 shown in FIG. 5 is associated with the device ID of the voice acquisition device, the image reading device name, and the device ID for each voice acquisition device name. Has been done. That is, in the associating DB 402, the device ID of each smart speaker 2 and the device ID of the MFP 6 are stored in association with each other so that the smart speaker 2 and the MFP 6 can be identified. The types and values of the respective IDs shown in FIG. 5 are examples, and are not limited to the above-mentioned contents.

＜ＭＦＰの機能構成＞
ＭＦＰ６のＣＰＵ６０１は、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）から送信された読取命令に基づいて、ＨＤＤ６０９等の記憶手段に記憶された原稿の読取りに係る実行プログラムをＲＡＭ６０２ｂに展開して実行する。ＣＰＵ６０１は、この読取命令を実行することで、例えば、通信制御部６５１、命令受信部６５２、判断部６５３、読取実行部６５４、通知部６５５及び記憶・読出処理部６５６として機能又は機能する手段を構成する。 <Functional configuration of MFP>
Based on the reading instruction transmitted from the cloud service device 5 (or AI assistant server device 4), the CPU 601 of the MFP 6 expands the execution program related to reading the document stored in the storage means such as the HDD 609 into the RAM 602b and executes it. do. By executing this reading command, the CPU 601 can function or function as, for example, a communication control unit 651, a command receiving unit 652, a determination unit 653, a reading execution unit 654, a notification unit 655, and a storage / reading processing unit 656. Configure.

＜ＭＦＰの各機能構成＞
次に、ＭＦＰ６の各機能構成について説明する。通信制御部６５１は、ＡＩアシスタントサーバ装置４の通信制御部４５１とネットワーク７を介して通信を行う。但し、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）と直接通信を行ってもよい。 <Each functional configuration of MFP>
Next, each functional configuration of the MFP 6 will be described. The communication control unit 651 communicates with the communication control unit 451 of the AI assistant server device 4 via the network 7. However, direct communication may be performed with the cloud service device 5 (or the AI assistant server device 4).

命令受信部６５２は、ＭＦＰ６で実行される読取命令等の各種命令を、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）から受信する。つまり、命令受信部６５２は、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）から読取命令等の各種命令を受信する受信手段の機能を担う。 The instruction receiving unit 652 receives various instructions such as a reading instruction executed by the MFP 6 from the cloud service device 5 (or the AI assistant server device 4). That is, the instruction receiving unit 652 functions as a receiving means for receiving various instructions such as reading instructions from the cloud service device 5 (or AI assistant server device 4).

判断部６５３は、命令受信部６５２が読取命令を受信した場合、読取命令に係る情報（画像読取装置名、画像読取装置の装置ＩＤ、ユーザ名及びユーザＩＤ、等）に基づいて、ＨＤＤ６４等の記憶部に記憶された各種情報の検索を行い、読取命令の実行対象となるファイルを特定し、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）に対して読取命令又は所定の処理要求を生成する。 When the command receiving unit 652 receives the reading command, the determination unit 653 determines that the HDD 64 or the like is based on the information related to the reading command (image reading device name, device ID of the image reading device, user name, user ID, etc.). It searches various information stored in the storage unit, identifies the file to be executed by the read instruction, and generates a read instruction or a predetermined processing request to the cloud service device 5 (or AI assistant server device 4). ..

読取実行部６５４は、命令受信部６５２で受信した読取命令に基づいて、ＭＦＰ６において読取処理を実行する。また、読取実行部６５４は、例えば、命令受信部６５２が読取命令を受信した場合、読取命令に含まれる上述の各種情報に基づいて、ＨＤＤ６０９等の記憶部に記憶された原稿の読取状況を更新する。一方、ＭＦＰ６が何らかの原因で原稿の読取処理ができない場合は、ＭＦＰ６からのステータス信号等を受信して、外部にエラーを通知してもよい。その際、エラー通知はＭＦＰ６から直接スマートスピーカ２に送信される。また、エラー通知の受信に伴い、原稿の読取りに係る取消要求を取得した場合は、読取実行部６５４は、条件に一致するＭＦＰ６の読取処理を記憶部から削除する。 The reading execution unit 654 executes a reading process in the MFP 6 based on the reading instruction received by the instruction receiving unit 652. Further, for example, when the command receiving unit 652 receives the reading command, the reading execution unit 654 updates the reading status of the document stored in the storage unit such as the HDD 609 based on the above-mentioned various information included in the reading command. do. On the other hand, if the MFP 6 cannot read the document for some reason, it may receive a status signal or the like from the MFP 6 and notify the error to the outside. At that time, the error notification is transmitted directly from the MFP 6 to the smart speaker 2. Further, when the cancellation request related to the reading of the original is acquired in response to the reception of the error notification, the reading execution unit 654 deletes the reading process of the MFP 6 that matches the conditions from the storage unit.

上述したように、読取実行部６５４は、スマートスピーカ２に対してユーザが行う音声操作によって与えられた、ＭＦＰ６に対する読取命令及び所定の処理の指示等の内容に基づく読取処理等を実行する読取制御手段の機能を担う。本実施形態では、読取実行部６５４はＭＦＰ６における読取処理を例に説明したが、実行される処理が読取処理に加えて外部装置へのファイル送信及びストレージへの保存等を行う画像読取装置の場合は、画像読取装置で受信したそれぞれのファイル及びデータを所定の出力要求に含まれる出力形式で出力（送信）するなどの出力処理が可能である。 As described above, the reading execution unit 654 executes the reading process or the like based on the contents of the reading command to the MFP 6 and the instruction of the predetermined processing given by the voice operation performed by the user to the smart speaker 2. Responsible for the function of means. In the present embodiment, the reading execution unit 654 has described the reading process in the MFP 6 as an example, but in the case of an image reading device that performs file transmission to an external device, storage in storage, etc. in addition to the reading process. Is capable of output processing such as outputting (transmitting) each file and data received by the image reader in an output format included in a predetermined output request.

通知部６５５は、ＭＦＰ６の状態をスマートスピーカ２に通知する。通知される内容は、例えば、当該装置の原稿の読取り及びその他の動作に係る情報、並びに当該装置の起動又はログイン等に関する情報である。なお、通知部６５５は、ユーザから与えられた原稿読取指示を受け付けた時点で、上述した各種情報をスマートスピーカ２に通知してもよい。一方で通知部６５５は、受信した読取命令に含まれる原稿の読取りの開始時刻になったら上述した各種情報をスマートスピーカ２に通知してもよい。また、読取命令に含まれる原稿の読取りの内容に重複があった場合、又は原稿の読取りの開始時刻の所定時間前（例えば、１０分前）に当該装置に故障等が発生した場合は、通知部６５５は、通信制御部６５１を介してスマートスピーカ２に対して、メール、画像配信等で読取処理に係る内容の重複及び故障等に関する通知を行ってもよい。 The notification unit 655 notifies the smart speaker 2 of the status of the MFP 6. The contents to be notified are, for example, information related to reading the manuscript of the device and other operations, and information related to activation or login of the device. The notification unit 655 may notify the smart speaker 2 of the various information described above when the document reading instruction given by the user is received. On the other hand, the notification unit 655 may notify the smart speaker 2 of the various information described above when the start time of reading the document included in the received reading command is reached. In addition, if there is a duplication in the reading content of the document included in the scanning command, or if a failure occurs in the device before a predetermined time (for example, 10 minutes) before the start time of scanning the document, a notification is sent. The unit 655 may notify the smart speaker 2 via the communication control unit 651 regarding duplication of contents related to the reading process, a failure, or the like by e-mail, image distribution, or the like.

記憶・読出処理部６５６は、ＨＤＤ６０９等の記憶部を制御して、各種データの読出し、書込みを行う。 The storage / reading processing unit 656 controls a storage unit such as the HDD 609 to read / write various data.

なお、本実施形態では、通信制御部６５１〜記憶・読出処理部６５６をソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。 In the present embodiment, the communication control unit 651 to the storage / reading processing unit 656 are realized by software, but some or all of them are realized by hardware such as an IC (Integrated Circuit). You may.

＜音声操作システムの処理の概要＞
本実施形態に係る音声操作システム１は、音声を集音して音声データを得るスマートスピーカ２と、原稿に対して少なくとも１回以上の画像の読取りを行うＭＦＰ６と、スマートスピーカ２が送信した音声データを受信し、受信した音声データから所定の読取条件に基づいて原稿を読み取るための読取命令に変換してＭＦＰ６に送信するクラウドサービス装置５（又はＡＩアシスタントサーバ装置４）を備える。クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）は、ユーザから続けて与えられたスマートスピーカ２への発話内容に基づく音声データが、直前に送信した読取命令を継続して実行可能な内容であるか否か、すなわち、継続して原稿の読取りを行う読取条件が存在するか否かを判断する。クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）は、継続して原稿の読取りを行う読取条件が存在すると判断した場合には、その読取条件を引き継いだ読取命令をＭＦＰ６に再送し、その読取命令を受信したＭＦＰ６は、再送された読取命令に基づいて原稿の読取りを継続して実行する構成となっている。上述した構成について、以下に詳述する。 <Overview of voice operation system processing>
The voice operation system 1 according to the present embodiment includes a smart speaker 2 that collects sound to obtain voice data, an MFP 6 that reads an image of a document at least once, and a voice transmitted by the smart speaker 2. A cloud service device 5 (or an AI assistant server device 4) that receives data, converts the received audio data into a reading command for reading a document based on a predetermined reading condition, and transmits the data to the MFP 6. In the cloud service device 5 (or AI assistant server device 4), the voice data based on the utterance content to the smart speaker 2 continuously given by the user can continuously execute the read command transmitted immediately before. Whether or not, that is, whether or not there is a reading condition for continuously reading the original. When the cloud service device 5 (or the AI assistant server device 4) determines that there is a reading condition for continuously reading the document, the cloud service device 5 (or the AI assistant server device 4) retransmits the reading command inheriting the reading condition to the MFP 6 and the reading command. The MFP 6 that has received the above is configured to continuously read the original based on the retransmitted reading command. The above configuration will be described in detail below.

〔実施形態の処理又は動作〕
＜第１の実施形態＞
図８ａ及び図８ｂは、第１の実施形態におけるユーザの発話に基づく読取処理の一例を示すシーケンス図である。第１の実施形態では、ＭＦＰ６に対する原稿読取要求をＡＩアシスタントサーバ装置４から送信するとともに、連続して原稿を読み取る場合の読取命令に変換する処理について説明する。具体的には、第１の実施形態では、ユーザの発話によってＡＩアシスタントサーバ装置４からＭＦＰ＿＃１に対して読取命令を送信し、ＭＦＰ６で原稿の読取り及び継続した読取処理を行う場合を例示する。以下にシーケンス図における各処理を示す。 [Processing or operation of the embodiment]
<First Embodiment>
8a and 8b are sequence diagrams showing an example of a reading process based on a user's utterance in the first embodiment. In the first embodiment, a process of transmitting a document reading request to the MFP 6 from the AI assistant server device 4 and converting it into a reading command when continuously reading the document will be described. Specifically, in the first embodiment, a case where a reading command is transmitted from the AI assistant server device 4 to the MFP_ # 1 by the user's utterance, and the MFP 6 performs the reading of the document and the continuous reading process is illustrated. .. Each process in the sequence diagram is shown below.

本実施形態における音声操作システム１では、ユーザが利用するＭＦＰ６の隣りに又は近接してスマートスピーカ２が配置されている状態を一例として説明する。この状態において、まずユーザは、例えば製本された原稿の所望のページを開いて動作可能な状態のＭＦＰ６の原稿台に乗せる。このとき、ユーザは製本原稿に手を添えて原稿台に押さえていてもよい。続いてユーザは、スマートスピーカ２に向かって「１０００ｄｐｉで田中さん宛にスキャン」と発話する。この発話に伴い、スマートスピーカ２の取得部２５２は、例えば、図３に示したマイクロホン部２９を使用してユーザから発話された「１０００ｄｐｉで田中さん宛にスキャン」という発話音声に基づく音声データを取得する（ステップＳ１０１）。 In the voice operation system 1 of the present embodiment, a state in which the smart speaker 2 is arranged next to or close to the MFP 6 used by the user will be described as an example. In this state, the user first opens a desired page of the bound document, for example, and places it on the platen of the MFP 6 in an operable state. At this time, the user may put his / her hand on the bound document and hold it on the platen. Then, the user speaks to the smart speaker 2 "scan to Mr. Tanaka at 1000 dpi". Along with this utterance, the acquisition unit 252 of the smart speaker 2 uses, for example, the microphone unit 29 shown in FIG. 3 to transmit voice data based on the utterance voice "scan to Mr. Tanaka at 1000 dpi" uttered by the user. Acquire (step S101).

なお、ユーザから発話された内容が単に「スキャンして」のように原稿の読取りに係る解像度、宛先等が含まれていない場合は、ＡＩアシスタントサーバ装置４の補完部４５５は、後述するフィードバック処理によって、スマートスピーカ２に対して「何ｄｐｉでスキャンしますか？」、「スキャンした原稿を誰宛に送信しますか？」等の問合せを音声で行うように制御する。この問合せには、各種パラメータを補完するための情報が含まれる。つまり、本実施形態における音声操作システム１は、ユーザが発話した一つの内容に対して一つの質問（１対１のフィードバック処理）が行われることを前提とする。但し、音声操作システム１は、ユーザが発話した複数の内容に対して一つの質問（多対１のフィードバック処理）、又は、ユーザが発話した一つの質問に対して多数の質問（１対多のフィードバック処理）が行われるように制御されてもよい。 If the content uttered by the user does not include the resolution, destination, etc. related to reading the document, as in the case of simply "scanning", the complementary unit 455 of the AI assistant server device 4 performs feedback processing described later. Controls the smart speaker 2 to make inquiries such as "How many dpi do you want to scan?" And "Who do you want to send the scanned document to?" By voice. This query contains information to complement the various parameters. That is, the voice operation system 1 in the present embodiment is premised on one question (one-to-one feedback processing) being asked for one content uttered by the user. However, the voice operation system 1 has one question (many-to-one feedback processing) for a plurality of contents uttered by the user, or a large number of questions (one-to-many) for one question uttered by the user. It may be controlled so that feedback processing) is performed.

なお、ＭＦＰ６に対して原稿のスキャンを実行させるための発話内容は上述した内容に限定されない。例えば、発話内容に原稿の読取りのための各種設定を示すスキャン設定を指定する内容が含まれていてもよい。 The content of the utterance for causing the MFP 6 to scan the document is not limited to the above-mentioned content. For example, the utterance content may include content that specifies scan settings indicating various settings for reading the original.

続いて、スマートスピーカ２の通信制御部２５１は、取得した音声データを音声認識サーバ装置３に送信する。この音声データを送信する送信タイミングは、第１のタイミングの一例である。また、第１のタイミングで送信される上述の音声データは、第１の音声データの一例である。このとき、通信制御部２５１は、当該スマートスピーカ２のデバイスＩＤもあわせて音声認識サーバ装置３に送信する（ステップＳ１０２）。 Subsequently, the communication control unit 251 of the smart speaker 2 transmits the acquired voice data to the voice recognition server device 3. The transmission timing for transmitting this voice data is an example of the first timing. Further, the above-mentioned voice data transmitted at the first timing is an example of the first voice data. At this time, the communication control unit 251 also transmits the device ID of the smart speaker 2 to the voice recognition server device 3 (step S102).

なお、デバイスＩＤは、ユーザに関連付けられたスマートスピーカ２を特定する情報の一例であり、ユーザ管理テーブル４０２ａに示したとおりである。通信制御部２５１は、例えば、デバイスＩＤに代えて又は加えて、スマートスピーカ２の位置情報、スマートスピーカ２を使用するユーザ個人を特定するユーザＩＤ、ユーザ名又はユーザの所属する組織等の個人を特定する情報を送信してもよい。 The device ID is an example of information for identifying the smart speaker 2 associated with the user, and is as shown in the user management table 402a. The communication control unit 251 uses, for example, in place of or in addition to the device ID, the position information of the smart speaker 2, the user ID that identifies the individual user who uses the smart speaker 2, the user name, or an individual such as an organization to which the user belongs. You may send specific information.

続いて、音声認識サーバ装置３の取得部３５２は、通信制御部３５１を介して、デバイスＩＤとあわせてスマートスピーカ２から送信された音声データを取得し、テキスト化する（ステップＳ１０３）。 Subsequently, the acquisition unit 352 of the voice recognition server device 3 acquires the voice data transmitted from the smart speaker 2 together with the device ID via the communication control unit 351 and converts it into text (step S103).

なお、取得部３５２は、スマートスピーカ２に備えられた音声データを得るための取得部２５２の機能を兼ね備えてもよい。その場合、取得部３５２は、例えば、マイクロホン部２９を介して集音されたユーザの指示音声を、スマートスピーカ２のデバイスＩＤ及び使用者のユーザＩＤとあわせて取得する機能を有する。つまり、上述したステップＳ１０１及びステップＳ１０２の機能を兼用する。このような取得部３５２は、スマートスピーカ２の取得部２５２と同様に、音声データ取得手段の一例として機能してもよい。つまり、音声認識サーバ装置３は、音声データ取得手段を備えるサーバ装置の一例として機能してもよい。 The acquisition unit 352 may also have the function of the acquisition unit 252 for acquiring the voice data provided in the smart speaker 2. In that case, the acquisition unit 352 has a function of acquiring, for example, the instruction voice of the user collected through the microphone unit 29 together with the device ID of the smart speaker 2 and the user ID of the user. That is, the functions of step S101 and step S102 described above are also used. Such an acquisition unit 352 may function as an example of the voice data acquisition means, similarly to the acquisition unit 252 of the smart speaker 2. That is, the voice recognition server device 3 may function as an example of a server device provided with voice data acquisition means.

続いて、テキスト化の具体例として、音声認識サーバ装置３のテキスト変換部３５３は、取得した音声データをテキスト化する。このテキスト化の処理は、例えば、「１０００ｄｐｉで田中さん宛にスキャン」という内容の音声操作に基づく情報をテキストデータに変換する処理を行う。 Subsequently, as a specific example of text conversion, the text conversion unit 353 of the voice recognition server device 3 converts the acquired voice data into text. In this text conversion process, for example, information based on a voice operation with the content of "scanning to Mr. Tanaka at 1000 dpi" is converted into text data.

続いて、操作音声変換プログラムは、ＡＩアシスタントサーバ装置４で実行される音声アシスタントプログラムに対して、辞書情報の要求を、通信制御部３５１を介してＡＩアシスタントサーバ装置４に送信する（ステップＳ１０４）。 Subsequently, the operation voice conversion program transmits a request for dictionary information to the AI assistant server device 4 via the communication control unit 351 to the voice assistant program executed by the AI assistant server device 4 (step S104). ..

ＡＩアシスタントサーバ装置４の取得部４５２は、音声認識サーバ装置３から、辞書情報の要求を通信制御部４５１を介して取得する。テキスト化された音声データを取得したＡＩアシスタントサーバ装置４は、取得した辞書情報の要求に応じて辞書情報を音声認識サーバ装置３で実行されている操作音声変換プログラムに提供する（ステップＳ１０５）。 The acquisition unit 452 of the AI assistant server device 4 acquires a request for dictionary information from the voice recognition server device 3 via the communication control unit 451. The AI assistant server device 4 that has acquired the textualized voice data provides the dictionary information to the operation voice conversion program executed by the voice recognition server device 3 in response to the request for the acquired dictionary information (step S105).

続いて、解釈部３５４は、テキスト化された音声データからインテントとパラメータを生成する（ステップＳ１０６）。ステップＳ１０６の処理の具体例として、解釈部３５４は、音声アシスタントプログラムから取得した辞書情報に基づいて、テキストデータに含まれる単語、及び、所定の意味を持つことば、等が辞書情報と一致しているか否かを判断する。つまり、テキスト解釈を行う。テキストデータに含まれる単語及び所定の意味を持つことばが辞書情報と一致している場合、解釈部３５４は、ユーザから指示された操作を示すインテント及び各種処理の実行条件等の変数を示すパラメータに変換する。なお上述した解釈部３５４の処理については、提供部３５６が行ってもよい。 Subsequently, the interpretation unit 354 generates an intent and a parameter from the textualized voice data (step S106). As a specific example of the process of step S106, the interpretation unit 354 matches the words included in the text data, the words having a predetermined meaning, and the like with the dictionary information based on the dictionary information acquired from the voice assistant program. Judge whether or not. That is, it interprets the text. When the word contained in the text data and the word having a predetermined meaning match the dictionary information, the interpretation unit 354 has a parameter indicating a variable such as an intent indicating an operation instructed by the user and execution conditions of various processes. Convert to. The processing of the interpretation unit 354 described above may be performed by the providing unit 356.

本実施形態では、インテントは、例えば、ＭＦＰ６に対して要求するジョブの種類を示す情報、すなわちＭＦＰ６に対して要求する読取処理の実行を示す情報である。また、パラメータは、例えば、ＡＩアシスタントサーバ装置４がＭＦＰ６に対して送信するジョブの設定などを示す情報、すなわち原稿の読取りに係る解像度及び読取処理におけるデータの送信先などの各種設定を示す情報である。変換されるインテントとパラメータは、例えば、「インテント：ＳＣＡＮＥＸＥＣＵＴＥ」（表１の「Ａｃｔｉｏｎ」に相当）である。パラメータについては、例えば「解像度：１０００ｄｐｉ」及び「宛先：田中」である。但し、上述した例に限定されず、パラメータ中にＭＦＰ６に対して送信する他の読取りに関する設定（読取りサイズ、カラー／モノクロ、等）の情報を含めてもよい。 In the present embodiment, the intent is, for example, information indicating the type of job requested for the MFP 6, that is, information indicating the execution of the reading process requested for the MFP 6. Further, the parameter is, for example, information indicating the setting of the job transmitted by the AI assistant server device 4 to the MFP 6, that is, information indicating various settings such as the resolution related to reading the original and the destination of data in the reading process. be. The intents and parameters to be converted are, for example, "intent: SCAN EXECUTE" (corresponding to "Action" in Table 1). The parameters are, for example, "resolution: 1000 dpi" and "destination: Tanaka". However, the present invention is not limited to the above-mentioned example, and information on other reading-related settings (reading size, color / monochrome, etc.) to be transmitted to the MFP 6 may be included in the parameters.

なお、他の装置へのemail送信を行う場合、又は、クラウドサービス装置５によって読取りの対象となる原稿が特定される場合等には、パラメータはemail送信先の装置のアドレス、外部のクラウド装置の装置ＩＤ、送信されるファイルのファイル名及びファイルの保存場所を示すネットワークアドレス等のファイルに係る情報であってもよい。 When sending an email to another device, or when the cloud service device 5 identifies the manuscript to be read, the parameters are the address of the device to which the email is sent, and the external cloud device. It may be information related to a file such as a device ID, a file name of a file to be transmitted, and a network address indicating a storage location of the file.

より具体的には、解釈部３５４は、ユーザが操作するＭＦＰ＿＃１において、原稿の読取りが実行される際に生成されるインテントに係る情報として「インテント：ＳＣＡＮＥＸＥＣＵＴＥ」を生成する。さらに、解釈部３５４は、ＭＦＰ＿＃１において実行される読取処理のパラメータに係る情報として、例えば、「画像読取装置名：ＭＦＰ＿＃１」を生成する。このように、解釈部３５４は、取得したテキストデータに基づいて、例えば、ユーザから与えられた原稿読取指示、所定の処理の種別（インテント）及び所定の処理に関連する内容（パラメータ）を示す解釈結果を生成する。 More specifically, the interpretation unit 354 generates "intent: SCAN EXECUTE" as information related to the intent generated when the reading of the document is executed in the MFP_ # 1 operated by the user. Further, the interpretation unit 354 generates, for example, "image reading device name: MFP_ # 1" as information related to the parameters of the reading process executed in the MFP_# 1. In this way, the interpretation unit 354 indicates, for example, a document reading instruction given by the user, a predetermined processing type (intent), and contents (parameters) related to the predetermined processing, based on the acquired text data. Generate an interpretation result.

続いて、解釈部３５４は、生成したインテント、パラメータ及びスマートスピーカ２のデバイスＩＤをＡＩアシスタントサーバ装置４で実行される管理プログラムに送信する（ステップＳ１０７）。 Subsequently, the interpretation unit 354 transmits the generated intent, parameters, and device ID of the smart speaker 2 to the management program executed by the AI assistant server device 4 (step S107).

＜情報の補完処理＞
次に、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）で実行される情報の補完処理の一例を説明する。 <Information complement processing>
Next, an example of information complement processing executed by the cloud service device 5 (or AI assistant server device 4) will be described.

まず、ＡＩアシスタントサーバ装置４の解釈結果変換部４５３は、取得部４５２で取得されたインテント、パラメータ及びスマートスピーカ２のデバイスＩＤ等に基づいて、ＭＦＰ＿＃１に対する読取命令を示すデータに変換する。このとき、インテントには原稿を読み画像読取装置名を表す「ＭＦＰ＿＃１」等が与えられる。以下、画像読取装置としてＭＦＰ＿＃１を例に説明するが、装置管理テーブル４０２ｂに例示したような画像読取装置であれば、その種類は問わない。さらに、パラメータについて、装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃに例示したような内容であれば、その種類は問わない。 First, the interpretation result conversion unit 453 of the AI assistant server device 4 converts the data indicating the reading command for the MFP_ # 1 based on the intent, the parameter, the device ID of the smart speaker 2, and the like acquired by the acquisition unit 452. .. At this time, the intent is given "MFP_ # 1" or the like that reads the original and represents the name of the image reading device. Hereinafter, MFP_ # 1 will be described as an example of the image reading device, but any type of image reading device as illustrated in the device management table 402b may be used. Further, the parameters may be of any type as long as they have the contents illustrated in the device management table 402b and the instruction management table 402c.

解釈結果変換部４５３によるデータ変換に伴い、検索部４６０は、ユーザ管理テーブル４０２ａに示すスマートスピーカ２のデバイスＩＤ使用者のユーザ名及び使用者のユーザＩＤ、並びに装置管理テーブル４０２ｂで管理される情報に基づいて、原稿を読み取るためのスキャナを備えるＭＦＰ＿＃１を特定する。なお、ＭＦＰ＿＃１を特定する場合、検索部４６０は、装置管理テーブル４０２ｂで記憶、管理されている各種情報に基づいて画像読取装置を特定する。つまり、音声取得装置のデバイスＩＤに基づいて画像読取装置の装置ＩＤを照合し、画像読取装置を特定する。しかし、スマートスピーカ２が何らかの理由でＭＦＰ＿＃１の近くから移動され、装置管理テーブル４０２ｂに記憶、管理されている各種情報と一致しなくなる場合も想定される。そのような場合は、検索部４６０は、ユーザ管理テーブル４０２ａで記憶、管理されているスマートスピーカ２のデバイスＩＤ及び使用者のユーザＩＤのうち少なくとも一つを取得した後、スマートスピーカ２及びＭＦＰ＿＃１のそれぞれの設置位置を示す位置情報等を取得して、互いの位置関係から装置管理テーブル４０２ｂの正当性をチェックするように機能してもよい。仮に、スマートスピーカ２及びＭＦＰ＿＃１のそれぞれの設置位置が所定のずれていると判断した場合は、検索部４６０は、スマートスピーカ２に対して、原稿の読取りが行われるＭＦＰがユーザの傍に存在しないことを音声で伝えるように、スマートスピーカ２に対してフィードバックしてもよい。 With the data conversion by the interpretation result conversion unit 453, the search unit 460 uses the device ID of the smart speaker 2 shown in the user management table 402a, the user name of the user, the user ID of the user, and the information managed by the device management table 402b. The MFP_ # 1 provided with the scanner for reading the document is specified based on the above. When specifying the MFP_ # 1, the search unit 460 specifies the image reading device based on various information stored and managed in the device management table 402b. That is, the device ID of the image reading device is collated based on the device ID of the voice acquisition device, and the image reading device is specified. However, it is assumed that the smart speaker 2 is moved from the vicinity of the MFP_ # 1 for some reason and does not match the various information stored and managed in the device management table 402b. In such a case, the search unit 460 acquires at least one of the device ID of the smart speaker 2 and the user ID of the user stored and managed in the user management table 402a, and then the smart speaker 2 and the MFP_ #. It may function to acquire the position information or the like indicating each installation position of 1 and check the validity of the apparatus management table 402b from the mutual positional relationship. If it is determined that the installation positions of the smart speaker 2 and the MFP_ # 1 are deviated from each other by a predetermined value, the search unit 460 determines that the MFP that reads the original is placed near the user for the smart speaker 2. Feedback may be given to the smart speaker 2 so as to convey by voice that it does not exist.

さらに補完部４５５は、ユーザの発話によって与えられた音声データに対して、紐づけ用ＤＢ４０２に記憶された装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃを参照して、ＭＦＰ＿＃１で実行される読取命令の変換（生成）に必要な情報を補完する（ステップＳ１０８）。しかし、この装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃを参照してもなお読取命令に係る必須パラメータの生成に必要な情報を補完できない場合は、補完部４５５は、スマートスピーカ２を介してユーザにフィードバックを行い、必須パラメータの生成に必要な情報の入力（取得）をユーザに促すよう制御してもよい。なお、補完に係る補完処理は補完部４５５が行い、補完部４５５は、補完手段に相当する。 Further, the complement unit 455 refers to the device management table 402b and the instruction management table 402c stored in the associating DB 402 with respect to the voice data given by the user's utterance, and the reading instruction executed by the MFP_ # 1. Complements the information required for conversion (generation) of (step S108). However, if the information necessary for generating the essential parameters related to the read instruction cannot be supplemented even with reference to the device management table 402b and the instruction management table 402c, the complement unit 455 provides feedback to the user via the smart speaker 2. May be controlled to prompt the user to input (acquire) the information necessary for generating the essential parameters. The complement processing related to the complement is performed by the complement unit 455, and the complement unit 455 corresponds to the complement means.

このとき、管理部４５９は、紐づけ用ＤＢ４０２に対して、デバイスＩＤ、ユーザＩＤ及び情報処理装置名（ＭＦＰ＿＃１等）を関連付けて、ユーザ管理テーブル４０２ａ及び装置管理テーブル４０２ｂとして登録することができる。 At this time, the management unit 459 may associate the device ID, the user ID, and the information processing device name (MFP_ # 1 or the like) with the linking DB 402 and register them as the user management table 402a and the device management table 402b. can.

図９は、第１の実施形態における情報の補完及び問合せ処理の一例を示すフローチャートである。 FIG. 9 is a flowchart showing an example of information complementation and inquiry processing in the first embodiment.

ＡＩアシスタントサーバ装置４の取得部４５２は、ステップＳ１０７の処理で音声認識サーバ装置３からインテント、パラメータ及びデバイスＩＤ等を取得する（ステップＳ１００１）。 The acquisition unit 452 of the AI assistant server device 4 acquires an intent, a parameter, a device ID, and the like from the voice recognition server device 3 in the process of step S107 (step S1001).

続いて、解釈結果変換部４５３は、取得したインテント、パラメータ及びデバイスＩＤ等のデータから、必須パラメータを充足するか否かを判断する（ステップＳ１００２）。この必須パラメータを充足するか否かを判断する方法として、例えば、解釈結果変換部４５３は、ユーザ名、ユーザＩＤ、原稿の読取りに必要となる情報等が取得したインテント、パラメータ及びデバイスＩＤ等のデータに含まれているか否かを確認する方法が挙げられる。上述の判断は、解釈結果変換部４５３が、例えば、紐づけ用ＤＢ４０２に記憶されたユーザ管理テーブル４０２ａ、装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃを参照することで実現される。 Subsequently, the interpretation result conversion unit 453 determines whether or not the essential parameters are satisfied from the acquired data such as the intent, the parameter, and the device ID (step S1002). As a method of determining whether or not this essential parameter is satisfied, for example, the interpretation result conversion unit 453 has an intent, a parameter, a device ID, etc. acquired by the user name, the user ID, the information necessary for reading the manuscript, and the like. There is a method of confirming whether or not it is included in the data of. The above determination is realized by the interpretation result conversion unit 453 referring to, for example, the user management table 402a, the device management table 402b, and the instruction management table 402c stored in the linking DB 402.

続いて、取得したインテント、パラメータ及びデバイスＩＤ等のデータから、必須パラメータを充足すると判断された場合（ステップＳ１００２でＹｅｓ）、解釈結果変換部４５３は、受信したインテント、パラメータ及びデバイスＩＤ等のデータからＭＦＰ＿＃１（ＭＦＰ６）に対する読取命令に変換してこのフローを抜ける（ステップＳ１００３）。 Subsequently, when it is determined from the acquired data such as the intent, the parameter, and the device ID that the essential parameters are satisfied (Yes in step S1002), the interpretation result conversion unit 453 receives the received intent, the parameter, the device ID, and the like. The data in the above is converted into a read instruction for MFP_ # 1 (MFP6), and this flow is exited (step S1003).

一方、取得したインテント、パラメータ及びデバイスＩＤ等のデータから、必須パラメータを充足しないと判断された場合（ステップＳ１００２でＮｏ）、解釈結果変換部４５３は、紐づけ用ＤＢ４０２で記憶、管理されている各種テーブル（ユーザ管理テーブル４０２ａ、装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃ）の情報で必須パラメータを充足可能か否か判断する（ステップＳ１００４）。 On the other hand, when it is determined from the acquired data such as the intent, the parameter, and the device ID that the required parameters are not satisfied (No in step S1002), the interpretation result conversion unit 453 is stored and managed by the linking DB 402. It is determined whether or not the essential parameters can be satisfied by the information of the various tables (user management table 402a, device management table 402b, and instruction management table 402c) (step S1004).

各種テーブルの情報で必須パラメータを充足すると判断された場合（ステップＳ１００４でＹｅｓ）、解釈結果変換部４５３は、補充した内容に基づいてＭＦＰ＿＃１に対する読取命令に変換してこのフローを抜ける（ステップＳ１００５）。 When it is determined that the essential parameters are satisfied by the information of various tables (Yes in step S1004), the interpretation result conversion unit 453 converts the information into a read instruction for MFP_ # 1 based on the supplemented contents and exits this flow (step). S1005).

一方、各種テーブルの情報で必須パラメータを充足しないと判断された場合（ステップＳ１００４でＮｏ）、解釈結果変換部４５３は、必須パラメータの問合せのために、ユーザに再度必要な情報を入力させるためのフィードバック処理を行い、このフローを抜ける（ステップＳ１００６）。以上が、解釈結果変換部４５３が実行する情報の補完処理の一例である。 On the other hand, when it is determined that the essential parameters are not satisfied by the information in the various tables (No in step S1004), the interpretation result conversion unit 453 asks the user to input the necessary information again for the inquiry of the essential parameters. Feedback processing is performed, and this flow is exited (step S1006). The above is an example of the information complement processing executed by the interpretation result conversion unit 453.

なお、上述したユーザＩＤを特定する方法として、例えば、以下の方法がある。スマートスピーカ２をあるユーザが使用する場合、そのスマートスピーカ２に向けて自分の名前をマイクロホン部２９に向けて発話する。この発話による名前の入力を受けて、音声認識サーバ装置３の取得部３５２は、入力された名前のテキスト化を行う。続いて、ＡＩアシスタントサーバ装置４の解釈結果変換部４５３は、上述したユーザ管理テーブル４０２ａで記憶、管理されているユーザ名を照合して、発話をした使用者のユーザＩＤを特定する。なお、名前の入力に代えてユーザのメールアドレス等を発話するようにしてもよい。さらに、スマートスピーカ２の撮像部（カメラ部）３０を使用して使用者の顔写真等を撮影し、その撮影画像とユーザＩＤとを照合するようにしてもよい。 As a method of specifying the user ID described above, for example, there are the following methods. When a user uses the smart speaker 2, he / she speaks his / her name toward the smart speaker 2 toward the microphone unit 29. In response to the input of the name by this utterance, the acquisition unit 352 of the voice recognition server device 3 converts the input name into text. Subsequently, the interpretation result conversion unit 453 of the AI assistant server device 4 collates the user names stored and managed in the user management table 402a described above, and identifies the user ID of the user who has spoken. Instead of inputting the name, the user's e-mail address or the like may be spoken. Further, the image pickup unit (camera unit) 30 of the smart speaker 2 may be used to take a photograph of the user's face or the like, and the photographed image may be collated with the user ID.

また、別の例として、スマートスピーカ２及びそのスマートスピーカ２を利用するユーザが変わり、新たなユーザＩＤ及びデバイスＩＤの組合せでＭＦＰ＿＃１における原稿の読取り等に関する情報が与えられたときは、管理部４５９は、それらの情報をＡＩアシスタントサーバ装置４のＨＤＤ４４等の記憶部に記憶、管理された命令管理テーブル４０２ｃのＭＦＰ＿＃１の項目に追加することで、命令管理テーブル４０２ｃを更新してもよい。 Further, as another example, when the smart speaker 2 and the user who uses the smart speaker 2 change, and information regarding reading of a document in MFP_ # 1 is given by a new combination of user ID and device ID, management is performed. Even if the instruction management table 402c is updated by the unit 459 by adding the information to the item of MFP_ # 1 of the instruction management table 402c stored and managed in the storage unit such as the HDD 44 of the AI assistant server device 4. good.

続いて、機器情報取得部４５７は、取得したインテントとパラメータに基づいてＭＦＰ＿＃１における原稿の読取りに必要な必須パラメータが充足しているか否かを判断する。この必須パラメータとは、例えば、受信したパラメータのうち読取りの対象となる原稿の読取りに係る属性情報を特定するための情報である。つまり、必須パラメータには、原稿の読取り後に生成されるファイルのファイル形式、原稿の読取り時の解像度、カラー／モノクロ設定、ファイル送信時の宛先等の任意の条件を設定することができる。 Subsequently, the device information acquisition unit 457 determines whether or not the essential parameters required for reading the original in the MFP_ # 1 are satisfied based on the acquired intents and parameters. This essential parameter is, for example, information for specifying the attribute information related to the reading of the document to be read among the received parameters. That is, arbitrary conditions such as the file format of the file generated after reading the original, the resolution at the time of reading the original, the color / monochrome setting, and the destination at the time of transmitting the file can be set in the essential parameters.

さらに、この必須パラメータは、上述したＭＦＰ＿＃１に関する命令管理テーブル４０２ｃとして、例えば、ＡＩアシスタントサーバ装置４のＨＤＤ４４等の記憶部に予め記憶させておき、適宜設定することもできる。さらに、この必須パラメータは、ユーザ及び画像読取装置の組合せ等にしたがって適宜必須パラメータと通常のパラメータを入れ替えることも可能である。つまり、ある条件では必須パラメータとして定義されたものでも、別の条件では通常のパラメータとして管理してもよい。 Further, this essential parameter can be stored in advance in a storage unit such as HDD 44 of the AI assistant server device 4 as the instruction management table 402c related to the above-mentioned MFP_ # 1, and can be appropriately set. Further, as for this essential parameter, it is possible to replace the essential parameter with the normal parameter as appropriate according to the combination of the user and the image reading device. That is, it may be defined as an essential parameter under certain conditions, or may be managed as a normal parameter under another condition.

上述した説明より、機器情報取得部４５７は、以下の特徴を有する。つまり、機器情報取得部４５７は、紐づけ用ＤＢ４０２に記憶された装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃを参照して、必須パラメータの生成に関連する情報を補完する。必須パラメータの生成に関連する情報としては、原稿の読取りにおける原稿（１ページの原稿、複数のページを含む製本原稿等）及び画像読取装置（ＭＦＰ＿＃１、等）を特定するための情報等が挙げられる。具体的には、例えば、「田中さん」、「１０００ｄｐｉ」等の情報である。しかし、パラメータの生成に関連する情報、装置管理テーブル４０２ｂ及び命令管理テーブル４０２ｃを参照してもなお必須パラメータの生成に必要な情報を補完できない場合は、機器情報取得部４５７は、必須パラメータを補完するための問合せとして、スマートスピーカ２に対して、不足しているパラメータの要求を送信する（ステップＳ１０９）。 From the above description, the device information acquisition unit 457 has the following features. That is, the device information acquisition unit 457 supplements the information related to the generation of the essential parameters by referring to the device management table 402b and the instruction management table 402c stored in the linking DB 402. Information related to the generation of essential parameters includes information for identifying a manuscript (one-page manuscript, bound manuscript containing multiple pages, etc.) and an image reading device (MFP_ # 1, etc.) in reading the manuscript. Can be mentioned. Specifically, for example, it is information such as "Mr. Tanaka" and "1000 dpi". However, if the information related to parameter generation, the device management table 402b, and the instruction management table 402c cannot be supplemented with the information necessary for generating the essential parameters, the device information acquisition unit 457 complements the essential parameters. As an inquiry for the operation, a request for the missing parameter is transmitted to the smart speaker 2 (step S109).

さらに、ステップＳ１０９で機器情報取得部４５７からパラメータの要求を受信したスマートスピーカ２の取得部２５２は、受信したパラメータの要求をフィードバック部２５３に転送する。フィードバック部２５３は、パラメータの要求に相当する情報を音声に変換して通信制御部２５１を介してユーザにフィードバックを行い、必須パラメータの生成に必要な情報の入力をユーザに促すよう制御する（ステップＳ１１０）。なお、ステップＳ１０９及びＳ１１０の各処理は、図９で上述したフローチャートのステップＳ１００６の処理に相当する。但し、図９でステップＳ１００６の処理の実行が不要と判断された場合は、上述したステップＳ１０９及びＳ１１０の処理は行われない（省略される）。 Further, the acquisition unit 252 of the smart speaker 2 that has received the parameter request from the device information acquisition unit 457 in step S109 transfers the received parameter request to the feedback unit 253. The feedback unit 253 converts the information corresponding to the parameter request into voice, feeds it back to the user via the communication control unit 251 and controls the user to input the information necessary for generating the essential parameters (step). S110). The processes of steps S109 and S110 correspond to the processes of step S1006 of the flowchart described above in FIG. However, when it is determined in FIG. 9 that the processing of step S1006 is unnecessary, the processing of steps S109 and S110 described above is not performed (omitted).

続いて、実行判定部４５４は、上述した補完処理に基づいて、必須パラメータの充足判断を行う。その際、実行判定部４５４は、補完処理によって補完された内容でもなお必須パラメータが充足していないと判断した場合は、必須パラメータを問い合わせるためのレスポンス情報を生成する。このレスポンス情報の生成に基づいて、通知部４５８は、生成されたレスポンス情報をスマートスピーカ２に対して送信し、スマートスピーカ２から出力される音声等によってユーザに周知する。 Subsequently, the execution determination unit 454 determines the satisfaction of the essential parameters based on the above-mentioned complement processing. At that time, if the execution determination unit 454 determines that the essential parameters are not satisfied even with the contents complemented by the complement processing, the execution determination unit 454 generates response information for inquiring the essential parameters. Based on the generation of the response information, the notification unit 458 transmits the generated response information to the smart speaker 2 and informs the user by the voice output from the smart speaker 2 or the like.

なお、実行判定部４５４は、必須パラメータが充足していない場合は、必須パラメータが充足するまでパラメータを指定するようなレスポンス情報を生成して、スマートスピーカ２を介してユーザに問合せを継続するように機能する。このようにして実行される情報の補完及び必須パラメータの問合せ処理については、解釈結果変換部４５３及び実行判定部４５４が互いに協働することによって、ユーザの発話によって与えられる音声データに関連する情報を補完するための取得制御部４６２として機能してもよい。 If the required parameters are not satisfied, the execution determination unit 454 generates response information that specifies the parameters until the required parameters are satisfied, and continues the inquiry to the user via the smart speaker 2. Works for. Regarding the complementation of the information executed in this way and the inquiry processing of the essential parameters, the interpretation result conversion unit 453 and the execution determination unit 454 cooperate with each other to obtain information related to the voice data given by the user's utterance. It may function as an acquisition control unit 462 for complementation.

また、必須パラメータは、ステップＳ１０２でスマートスピーカ２から取得したデバイスＩＤ及びユーザＩＤのうち少なくとも一つに基づいて変更されてもよい。また、必須パラメータには、ＭＦＰ＿＃１を使用する使用者の（使用者名（ユーザ名）、使用者のユーザＩＤ等）が含まれていることが好ましい。但し、ユーザが音声操作等によってＭＦＰ＿＃１の使用者名、使用者のユーザＩＤ等を設定しなかった場合、実行判定部４５４は、ステップＳ１０２でスマートスピーカ２から取得したデバイスＩＤ及びユーザＩＤのうち少なくとも一つによって使用者を特定できるか否かを判断する。例えば、ある１台のスマートスピーカ２は、一人のユーザによって占有される場合があり得る。そこで、実行判定部４５４は、スマートスピーカ２のデバイスＩＤ及び使用者のユーザＩＤに関連付けられたユーザが紐づけ用ＤＢ４０２に登録されているかを判断する。つまり、実行判定部４５４は、デバイスＩＤ及びユーザＩＤに基づいてユーザを検索し、ユーザを特定する機能を備える。 Further, the essential parameters may be changed based on at least one of the device ID and the user ID acquired from the smart speaker 2 in step S102. Further, it is preferable that the essential parameters include (user name (user name), user ID of the user, etc.) of the user who uses MFP_ # 1. However, if the user does not set the user name, user ID, etc. of the MFP_ # 1 by voice operation or the like, the execution determination unit 454 determines the device ID and user ID acquired from the smart speaker 2 in step S102. Determine if the user can be identified by at least one of them. For example, one smart speaker 2 may be occupied by one user. Therefore, the execution determination unit 454 determines whether the user associated with the device ID of the smart speaker 2 and the user ID of the user is registered in the association DB 402. That is, the execution determination unit 454 has a function of searching for a user based on the device ID and the user ID and identifying the user.

ここで、実行判定部４５４は、ユーザを特定できた場合には、特定したユーザをＭＦＰ＿＃１の使用者としてパラメータに設定することができる。一方、ユーザを特定できなかった場合には、実行判定部４５４は、スマートスピーカ２を介して使用者の情報を設定するようにユーザへ問い合わせてもよい。つまり、所定の処理要求（読取指示等）を示すデータを生成するために、実行判定部４５４は、通知部４５８及び通信制御部４５１を介してスマートスピーカ２と通信し、ユーザに対して補完情報の入力を依頼してもよい。 Here, when the execution determination unit 454 can identify the user, the specified user can be set in the parameter as the user of MFP_ # 1. On the other hand, when the user cannot be identified, the execution determination unit 454 may inquire the user to set the user information via the smart speaker 2. That is, in order to generate data indicating a predetermined processing request (reading instruction, etc.), the execution determination unit 454 communicates with the smart speaker 2 via the notification unit 458 and the communication control unit 451 to provide complementary information to the user. You may request the input of.

なお、パラメータにはＭＦＰ＿＃１の使用者に係る情報が含まれていてもよい。但し、ユーザが音声操作によって使用者に係る情報、すなわち、ユーザ名及びユーザＩＤ等を設定しない場合、実行判定部４５４は、ステップＳ１０２でスマートスピーカ２から取得したデバイスＩＤ及びユーザＩＤのうち少なくとも一つによってＭＦＰ＿＃１を特定できるか否かを判断する。 The parameter may include information related to the user of MFP_ # 1. However, when the user does not set the information related to the user by voice operation, that is, the user name, the user ID, and the like, the execution determination unit 454 performs at least one of the device ID and the user ID acquired from the smart speaker 2 in step S102. It is determined whether or not MFP_ # 1 can be specified by the speaker.

上述の判断に基づいて、検索部４６０は、原稿の読取りに用いられるＭＦＰ＿＃１を検索し、特定する。ここで、ＭＦＰ＿＃１を特定できた場合には、検索部４６０は、当該ＭＦＰ＿＃１を原稿の読取りに用いられる画像読取装置としてパラメータに設定する。一方、ＭＦＰ＿＃１を特定できなかった場合には、検索部４６０は、通知部４５８と協働してスマートスピーカ２を介してＭＦＰ＿＃１を設定するようにユーザへ問い合わせてもよい。 Based on the above determination, the search unit 460 searches for and identifies the MFP_ # 1 used for reading the manuscript. Here, when the MFP _ # 1 can be specified, the search unit 460 sets the MFP _ # 1 as an image reading device used for reading the original as a parameter. On the other hand, when the MFP_ # 1 cannot be specified, the search unit 460 may inquire the user to set the MFP_ # 1 via the smart speaker 2 in cooperation with the notification unit 458.

なお、ユーザが音声操作によってＭＦＰ＿＃１を設定した場合であっても、設定したＭＦＰ＿＃１と同一の名称を含む画像読取装置が複数存在する場合がある。そこで、実行判定部４５４は、音声操作によって設定されたＭＦＰ＿＃１の名称に加えて、デバイスＩＤ及びユーザＩＤのうち少なくとも一つによってＭＦＰ＿＃１を特定できるか否かを判断してもよい。つまり、実行判定部４５４は、デバイスＩＤ及びユーザＩＤに関連付けられたＭＦＰ＿＃１が紐づけ用ＤＢ４０２に登録されているかを判断する。これに続いて検索部４６０は、音声操作によって設定されたＭＦＰ＿＃１の名称に加えて、デバイスＩＤ及びユーザＩＤに基づいてＭＦＰ＿＃１を検索し、検索した結果から目的のＭＦＰ＿＃１を特定する。 Even when the user sets the MFP_ # 1 by voice operation, there may be a plurality of image reading devices including the same name as the set MFP_ # 1. Therefore, the execution determination unit 454 may determine whether or not the MFP_ # 1 can be specified by at least one of the device ID and the user ID in addition to the name of the MFP_ # 1 set by the voice operation. That is, the execution determination unit 454 determines whether the MFP_ # 1 associated with the device ID and the user ID is registered in the association DB 402. Following this, the search unit 460 searches for MFP_ # 1 based on the device ID and user ID in addition to the name of MFP_ # 1 set by voice operation, and identifies the target MFP_ # 1 from the search results. do.

ここで、本実施形態で使用される表１のテーブルデータとしてのＡｃｔｉｏｎ（アクション）及びＰａｒａｍｅｔｅｒ（パラメータ）について、表１に示した具体例を用いて説明する。なお、ＡＩアシスタントサーバ装置４の解釈結果変換部４５３は、音声認識サーバ装置３で解釈された解釈結果に基づいてＭＦＰ＿＃１における読取命令を示すデータに変換するために、例えば、以下に詳述する表１に示された情報をＡＩアシスタントサーバ装置４のＨＤＤ４４等の記憶部に記憶し、参照できる構成としてもよい。 Here, Action (action) and Parameter (parameter) as the table data of Table 1 used in the present embodiment will be described with reference to the specific examples shown in Table 1. The interpretation result conversion unit 453 of the AI assistant server device 4 is described in detail below, for example, in order to convert the data indicating the reading command in the MFP_ # 1 based on the interpretation result interpreted by the voice recognition server device 3. The information shown in Table 1 may be stored in a storage unit such as HDD 44 of the AI assistant server device 4 and can be referred to.

ＡＩアシスタントサーバ装置４は、ＨＤＤ４４等の記憶部に、表１に示す画像読取装置に対する読取命令を含むテーブルデータを記憶する。なお、ＡＩアシスタントサーバ装置４の解釈結果変換部４５３は、音声認識サーバ装置３で得られた解釈結果を読取命令に変換するために、表１に相当する情報をＭＦＰ６のＨＤＤ６０９等の記憶部に記憶し、参照できる構成としてもよい。 The AI assistant server device 4 stores table data including a reading command for the image reading device shown in Table 1 in a storage unit such as the HDD 44. The interpretation result conversion unit 453 of the AI assistant server device 4 stores the information corresponding to Table 1 in a storage unit such as HDD 609 of the MFP 6 in order to convert the interpretation result obtained by the voice recognition server device 3 into a read command. It may be configured so that it can be stored and referred to.

表１の例の場合、例えば、「ＳＣＡＮＥＸＥＣＵＴＥ」、「ＥＭＡＩＬＥＸＥＣＵＴＥ」、及び「ＳＴＯＲＥＥＸＥＣＵＴＥ」等が、アクション又はインテントの一例として示されている。また、「１０００ＤＰＩ」、「田中」及び「ＡＤＤＲＥＳＳ」が、パラメータの一例として示されている。なお、パラメータは、ＭＦＰ６への読取命令等に対する設定値として指定可能な全てのパラメータが含まれる。 In the case of the example of Table 1, for example, "SCAN EXECUTE", "EMAIL EXECUTE", "STORE EXECUTE" and the like are shown as examples of actions or intents. Further, "1000 DPI", "Tanaka" and "ADDRESS" are shown as examples of parameters. The parameters include all parameters that can be specified as setting values for the reading command to the MFP 6.

本実施形態では、例えば、解釈結果変換部４５３は、「ＳＣＡＮＥＸＥＣＵＴＥ」の解釈結果を、ＭＦＰ＿＃１に対する「原稿の読取りの実行」を示す命令に変換する。同様に、解釈結果変換部４５３は、「ＥＭＡＩＬＥＸＥＣＵＴＥ」の解釈結果を、ＭＦＰ＿＃１に対する「ｅｍａｉｌの送信」を示す命令に変換する。同様に、解釈結果変換部４５３は、「ＳＴＯＲＥＥＸＥＣＵＴＥ」の解釈結果を、ＭＦＰ＿＃１に対する「ストレージサービスへの保存」を示す命令に変換する。 In the present embodiment, for example, the interpretation result conversion unit 453 converts the interpretation result of "SCAN EXECUTE" into an instruction indicating "execution of reading a document" for MFP_ # 1. Similarly, the interpretation result conversion unit 453 converts the interpretation result of "EMAIL EXECUTE" into an instruction indicating "transmission of email" to MFP_ # 1. Similarly, the interpretation result conversion unit 453 converts the interpretation result of "STORE EXECUTE" into an instruction indicating "save in storage service" for MFP_ # 1.

すなわち、ＡＩアシスタントサーバ装置４の解釈結果変換部４５３は、解釈結果のアクション又はインテントに含まれる情報で、ＭＦＰ＿＃１に対する読取命令の種類を判断し、パラメータに含まれる値を読取命令に対する設定値と判断して、解釈結果を読取命令に変換する。 That is, the interpretation result conversion unit 453 of the AI assistant server device 4 determines the type of the read instruction for the MFP_ # 1 based on the information included in the action or intent of the interpretation result, and sets the value included in the parameter for the read instruction. Judging as a value, the interpretation result is converted into a reading instruction.

なお、実行判定部４５４は、は、ＨＤＤ４４等の記憶部に表１とは異なる所定の処理の実行命令を含むテーブルデータを記憶し、そのテーブルデータを用いて、解釈結果変換部４５３で解釈した解釈結果をスマートスピーカ２にフィードバックしてもよい。 In addition, the execution determination unit 454 stores table data including an execution instruction of a predetermined process different from Table 1 in a storage unit such as HDD 44, and interprets the table data by the interpretation result conversion unit 453. The interpretation result may be fed back to the smart speaker 2.

＜読取命令の変換処理＞
次に、ＡＩアシスタントサーバ装置４で実行される読取命令への変換について説明する。解釈結果変換部４５３は、ステップＳ１０８で補完された情報から、例えば、ＭＦＰ＿＃１で実行される読取命令に変換して、通信制御部４５１を介してＭＦＰ＿＃１に送信する（ステップＳ１１１）。このときの読取命令は、第一の読取要求の一例である。この場合、例えば、ユーザが発話等によって指示した「これスキャンして」、「これを田中さん宛にスキャンして」、「これを１０００ｄｐｉでスキャンして」等の指示内容に相当する読取命令が、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）から通信制御部４５１を介してＭＦＰ＿＃１に送信される。なお、通信制御部４５１は、通信手段の一例である。 <Read instruction conversion process>
Next, the conversion to the read instruction executed by the AI assistant server device 4 will be described. The interpretation result conversion unit 453 converts the information supplemented in step S108 into, for example, a reading instruction executed by the MFP _ # 1 and transmits the information to the MFP _ # 1 via the communication control unit 451 (step S111). The read instruction at this time is an example of the first read request. In this case, for example, a reading command corresponding to the instruction content such as "scan this", "scan this to Mr. Tanaka", or "scan this at 1000 dpi" instructed by the user by utterance or the like is issued. , Is transmitted from the cloud service device 5 (or AI assistant server device 4) to the MFP_ # 1 via the communication control unit 451. The communication control unit 451 is an example of communication means.

図１０は、第１の実施形態における読取命令の変換及び送信の一例を示すフローチャートである。 FIG. 10 is a flowchart showing an example of conversion and transmission of a reading instruction according to the first embodiment.

図１０では、解釈結果変換部４５３及び実行判定部４５４は、スマートスピーカ２で取得されたユーザの発話によって与えられる音声データ、スマートスピーカ２を識別するデバイスＩＤ、原稿の読取りに係る属性情報及びＭＦＰ６を識別する装置ＩＤを含む情報に基づいて、原稿の読取りを継続させるための読取条件が存在するか否かを判断し、その判断結果に応じて読取命令に変換する一連の処理を行う。 In FIG. 10, the interpretation result conversion unit 453 and the execution determination unit 454 are voice data given by the user's utterance acquired by the smart speaker 2, a device ID for identifying the smart speaker 2, attribute information related to reading the original, and an MFP 6 Based on the information including the device ID that identifies

まず、解釈結果変換部４５３は、充足された必須パラメータを取得して読取命令に変換する（ステップＳ１１０１）。例えば、発話にスキャンなどの読取を指示する内容が含まれていた場合は読取命令に変換する。更に、「次」や「続けて」など明示的に読取を指示する内容が発話中に含まれていない場合であっても、直前の指示が読取命令であった場合には、読取命令に変換することができる。例えば、実行判定部４５４は、スマートスピーカ２から取得した音声取得装置のデバイスＩＤに紐づく画像読取装置の装置ＩＤを装置管理テーブル４０２ｂから特定する。特定した装置ＩＤを含む命令が命令管理テーブル４０２ｃに含まれる一方で発話にジョブの種類が明示的に含まれていない場合は、実行判定部４５４は、読取命令であるものと判断することができる。 First, the interpretation result conversion unit 453 acquires the satisfied essential parameters and converts them into a reading instruction (step S1101). For example, if the utterance contains a content instructing reading such as scanning, it is converted into a reading command. Furthermore, even if the utterance does not include the content that explicitly instructs reading such as "next" or "continue", if the previous instruction is a reading instruction, it is converted into a reading instruction. can do. For example, the execution determination unit 454 specifies the device ID of the image reading device associated with the device ID of the voice acquisition device acquired from the smart speaker 2 from the device management table 402b. When the instruction including the specified device ID is included in the instruction management table 402c but the job type is not explicitly included in the utterance, the execution determination unit 454 can determine that the instruction is a read instruction. ..

続いて、実行判定部４５４は、変換された読取命令について１回目の原稿の読取りであるか否かを判断する。転送された読取命令に対して原稿の読取りが１回目であるか否かの判断は、例えば、命令管理テーブル４０２ｃにて記憶、管理されている連続処理フラグの値を確認することで行われる。具体的には、実行判定部４５４は、連続処理フラグの値が『０』か『１』かのいずれであるかを判断する。つまり、実行判定部４５４は、スマートスピーカ２から取得した音声取得装置のデバイスＩＤに紐づく画像読取装置の装置ＩＤを装置管理テーブル４０２ｂから特定する。続いて、実行判定部４５４は、特定した装置ＩＤを含む命令を命令管理テーブル４０２ｃから特定し、特定した命令に含まれる連続処理フラグの値を確認する。この処理においては、実行判定部４５４は、連続処理フラグの値が『０』であることを確認する。このとき、連続処理フラグの値が『０』であれば、実行判定部４５４は原稿の読取りが１回目であると判断して原稿の読取りに係る属性情報で原稿の読取りを行うための処理を行う。一方、連続処理フラグの値が『１』であれば、実行判定部４５４は、原稿の読取りが２回目以降であると判断する。（ステップＳ１１０２）。したがって、連続処理フラグの値は、原稿の読取りを継続させるための読取条件の一例ということになる。なお、連続処理フラグの値は、ある原稿の読取りが行われる最初の状態では初期値として『０』が与えられてもよい。このように、ユーザから与えられた音声による指示を読取命令に変換する際に、原稿の読取りを連続（継続）して実行することを示す連続処理フラグをパラメータとして含めてもよい。 Subsequently, the execution determination unit 454 determines whether or not the converted reading command is the first reading of the original. Whether or not the document is read for the first time with respect to the transferred reading command is determined, for example, by checking the value of the continuous processing flag stored and managed in the command management table 402c. Specifically, the execution determination unit 454 determines whether the value of the continuous processing flag is "0" or "1". That is, the execution determination unit 454 specifies the device ID of the image reading device associated with the device ID of the voice acquisition device acquired from the smart speaker 2 from the device management table 402b. Subsequently, the execution determination unit 454 specifies an instruction including the specified device ID from the instruction management table 402c, and confirms the value of the continuous processing flag included in the specified instruction. In this process, the execution determination unit 454 confirms that the value of the continuous process flag is "0". At this time, if the value of the continuous processing flag is "0", the execution determination unit 454 determines that the original is read for the first time, and performs a process for reading the original with the attribute information related to the reading of the original. conduct. On the other hand, if the value of the continuous processing flag is "1", the execution determination unit 454 determines that the document has been read for the second time or later. (Step S1102). Therefore, the value of the continuous processing flag is an example of the scanning conditions for continuing the scanning of the original. The value of the continuous processing flag may be given as an initial value of "0" in the initial state in which a certain document is read. In this way, when converting the voice instruction given by the user into a reading instruction, a continuous processing flag indicating that the original is continuously read (continuously) may be included as a parameter.

連続処理フラグの値を確認することで、１回目の原稿の読取りであると判断された場合（ステップＳ１１０２でＹｅｓ）、すなわち、連続処理フラグが『０』と確認された場合、実行判定部４５４は、上述した原稿の読取りに係る属性情報に基づいて１回目の原稿の読取処理を実行するための読取命令を、通信制御部４５１を介してＭＦＰ＿＃１に送信する（ステップＳ１１０３）。さらに、実行判定部４５４は、連続処理フラグの値を『０』から『１』に変更する。 When it is determined by checking the value of the continuous processing flag that the document has been read for the first time (Yes in step S1102), that is, when the continuous processing flag is confirmed as "0", the execution determination unit 454 Transmits a scanning command for executing the first document scanning process based on the attribute information related to document scanning described above to the MFP_ # 1 via the communication control unit 451 (step S1103). Further, the execution determination unit 454 changes the value of the continuous processing flag from "0" to "1".

一方、連続処理フラグの値を確認することで、１回目の原稿の読取りでないと判断された場合（ステップＳ１１０２でＮｏ）、すなわち、連続処理フラグが『１』と確認された場合、実行判定部４５４は、変換した読取命令がその原稿の読取りを終了させる内容であるか否かを判断する（ステップＳ１１０４）。このステップでは、実行判定部４５４は、例えば、読取命令に、「終了」、「以上」等の原稿読取を終了させる意味を持つことばを探す。この「終了」、「以上」等の発話を与えることは、周知の画像形成装置等におけるコピー、印刷、スキャン機能等を実行する際に操作部に配置もしくは表示される「＃」記号を押下して最終原稿、最終ページであることを示す処理に相当する。なお、前回の音声データの取得から所定時間内にその原稿に対する原稿読取りの要求がなされた場合に、実行判定部４５４は、その原稿に対して継続した原稿読取りの要求であると判断してもよい。 On the other hand, when it is determined by checking the value of the continuous processing flag that it is not the first reading of the original (No in step S1102), that is, when the continuous processing flag is confirmed as "1", the execution determination unit 454 determines whether or not the converted scanning command is the content that terminates the scanning of the original (step S1104). In this step, the execution determination unit 454 searches for, for example, a reading command having a meaning such as “end” or “or more” to end the reading of the document. To give utterances such as "end" and "or more", press the "#" symbol arranged or displayed on the operation unit when executing the copy, print, scan function, etc. in a well-known image forming apparatus or the like. Corresponds to the process of indicating that the final manuscript and the final page. If a request for reading a manuscript is made for the manuscript within a predetermined time from the acquisition of the previous audio data, the execution determination unit 454 may determine that the manuscript is a continuous request for reading the manuscript. good.

読取命令に、「終了」、「以上」等の原稿の読取りを終了させる意味を持つことばが含まれていない場合（ステップＳ１１０４でＮｏ）、実行判定部４５４は、直前に送信された読取命令をＭＦＰ＿＃１に再送してステップＳ１１０１に戻り、ユーザが原稿の読取りの終了を指示するまで繰り返す（ステップＳ１１０５）。このステップＳ１１０５において実行判定部４５４は、ユーザの指示が読取命令であった場合、つまり音声認識サーバ装置３が「次」、「続けて」のような音声データを受信した場合は、受信した音声データから生成されたパラメータに原稿の読取りに係る必須パラメータが含まれていない場合であっても、連続処理フラグが『１』であることを条件に原稿の読取りに係る必須パラメータが充足していると判断する。つまり、このステップＳ１１０５における処理では、実行判定部４５４は、直前に送信した読取命令とともに直前に送信した各種パラメータ等の読取条件もあわせてＭＦＰ＿＃１に再送する。このステップＳ１１０５から次のステップＳ１１０１までに実行される音声データの取得タイミングは、上述した第１のタイミングよりも後のタイミングである第２のタイミングの一例である。また、第２のタイミングで取得される音声データは、第２の音声データの一例である。さらに、第２の音声データのうち、「次」、「続けて」のように、直前に実行された読取命令を継続して実行可能な意味を持つ音声データが、所定の読取条件に基づく読取命令の実行を可能とする内容の一例となる。つまり、「次」、「続けて」のような音声データが、所定の読取条件を引き継いだ内容の一例でもある。 When the reading command does not include words such as "end" and "or more" that have the meaning of ending the reading of the original (No in step S1104), the execution determination unit 454 outputs the reading command transmitted immediately before. Retransmission to MFP_ # 1 returns to step S1101, and the process is repeated until the user instructs the end of reading the original (step S1105). In step S1105, the execution determination unit 454 receives voice data when the user's instruction is a read instruction, that is, when the voice recognition server device 3 receives voice data such as "next" and "continue". Even if the parameters generated from the data do not include the required parameters for reading the original, the required parameters for reading the original are satisfied provided that the continuous processing flag is "1". Judge. That is, in the process in step S1105, the execution determination unit 454 retransmits to the MFP_ # 1 together with the reading command transmitted immediately before and the reading conditions such as various parameters transmitted immediately before. The voice data acquisition timing executed from this step S1105 to the next step S1101 is an example of the second timing, which is a timing after the first timing described above. The voice data acquired at the second timing is an example of the second voice data. Further, among the second voice data, voice data having a meaning such as "next" and "continue" that can continuously execute the read instruction executed immediately before is read based on a predetermined reading condition. This is an example of the content that enables the execution of instructions. That is, the voice data such as "next" and "continue" is also an example of the content that inherits the predetermined reading conditions.

ただし、実行判定部４５４は、継続した原稿の読取である場合には各種パラメータは送信しなくてもよい。つまり、実行判定部４５４は、読取を実行する命令のみを送ってもよい。この場合、ＭＦＰ６は、それ以前に取得した各種パラメータに基づいて読取を実行する。 However, the execution determination unit 454 does not have to transmit various parameters in the case of continuous reading of the original. That is, the execution determination unit 454 may send only an instruction to execute reading. In this case, the MFP 6 executes reading based on various parameters acquired before that.

なお、実行判定部４５４は、前回のインテントの受信から所定時間内に同一のインテントを取得した場合に、その原稿に対する継続した原稿の読取りであると判断してもよい。さらに、解釈結果変換部４５３は、「次」、「続けて」など発話にジョブの種類を明示的に含まない場合にはインテントとして「ＪＯＢ＿ＥＸＥＣＵＴＥ」を解釈結果として生成することができる。この場合、実行判定部４５４は、「ＪＯＢ＿ＥＸＥＣＵＴＥ」等のジョブの種類を明示しないインテントを受信した場合にも、継続した原稿の読取りであると判断してもよい。 When the execution determination unit 454 acquires the same intent within a predetermined time from the reception of the previous intent, it may determine that the original is continuously read. Further, the interpretation result conversion unit 453 can generate "JOB_EXECUTE" as an interpretation result as an intent when the type of job is not explicitly included in the utterance such as "next" or "continue". In this case, the execution determination unit 454 may determine that the continuous reading of the document is performed even when the execution determination unit 454 receives an intent such as "JOB_EXECUTE" that does not specify the job type.

一方、読取命令に、「終了」、「以上」等の原稿の読取りを終了させる意味を持つことばが含まれている場合（ステップＳ１１０４でＹｅｓ）、実行判定部４５４は、原稿の読取りに係る終了要求を生成し、命令管理テーブル４０２ｃから対応する読取命令を削除するとともに連続処理フラグの値を『０』にしてこのフローを抜ける（ステップＳ１１０６）。ステップＳ１１０６の処理において、実行判定部４５４は、対応する読取命令を削除するだけでもよいし、連続処理フラグの値を『０』にしてから対応する読取命令を削除してもよい。また、実行判定部４５４は、所定時間以上ユーザから指示を受け付けなかった場合に終了要求を生成してもよい。なお、「終了」、「以上」等の原稿の読取りを終了させる意味を持つことばが含まれる読取命令は、第二の読取要求の一例である。 On the other hand, when the scanning command includes words such as "end" and "or more" that have the meaning of ending the reading of the document (Yes in step S1104), the execution determination unit 454 ends the reading of the document. A request is generated, the corresponding read instruction is deleted from the instruction management table 402c, the value of the continuous processing flag is set to "0", and this flow is exited (step S1106). In the process of step S1106, the execution determination unit 454 may only delete the corresponding read instruction, or may delete the corresponding read instruction after setting the value of the continuous processing flag to "0". Further, the execution determination unit 454 may generate an end request when the instruction is not received from the user for a predetermined time or more. A reading instruction including words having the meaning of ending the reading of the original such as "end" and "or more" is an example of the second reading request.

なお、本実施形態において、ユーザから発話される音声に原稿の読取りに無関係な意味を持つ内容が含まれていた場合、ＡＩアシスタントサーバ装置４の実行判定部４５４は、図１０のフローチャートを実行する前に、解釈結果変換部４５３と協働してユーザに対してフィードバック処理（図８ｂのステップＳ１０８）を実行するようにしてもよい。 In the present embodiment, when the voice uttered by the user contains content having a meaning irrelevant to the reading of the manuscript, the execution determination unit 454 of the AI assistant server device 4 executes the flowchart of FIG. Previously, the feedback process (step S108 in FIG. 8b) may be executed for the user in cooperation with the interpretation result conversion unit 453.

また、本実施形態では、同一原稿の読取りにおいて、連続処理フラグの値が『０』のときに原稿の読取りが１回目であることを示し、『１』のときに原稿の読取りが２回目以降であることを示したが、これに限らない。例えば、同一原稿の読取りにおいて、『１』のときに原稿の読取りが１回目であることを示し、『０』のときに原稿の読取りが２回目以降であることを示してもよい。さらに、連続処理フラグの値を『０』、『１』に限らず、他の値及び文字列、記号等を用いて判断するようにしてもよい。 Further, in the present embodiment, in reading the same document, when the value of the continuous processing flag is "0", it indicates that the document is read for the first time, and when it is "1", the document is read for the second time or later. However, it is not limited to this. For example, in reading the same document, a "1" may indicate that the document has been read for the first time, and a "0" may indicate that the document has been read for the second time or later. Further, the value of the continuous processing flag is not limited to "0" and "1", and may be determined by using other values, character strings, symbols, and the like.

さらに、その原稿に対して２回目以降の読取りが行われる場合、次の原稿は、ユーザが原稿のページを更新した状態（ページを捲った状態、原稿の方向を変えた状態、等）で原稿台に載置されていることを前提とする。このような前提において、実行判定部４５４が連続処理フラグの値を確認することで１回目の原稿の読取りでないと判断された場合（ステップＳ１１０２でＮｏ）、すなわち、連続処理フラグが『１』と確認された場合、実行判定部４５４は、ステップＳ１１０５で説明したように、直前に実行された原稿の読取りに係る属性情報を維持したまま、１回目の原稿の読取処理を実行するための読取命令と同じ命令を通信制御部４５１を介してＭＦＰ＿＃１に送信する。さらに、実行判定部４５４は、連続処理フラグの値を『１』に維持する。 Further, when the original is read from the second time onward, the next original is the original in a state where the user has updated the pages of the original (pages are turned, the direction of the original is changed, etc.). It is assumed that it is placed on the table. Under such a premise, when the execution determination unit 454 confirms the value of the continuous processing flag and determines that the original is not read for the first time (No in step S1102), that is, the continuous processing flag is set to "1". If confirmed, as described in step S1105, the execution determination unit 454 is a reading command for executing the first document scanning process while maintaining the attribute information related to the document scanning that was executed immediately before. The same command as is transmitted to MFP_ # 1 via the communication control unit 451. Further, the execution determination unit 454 maintains the value of the continuous processing flag at "1".

なお、連続処理フラグの値については、同一原稿の読取りにおいて、『０』のときに読取りの実行が１回目であることを示し、『１』のときに読取りの実行が２回目以降であることを示したが、これに限らない。例えば、同一原稿の読取りにおいて、『１』のときに読取りの実行が１回目であることを示し、『０』のときに読取りの実行が２回目以降であることを示してもよい。さらに、連続処理フラグの値を『０』、『１』に限らず、他の値及び文字列、記号等を用いて判断するようにしてもよい。 Regarding the value of the continuous processing flag, when reading the same document, "0" indicates that the reading is executed for the first time, and "1" indicates that the reading is executed for the second time or later. However, it is not limited to this. For example, in reading the same document, a "1" may indicate that the reading is executed for the first time, and a "0" may indicate that the reading is executed for the second time or later. Further, the value of the continuous processing flag is not limited to "0" and "1", and may be determined by using other values, character strings, symbols, and the like.

＜原稿の読取り及び継続処理＞
図８ｂのシーケンス図に戻り、ＭＦＰ＿＃１で実行される読取命令について説明する。ステップＳ１１１でＡＩアシスタントサーバ装置４から読取命令を受信したＭＦＰ＿＃１は、ネットワークＩ／Ｆ６５０で読取命令を受信し、ＣＰＵ６０１で命令の内容に対応する各種制御信号を生成してエンジン制御部６３０に送信する。エンジン制御部６３０に送信された各種制御信号は、読取実行部６５４の制御の下、スキャナ部６３１で原稿を読み取るための各種駆動系を制御して原稿を読み取る。このようにして、受信した読取命令に基づいて原稿の読取処理及びその原稿に対する継続読取りの処理を行う（ステップＳ１１２）。このステップＳ１１２において、原稿の読取りの終了指示若しくは命令を受け付けた場合、ＭＦＰ＿＃１の通信制御部６５１は、原稿の読取りによって生成したスキャンデータを自装置のＨＤＤ６０９等の記憶部に記憶、又は読取命令に含まれていた宛先に送信する。なお、読取命令に宛先が含まれていない場合は、ＭＦＰ＿＃１は、自装置の操作部がユーザからの操作を受け付けることで、スキャンデータの送信先を示す宛先の指定を受け付けてもよい。 <Manuscript reading and continuous processing>
Returning to the sequence diagram of FIG. 8b, the read instruction executed by MFP_ # 1 will be described. Upon receiving the read instruction from the AI assistant server device 4 in step S111, the MFP_ # 1 receives the read instruction on the network I / F650, generates various control signals corresponding to the contents of the instruction on the CPU 601 and causes the engine control unit 630 to generate various control signals. Send. The various control signals transmitted to the engine control unit 630 are read by the scanner unit 631 by controlling various drive systems for reading the document under the control of the scanning execution unit 654. In this way, the document scanning process and the continuous scanning process for the document are performed based on the received scanning command (step S112). When an instruction or command to end reading the original is received in step S112, the communication control unit 651 of the MFP_ # 1 stores or reads the scan data generated by reading the original in a storage unit such as HDD 609 of its own device. Send to the destination included in the instruction. If the read instruction does not include a destination, the MFP_ # 1 may accept the designation of the destination indicating the transmission destination of the scan data by accepting the operation from the user by the operation unit of the own device.

図１０のステップＳ１１０２〜Ｓ１１０６の処理で説明したように、ＡＩアシスタントサーバ装置４から送信された読取命令の内容に応じて、ＭＦＰ＿＃１の読取実行部６５４は、その原稿に対して、１回の原稿の読取りで終了する場合と２回以上継続して原稿の読取りを行う場合のそれぞれに対して原稿の読取処理を行う。 As described in the process of steps S1102 to S1106 of FIG. 10, the reading execution unit 654 of the MFP_ # 1 performs once with respect to the document according to the content of the reading instruction transmitted from the AI assistant server device 4. The document scanning process is performed for each of the case where the document is read in the above case and the case where the document is continuously read twice or more.

読取実行部６５４において所定の読取処理が終了したら、通知部６５５は、ＡＩアシスタントサーバ装置４に対して、終了要求に対する終了通知を送信する（ステップＳ１１３）。なお、読取処理に係る終了要求は、ＭＦＰ＿＃１の操作部がユーザの操作に応じて受け付けてもよいし、上述したように、ユーザがスマートスピーカ２に対して、例えば、「終了」と発話することによって終了させてもよい。この「終了」という発話内容によって、ＡＩアシスタントサーバ装置４の通信制御部４５１は、操作音声変換プログラムで生成された「ＳＣＡＮ＿ＥＮＤ」又は「ＪＯＢ＿ＥＮＤ」等の読取処理の終了を指示するインテントを取得し、読取命令に変換してＭＦＰ＿＃１に送信する。そして、ＭＦＰ＿＃１の読取実行部６５４は、読取命令を受信してその原稿に対する読取りを実行して生成した複数のスキャンデータを複数ページからなる一つのファイルとして生成し、記憶・読出処理部を介してＨＤＤ６０９等の記憶手段に記憶、保存させることができる。さらに、読取実行部６５４は、通信制御部６５１と協働して、生成した複数ページからなる一つのファイルを外部装置にemail送信等により送信することもできる。 When the predetermined reading process is completed in the reading execution unit 654, the notification unit 655 transmits the end notification for the end request to the AI assistant server device 4 (step S113). The end request related to the reading process may be accepted by the operation unit of MFP_ # 1 according to the operation of the user, and as described above, the user utters, for example, "end" to the smart speaker 2. It may be terminated by doing. Based on the utterance content of "end", the communication control unit 451 of the AI assistant server device 4 acquires an intent instructing the end of the reading process such as "SCAN_END" or "JOB_END" generated by the operation voice conversion program. , Converted to a read command and transmitted to MFP_ # 1. Then, the scanning execution unit 654 of the MFP_ # 1 receives a scanning command, executes scanning of the document, generates a plurality of scan data generated as one file consisting of a plurality of pages, and stores and reads the storage / reading processing unit. It can be stored and stored in a storage means such as HDD 609 via the device. Further, the reading execution unit 654 can cooperate with the communication control unit 651 to transmit one generated file composed of a plurality of pages to an external device by e-mail transmission or the like.

続いて、ＭＦＰ＿＃１から終了通知を受信したＡＩアシスタントサーバ装置４は、通信制御部４５１からスマートスピーカ２に対して継続命令の有無判断及び発話要求を送信する（ステップＳ１１４）。 Subsequently, the AI assistant server device 4 that has received the end notification from the MFP_# 1 transmits the presence / absence determination of the continuation command and the utterance request from the communication control unit 451 to the smart speaker 2 (step S114).

さらに、継続命令の有無判断及び発話要求を受信したスマートスピーカ２の取得部２５２及びフィードバック部２５３は、ＭＦＰ＿＃１を使用するユーザに対して音声によるフィードバックを行い、一連の処理を終了する（ステップＳ１１５）。 Further, the acquisition unit 252 and the feedback unit 253 of the smart speaker 2 that have received the continuation command presence / absence determination and the utterance request provide voice feedback to the user using the MFP_ # 1 and end a series of processes (step). S115).

なお、図１０に示したフローチャートは一例であって、実行判定部４５４により実行される処理は上述した例に限らない。例えば、本実施形態に係る情報処理システムの置かれた環境及びシステムの用途等に応じて、上述したフローチャートの内容を適宜変えてもよい。 The flowchart shown in FIG. 10 is an example, and the process executed by the execution determination unit 454 is not limited to the above-mentioned example. For example, the contents of the above-mentioned flowchart may be appropriately changed according to the environment in which the information processing system according to the present embodiment is placed, the use of the system, and the like.

第１の実施形態において上述したような構成を備えることで、ユーザは、原稿を継続して読み取りたい場合に、原稿を読み取るための発話音声を原稿のページを変えたり原稿の方向を変えたりする度にすべて発話することなく、簡略化した発話音声を与えるだけで継続した読取りを行えるようになる。 By providing the above-described configuration in the first embodiment, when the user wants to continuously read the document, the user changes the page of the document or the direction of the document with the utterance voice for reading the document. You will be able to continue reading simply by giving a simplified spoken voice without having to speak everything every time.

＜第２の実施形態＞
図１１ａ及び図１１ｂは、第２の実施形態におけるユーザの発話に基づく読取処理の一例を示すシーケンス図である。第１の実施形態との相違点は、読取命令をＭＦＰ６が受信した後、ＡＩアシスタントサーバ装置４から送信された読取命令に基づく原稿の読取りを継続させるための読取条件が存在するか否かをＭＦＰ６が判断し、原稿の読取りを継続させるための読取条件が存在する場合に、その読取条件を引き継いで原稿を読み取る処理を行う点である。具体的には、ユーザの発話によって与えられた原稿読取指示から変換された読取命令をＭＦＰ＿＃１が受信して自ら解釈、判断し、原稿の読取り及び継続した原稿の読取りを行う場合を例示する。以下にシーケンス図における各処理を示す。 <Second embodiment>
11a and 11b are sequence diagrams showing an example of a reading process based on a user's utterance in the second embodiment. The difference from the first embodiment is whether or not there is a reading condition for continuing reading of the document based on the reading command transmitted from the AI assistant server device 4 after the MFP 6 receives the reading command. When the MFP 6 determines and there is a reading condition for continuing the reading of the original, the processing of reading the original is performed by taking over the reading condition. Specifically, the case where the MFP_ # 1 receives the reading instruction converted from the document reading instruction given by the user's utterance, interprets and judges by itself, reads the document, and continuously reads the document is illustrated. .. Each process in the sequence diagram is shown below.

＜原稿の読取り及び継続処理＞
図１１ａ及び図１１ｂのシーケンス図において、ステップＳ１０１〜Ｓ１１０までは図８ａ及び図８ｂの場合と同様のため、詳細な説明を省略する。 <Manuscript reading and continuous processing>
In the sequence diagrams of FIGS. 11a and 11b, steps S101 to S110 are the same as those of FIGS. 8a and 8b, and thus detailed description thereof will be omitted.

第１の実施形態と同様に、ＡＩアシスタントサーバ装置４の解釈結果変換部４５３は、ステップＳ１０８で補完された情報から、例えば、ＭＦＰ＿＃１で実行される読取命令に変換して、通信制御部４５１を介してＭＦＰ＿＃１に送信する（ステップＳ２１１）。この場合、例えば、ユーザが発話等によって指示した「これスキャンして」、「これを田中さん宛にスキャンして」、「これを１０００ｄｐｉでスキャンして」等の指示内容に相当する読取命令が、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）から通信制御部４５１を介してＭＦＰ＿＃１に送信される。 Similar to the first embodiment, the interpretation result conversion unit 453 of the AI assistant server device 4 converts the information supplemented in step S108 into, for example, a read instruction executed by the MFP_ # 1, and the communication control unit. It is transmitted to MFP_ # 1 via 451 (step S211). In this case, for example, a reading command corresponding to the instruction content such as "scan this", "scan this to Mr. Tanaka", or "scan this at 1000 dpi" instructed by the user by utterance or the like is issued. , Is transmitted from the cloud service device 5 (or AI assistant server device 4) to the MFP_ # 1 via the communication control unit 451.

ステップＳ２１１でＡＩアシスタントサーバ装置４から読取命令を受信したＭＦＰ＿＃１は、受信した読取命令に基づいて原稿の読取処理及びその原稿に対する継続した読取りの処理を行う（ステップＳ２１２）。 Upon receiving the reading command from the AI assistant server device 4 in step S211th, the MFP_ # 1 performs a document scanning process and a continuous scanning process for the document based on the received scanning command (step S212).

第２の実施形態でも同様に、ＡＩアシスタントサーバ装置４から送信された読取命令の内容に応じて、ＭＦＰ＿＃１の読取実行部６５４は、その原稿に対して、１回の原稿の読取りで終了する場合と２回以上継続して原稿の読取りを行う場合のそれぞれに対して原稿の読取処理を行う。 Similarly in the second embodiment, the reading execution unit 654 of the MFP_ # 1 finishes reading the document once with respect to the document according to the content of the reading command transmitted from the AI assistant server device 4. The document reading process is performed for each of the case where the document is read and the case where the document is continuously read twice or more.

図１２は、第２の実施形態における読取命令の実行処理の一例を示すフローチャートである。図１２のフローチャートは、上述した図８ｂのステップＳ１１１でＡＩアシスタントサーバ装置４の解釈結果変換部４５３から通信制御部４５１を介して送信された読取命令が、ＭＦＰ＿＃１で原稿の読取処理として実行される例を示したものである。 FIG. 12 is a flowchart showing an example of the execution processing of the reading instruction in the second embodiment. In the flowchart of FIG. 12, the reading command transmitted from the interpretation result conversion unit 453 of the AI assistant server device 4 via the communication control unit 451 in step S111 of FIG. 8b described above is executed by the MFP_ # 1 as a document reading process. It shows an example of being done.

まず、ＭＦＰ＿＃１（ＭＦＰ６）の命令受信部６５２は、ＡＩアシスタントサーバ装置４から送信された読取命令を受信する（ステップＳ１２０１）。本実施形態では、命令受信部６５２は、読取命令受信手段の一例として機能する。 First, the instruction receiving unit 652 of the MFP_ # 1 (MFP6) receives the reading instruction transmitted from the AI assistant server device 4 (step S1201). In the present embodiment, the command receiving unit 652 functions as an example of the reading command receiving means.

続いて、ＭＦＰ＿＃１の判断部６５３は、命令受信部６５２から転送された読取命令に対して、原稿の読取りが１回目であるか否かを判断する。転送された読取命令に対して原稿の読取りが１回目であるか否かの判断は、例えば、ＭＦＰ６に記憶される命令管理テーブル４０２ｃで記憶、管理されている連続処理フラグの値を確認することで行われる。この処理においては、判断部６５３は、連続処理フラグの値が『０』であることを確認する。このとき、連続処理フラグの値が『０』であれば、判断部６５３は原稿の読取りが１回目であると判断して原稿の読取りに係る属性情報に基づいて原稿の読取りを実行する。一方、連続処理フラグの値が『１』であれば、判断部６５３は、原稿の読取りが２回目以降であると判断する（ステップＳ１２０２）。したがって、連続処理フラグの値は、原稿の読取りを継続させるための読取条件の一例ということになる。なお、連続処理フラグの値は、ある原稿の読取りが行われる最初の状態では初期値として『０』が与えられてもよい。このように、ユーザから与えられた音声による指示から変換された読取命令に、原稿の読取りを継続（連続）して実行することを示す連続処理フラグがパラメータとして含まれてもよい。 Subsequently, the determination unit 653 of the MFP_ # 1 determines whether or not the original is read for the first time in response to the reading command transferred from the instruction receiving unit 652. To determine whether or not the original is read for the first time with respect to the transferred reading instruction, for example, the value of the continuous processing flag stored and managed in the instruction management table 402c stored in the MFP 6 is confirmed. It is done in. In this process, the determination unit 653 confirms that the value of the continuous process flag is "0". At this time, if the value of the continuous processing flag is "0", the determination unit 653 determines that the original is read for the first time, and executes the reading of the original based on the attribute information related to the reading of the original. On the other hand, if the value of the continuous processing flag is "1", the determination unit 653 determines that the original has been read for the second time or later (step S1202). Therefore, the value of the continuous processing flag is an example of the scanning conditions for continuing the scanning of the original. The value of the continuous processing flag may be given as an initial value of "0" in the initial state in which a certain document is read. As described above, the reading instruction converted from the voice instruction given by the user may include a continuous processing flag indicating that the original is continuously read (continuously) as a parameter.

連続処理フラグの値を確認することで、１回目の原稿の読取りであると判断された場合（ステップＳ１２０２でＹｅｓ）、すなわち、連続処理フラグが『０』と確認された場合、判断部６５３は、上述した原稿の読取りに係る属性情報に基づいて１回目の原稿の読取処理を実行する（ステップＳ１２０３）。さらに、判断部６５３は、連続処理フラグの値を『０』から『１』に変更する。 By checking the value of the continuous processing flag, if it is determined that the document has been read for the first time (Yes in step S1202), that is, if the continuous processing flag is confirmed to be "0", the determination unit 653 , The first document scanning process is executed based on the attribute information related to the document scanning described above (step S1203). Further, the determination unit 653 changes the value of the continuous processing flag from "0" to "1".

一方、連続処理フラグの値を確認することで、１回目の原稿の読取りでないと判断された場合（ステップＳ１２０２でＮｏ）、すなわち、連続処理フラグが『１』と確認された場合、判断部６５３は、変換した読取命令がその原稿の読取りを終了させる内容であるか否かを判断する（ステップＳ１２０４）。このステップでは、判断部６５３は、例えば、読取命令に、「終了」、「以上」等の原稿読取を終了させる意味を持つことばを探す。この「終了」、「以上」等の発話を与えることは、周知の画像形成装置等におけるコピー、印刷、スキャン機能等を実行する際に操作部に配置もしくは表示される「＃」記号を押下して最終原稿、最終ページであることを示す処理に相当する。 On the other hand, when it is determined by checking the value of the continuous processing flag that the document is not read for the first time (No in step S1202), that is, when the continuous processing flag is confirmed to be "1", the determination unit 653 Determines whether or not the converted scanning command ends the scanning of the original (step S1204). In this step, the determination unit 653 searches for, for example, a reading command having a meaning such as "end" or "or more" to end the reading of the document. To give utterances such as "end" and "or more", press the "#" symbol arranged or displayed on the operation unit when executing the copy, print, scan function, etc. in a well-known image forming apparatus or the like. Corresponds to the process of indicating that the final manuscript and the final page.

読取命令に、「終了」、「以上」等の原稿の読取りを終了させる意味を持つことばが含まれていない場合（ステップＳ１２０４でＮｏ）、判断部６５３は、直前に実行された読取命令を再度ＭＦＰ＿＃１で実行してステップＳ１１０１に戻り、ユーザが原稿の読取りの終了を指示するまで繰り返す（ステップＳ１２０５）。このステップＳ１１０５において判断部６５３は、「次」、「続けて」のような音声データから取得したパラメータには必須パラメータが含まれていない場合であっても、連続処理フラグが『１』であることを条件に必須パラメータが充足していると判断する。なお、判断部６５３は、前回のインテントの受信から所定時間内に同一のインテントを取得した場合に、その原稿に対する継続した原稿の読取りであると判断してもよい。さらに、判断部６５３は、「ＪＯＢ＿ＥＸＥＣＵＴＥ」等の原稿の読取りであることを示すインテントを受信した場合にも、継続した原稿の読取りであると判断してもよい。 If the scanning instruction does not include words such as "end" and "or more" that have the meaning of ending the reading of the original (No in step S1204), the determination unit 653 re-executes the scanning instruction executed immediately before. It is executed by MFP_ # 1 to return to step S1101 and repeated until the user instructs the end of reading the original (step S1205). In step S1105, the determination unit 653 sets the continuous processing flag to "1" even when the parameters acquired from the voice data such as "next" and "continue" do not include the essential parameters. It is judged that the essential parameters are satisfied on the condition that. When the determination unit 653 acquires the same intent within a predetermined time from the reception of the previous intent, it may determine that the original is continuously read. Further, the determination unit 653 may determine that the original is continuously read even when the intent indicating that the original is being read, such as "JOB_EXECUTE", is received.

一方、読取命令に、「終了」、「以上」等の原稿の読取りを終了させる意味を持つことばが含まれている場合（ステップＳ１２０４でＹｅｓ）、読取実行部６５４は、直前に実行された読取命令を実行し、判断部６５３は、原稿の読取りに係る終了要求の生成及び命令管理テーブル４０２ｃから対応する読取命令を削除するとともに連続処理フラグの値を『０』にしてこのフローを抜ける（ステップＳ１２０６）。ステップＳ１２０６の処理において、判断部６５３は、対応する読取命令を削除するだけでもよいし、連続処理フラグの値を『０』にしてから対応する読取命令を削除してもよい。なお、「終了」、「以上」等の原稿の読取りを終了させる意味を持つことばが含まれる読取命令は、第二の読取要求の一例である。 On the other hand, when the scanning command includes words such as "end" and "or more" that have the meaning of ending the reading of the original (Yes in step S1204), the scanning execution unit 654 reads the immediately preceding execution. Upon executing the instruction, the determination unit 653 generates an end request related to reading the original and deletes the corresponding reading instruction from the instruction management table 402c, sets the value of the continuous processing flag to "0", and exits this flow (step). S1206). In the process of step S1206, the determination unit 653 may only delete the corresponding read instruction, or may delete the corresponding read instruction after setting the value of the continuous processing flag to "0". A reading instruction including words having the meaning of ending the reading of the original such as "end" and "or more" is an example of the second reading request.

なお、本実施形態でも、同一原稿の読取りにおいて、連続処理フラグの値と原稿の読取りが何回目であるかの関係に制約は設けない。例えば、同一原稿の読取りにおいて、『１』のときに原稿の読取りが１回目であることを示し、『０』のときに原稿の読取りが２回目以降であることを示してもよい。さらに、連続処理フラグの値を『０』、『１』に限らず、他の値、文字列、記号等を用いて判断するようにしてもよい。 Even in this embodiment, there is no restriction on the relationship between the value of the continuous processing flag and the number of times the document is read in reading the same document. For example, in reading the same document, a "1" may indicate that the document has been read for the first time, and a "0" may indicate that the document has been read for the second time or later. Further, the value of the continuous processing flag is not limited to "0" and "1", and may be determined by using other values, character strings, symbols, and the like.

なお、その原稿に対して２回目以降の読取りが行われる場合、次の原稿は、ユーザが原稿のページを更新した状態（ページを捲った状態）で原稿台に載置されていることを前提とする。このような前提において、判断部６５３が連続処理フラグの値を確認することで１回目の原稿の読取りでないと判断された場合（ステップＳ１２０２でＮｏ）、すなわち、連続処理フラグが『１』と確認された場合、読取実行部６５４は、ステップＳ１２０５で説明したように、直前に実行された原稿の読取りに係る属性情報を維持したまま、１回目の原稿の読取処理を実行するための読取命令と同じ命令を実行する。さらに、判断部６５３は、連続処理フラグの値を『１』に維持する。 When the original is read from the second time onward, it is assumed that the next original is placed on the platen with the page of the original updated by the user (page turned). And. Under such a premise, when the determination unit 653 confirms the value of the continuous processing flag and determines that the original is not read for the first time (No in step S1202), that is, the continuous processing flag is confirmed to be "1". If this is done, the scanning execution unit 654, as described in step S1205, provides a scanning command for executing the first document scanning process while maintaining the attribute information related to the document scanning that was executed immediately before. Execute the same instruction. Further, the determination unit 653 maintains the value of the continuous processing flag at "1".

上述したように、継続して読み取られる原稿は、ユーザが所望のページを更新した（開いた）状態で原稿台に載置されているものとして説明したが、原稿のページが更新されずに次の読取命令がＭＦＰ＿＃１で実行された場合の処理については、後ほど詳述する（重複した読取りに対する処理）。 As described above, the continuously read original is described as being placed on the platen with the desired page updated (opened) by the user, but the page of the original is not updated and the next page is not updated. The processing when the reading instruction of is executed by MFP_ # 1 will be described in detail later (processing for duplicate reading).

また、ＡＩアシスタントサーバ装置４から受信した読取命令に基づいて原稿の読取りを実行する場合、ＭＦＰ＿＃１は、自装置の操作部に原稿の読取りに係る設定及びスキャンデータの送付先を示す宛先を表示する画面を表示してもよい。また、あわせて、読み取った原稿に対する印刷条件の変更を受け付けてもよいし、ユーザの許可を受け付けたことを条件に、読み取った原稿の外部装置へのファイル送信を実行してもよい。 Further, when the document is read based on the reading command received from the AI assistant server device 4, the MFP_ # 1 sets the operation unit of the own device to indicate the setting related to the reading of the document and the destination of the scan data. The screen to be displayed may be displayed. At the same time, the change of the printing conditions for the scanned document may be accepted, or the file may be transmitted to the external device of the scanned document on condition that the permission of the user is accepted.

さらに、ＭＦＰ＿＃１は、その原稿から読み取られた各ページを１つのファイルとして生成し、生成したファイルを通信制御部６５１を介して外部装置に送信することもできる。この場合、通信制御部６５１は、ファイル送信手段としての機能を担う。なお、上述した命令受信部６５２は、ＡＩアシスタントサーバ装置４から読取命令を受信する読取命令受信手段として機能すると説明したが、通信制御部６５１が読取命令受信手段の機能を兼用してもよい。 Further, the MFP_ # 1 can also generate each page read from the manuscript as one file and transmit the generated file to the external device via the communication control unit 651. In this case, the communication control unit 651 functions as a file transmission means. Although the above-mentioned command receiving unit 652 has been described as functioning as a reading command receiving means for receiving a reading command from the AI assistant server device 4, the communication control unit 651 may also function as a reading command receiving means.

ここで図１１ｂのシーケンス図に戻るが、ステップＳ２１３〜Ｓ２１５までの処理は、図８ｂのステップＳ１１３〜Ｓ１１５と同様のため、詳細の説明は省略する。 Here, the sequence diagram of FIG. 11b is returned, but since the processing of steps S213 to S215 is the same as that of steps S113 to S115 of FIG. 8b, detailed description thereof will be omitted.

ここで、ＭＦＰ＿＃１は、自装置が有する操作部に、原稿の読取りに係る各設定情報、読取処理によって取得したファイル名、自装置内部ストレージへの記憶設定、外部装置へのファイル送信条件及び印刷を実行することを示す画面等を表示してもよい。このときに、ユーザの許可を受け付けたことを条件に読み取った原稿の印刷処理を実行してもよい。 Here, the MFP_ # 1 has the operation unit of the own device, each setting information related to the reading of the document, the file name acquired by the reading process, the storage setting in the internal storage of the own device, the file transmission condition to the external device, and the file transmission condition. A screen or the like indicating that printing may be executed may be displayed. At this time, the printing process of the scanned document may be executed on condition that the permission of the user is accepted.

なお、ＭＦＰ＿＃１は、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）から送信された読取命令に応じて、装置の電源及びネットワーク設定を起動し、原稿の読取りを開始するようにしてもよい。 The MFP_ # 1 may activate the power supply and network settings of the device in response to the reading command transmitted from the cloud service device 5 (or the AI assistant server device 4) to start reading the original. ..

また、本実施形態によれば、ＭＦＰ＿＃１は、操作部による操作を受け付けることなく直ちにジョブを実行することができる。これにより、ユーザは音声操作のみで連続した原稿の読取りを指示することができる。 Further, according to the present embodiment, the MFP_ # 1 can immediately execute the job without accepting the operation by the operation unit. As a result, the user can instruct the continuous reading of the original only by voice operation.

なお、図１２に示したフローチャートは一例であって、判断部６５３により実行される処理は上述した例に限らない。例えば、本実施形態に係る情報処理システムの置かれた環境及びシステムの用途等に応じて、上述したフローチャートの内容を適宜変えてもよい。 The flowchart shown in FIG. 12 is an example, and the process executed by the determination unit 653 is not limited to the above-mentioned example. For example, the contents of the above-mentioned flowchart may be appropriately changed according to the environment in which the information processing system according to the present embodiment is placed, the use of the system, and the like.

（継続した原稿の読取りとして判断される条件）
上述したように、本実施形態におけるＭＦＰ＿＃１では、その原稿に対して継続した原稿の読取りであると判断される場合として、以下が考えられる。一つは、「次」、「続けて」のように、ユーザの発話から与えられた音声データに含まれるパラメータが、継続して原稿の読取りを行う意味として解釈可能な内容である場合である。 (Conditions judged as continuous reading of manuscript)
As described above, in the MFP_ # 1 of the present embodiment, the following can be considered as a case where it is determined that the original is continuously read with respect to the original. One is a case where the parameters included in the voice data given from the user's utterance, such as "next" and "continue", have contents that can be interpreted as meaning to continuously read the manuscript. ..

もう一つは、前回の原稿の読取りに係る音声データの取得から所定時間内に、その原稿に対する原稿の読取りに係る音声データを取得した場合である。 The other is a case where the voice data related to the reading of the manuscript for the manuscript is acquired within a predetermined time from the acquisition of the voice data related to the previous reading of the manuscript.

さらにもう一つは、あるユーザがＭＦＰ＿＃１にログインをした後、命令受信部６５２で読取命令を受信（又は解釈結果変換部４５３で読取命令に変換）してからそのユーザがログインをした状態で命令受信部６５２が次の読取命令を受信（又は解釈結果変換部４５３で次の読取命令に変換）した場合、又は命令受信部６５２で読取命令を受信してから所定時間内に命令受信部６５２が次の読取命令を受信した場合の少なくとも一つの場合に、継続した原稿の読取りであると判断してもよい。 The other is a state in which a user logs in to MFP_ # 1, receives a read instruction by the instruction receiving unit 652 (or converts it into a reading instruction by the interpretation result conversion unit 453), and then the user logs in. When the command receiving unit 652 receives the next reading command (or is converted to the next reading command by the interpretation result conversion unit 453), or the command receiving unit 652 receives the reading command and the command receiving unit within a predetermined time. In at least one case when the 652 receives the next reading command, it may be determined that the reading of the original is continuous.

但し、継続した原稿の読取りとして判断される条件は一例にすぎず、本実施形態において発明の要旨を逸脱しない範囲であれば、継続した原稿の読取りとして判断される条件に特に制約は設けない。 However, the conditions for determining continuous reading of a manuscript are merely examples, and there are no particular restrictions on the conditions for determining continuous reading of a manuscript as long as they do not deviate from the gist of the invention in the present embodiment.

（重複した読取りに対する処理）
ＭＦＰ＿＃１は、原稿を読み取る度に、既存の文字認識技術を用いて、原稿の読取りが完了したページ番号を特定してもよい。ＭＦＰ＿＃１は、特定したページ番号をＡＩアシスタントサーバ装置４へ通知することができる。ＡＩアシスタントサーバ装置４は、管理ＤＢ４０１又は紐づけ用ＤＢ４０２等に通知された情報を、デバイスＩＤ、装置ＩＤ及びユーザＩＤ等と紐づけて記憶する。これにより、ＡＩアシスタントサーバ装置４は、スマートスピーカ２を介して原稿の読取り状況をユーザへ通知することができる。 (Processing for duplicate reads)
Each time the MFP_ # 1 reads the original, the MFP_ # 1 may specify the page number at which the reading of the original is completed by using the existing character recognition technique. The MFP_ # 1 can notify the AI assistant server device 4 of the specified page number. The AI assistant server device 4 stores the information notified to the management DB 401, the linking DB 402, etc. in association with the device ID, the device ID, the user ID, and the like. As a result, the AI assistant server device 4 can notify the user of the reading status of the document via the smart speaker 2.

例えば、同じページ番号を有する原稿が２回以上読み取られたと判断した場合、ＡＩアシスタントサーバ装置４の通知部４５８は、スマートスピーカ２を介してユーザに音声又は画面表示によって、重複して読取処理が行われたことを警告することができる。また、連続したページ番号を有する原稿が読み取られたにも関わらず所定のページ番号を有する原稿だけ読み取られていないと判断した場合、通知部４５８は、スマートスピーカ２を介してユーザに音声又は画面表示によって、所定ページの読取り処理が行われなかったことを警告することができる。 For example, when it is determined that a document having the same page number has been read twice or more, the notification unit 458 of the AI assistant server device 4 performs duplicate reading processing by voice or screen display to the user via the smart speaker 2. You can warn that it has been done. Further, when it is determined that only the document having a predetermined page number is not read even though the document having a continuous page number is read, the notification unit 458 informs the user via the smart speaker 2 by voice or screen. By the display, it is possible to warn that the reading process of the predetermined page has not been performed.

また、ユーザがスマートスピーカ２に対して原稿の読取りの状況を発話によって問い合わせた場合、ＡＩアシスタントサーバ装置４は、スマートスピーカ２を介してユーザに音声又は画面表示によって、原稿の読取りが完了したページ番号を通知することができる。この場合、例えば、操作音声変換プログラムの実行により機能する解釈部３５４は、「ＳＣＡＮ＿ＰＡＧＥＣＯＮＦＩＲＭ」などの原稿の読取りの状況を問い合わせるインテントを生成する。さらに、管理プログラムを実行することで機能する実行指示部４５６は、ＭＦＰ＿＃１に原稿の読取り済みのページ番号を問い合わせることによって、又はＭＦＰ＿＃１から通知されているページ番号に基づいて、原稿の読取りの状況を確認する。そして、通知部４５８は、操作音声変換プログラムを介して、スマートスピーカ２に対して原稿読取り済みのページ番号を通知することができる。 Further, when the user inquires the smart speaker 2 about the reading status of the document by utterance, the AI assistant server device 4 tells the user via the smart speaker 2 the page in which the reading of the document is completed by voice or screen display. You can notify the number. In this case, for example, the interpretation unit 354 that functions by executing the operation voice conversion program generates an intent that inquires about the reading status of the document such as "SCAN_PAGECONFIRM". Further, the execution instruction unit 456, which functions by executing the management program, asks the MFP_# 1 for the read page number of the manuscript, or based on the page number notified from the MFP_# 1, of the manuscript. Check the read status. Then, the notification unit 458 can notify the smart speaker 2 of the page number of which the document has been read via the operation voice conversion program.

本実施形態において、音声操作システム１は、例えば、過去のＭＦＰ＿＃１の原稿の読取りに係る履歴及び使用履歴、並びに図５、図６の紐づけ用ＤＢ４０２を構築する各管理テーブルの情報から、ユーザの発話等に伴う音声データついて、機械学習を利用して、ＭＦＰ＿＃１における読取処理に加えて関連する処理も自動的に実行するような構成を備えていてもよい。 In the present embodiment, the voice operation system 1 is based on, for example, the history and usage history of reading the original of the past MFP_ # 1, and the information of each management table for constructing the linking DB 402 of FIGS. 5 and 6. With respect to the voice data accompanying the user's utterance or the like, a configuration may be provided in which machine learning is used to automatically execute related processing in addition to the reading processing in MFP_ # 1.

第２の実施形態において上述したような構成を備えることで、ユーザは、原稿を継続して読み取りたい場合に、原稿を読み取るための発話音声を原稿のページを変えたり原稿の方向を変えたりする度にすべて発話することなく、簡略化した発話音声を与えるだけで継続した読取りを行えるようになる。 By providing the above-described configuration in the second embodiment, when the user wants to continuously read the document, the user changes the page of the document or the direction of the document with the utterance voice for reading the document. You will be able to continue reading simply by giving a simplified spoken voice without having to speak everything every time.

〔実施形態の効果〕
以上の説明から明らかなように、本実施形態に係る音声操作システム１は、スマートスピーカ２に、プラットフォームアプリケーションプログラムとなる操作音声処理プログラム等を含むプログラムをインストールし、このプラットフォームアプリケーションプログラムによるクラウドサービス装置５との通信を行う。ユーザがスマートスピーカ２に設けられているマイクロホン部２９に向かって音声操作を行うと、クラウドサービス装置５は、ユーザの発話内容を解析し、ユーザによって与えられた原稿読取指示及び所定の処理の実行指示に基づく各処理が行われるようにＭＦＰ６等の画像読取装置を操作する。 [Effect of Embodiment]
As is clear from the above description, the voice operation system 1 according to the present embodiment installs a program including an operation voice processing program, which is a platform application program, on the smart speaker 2, and a cloud service device based on this platform application program. Communicate with 5. When the user performs a voice operation toward the microphone unit 29 provided in the smart speaker 2, the cloud service device 5 analyzes the content of the user's utterance, executes a document reading instruction given by the user, and executes a predetermined process. An image reading device such as an MFP 6 is operated so that each process based on the instruction is performed.

このような構成により、簡略化した音声指示を与えるだけで複数の原稿の読取り処理を継続的に行うようにすることが可能になる。つまり、連続する複数のジョブを音声によって画像形成装置に実行させる場合、その都度ジョブを実行させるための操作を簡略化することが可能になる。 With such a configuration, it becomes possible to continuously read a plurality of documents simply by giving a simplified voice instruction. That is, when a plurality of consecutive jobs are executed by the image forming apparatus by voice, it is possible to simplify the operation for executing the jobs each time.

これによって、タッチパネル２７等のＧＵＩ（Graphical User Interface）による操作を不要とすることができる。このため、操作に慣れているユーザであっても、さらに迅速かつ簡単な入力操作を可能とすることができる。また、対話等による操作サポートによって、例えば、複雑なネットワーク設定、高度な処理の設定又は新規アプリの導入等が不要となる。その結果、高齢者又は機械操作に不慣れなユーザ等であっても、ユーザが希望する操作を迅速かつ簡単に実行可能とすることができ、利便性が向上する。さらに、原稿を読み取る際に原稿を手で押さえなければならない場合、操作部等への操作性が悪くなるといった場合が想定される。しかし、本実施形態に係る音声操作システム１によれば、読み取りをしたい原稿を原稿台に置いて必要最低限な発話をすれば継続的な原稿の読取りが実行されるため、操作性の向上が期待できる。 As a result, it is possible to eliminate the need for an operation by a GUI (Graphical User Interface) such as the touch panel 27. Therefore, even a user who is accustomed to the operation can perform a quicker and simpler input operation. In addition, operation support through dialogue or the like eliminates the need for, for example, complicated network settings, advanced processing settings, or the introduction of new applications. As a result, even an elderly person or a user who is unfamiliar with machine operation can quickly and easily perform the operation desired by the user, and the convenience is improved. Further, when the original must be held by hand when reading the original, it is assumed that the operability to the operation unit or the like is deteriorated. However, according to the voice operation system 1 according to the present embodiment, if the document to be read is placed on the platen and the minimum necessary utterance is made, the document is continuously read, so that the operability is improved. You can expect it.

また、本実施形態によれば、ユーザの発話内容から得られたテキストデータに基づくユーザの意図の解析を、クラウドサービス装置５（又はＡＩアシスタントサーバ装置４）側で判断して処理することも可能となる。 Further, according to the present embodiment, it is also possible for the cloud service device 5 (or AI assistant server device 4) to determine and process the analysis of the user's intention based on the text data obtained from the user's utterance content. It becomes.

なお、画像読取装置は、通信機能を備え繰返しの処理が可能な装置であれば画像形成装置（ＭＦＰ）に限られない。つまり、画像読取装置は、例えば、ＰＪ（Projector：プロジェクタ）、ＩＷＢ（Interactive White Board：相互通信が可能な電子式の黒板機能を有する白板）、デジタルサイネージ等の出力装置、ＨＵＤ（Head Up Display）装置、産業機械、撮像装置、集音装置、医療機器、ネットワーク家電、自動車（Connected Car）、ノートＰＣ（Personal Computer）、携帯電話、スマートフォン、タブレット端末、ゲーム機、ＰＤＡ（Personal Digital Assistant）、デジタルカメラ、ウェアラブルＰＣまたはデスクトップＰＣ等であってもよい。 The image reading device is not limited to an image forming device (MFP) as long as it has a communication function and can perform repetitive processing. That is, the image reading device is, for example, an output device such as a PJ (Projector: projector), an IWB (Interactive White Board: a white board having an electronic whiteboard function capable of intercommunication), a digital signage, or a HUD (Head Up Display). Devices, industrial machines, imaging devices, sound collectors, medical devices, network home appliances, automobiles (Connected Cars), notebook PCs (Personal Computers), mobile phones, smartphones, tablet terminals, game machines, PDA (Personal Digital Assistant), digital It may be a camera, a wearable PC, a desktop PC, or the like.

上述した実施形態の各機能は、一又は複数の処理回路によって実現することが可能である。ここで、本明細書における「処理回路」とは、電子回路により実装されるプロセッサのようにソフトウェアによって各機能を実行するようプログラミングされたプロセッサ、上述した各機能を実行するよう設計されたＡＳＩＣ（Application Specific Integrated Circuit）、ＤＳＰ（Digital Signal Processor）、ＦＰＧＡ（Field Programmable Gate Array）及び従来の回路モジュール等のデバイスを含むものとする。 Each function of the above-described embodiment can be realized by one or more processing circuits. Here, the "processing circuit" in the present specification is a processor programmed to execute each function by software like a processor implemented by an electronic circuit, or an ASIC designed to execute each function described above. It shall include devices such as Application Specific Integrated Circuits), DSPs (Digital Signal Processors), FPGAs (Field Programmable Gate Arrays) and conventional circuit modules.

また、音声取得装置は、マイク機能、撮像機能、スピーカ機能、表示機能、操作機能及び通信機能等を備えた装置であれば、スマートスピーカに限られない。音声取得装置は、例えば、ノートＰＣ（Personal Computer）、携帯電話、スマートフォン、タブレット端末、ゲーム機、ＰＤＡ（Personal Digital Assistant）、デジタルカメラ、ウェアラブルＰＣ、デスクトップＰＣ又はイヤホン型の送受信装置であってもよい。このイヤホン型の送受信装置とは、例えば、ユーザの耳に装着された状態で発話したユーザ自身の音声を受信（取得）し、受信した音声を音声データに変換して所定のサーバ装置に送信し、所定のサーバ装置からフィードバック結果等を受信（取得）する機能を備えた通信装置をいう。 Further, the voice acquisition device is not limited to a smart speaker as long as it is a device having a microphone function, an image pickup function, a speaker function, a display function, an operation function, a communication function, and the like. The voice acquisition device may be, for example, a notebook PC (Personal Computer), a mobile phone, a smartphone, a tablet terminal, a game machine, a PDA (Personal Digital Assistant), a digital camera, a wearable PC, a desktop PC, or an earphone type transmitter / receiver. good. The earphone-type transmitter / receiver receives (acquires) the user's own voice spoken while being worn on the user's ear, converts the received voice into voice data, and transmits the received voice to a predetermined server device. , A communication device having a function of receiving (acquiring) feedback results and the like from a predetermined server device.

同様に、画像読取装置は、上述したＭＦＰ以外に、ネットワークを介してサーバ装置及び音声取得装置と通信可能で、製本された状態のブック原稿等の原稿を読取り可能な装置であればその種類を問わない。例えば、画像読取装置は、単体スキャナ等の電子機器であってもよい。 Similarly, in addition to the above-mentioned MFP, the image reader can be used as long as it can communicate with the server device and the voice acquisition device via a network and can read a document such as a bound book document. It doesn't matter. For example, the image reading device may be an electronic device such as a single scanner.

最後に、上述の実施形態は、一例として提示したものであり、本発明の範囲を限定することは意図していない。この新規な各実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置換え、変更を行うことも可能である。例えば、上述の第１の実施形態の説明では、音声認識サーバ装置３がユーザにより与えられた発話等に対応するテキストデータを生成し、生成したテキストデータに基づいて、ＡＩアシスタントサーバ装置４がユーザの意図している操作を解釈した。しかし、音声取得装置側に、このような音声認識機能及び解釈機能を設け、スマートスピーカ２で、ユーザの発話から意図する操作を解釈してもよい。これにより、音声認識サーバ装置３及びＡＩアシスタントサーバ装置４を不要とすることができ、システム構成を簡素化することができる。 Finally, the above embodiments are presented as an example and are not intended to limit the scope of the invention. Each of the novel embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. For example, in the above description of the first embodiment, the voice recognition server device 3 generates text data corresponding to the utterance or the like given by the user, and the AI assistant server device 4 is the user based on the generated text data. Interpreted the intended operation of. However, such a voice recognition function and an interpretation function may be provided on the voice acquisition device side, and the smart speaker 2 may interpret the intended operation from the user's utterance. As a result, the voice recognition server device 3 and the AI assistant server device 4 can be eliminated, and the system configuration can be simplified.

このような各実施形態及び各実施形態の変形は、発明の範囲及び要旨に含まれると共に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Such embodiments and modifications of the embodiments are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalent scope thereof.

１情報処理システム
２スマートスピーカ（音声取得装置の一例）
３音声認識サーバ装置
４ＡＩアシスタントサーバ装置（情報処理装置の一例）
６ＭＦＰ（画像読取装置の一例）
２５２取得部（音声データ取得手段の一例）
４５１通信制御部（通信手段の一例）
４５３解釈結果変換部（読取命令変換手段の一例）
４５５補完部（補完手段の一例）
６５１通信制御部（ファイル送信手段の一例）
６５２命令受信部（読取命令受信手段の一例）
６５４読取実行部（読取制御手段の一例） 1 Information processing system 2 Smart speaker (an example of voice acquisition device)
3 Voice recognition server device 4 AI assistant server device (an example of information processing device)
6 MFP (an example of an image reader)
252 Acquisition unit (an example of voice data acquisition means)
451 Communication control unit (an example of communication means)
453 Interpretation result conversion unit (example of reading command conversion means)
455 Complementary part (an example of complementary means)
651 Communication control unit (example of file transmission means)
652 Command receiving unit (Example of reading command receiving means)
654 Read execution unit (an example of read control means)

特開２０１４−２０３０２４号公報Japanese Unexamined Patent Publication No. 2014-203024

Claims

A voice acquisition device that collects voice and obtains voice data,
An image reader that reads an image at least once on a document,
The first voice data is received at the first timing transmitted by the voice acquisition device, and the first voice data is converted into a reading command for reading the document based on a predetermined reading condition, and the predetermined An information processing device that transmits the reading command to the image reading device based on the reading conditions of
It is an information processing system equipped with
The information processing device
When the second voice data received at the second timing after the first timing is the content that enables the execution of the reading command based on the first voice data to be continued, the reading command An information processing system characterized in that the data is retransmitted to the image reader.

The image reader is
The result obtained by executing the reading command transmitted from the information processing device is generated as one file, and the file is stored in the image reading device or transmitted to an external device. The information processing system according to claim 1.

An information processing device that is connected to a voice acquisition device that collects voice and obtains voice data.
A receiving means for receiving the first voice data transmitted by the voice acquisition device at the first timing, and
A conversion means for converting the first voice data into a reading command for reading a document based on a predetermined reading condition, and
A transmission means for transmitting the reading command to an image reading device that executes the reading command, and
Have,
The conversion means
When the second voice data received at the second timing after the first timing has the content that enables the execution of the reading command based on the first voice data to be continued, the reading command Convert to
The information processing device is characterized in that the transmitting means retransmits the converted reading command to the image reading device.

The information processing apparatus according to claim 3, further comprising a complementary means for supplementing information related to the reading instruction when converting to the reading instruction.

The information processing device according to claim 4, wherein the complementing means prompts the voice acquisition device to acquire voice for complementing information related to the reading command.

A voice acquisition device that collects voice and obtains voice data,
An image reader that reads the original at least once, and
The first voice data is received at the first timing transmitted by the voice acquisition device, and the first voice data is converted into a reading command for reading the document based on a predetermined reading condition, and the predetermined An information processing device that transmits the reading command to the image reading device based on the reading conditions of
It is an information processing method executed by an information processing system equipped with
The steps performed by the information processing device are
When the second voice data received at the second timing after the first timing has the content that enables the execution of the reading command based on the first voice data to be continued, the reading command An information processing method comprising the step of retransmitting the image to the image reading device.

A program comprising causing a computer to execute the information processing method according to claim 6.