JP7447458B2

JP7447458B2 - Control device, control system and control program

Info

Publication number: JP7447458B2
Application number: JP2019225082A
Authority: JP
Inventors: 宏樹田島
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2024-03-12
Anticipated expiration: 2039-12-13
Also published as: JP2021096493A

Description

本発明は、ユーザーによって入力される音声に応じて画像処理装置を制御する制御装置、制御システム及び制御プログラムに関する。 The present invention relates to a control device, a control system, and a control program that control an image processing device according to audio input by a user.

従来、ＭＦＰ（Multifunction Peripherals）などの画像処理装置において、ユーザーの視覚を利用しないで操作可能とする技術が提案されている（例えば、特許文献１）。この従来の画像処理装置は、音声認識部を備えており、ユーザーによって入力される音声に基づいて各種設定項目に対する設定を行うように構成されている。また、この従来の画像処理装置は、音声認識部による音声認識結果を復唱音声として音声出力部から出力するように構成されている。そのため、ユーザーは、画像処理装置から出力される復唱音声を聞くことにより、自身で発した音声が画像処理装置において正しく認識されたかどうかを把握することができる。 2. Description of the Related Art Conventionally, a technique has been proposed that allows an image processing apparatus such as an MFP (Multifunction Peripherals) to be operated without using the user's vision (for example, Patent Document 1). This conventional image processing device includes a voice recognition section and is configured to perform settings for various setting items based on voice input by a user. Further, this conventional image processing device is configured to output the voice recognition result by the voice recognition section as a repeating voice from the voice output section. Therefore, by listening to the repeated voice output from the image processing device, the user can understand whether the voice he or she has uttered has been correctly recognized by the image processing device.

特開２００６－２３５０４０号公報Japanese Patent Application Publication No. 2006-235040

上記従来の画像処理装置は、ユーザーによって発せられた音声に対応する１つの設定項目を特定することができると、復唱音声を出力するようにしている。 The conventional image processing apparatus described above outputs a repeating sound when one setting item corresponding to the sound uttered by the user can be specified.

しかしながら、この種の画像処理装置には、様々な設定項目が存在し、ユーザーの音声に対応する１つの設定項目を必ず特定できるとは限らず、ユーザーの音声に対応する設定として複数の設定候補が抽出されることもある。例えば、ユーザーが「両面」と発声した場合、コピー機能において読み取り原稿の両面設定と複写出力する際の両面設定とが存在すると共に、スキャン機能においても読み取り原稿の両面設定が存在し、それら複数の両面設定が候補として抽出されることになる。 However, this type of image processing device has various setting items, and it is not always possible to identify one setting item that corresponds to the user's voice, and there are multiple setting candidates as settings that correspond to the user's voice. may be extracted. For example, if the user says "duplex", the copy function has two-sided settings for the scanned original and the two-sided setting for copy output, and the scan function also has two-sided settings for the scanned original, and these multiple settings exist. Duplex settings will be extracted as candidates.

上述した従来の画像処理装置では、ユーザーの音声に対応する設定として複数の設定候補が抽出された場合に何ら対応策が採られていない。そのため、ユーザーは、自身で発した音声に対応する設定として複数の設定候補が存在する場合に、それら複数の設定候補のうちから所望の設定候補を選択することができないという課題がある。 In the conventional image processing apparatus described above, no countermeasure is taken when a plurality of setting candidates are extracted as settings corresponding to the user's voice. Therefore, when a plurality of setting candidates exist as settings corresponding to the voice uttered by the user, there is a problem in that the user cannot select a desired setting candidate from among the plurality of setting candidates.

また、この種の画像処理装置は、複数のユーザーによって共有される装置であるため、一人のユーザーによる占有時間をなるべく短くすることが望まれる。ところが、従来の画像処理装置は、複数の設定候補のうちからユーザーが所望する設定候補を効率的に選択することができないため、一人のユーザーによる画像処理装置の占有時間が長くなってしまうという問題もある。 Further, since this type of image processing device is a device shared by a plurality of users, it is desirable to reduce the time occupied by one user as much as possible. However, with conventional image processing devices, it is not possible for a user to efficiently select a desired setting candidate from among a plurality of setting candidates, resulting in a problem that a single user occupies the image processing device for a long time. There is also.

本発明は、上述した従来の問題点を解決するためになされたものであり、ユーザーが発した音声に基づいて複数の設定候補が抽出される場合であっても、ユーザーに効率的に所望の設定候補を選択させることを可能にした画像処理装置の制御装置、制御システム及び制御プログラムを提供することを目的とする。 The present invention has been made to solve the above-mentioned conventional problems, and even when multiple setting candidates are extracted based on the voice uttered by the user, the present invention allows the user to efficiently select the desired setting. It is an object of the present invention to provide a control device, a control system, and a control program for an image processing apparatus that make it possible to select setting candidates.

上記目的を達成するため、請求項１に係る発明は、画像処理装置を制御する制御装置であって、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、を備え、前記案内情報出力手段は、前記設定特定手段において前記設定候補の数が所定数未満であると判定された場合、前記所定数未満の設定候補を音声案内するための前記案内情報を出力することを特徴とする構成である。
請求項２に係る発明は、画像処理装置を制御する制御装置であって、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、
を備え、前記案内情報出力手段は、前記設定特定手段において所定数以上の設定候補が存在すると判定された場合、前記所定数以上の設定候補を音声案内するための前記案内情報を出力することを特徴とする構成である。
請求項３に係る発明は、画像処理装置を制御する制御装置であって、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、
を備え、前記案内情報出力手段は、前記設定特定手段において所定数以上の設定候補が存在すると判定された場合、ユーザーに前記画像処理装置に設けられている表示手段の確認を促す音声案内のための前記案内情報を出力することを特徴とする構成である。
請求項４に係る発明は、画像処理装置を制御する制御装置であって、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、を備え、前記案内情報出力手段は、前記設定特定手段において所定数以上の設定候補が存在すると判定された場合、前記案内情報を出力しないことを特徴とする構成である。 In order to achieve the above object, the invention according to claim 1 provides a control device for controlling an image processing device, the control device controlling the image processing device to control the settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user. a setting specifying means to specify, and guidance information for generating voice guidance representing the contents of the settings specified by the setting specifying means, and outputting voice guidance based on the guide information from a predetermined voice output means. and a control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result, the guidance information outputting means , when the setting specifying means determines that the number of setting candidates is less than a predetermined number, outputting the guidance information for audio guidance of the setting candidates less than the predetermined number. .
The invention according to claim 2 is a control device for controlling an image processing device, comprising: a setting specifying means for specifying a setting to be reflected in the image processing device based on a voice recognition result of a voice uttered by a user; guidance information output means for generating guidance information for audio guidance representing the content of the settings specified by the setting identification means, and outputting audio guidance based on the guidance information from a predetermined audio output means; and the setting identification means. a control means for presenting the plurality of setting candidates when the means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
The guidance information output means outputs the guidance information for audio guidance of the predetermined number or more setting candidates when the setting specifying means determines that there are more than a predetermined number of setting candidates. This is a characteristic configuration.
The invention according to claim 3 is a control device for controlling an image processing device, comprising: a setting specifying means for specifying a setting to be reflected in the image processing device based on a voice recognition result of a voice uttered by a user; guidance information output means for generating guidance information for audio guidance representing the content of the settings specified by the setting identification means, and outputting audio guidance based on the guidance information from a predetermined audio output means; and the setting identification means. a control means for presenting the plurality of setting candidates when the means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
and the guidance information output means is configured to provide voice guidance for prompting the user to confirm a display means provided in the image processing device when the setting identification means determines that a predetermined number or more of setting candidates exist. This configuration is characterized by outputting the guidance information of.
The invention according to claim 4 is a control device for controlling an image processing device, comprising: a setting specifying means for specifying a setting to be reflected in the image processing device based on a voice recognition result of a voice uttered by a user; guidance information output means for generating guidance information for audio guidance representing the content of the settings specified by the setting identification means, and outputting audio guidance based on the guidance information from a predetermined audio output means; and the setting identification means. control means for presenting the plurality of setting candidates when the means determines that there are a plurality of setting candidates corresponding to the voice recognition result; This configuration is characterized in that when it is determined that there are more than three setting candidates, the guidance information is not output.

請求項５に係る発明は、請求項１乃至４のいずれかの制御装置において、前記制御手段は、前記設定特定手段において前記音声認識結果に対応する所定数以上の設定候補が存在すると判定された場合に、前記複数の設定候補を提示することを特徴とする構成である。 The invention according to claim 5 is the control device according to any one of claims 1 to 4 , wherein the control means determines that there are a predetermined number or more of setting candidates corresponding to the voice recognition result in the setting specifying means. In this case, the plurality of setting candidates are presented.

請求項６に係る発明は、請求項１乃至５のいずれかの制御装置において、前記画像処理装置は、表示手段を有し、前記制御手段は、前記複数の設定候補を前記表示手段に表示させることによって提示することを特徴とする構成である。 The invention according to claim 6 is the control device according to any one of claims 1 to 5 , wherein the image processing device has a display means, and the control means causes the plurality of setting candidates to be displayed on the display means. This is a configuration characterized by presentation by.

請求項７に係る発明は、請求項１乃至６のいずれかの制御装置において、前記制御手段は、所定の優先順位に従って前記複数の設定候補を提示することを特徴とする構成である。 The invention according to claim 7 is the control device according to any one of claims 1 to 6 , wherein the control means presents the plurality of setting candidates according to a predetermined priority order.

請求項８に係る発明は、請求項７の制御装置において、前記優先順位は、ユーザーによる設定頻度が高い順に定められることを特徴とする構成である。 The invention according to claim 8 is the control device according to claim 7 , wherein the priority order is determined in descending order of frequency of setting by the user.

請求項９に係る発明は、請求項７の制御装置において、前記優先順位は、前記設定候補に対応する設定項目が含まれる操作画面の階層に基づいて予め定められることを特徴とする構成である。 The invention according to claim 9 is the control device according to claim 7 , wherein the priority order is determined in advance based on a hierarchy of an operation screen that includes setting items corresponding to the setting candidates. .

請求項１０に係る発明は、請求項６の制御装置において、前記制御手段は、前記複数の設定候補のそれぞれに対応するサムネイル画像を前記表示手段に表示させることを特徴とする構成である。 The invention according to claim 10 is the control device according to claim 6 , wherein the control means causes the display means to display a thumbnail image corresponding to each of the plurality of setting candidates.

請求項１１に係る発明は、請求項１０の制御装置において、前記制御手段は、前記複数の設定候補のそれぞれに対応する前記サムネイル画像の画像サイズを所定の優先順位に応じて変化させることを特徴とする構成である。 The invention according to claim 11 is the control device according to claim 10 , wherein the control means changes the image size of the thumbnail image corresponding to each of the plurality of setting candidates according to a predetermined priority order. The configuration is as follows.

請求項１２に係る発明は、請求項６、１０又は１１の制御装置において、前記制御手段は、前記設定特定手段において前記設定候補の数が所定数未満であると判定された場合、前記複数の設定候補を前記表示手段に表示させないことを特徴とする構成である。 The invention according to claim 12 is the control device according to claim 6, 10, or 11 , in which the control means, when the setting specifying means determines that the number of the setting candidates is less than a predetermined number, This configuration is characterized in that setting candidates are not displayed on the display means.

請求項１３に係る発明は、請求項１乃至１２のいずれかの制御装置において、前記設定特定手段は、前記音声認識結果に対応する複数の設定候補が存在すると判定した場合、現在の設定状態に対して禁則条件を満たす設定候補を前記複数の設定候補から除外することを特徴とする構成である。 According to a thirteenth aspect of the invention, in the control device according to any one of claims 1 to 12 , when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result, the setting specifying means changes the current setting state. On the other hand, this configuration is characterized in that setting candidates that satisfy prohibitive conditions are excluded from the plurality of setting candidates.

請求項１４に係る発明は、請求項１乃至１３いずれかの制御装置において、前記制御装置は、前記画像処理装置と通信可能なサーバーであることを特徴とする構成である。 The invention according to claim 14 is the control device according to any one of claims 1 to 13 , wherein the control device is a server capable of communicating with the image processing device.

請求項１５係る発明は、請求項１乃至１３いずれかの制御装置において、前記制御装置は、前記画像処理装置に設けられることを特徴とする構成である。 The invention according to claim 15 is the control device according to any one of claims 1 to 13 , wherein the control device is provided in the image processing device.

請求項１６に係る発明は、画像処理装置と、前記画像処理装置を音声操作するための音声を入力する音声入力装置と、前記音声入力装置に入力された音声に基づいて前記画像処理装置を制御する制御装置と、を備える制御システムであって、前記制御装置は、前記音声入力装置に入力された音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、を備え、前記案内情報出力手段は、前記設定特定手段において前記設定候補の数が所定数未満であると判定された場合、前記所定数未満の設定候補を音声案内するための案内情報を出力することを特徴とする構成である。
請求項１７に係る発明は、画像処理装置と、前記画像処理装置を音声操作するための音声を入力する音声入力装置と、前記音声入力装置に入力された音声に基づいて前記画像処理装置を制御する制御装置と、を備える制御システムであって、前記制御装置は、前記音声入力装置に入力された音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、を備え、前記案内情報出力手段は、前記設定特定手段において所定数以上の設定候補が存在すると判定された場合、前記所定数以上の設定候補を音声案内するための案内情報を出力することを特徴とする構成である。
請求項１８に係る発明は、画像処理装置と、前記画像処理装置を音声操作するための音声を入力する音声入力装置と、前記音声入力装置に入力された音声に基づいて前記画像処理装置を制御する制御装置と、を備える制御システムであって、前記制御装置は、前記音声入力装置に入力された音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、を備え、前記案内情報出力手段は、前記設定特定手段において所定数以上の設定候補が存在すると判定された場合、ユーザーに前記画像処理装置に設けられている表示手段の確認を促す音声案内のための案内情報を出力することを特徴とする構成である。
請求項１９に係る発明は、画像処理装置と、前記画像処理装置を音声操作するための音声を入力する音声入力装置と、前記音声入力装置に入力された音声に基づいて前記画像処理装置を制御する制御装置と、を備える制御システムであって、前記制御装置は、前記音声入力装置に入力された音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定する設定特定手段と、前記設定特定手段によって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力手段と、前記設定特定手段において前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御手段と、を備え、前記案内情報出力手段は、前記設定特定手段において所定数以上の設定候補が存在すると判定された場合、案内情報を出力しないことを特徴とする構成である。 The invention according to claim 16 provides an image processing device, an audio input device for inputting audio for audio operating the image processing device, and controlling the image processing device based on the audio input to the audio input device. A control system comprising: a control device that specifies settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device; , a guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means; and the settings. control means for presenting the plurality of setting candidates when the specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result; If it is determined that the number of setting candidates is less than a predetermined number, guidance information for audio guidance of the setting candidates less than the predetermined number is output .
The invention according to claim 17 provides an image processing device, an audio input device for inputting audio for audio operating the image processing device, and controlling the image processing device based on the audio input to the audio input device. A control system comprising: a control device that specifies settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device; , a guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means; and the settings. control means for presenting the plurality of setting candidates when the specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result; This configuration is characterized in that when it is determined that a predetermined number or more of setting candidates exist, guidance information for audio guidance of the predetermined number or more of setting candidates is output.
The invention according to claim 18 provides an image processing device, an audio input device for inputting audio for audio operating the image processing device, and controlling the image processing device based on the audio input to the audio input device. A control system comprising: a control device that specifies settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device; , a guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means; and the settings. control means for presenting the plurality of setting candidates when the specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result; This configuration is characterized in that when it is determined that a predetermined number or more of setting candidates exist, guidance information for audio guidance is output for prompting the user to confirm the display means provided in the image processing apparatus.
The invention according to claim 19 provides an image processing device, an audio input device for inputting audio for audio operating the image processing device, and controlling the image processing device based on the audio input to the audio input device. A control system comprising: a control device that specifies settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device; , a guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means; and the settings. control means for presenting the plurality of setting candidates when the specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result; This configuration is characterized in that when it is determined that a predetermined number or more of setting candidates exist, guidance information is not output.

請求項２０に係る発明は、プロセッサーによって実行されることにより、画像処理装置を制御する制御プログラムであって、前記プロセッサーに、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定させる設定特定ステップと、前記設定特定ステップによって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力ステップと、前記設定特定ステップにおいて前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御ステップと、を実行させ、前記案内情報出力ステップは、前記設定特定ステップにおいて前記設定候補の数が所定数未満であると判定された場合、前記所定数未満の設定候補を音声案内するための前記案内情報を出力することを特徴とする構成である。
請求項２１に係る発明は、プロセッサーによって実行されることにより、画像処理装置を制御する制御プログラムであって、前記プロセッサーに、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定させる設定特定ステップと、前記設定特定ステップによって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力ステップと、前記設定特定ステップにおいて前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御ステップと、を実行させ、前記案内情報出力ステップは、前記設定特定ステップにおいて所定数以上の設定候補が存在すると判定された場合、前記複数の設定候補を音声案内するための前記案内情報を出力することを特徴とする構成である。
請求項２２に係る発明は、プロセッサーによって実行されることにより、画像処理装置を制御する制御プログラムであって、前記プロセッサーに、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定させる設定特定ステップと、前記設定特定ステップによって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力ステップと、前記設定特定ステップにおいて前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御ステップと、を実行させ、前記案内情報出力ステップは、前記設定特定ステップにおいて所定数以上の設定候補が存在すると判定された場合、ユーザーに前記画像処理装置に設けられている表示手段の確認を促す音声案内のための前記案内情報を出力することを特徴とする構成である。
請求項２３に係る発明は、プロセッサーによって実行されることにより、画像処理装置を制御する制御プログラムであって、前記プロセッサーに、ユーザーによって発せられた音声の音声認識結果に基づいて前記画像処理装置に反映すべき設定を特定させる設定特定ステップと、前記設定特定ステップによって特定された設定の内容を表した音声案内のための案内情報を生成し、前記案内情報に基づく音声案内を所定の音声出力手段から出力させる案内情報出力ステップと、前記設定特定ステップにおいて前記音声認識結果に対応する複数の設定候補が存在すると判定された場合に、前記複数の設定候補を提示する制御ステップと、を実行させ、前記案内情報出力ステップは、前記設定特定ステップにおいて所定数以上の設定候補が存在すると判定された場合、前記案内情報を出力しないことを特徴とする構成である。 The invention according to claim 20 is a control program that is executed by a processor to control an image processing device, the control program causing the processor to control the image processing device based on a voice recognition result of a voice uttered by a user. a setting specifying step for specifying settings to be reflected; generating guidance information for voice guidance representing the contents of the settings specified in the setting specifying step; and generating voice guidance based on the guidance information to a predetermined voice output means. and a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result , The guidance information output step outputs the guidance information for audio guidance of the setting candidates that are less than the predetermined number, when it is determined in the setting specifying step that the number of setting candidates is less than a predetermined number. This is a characteristic configuration.
The invention according to claim 21 is a control program that is executed by a processor to control an image processing device, the control program causing the processor to control the image processing device based on a voice recognition result of a voice uttered by a user. a setting specifying step for specifying settings to be reflected; generating guidance information for voice guidance representing the contents of the settings specified in the setting specifying step; and generating voice guidance based on the guidance information to a predetermined voice output means. and a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result, The guidance information output step outputs the guidance information for audio guidance of the plurality of setting candidates when it is determined in the setting specifying step that there are a predetermined number or more of setting candidates. be.
The invention according to claim 22 is a control program that is executed by a processor to control an image processing device, the control program causing the processor to control the image processing device based on a voice recognition result of a voice uttered by a user. a setting specifying step for specifying settings to be reflected; generating guidance information for voice guidance representing the contents of the settings specified in the setting specifying step; and generating voice guidance based on the guidance information to a predetermined voice output means. and a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result, The guidance information output step includes, when it is determined in the setting specifying step that there are a predetermined number or more of setting candidates, the guidance for audio guidance prompting the user to check a display means provided in the image processing device. This configuration is characterized by outputting information.
The invention according to claim 23 is a control program that is executed by a processor to control an image processing device, the control program causing the processor to control the image processing device based on a voice recognition result of a voice uttered by a user. a setting specifying step for specifying settings to be reflected; generating guidance information for voice guidance representing the contents of the settings specified in the setting specifying step; and generating voice guidance based on the guidance information to a predetermined voice output means. and a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result, The guide information output step is characterized in that the guide information is not output when it is determined in the setting specifying step that a predetermined number or more of setting candidates exist.

請求項２４に係る発明は、請求項２０乃至２３のいずれかの制御プログラムにおいて、前記制御ステップは、前記設定特定ステップにおいて前記音声認識結果に対応する所定数以上の設定候補が存在すると判定された場合に、前記所定数以上の設定候補を提示することを特徴とする構成である。 The invention according to claim 24 is the control program according to any one of claims 20 to 23 , wherein in the control step, it is determined in the setting specifying step that there are a predetermined number or more of setting candidates corresponding to the voice recognition result. In this case, the configuration is characterized in that the predetermined number or more of setting candidates are presented.

請求項２５に係る発明は、請求項２０乃至２４のいずれかの制御プログラムにおいて、前記画像処理装置は、表示手段を有し、前記制御ステップは、前記複数の設定候補を前記表示手段に表示させることによって提示することを特徴とする構成である。 The invention according to claim 25 is the control program according to any one of claims 20 to 24 , wherein the image processing device has a display means, and the control step causes the plurality of setting candidates to be displayed on the display means. This is a configuration characterized by presentation by.

請求項２６に係る発明は、請求項２０乃至２５のいずれかの制御プログラムにおいて、前記制御ステップは、所定の優先順位に従って前記複数の設定候補を提示することを特徴とする構成である。 The invention according to claim 26 is the control program according to any one of claims 20 to 25 , wherein the control step presents the plurality of setting candidates according to a predetermined priority order.

請求項２７に係る発明は、請求項２６の制御プログラムにおいて、前記優先順位は、ユーザーによる設定頻度が高い順に定められることを特徴とする構成である。 The invention according to claim 27 is the control program according to claim 26 , wherein the priority order is determined in descending order of frequency of setting by the user.

請求項２８に係る発明は、請求項２６の制御プログラムにおいて、前記優先順位は、前記設定候補に対応する設定項目が含まれる操作画面の階層に基づいて予め定められることを特徴とする構成である。 The invention according to claim 28 is the control program according to claim 26 , wherein the priority order is determined in advance based on a hierarchy of an operation screen that includes setting items corresponding to the setting candidates. .

請求項２９に係る発明は、請求項２５の制御プログラムにおいて、前記制御ステップは、前記複数の設定候補のそれぞれに対応するサムネイル画像を前記表示手段に表示させることを特徴とする構成である。 The invention according to claim 29 is the control program according to claim 25 , wherein the control step causes the display means to display a thumbnail image corresponding to each of the plurality of setting candidates.

請求項３０に係る発明は、請求項２９の制御プログラムにおいて、前記制御ステップは、前記複数の設定候補のそれぞれに対応する前記サムネイル画像の画像サイズを所定の優先順位に応じて変化させることを特徴とする構成である。 The invention according to claim 30 is the control program according to claim 29 , wherein the control step changes the image size of the thumbnail image corresponding to each of the plurality of setting candidates according to a predetermined priority order. The configuration is as follows.

請求項３１に係る発明は、請求項２５、２９又は３０の制御プログラムにおいて、前記制御ステップは、前記設定特定ステップにおいて前記設定候補の数が所定数未満であると判定された場合、前記複数の設定候補を前記表示手段に表示させないことを特徴とする構成である。 The invention according to claim 31 is the control program according to claim 25 , 29 , or 30 , in which, in the control step, when it is determined in the setting specifying step that the number of setting candidates is less than a predetermined number, This configuration is characterized in that setting candidates are not displayed on the display means.

請求項３２に係る発明は、請求項２０乃至３１のいずれかの制御プログラムにおいて、前記設定特定ステップは、前記音声認識結果に対応する複数の設定候補が存在すると判定した場合、現在の設定状態に対して禁則条件を満たす設定候補を前記複数の設定候補から除外することを特徴とする構成である。 The invention according to claim 32 is the control program according to any one of claims 20 to 31 , when the setting specifying step determines that there are a plurality of setting candidates corresponding to the voice recognition result, the setting specifying step selects the current setting state. On the other hand, this configuration is characterized in that setting candidates that satisfy prohibitive conditions are excluded from the plurality of setting candidates.

本発明によれば、音声認識結果に対応する複数の設定候補が存在すると判定された場合にそれら複数の設定候補を提示するため、ユーザーは提示された複数の設定候補のうちから所望する設定候補を効率的に選択することができるようになる。 According to the present invention, when it is determined that a plurality of setting candidates corresponding to the voice recognition result exist, the plurality of setting candidates are presented, so that the user can select a desired setting candidate from among the presented plural setting candidates. will be able to select efficiently.

画像処理装置を制御するための制御システムの一例を示す図である。FIG. 1 is a diagram illustrating an example of a control system for controlling an image processing device. 制御システムの動作の概略を示す図である。FIG. 2 is a diagram schematically showing the operation of the control system. 画像処理装置のハードウェア構成及び機能構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the hardware configuration and functional configuration of an image processing device. 制御装置のハードウェア構成及び機能構成を示す図である。It is a diagram showing the hardware configuration and functional configuration of a control device. キーワード情報の一例を示す図である。It is a figure showing an example of keyword information. 制御装置が音声認識結果に基づいて画像処理装置に反映すべき設定を特定することができた場合の流れを示す図である。FIG. 6 is a diagram showing a flow when the control device is able to specify settings to be reflected in the image processing device based on the voice recognition result. 制御装置が音声認識結果に基づいて所定数以上の設定候補を抽出した場合の流れを示す図である。FIG. 6 is a diagram showing a flow when the control device extracts a predetermined number or more of setting candidates based on a voice recognition result. 制御装置において行われる処理手順の一例を示すフローチャートである。3 is a flowchart illustrating an example of a processing procedure performed in the control device. 設定候補抽出処理の詳細な処理手順の一例を示すフローチャートである。7 is a flowchart illustrating an example of a detailed processing procedure of setting candidate extraction processing. 設定候補提示処理の詳細な処理手順の一例を示すフローチャートである。7 is a flowchart illustrating an example of a detailed processing procedure of setting candidate presentation processing. 画像処理装置において行われる処理手順の一例を示すフローチャートである。3 is a flowchart illustrating an example of a processing procedure performed in the image processing device. 操作パネルの表示部に表示される選択画面の一例を示す図である。FIG. 3 is a diagram showing an example of a selection screen displayed on the display section of the operation panel. 図１２とは異なる選択画面の例を示す図である。13 is a diagram showing an example of a selection screen different from that in FIG. 12. FIG.

以下、本発明に関する好ましい実施形態について図面を参照しつつ詳細に説明する。尚、以下に説明する実施形態において互いに共通する要素には同一符号を付しており、それらについての重複する説明は省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In the embodiments described below, common elements are given the same reference numerals, and redundant explanations thereof will be omitted.

図１は、本実施形態において画像処理装置２を制御するための制御システム１の一例を示す図である。この制御システム１は、ユーザーのローカル環境に設置される画像処理装置２と、ローカル環境に設置されて音声の入出力を行う音声入出力装置３と、ＬＡＮ（Local Area Network）などのローカルネットワーク４と、インターネット上のクラウド５に設置される音声認識装置７と、クラウド５に設置されて画像処理装置２を制御する制御装置８と、クラウド５上で音声認識装置７と制御装置８とを通信可能に接続するネットワーク９とを備える。 FIG. 1 is a diagram showing an example of a control system 1 for controlling an image processing device 2 in this embodiment. This control system 1 includes an image processing device 2 installed in a user's local environment, an audio input/output device 3 installed in the local environment for inputting and outputting audio, and a local network 4 such as a LAN (Local Area Network). , a voice recognition device 7 installed in a cloud 5 on the Internet, a control device 8 installed in the cloud 5 and controlling the image processing device 2, and communication between the voice recognition device 7 and the control device 8 on the cloud 5. and a network 9 that can be connected.

画像処理装置２は、例えばＭＦＰ（Multifunction Peripherals）などと呼ばれる装置であって、コピー機能、スキャン機能、プリント機能、ＦＡＸ機能などの複数の機能を有する装置である。画像処理装置２は、装置本体の上部にスキャナ部１７を備え、スキャナ部１７の下部にプリンタ部１８を備えている。また、画像処理装置２は、内部にＦＡＸ部を備えている。画像処理装置２は、それらスキャナ部１７、プリンタ部及びＦＡＸ部を適宜動作させることにより、ユーザーによって指定されたジョブを実行する。また、画像処理装置２は、装置本体の正面側にユーザーが操作可能な操作パネル１１を備えている。この操作パネル１１は、ユーザーが手動操作を行うことにより、各種ジョブの設定などを行うことができるユーザーインタフェースである。 The image processing device 2 is a device called, for example, an MFP (Multifunction Peripherals), and has multiple functions such as a copy function, a scan function, a print function, and a FAX function. The image processing device 2 includes a scanner section 17 at the top of the device body, and a printer section 18 at the bottom of the scanner section 17. The image processing device 2 also includes a FAX section inside. The image processing device 2 executes the job specified by the user by appropriately operating the scanner section 17, printer section, and FAX section. The image processing device 2 also includes an operation panel 11 that can be operated by a user on the front side of the device main body. The operation panel 11 is a user interface that allows the user to perform settings for various jobs through manual operations.

音声入出力装置３は、例えばスマートスピーカーやＡＩスピーカーなどと呼ばれる装置であって、ユーザーと対話を行うことができる装置である。この音声入出力装置３は、ユーザーが画像処理装置２を音声で操作できるようにするために、画像処理装置２と同じ環境に設置される。尚、図１では、音声入出力装置３が画像処理装置２と別体として構成される場合を例示しているが、これに限られるものではない。例えば、音声入出力装置３は、画像処理装置２に内蔵されたものであっても構わない。 The audio input/output device 3 is a device called, for example, a smart speaker or an AI speaker, and is a device that can interact with a user. This audio input/output device 3 is installed in the same environment as the image processing device 2 so that the user can operate the image processing device 2 by voice. Although FIG. 1 illustrates a case where the audio input/output device 3 is configured separately from the image processing device 2, the present invention is not limited to this. For example, the audio input/output device 3 may be built into the image processing device 2.

ローカルネットワーク４は、画像処理装置２と音声入出力装置３とを通信可能に接続する。またローカルネットワーク４は、インターネットにも接続されている。そのため、画像処理装置２及び音声入出力装置３は、クラウド５に設置されている音声認識装置７や制御装置８と通信を行うことが可能である。 The local network 4 communicably connects the image processing device 2 and the audio input/output device 3. The local network 4 is also connected to the Internet. Therefore, the image processing device 2 and the voice input/output device 3 can communicate with the voice recognition device 7 and the control device 8 installed in the cloud 5.

音声認識装置７は、音声を解析して音声をテキスト化する装置である。例えば、音声認識装置７には、人工知能（ＡＩ）による音声認識機能が搭載されており、入力する音声情報を高速かつ高精度にテキスト化することができる。本実施形態の音声認識装置７は、図１に示すようにクラウド５上にサーバーとして設置される。そのため、音声認識装置７による音声認識機能を、複数のローカル環境から利用することができる。 The speech recognition device 7 is a device that analyzes speech and converts speech into text. For example, the speech recognition device 7 is equipped with a speech recognition function using artificial intelligence (AI), and can convert input speech information into text at high speed and with high precision. The speech recognition device 7 of this embodiment is installed as a server on the cloud 5, as shown in FIG. Therefore, the speech recognition function of the speech recognition device 7 can be used from a plurality of local environments.

制御装置８は、ローカル環境に設置される画像処理装置２を遠隔制御するための装置である。この制御装置８もクラウド５上にサーバーとして設置されている。そのため、制御装置８は、複数のローカル環境に設置されている複数の画像処理装置２を個別に制御することができる。 The control device 8 is a device for remotely controlling the image processing device 2 installed in the local environment. This control device 8 is also installed on the cloud 5 as a server. Therefore, the control device 8 can individually control a plurality of image processing devices 2 installed in a plurality of local environments.

上記のような制御システム１は、ローカル環境においてユーザーが画像処理装置２を操作するためのキーワードを音声で発した場合に、ユーザーの音声に応じて画像処理装置２が動作するように構成される。 The control system 1 as described above is configured such that when a user vocally utters a keyword for operating the image processing device 2 in a local environment, the image processing device 2 operates according to the user's voice. .

図２は、制御システム１における概略動作を示す図である。音声入出力装置３は、画像処理装置２を音声操作するための装置であり、予め画像処理装置２の装置モデル（装置タイプ）やアドレスに関する装置情報を保持している。この音声入出力装置３は、画像処理装置２と同じローカル環境に設置されることにより、画像処理装置２の近傍位置でユーザーが発する音声を入力することができる。そしてユーザーが画像処理装置２を操作するための音声を発すると、音声入出力装置３は、その音声を入力し、音声情報Ｄ１を生成する。この音声情報Ｄ１は、音声入出力装置３からクラウド５上の音声認識装置７へと送信される。このとき、音声入出力装置３は、画像処理装置２の装置情報を付加した音声情報Ｄ１を音声認識装置７へ送信する。 FIG. 2 is a diagram schematically showing the operation of the control system 1. As shown in FIG. The audio input/output device 3 is a device for operating the image processing device 2 by voice, and holds device information regarding the device model (device type) and address of the image processing device 2 in advance. By installing the audio input/output device 3 in the same local environment as the image processing device 2, it is possible to input audio emitted by a user at a position near the image processing device 2. When the user utters a voice for operating the image processing device 2, the voice input/output device 3 inputs the voice and generates voice information D1. This voice information D1 is transmitted from the voice input/output device 3 to the voice recognition device 7 on the cloud 5. At this time, the voice input/output device 3 transmits the voice information D1 to which the device information of the image processing device 2 is added to the voice recognition device 7.

音声認識装置７は、音声入出力装置３から音声情報Ｄ１を受信すると、音声情報Ｄ１を解析することにより、ユーザーが発した音声をテキストＤ２に変換する。すなわち、音声認識装置７は、音声認識結果として、ユーザーの音声に対応するテキストＤ２を生成する。この音声認識装置７は、音声入出力装置３から音声情報Ｄ１を受信してテキストＤ２を生成すると、そのテキストＤ２をクラウド５内で制御装置８へ送信する。このとき、音声認識装置７は、音声情報Ｄ１に付加されていた装置情報を、テキストＤ２に付加した状態で制御装置８へ送信する。 When the speech recognition device 7 receives the speech information D1 from the speech input/output device 3, it analyzes the speech information D1 and converts the speech uttered by the user into text D2. That is, the voice recognition device 7 generates text D2 corresponding to the user's voice as a voice recognition result. This voice recognition device 7 receives voice information D1 from the voice input/output device 3 and generates a text D2, and then transmits the text D2 to the control device 8 within the cloud 5. At this time, the voice recognition device 7 transmits the device information added to the voice information D1 to the control device 8 with the device information added to the text D2.

制御装置８は、音声認識装置７の音声認識結果に基づいて画像処理装置２を制御するコマンドＤ４を生成する。すなわち、制御装置８は、音声認識装置７から受信するテキストＤ２に基づいてコマンドＤ４を生成するのである。このとき、制御装置８は、テキストＤ２に付加されている装置情報に基づき画像処理装置２の装置モデルを特定し、その装置モデルに対応したコマンドＤ４を生成する。また、制御装置８は、テキストＤ２に付加されている装置情報に基づき、コマンドＤ４の送信先となる画像処理装置２のアドレスを特定する。そして制御装置８は、コマンドＤ４を、画像処理装置２へ送信することにより、画像処理装置２を制御する。 The control device 8 generates a command D4 to control the image processing device 2 based on the voice recognition result of the voice recognition device 7. That is, the control device 8 generates the command D4 based on the text D2 received from the speech recognition device 7. At this time, the control device 8 identifies the device model of the image processing device 2 based on the device information added to the text D2, and generates a command D4 corresponding to the device model. Furthermore, the control device 8 identifies the address of the image processing device 2 to which the command D4 is to be sent, based on the device information added to the text D2. The control device 8 then controls the image processing device 2 by transmitting the command D4 to the image processing device 2.

例えば、ユーザーが画像処理装置２に対するジョブの設定を音声で指示した場合、制御装置８は、音声認識装置７から受信するテキストＤ２に基づいて画像処理装置２に反映すべき設定を特定し、その設定を画像処理装置２に反映させるためのコマンドＤ４を生成する。そして制御装置８は、そのコマンドＤ４を画像処理装置２へ送信することにより、ジョブの設定を反映させる。 For example, when a user instructs the image processing device 2 to set up a job by voice, the control device 8 specifies the settings to be reflected in the image processing device 2 based on the text D2 received from the voice recognition device 7, and A command D4 for reflecting the settings on the image processing device 2 is generated. Then, the control device 8 transmits the command D4 to the image processing device 2 to reflect the job settings.

また、ユーザーが画像処理装置２に対してジョブの実行を音声で指示した場合、制御装置８は、音声認識装置７から受信するテキストＤ２に基づいて画像処理装置２にジョブを実行させるためのコマンドＤ４を生成する。そして制御装置８は、そのコマンドＤ４を画像処理装置２へ送信することにより、画像処理装置２にジョブを実行させる。 Further, when the user instructs the image processing device 2 to execute a job by voice, the control device 8 issues a command to cause the image processing device 2 to execute the job based on the text D2 received from the voice recognition device 7. Generate D4. Then, the control device 8 transmits the command D4 to the image processing device 2, thereby causing the image processing device 2 to execute the job.

このように制御装置８は、音声認識装置７による音声認識結果に基づいて画像処理装置２に行わせる処理を特定し、その特定した処理に対応するコマンドＤ４を画像処理装置２へ送信することにより、画像処理装置２を制御する。 In this way, the control device 8 specifies the process to be performed by the image processing device 2 based on the voice recognition result by the voice recognition device 7, and sends the command D4 corresponding to the specified process to the image processing device 2. , controls the image processing device 2.

また、制御装置８は、音声認識結果に基づいて特定した処理に基づく音声案内を行うための案内情報Ｄ３を生成する。この案内情報Ｄ３は、テキストデータとして生成される。そして制御装置８は、テキストデータとして生成した案内情報Ｄ３を音声認識装置７へ送信する。音声認識装置７は、制御装置８から音声案内のための案内情報Ｄ３を受信すると、案内情報Ｄ３をテキストデータから音声情報に変換する。そして音声認識装置７は、音声情報に変換した案内情報Ｄ３を、音声情報Ｄ１の送信元である音声入出力装置３へ送信する。これにより、音声入出力装置３では、案内情報Ｄ３に基づく音声出力が行われる。例えば、ユーザーがジョブの設定操作を音声で行った場合、制御装置８は、その設定の内容をテキストデータで表した案内情報Ｄ３を生成して音声認識装置７へ送信する。音声認識装置７は、テキストデータを音声情報に変換することにより、ユーザーによる設定の内容を音声出力するための案内情報Ｄ３を生成する。そして音声認識装置７は、音声情報に変換した案内情報Ｄ３を音声入出力装置３へ送信する。音声入出力装置３は、音声認識装置７から案内情報Ｄ３を受信すると、その案内情報Ｄ３に基づく音声出力を行う。したがって、ユーザーは、自身で発した音声がどのように認識されたかを音声入出力装置３からの出力音声で把握することができる。 Further, the control device 8 generates guidance information D3 for performing voice guidance based on the process specified based on the voice recognition result. This guide information D3 is generated as text data. The control device 8 then transmits the guidance information D3 generated as text data to the voice recognition device 7. When the voice recognition device 7 receives the guidance information D3 for voice guidance from the control device 8, it converts the guidance information D3 from text data into voice information. The voice recognition device 7 then transmits the guidance information D3 converted into voice information to the voice input/output device 3 that is the source of the voice information D1. Thereby, the audio input/output device 3 performs audio output based on the guide information D3. For example, when a user performs a job setting operation by voice, the control device 8 generates guidance information D3 expressing the contents of the setting in text data and transmits it to the voice recognition device 7. The voice recognition device 7 generates guidance information D3 for audio outputting the contents of the settings made by the user by converting text data into voice information. The voice recognition device 7 then transmits the guidance information D3 converted into voice information to the voice input/output device 3. When the voice input/output device 3 receives the guidance information D3 from the voice recognition device 7, it performs voice output based on the guidance information D3. Therefore, the user can understand from the output voice from the voice input/output device 3 how the voice he/she uttered is recognized.

このような制御システム１によれば、ユーザーは、画像処理装置２の操作パネル１１を手動で操作しなくても、音声操作を行うことが可能である。そのため、例えばユーザーが荷物を抱えていて両手を使うことができない場合や、画像処理装置２から数メートル程度離れた位置にいる場合であっても、画像処理装置２を音声で操作することができるので、利便性が高い。 According to such a control system 1, the user can perform voice operations without manually operating the operation panel 11 of the image processing device 2. Therefore, even if the user is carrying luggage and cannot use both hands, or is located several meters away from the image processing device 2, the image processing device 2 can be operated by voice. Therefore, it is highly convenient.

図３は、画像処理装置２のハードウェア構成及び機能構成の一例を示すブロック図である。画像処理装置２は、そのハードウェア構成として、制御部１０と、操作パネル１１と、通信インタフェース１４と、記憶部１５と、スキャナ部１７と、プリンタ部１８と、ＦＡＸ部１９とを備えている。制御部１０は、図示を省略するＣＰＵとメモリとを備えている。制御部１０は、そのＣＰＵにおいて所定のプログラムが実行されることにより、各部の動作を制御する。 FIG. 3 is a block diagram showing an example of the hardware configuration and functional configuration of the image processing device 2. As shown in FIG. The image processing device 2 includes, as its hardware configuration, a control section 10, an operation panel 11, a communication interface 14, a storage section 15, a scanner section 17, a printer section 18, and a FAX section 19. . The control unit 10 includes a CPU and memory (not shown). The control unit 10 controls the operation of each unit by executing a predetermined program in its CPU.

操作パネル１１は、表示部１２と、操作部１３とを備えている。表示部１２は、例えばカラー液晶ディスプレイで構成され、ユーザーが操作可能な各種の操作画面を表示する。操作部１３は、例えばタッチパネルキーや押しボタンキーなどによって構成され、ユーザーによる手動操作を受け付ける。 The operation panel 11 includes a display section 12 and an operation section 13. The display unit 12 is configured with, for example, a color liquid crystal display, and displays various operation screens that can be operated by the user. The operation unit 13 includes, for example, touch panel keys, push button keys, etc., and accepts manual operations by the user.

通信インタフェース１４は、画像処理装置２をローカルネットワーク４に接続するためのものである。画像処理装置２は、この通信インタフェース１４を介してローカルネットワーク４に接続されている様々な機器と通信を行うことができる。また、画像処理装置２は、この通信インタフェース１４を介して、制御装置８から送信されるコマンドＤ４を受信する。 The communication interface 14 is for connecting the image processing device 2 to the local network 4. The image processing device 2 can communicate with various devices connected to the local network 4 via this communication interface 14. The image processing device 2 also receives a command D4 sent from the control device 8 via this communication interface 14.

記憶部１５は、ハードディスクドライブ（ＨＤＤ）やソリッドステートドライブ（ＳＳＤ）などによって構成される不揮発性の記憶デバイスである。この記憶部１５には、操作パネル１１の表示部１２に表示するための画面情報１６が記憶される。この画面情報１６は、表示部１２に表示するための複数の操作画面に関する情報や、各操作画面に含まれる設定項目等に関する情報、複数の操作画面を階層構造として定義した情報などが含まれる。尚、記憶部１５には、この他にも制御部１０のＣＰＵによって実行されるプログラムや各種データなどが記憶される。 The storage unit 15 is a nonvolatile storage device configured with a hard disk drive (HDD), solid state drive (SSD), or the like. This storage section 15 stores screen information 16 to be displayed on the display section 12 of the operation panel 11. This screen information 16 includes information regarding a plurality of operation screens to be displayed on the display unit 12, information regarding setting items included in each operation screen, information defining a plurality of operation screens as a hierarchical structure, and the like. Note that the storage unit 15 also stores programs executed by the CPU of the control unit 10, various data, and the like.

スキャナ部１７は、ユーザーによってセットされる原稿を光学的に読み取って画像データを生成する。スキャナ部１７は、制御部１０によって設定されるジョブ設定（原稿の読み取り設定など）に基づいて原稿の読み取り動作を行う。例えば、ジョブ設定において原稿の両面読み取りが指定されている場合、スキャナ部１７は、原稿の表裏両面に対する読み取り動作を行う。 The scanner unit 17 optically reads a document set by a user and generates image data. The scanner section 17 performs a document reading operation based on job settings (document reading settings, etc.) set by the control section 10 . For example, if double-sided reading of a document is specified in the job settings, the scanner unit 17 performs a reading operation for both the front and back sides of the document.

プリンタ部１８は、入力する画像データに基づいて印刷用紙などのシート材に画像形成を行うことにより印刷出力を行う。プリンタ部１８は、制御部１０によって設定されるジョブ設定（原稿の読み取り設定など）に基づく印刷出力を行う。例えば、ジョブ設定において両面印刷が指定されている場合、プリンタ部１８は、シート材の表裏両面に対して画像形成を行う。 The printer unit 18 performs printout by forming an image on a sheet material such as printing paper based on input image data. The printer unit 18 performs print output based on job settings (document reading settings, etc.) set by the control unit 10. For example, if double-sided printing is specified in the job settings, the printer unit 18 forms images on both the front and back sides of the sheet material.

ＦＡＸ部は、図示を省略する公衆電話網を介してＦＡＸデータの送受信を行うものである。 The FAX unit transmits and receives FAX data via a public telephone network (not shown).

制御部１０は、パネル制御部２０とジョブ制御部２１とを備えている。操作パネル１１に関する制御を行うとき、制御部１０は、パネル制御部２０を機能させる。また、ジョブの設定又は実行に関する制御を行うとき、制御部１０は、ジョブ制御部２１を機能させる。 The control section 10 includes a panel control section 20 and a job control section 21. When controlling the operation panel 11, the control section 10 causes the panel control section 20 to function. Furthermore, when controlling job settings or execution, the control unit 10 causes the job control unit 21 to function.

パネル制御部２０は、操作パネル１１の表示部１２に表示する操作画面を制御したり、操作部１３に対して行われるユーザーの手動操作を受け付けたりする。例えば、パネル制御部２０は、記憶部１５に記憶されている画面情報１６に基づき、表示部１２に表示している操作画面をユーザーの操作に基づいて更新したり、遷移させたりする。尚、パネル制御部２０は、ユーザーが操作パネル１１を手動操作していない状態が所定時間以上継続すると、表示部１２に対する給電を停止し、操作パネル１１を省電力モードへと移行させる。 The panel control unit 20 controls the operation screen displayed on the display unit 12 of the operation panel 11 and receives manual operations performed by the user on the operation unit 13. For example, the panel control unit 20 updates or transitions the operation screen displayed on the display unit 12 based on the user's operation based on the screen information 16 stored in the storage unit 15. Note that if the user does not manually operate the operation panel 11 for a predetermined period of time or more, the panel control section 20 stops power supply to the display section 12 and shifts the operation panel 11 to the power saving mode.

ジョブ制御部２１は、ジョブの設定及び実行を統括的に制御する。ジョブ制御部２１は、ジョブ設定部２２を備えている。ジョブ設定部２２は、ジョブの設定を行う処理部である。すなわち、ジョブ設定部２２は、コピー機能などの複数の機能のうちのユーザーが使用しようとする機能を特定し、その特定した機能に対する各種設定項目の設定値をデフォルト値からユーザーによって指定された値に変更し、ジョブの設定を行う。 The job control unit 21 comprehensively controls job settings and execution. The job control section 21 includes a job setting section 22. The job setting section 22 is a processing section that performs job settings. That is, the job setting unit 22 identifies the function that the user intends to use among multiple functions such as the copy function, and changes the setting values of various setting items for the specified function from default values to values specified by the user. and configure the job settings.

例えば、ユーザーが操作パネル１１に対する手動操作を行ってジョブの設定を行う場合、操作パネル１１の表示部１２には、パネル制御部２０によって各種の操作画面が表示される。ユーザーはその操作画面に対する操作を順次行っていくことで所望のジョブ設定を行う。このようにユーザーが操作パネル１１に対する手動操作を行っているとき、ジョブ設定部２２は、パネル制御部２０から出力される操作情報に基づいてジョブの設定を行う。 For example, when a user manually operates the operation panel 11 to set a job, the panel control section 20 displays various operation screens on the display section 12 of the operation panel 11. The user performs desired job settings by sequentially performing operations on the operation screen. When the user performs manual operations on the operation panel 11 in this way, the job setting section 22 performs job settings based on the operation information output from the panel control section 20.

また、ユーザーが画像処理装置２を操作するための音声を発した場合、制御部１０は、通信インタフェース１４を介して制御装置８から送信されるコマンドＤ４を受信する。制御部１０は、そのコマンドＤ４を解析して制御装置８からの指示を特定する。その指示がジョブの設定を反映させるための設定反映指示である場合、ジョブ設定部２２は、その設定反映指示に基づいてジョブの設定を行う。 Further, when the user makes a sound to operate the image processing device 2, the control unit 10 receives a command D4 transmitted from the control device 8 via the communication interface 14. The control unit 10 analyzes the command D4 and identifies the instruction from the control device 8. If the instruction is a setting reflection instruction for reflecting job settings, the job setting unit 22 performs job settings based on the setting reflection instruction.

またジョブ制御部２１は、ユーザーによってジョブの実行が指示された場合、スキャナ部１７、プリンタ部１８及びＦＡＸ部１９のそれぞれを駆動し、ユーザーによって指定されたジョブを実行する。 Further, when the job execution is instructed by the user, the job control unit 21 drives each of the scanner unit 17, printer unit 18, and FAX unit 19 to execute the job specified by the user.

次に図４は、制御装置８のハードウェア構成及び機能構成を示す図である。制御装置８は、図４（ａ）に示すように、そのハードウェア構成として、制御部４０と、記憶部４１と、通信インタフェース４２とを備えている。 Next, FIG. 4 is a diagram showing the hardware configuration and functional configuration of the control device 8. As shown in FIG. As shown in FIG. 4A, the control device 8 includes a control section 40, a storage section 41, and a communication interface 42 as its hardware configuration.

制御部４０は、ＣＰＵ４３とメモリ４４とを備えている。ＣＰＵ４３は、記憶部４１に記憶されているプログラム２５を読み出して実行する演算処理ユニット（プロセッサー）である。メモリ４４は、ＣＰＵ４３がプログラム２５を実行することに伴って発生する一時的なデータなどを記憶する。 The control unit 40 includes a CPU 43 and a memory 44. The CPU 43 is an arithmetic processing unit (processor) that reads and executes the program 25 stored in the storage section 41. The memory 44 stores temporary data generated as the CPU 43 executes the program 25.

記憶部４１は、ハードディスクドライブ（ＨＤＤ）やソリッドステートドライブ（ＳＳＤ）などで構成される不揮発性の記憶デバイスである。この記憶部４１には、画像処理装置２を制御するためのプログラム２５と、キーワード情報２６と、優先順位情報２７とが予め記憶される。 The storage unit 41 is a non-volatile storage device such as a hard disk drive (HDD) or solid state drive (SSD). The storage unit 41 stores in advance a program 25 for controlling the image processing device 2, keyword information 26, and priority information 27.

通信インタフェース４２は、制御装置８が他の機器と通信を行うためのものである。例えば、制御部４０は、この通信インタフェース４２を介して音声認識装置７から出力されるテキストＤ２を受信することができる。また、制御部４０は、通信インタフェース４２を介して案内情報Ｄ３を音声認識装置７へ送信したり、また、コマンドＤ４をローカル環境に設置されている画像処理装置２へ送信したりすることができる。 The communication interface 42 is for the control device 8 to communicate with other devices. For example, the control unit 40 can receive the text D2 output from the speech recognition device 7 via the communication interface 42. Further, the control unit 40 can transmit the guidance information D3 to the voice recognition device 7 via the communication interface 42, and can also transmit the command D4 to the image processing device 2 installed in the local environment. .

制御部４０のＣＰＵ４３は、記憶部４１のプログラム２５を読み出して実行することにより、図４（ｂ）に示すように、制御部４０を、設定特定部３１、案内情報出力部３２及び装置制御部３３として機能させる。 The CPU 43 of the control unit 40 reads and executes the program 25 in the storage unit 41, thereby controlling the control unit 40 to include the setting specifying unit 31, the guidance information output unit 32, and the device control unit, as shown in FIG. 4(b). Function as 33.

設定特定部３１は、音声認識装置７から出力されるテキストＤ２に基づいて画像処理装置２に反映すべき設定を特定する処理部である。設定特定部３１は、設定候補抽出部３４を備えている。設定候補抽出部３４は、テキストＤ２に対応する設定候補を抽出する。設定候補抽出部３４は、テキストＤ２を受信することに伴って機能し、記憶部４１に記憶されているキーワード情報２６に基づいて設定候補を抽出する。 The setting specifying unit 31 is a processing unit that specifies settings to be reflected in the image processing device 2 based on the text D2 output from the speech recognition device 7. The setting specifying section 31 includes a setting candidate extracting section 34. The setting candidate extraction unit 34 extracts setting candidates corresponding to the text D2. The setting candidate extraction unit 34 functions upon receiving the text D2, and extracts setting candidates based on the keyword information 26 stored in the storage unit 41.

図５は、キーワード情報２６の一例を示す図である。キーワード情報２６は、図５に示すように、画像処理装置２に対して設定可能な設定項目及び設定値に対してキーワードが対応付けられた情報であり、画像処理装置２の装置モデルごとに定義される情報である。キーワードは、画像処理装置２に対して各種の設定を行うことが可能なワードであり、例えば、各設定項目の設定値に対応している。ただし、図５に示すように、キーワード情報２６には、同じキーワードが異なる設定項目に登録されていることもある。例えば、図５に示すキーワード情報２６では、「リョウメン」というキーワードが、コピー機能設定時にスキャナ部１７に対して原稿の両面読み取りを指示するキーワードと、コピー機能設定時にプリンタ部１８に対して両面印刷を指示するキーワードと、スキャン機能設定時にスキャナ部１７に対して原稿の両面読み取りを指示するキーワードと、プリント機能設定時にプリンタ部１８に対して両面印刷を指示するキーワードとの４つの設定項目に対して登録されている。 FIG. 5 is a diagram showing an example of the keyword information 26. As shown in FIG. 5, the keyword information 26 is information in which keywords are associated with setting items and setting values that can be set for the image processing device 2, and is defined for each device model of the image processing device 2. This is the information that will be used. The keyword is a word that allows various settings to be made to the image processing device 2, and corresponds to, for example, the setting value of each setting item. However, as shown in FIG. 5, the same keyword may be registered in different setting items in the keyword information 26. For example, in the keyword information 26 shown in FIG. 5, the keyword "Ryomen" is a keyword that instructs the scanner section 17 to read both sides of the document when the copy function is set, and a keyword that instructs the printer section 18 to print both sides of the document when the copy function is set. , a keyword that instructs the scanner section 17 to read both sides of the document when setting the scan function, and a keyword that instructs the printer section 18 to print on both sides of the document when setting the print function. is registered.

設定候補抽出部３４は、ユーザーの音声に基づいて変換されたテキストＤ２に基づいてキーワード情報２６を検索することにより、テキストＤ２に一致するキーワードがキーワード情報２６に登録されているか否かを判断する。そして設定候補抽出部３４は、テキストＤ２に一致するキーワードがキーワード情報２６に登録されている場合、テキストＤ２に一致するキーワードが登録されている設定項目と設定値との組み合わせを設定候補として抽出する。そのため、設定候補抽出部３４は、テキストＤ２に対応する設定候補として、１つの設定候補を抽出することもあれば、また複数の設定候補を抽出することもある。例えば、ユーザーが「リョウメン」という音声を発した場合、設定候補抽出部３４は、図５に示すキーワード情報２６を参照すると、４つの設定候補を抽出することになる。また、設定候補抽出部３４は、テキストＤ２に対応する設定候補を１つも抽出することができないこともある。 The setting candidate extraction unit 34 determines whether a keyword matching the text D2 is registered in the keyword information 26 by searching the keyword information 26 based on the text D2 converted based on the user's voice. . Then, when a keyword matching the text D2 is registered in the keyword information 26, the setting candidate extraction unit 34 extracts a combination of a setting item and a setting value for which a keyword matching the text D2 is registered as a setting candidate. . Therefore, the setting candidate extraction unit 34 may extract one setting candidate or a plurality of setting candidates as the setting candidate corresponding to the text D2. For example, when the user utters the voice "Ryoumen", the setting candidate extraction unit 34 will extract four setting candidates by referring to the keyword information 26 shown in FIG. 5. Furthermore, the setting candidate extraction unit 34 may not be able to extract any setting candidates corresponding to the text D2.

また、設定候補抽出部３４は、画像処理装置２に対する現在の設定状態を把握しており、上記のようにして抽出した設定候補が、現在の設定状態に対する禁則条件を満たす場合には、その設定候補を除外する。例えば、画像処理装置２がスキャン機能において原稿を読み取って生成した画像データをコンパクトＰＤＦとして出力する際にモノクロ出力を選択することができない設定となっている場合において、ユーザーが既に「モノクロ出力」を設定している状態で「ＰＤＦ」という音声を発したと仮定する。この場合、設定候補抽出部３４は、設定候補として、例えば、ＰＤＦ、暗号化ＰＤＦ、ＰＤＦＡ、サーチャブルＰＤＦ、コンパクトＰＤＦといった５つの設定候補を抽出する。しかし、ユーザーが既に「モノクロ出力」を設定しているため、コンパクトＰＤＦが禁則条件を満たす設定候補となり、設定候補抽出部３４は、コンパクトＰＤＦを設定候補から除外する。その結果、設定候補抽出部３４は、ＰＤＦ、暗号化ＰＤＦ、ＰＤＦＡ、及び、サーチャブルＰＤＦの４つの設定候補を抽出することになる。 Further, the setting candidate extracting unit 34 grasps the current setting state of the image processing device 2, and if the setting candidate extracted as described above satisfies the prohibition condition for the current setting state, the setting candidate Eliminate candidates. For example, if the image processing device 2 is set to not be able to select monochrome output when outputting image data generated by reading a document using the scan function as a compact PDF, the user has already selected "monochrome output". Assume that the user utters the sound "PDF" while the settings are being made. In this case, the setting candidate extraction unit 34 extracts five setting candidates, such as PDF, encrypted PDF, PDFA, searchable PDF, and compact PDF, as setting candidates. However, since the user has already set "monochrome output", the compact PDF becomes a setting candidate that satisfies the prohibition condition, and the setting candidate extraction unit 34 excludes the compact PDF from the setting candidates. As a result, the setting candidate extraction unit 34 extracts four setting candidates: PDF, encrypted PDF, PDFA, and searchable PDF.

設定特定部３１は、設定候補抽出部３４によって設定候補が抽出されなかった場合、案内情報出力部３２及び装置制御部３３を機能させることなく処理を終了する。ただし、この場合、設定特定部３１は、案内情報出力部３２を機能させ、案内情報出力部３２に「設定項目を特定することができませんでした」という案内情報Ｄ３を生成させ、音声認識装置７を介して音声入出力装置３へ出力させるようにしても構わない。 If the setting candidate extraction unit 34 does not extract any setting candidates, the setting specifying unit 31 ends the process without causing the guidance information output unit 32 and the device control unit 33 to function. However, in this case, the setting specifying section 31 causes the guidance information output section 32 to function, causes the guidance information output section 32 to generate the guidance information D3 that says "setting item could not be specified", and the voice recognition device 7 It is also possible to output it to the audio input/output device 3 via the audio input/output device 3.

一方、設定候補抽出部３４によって１つの設定候補が抽出された場合、設定特定部３１は、その１つの設定候補を、画像処理装置２に反映すべき設定として特定することができる。この場合、設定特定部３１は、案内情報出力部３２及び装置制御部３３を機能させ、案内情報出力部３２及び装置制御部３３のそれぞれに対して画像処理装置２に反映すべき設定を通知する。 On the other hand, when one setting candidate is extracted by the setting candidate extracting section 34, the setting specifying section 31 can specify the one setting candidate as a setting to be reflected in the image processing device 2. In this case, the setting specifying unit 31 causes the guidance information output unit 32 and the device control unit 33 to function, and notifies each of the guidance information output unit 32 and the device control unit 33 of the settings to be reflected in the image processing device 2. .

また、設定候補抽出部３４によって複数の設定候補が抽出された場合にも、設定特定部３１は、案内情報出力部３２及び装置制御部３３を機能させる。そして設定特定部３１は、設定候補抽出部３４によって抽出された複数の設定候補を、案内情報出力部３２及び装置制御部３３のそれぞれへ通知する。このとき、設定特定部３１は、設定候補抽出部３４によって抽出された設定候補の数が所定数（例えば、「３」）以上であるか否かを判断する。そして所定数以上の設定候補が抽出されている場合、設定特定部３１は、案内情報出力部３２及び装置制御部３３のそれぞれに対して設定候補が所定数以上であることを通知する。尚、所定数は、「３」に限られるものではなく、２以上の値であれば良い。 Further, even when a plurality of setting candidates are extracted by the setting candidate extraction section 34, the setting specifying section 31 causes the guidance information output section 32 and the device control section 33 to function. Then, the setting specifying unit 31 notifies each of the guidance information output unit 32 and the device control unit 33 of the plurality of setting candidates extracted by the setting candidate extracting unit 34. At this time, the setting specifying unit 31 determines whether the number of setting candidates extracted by the setting candidate extracting unit 34 is greater than or equal to a predetermined number (for example, “3”). If a predetermined number or more of setting candidates have been extracted, the setting specifying unit 31 notifies each of the guidance information output unit 32 and the device control unit 33 that the number of setting candidates is the predetermined number or more. Note that the predetermined number is not limited to "3" and may be any value of 2 or more.

案内情報出力部３２は、ユーザーに対して音声案内を行うための各種の案内情報Ｄ３をテキストデータとして生成し、その案内情報Ｄ３を音声認識装置７へ送信する処理部である。例えば、案内情報出力部３２は、設定特定部３１から出力される設定候補を音声案内のためのテキストデータとして表現した案内情報Ｄ３を生成する。設定候補抽出部３４によって１つの設定候補が抽出された場合、案内情報出力部３２は、その１つの設定候補をテキストデータで表した案内情報Ｄ３を生成する。また、設定候補抽出部３４によって複数の設定候補が抽出されており、それら設定候補の数が所定数未満である場合、案内情報出力部３２は、それら複数の設定候補を１つずつ順番に音声出力するために各設定候補のテキストデータを配列した案内情報Ｄ３を生成する。 The guidance information output unit 32 is a processing unit that generates various types of guidance information D3 for providing voice guidance to the user as text data, and transmits the guidance information D3 to the voice recognition device 7. For example, the guidance information output unit 32 generates guidance information D3 that represents the setting candidates output from the setting identification unit 31 as text data for voice guidance. When one setting candidate is extracted by the setting candidate extraction unit 34, the guidance information output unit 32 generates guidance information D3 representing the one setting candidate in text data. Further, if a plurality of setting candidates have been extracted by the setting candidate extraction unit 34 and the number of these setting candidates is less than a predetermined number, the guidance information output unit 32 sequentially selects the plurality of setting candidates one by one by voice. Guidance information D3 in which text data of each setting candidate is arranged is generated for output.

さらに、設定候補抽出部３４によって複数の設定候補が抽出されており、それら設定候補の数が所定数以上である場合、案内情報出力部３２は、設定モードに応じた処理を行う。設定モードには、案内情報Ｄ３を生成しない第１のモードと、複数の設定候補を１つずつ順番に音声出力するための案内情報Ｄ３を生成する第２のモードと、ユーザーに操作パネル１１の確認を促すための案内情報Ｄ３を生成する第３のモードとの３つのモードがある。案内情報出力部３２には、それら３つのモードのうちから、ユーザーによって予め選択されたモードが設定されている。例えば、第１のモードが設定されている場合、案内情報出力部３２は、案内情報Ｄ３を出力しない。また、第２のモードが設定されている場合、案内情報出力部３２は、所定数未満の設定候補が抽出された場合と同様に、所定数以上の設定候補を１つずつ順番に音声出力するための案内情報Ｄ３を生成して音声認識装置７へ出力する。さらに第３のモードが設定されている場合、案内情報出力部３２は、例えば「操作パネルの表示画面を確認してください」といったテキストデータの案内情報Ｄ３を生成して音声認識装置７へ出力する。 Furthermore, if a plurality of setting candidates have been extracted by the setting candidate extraction section 34 and the number of these setting candidates is a predetermined number or more, the guidance information output section 32 performs processing according to the setting mode. The setting modes include a first mode in which no guidance information D3 is generated, a second mode in which guidance information D3 is generated for sequentially outputting multiple setting candidates one by one, and a second mode in which the user is prompted to use the operation panel 11. There are three modes including a third mode that generates guide information D3 for prompting confirmation. The guide information output unit 32 is set with a mode selected in advance by the user from among these three modes. For example, when the first mode is set, the guidance information output unit 32 does not output the guidance information D3. Furthermore, when the second mode is set, the guidance information output unit 32 sequentially outputs a predetermined number or more setting candidates one by one in voice, as in the case where less than a predetermined number of setting candidates are extracted. guidance information D3 is generated and output to the speech recognition device 7. Furthermore, when the third mode is set, the guidance information output unit 32 generates guidance information D3 in the form of text data such as "Please check the display screen of the operation panel" and outputs it to the voice recognition device 7. .

尚、第２のモードと第３のモードは、案内情報出力部３２において同時に設定されていても構わない。第２のモードと第３のモードとが同時に設定されている場合、案内情報出力部３２は、例えば「操作パネルの表示画面を確認してください」といったテキストデータの次に複数の設定候補のそれぞれに対応するテキストデータを配列した案内情報Ｄ３を生成する。 Note that the second mode and the third mode may be set at the same time in the guide information output section 32. When the second mode and the third mode are set at the same time, the guidance information output unit 32 outputs each of the plurality of setting candidates next to text data such as "Please check the display screen of the operation panel". Guide information D3 is generated in which text data corresponding to the text data is arranged.

装置制御部３３は、画像処理装置２に対して各種のコマンドＤ４を送信することにより、画像処理装置２を制御する処理部である。この装置制御部３３は、表示制御部３５と、設定反映部３６とを備えている。表示制御部３５は、画像処理装置２の操作パネル１１に表示される操作画面を制御するものである。設定反映部３６は、画像処理装置２に対するユーザーが所望する設定を反映させる処理部である。 The device control unit 33 is a processing unit that controls the image processing device 2 by transmitting various commands D4 to the image processing device 2. This device control section 33 includes a display control section 35 and a setting reflection section 36. The display control unit 35 controls an operation screen displayed on the operation panel 11 of the image processing device 2. The settings reflection unit 36 is a processing unit that reflects settings desired by the user on the image processing device 2.

装置制御部３３は、設定特定部３１において画像処理装置２に反映すべき設定が特定された場合、設定反映部３６を機能させる。すなわち、設定反映部３６は、音声認識結果に基づいて１つの設定候補だけが抽出された場合に機能する。そして設定反映部４６は、設定特定部３１において特定された設定を画像処理装置２に反映させるためのコマンドＤ４を生成し、画像処理装置２へ送信する。画像処理装置２は、そのコマンドＤ４を受信すると、設定反映部３６によって指定された設定を反映させる。つまり、画像処理装置２は、ユーザーの音声に対応する設定を自動的に行うのである。そのため、ユーザーは、操作パネル１１に対する操作を行わなくても、画像処理装置２に対して所望の設定を行うことができる。 The device control unit 33 causes the setting reflection unit 36 to function when the setting identification unit 31 identifies settings to be reflected in the image processing device 2 . That is, the setting reflection unit 36 functions when only one setting candidate is extracted based on the voice recognition result. Then, the setting reflection section 46 generates a command D4 for reflecting the settings specified by the setting specifying section 31 on the image processing apparatus 2, and transmits it to the image processing apparatus 2. When the image processing device 2 receives the command D4, the image processing device 2 reflects the settings specified by the setting reflection unit 36. In other words, the image processing device 2 automatically makes settings corresponding to the user's voice. Therefore, the user can make desired settings for the image processing device 2 without operating the operation panel 11.

一方、設定候補抽出部３４によって複数の設定候補が抽出され、設定特定部３１において画像処理装置２に反映すべき設定を特定することができなかった場合、設定反映部３６は、コマンドＤ４を生成しない。 On the other hand, if the setting candidate extracting unit 34 extracts a plurality of setting candidates and the setting specifying unit 31 is unable to specify a setting to be reflected in the image processing device 2, the setting reflecting unit 36 generates the command D4. do not.

また、装置制御部３３は、設定特定部３１において所定数以上の設定候補が抽出された場合、表示制御部３５を機能させる。表示制御部３５は、設定特定部３１において音声認識結果に対応する複数の設定候補が存在すると判定された場合に、ユーザーに対してそれら複数の設定候補を提示する制御手段である。表示制御部３５は、設定特定部３１において所定数以上の設定候補が抽出された場合に、それら所定数以上の設定候補を画像処理装置２の操作パネル１１に表示させるためのコマンドＤ４を生成し、画像処理装置２へ送信する。画像処理装置２は、このコマンドＤ４を受信すると、音声認識結果に対応して抽出された所定数以上の設定候補を表示した選択画面を生成し、その選択画面を操作パネル１１の表示部１２に表示する。このとき、画像処理装置２において生成される選択画面には、制御装置８において抽出された所定数以上の設定候補のそれぞれに対応する設定項目が含まれる。そのため、ユーザーは、画像処理装置２の操作パネル１１に表示される選択画面を確認することにより、自身が発した音声に対応する複数の設定項目を簡単且つ速やかに把握することができ、それらの複数の設定項目のうちから所望の設定項目を効率的に選択することができる。 Furthermore, when the setting specifying section 31 extracts a predetermined number or more of setting candidates, the device control section 33 causes the display control section 35 to function. The display control unit 35 is a control unit that presents a plurality of setting candidates to the user when the setting specifying unit 31 determines that there are a plurality of setting candidates corresponding to the voice recognition result. When the setting specifying unit 31 extracts a predetermined number or more of setting candidates, the display control unit 35 generates a command D4 for displaying the predetermined number or more of setting candidates on the operation panel 11 of the image processing device 2. , is transmitted to the image processing device 2. Upon receiving this command D4, the image processing device 2 generates a selection screen displaying a predetermined number or more of setting candidates extracted in accordance with the voice recognition result, and displays the selection screen on the display unit 12 of the operation panel 11. indicate. At this time, the selection screen generated by the image processing device 2 includes setting items corresponding to each of the predetermined number or more setting candidates extracted by the control device 8. Therefore, by checking the selection screen displayed on the operation panel 11 of the image processing device 2, the user can easily and quickly grasp the multiple setting items corresponding to the voice he/she uttered, and A desired setting item can be efficiently selected from among a plurality of setting items.

また、表示制御部３５は、所定数以上の設定候補が抽出された場合、それら所定数以上の設定項目を所定の優先順位に従って表示させるためのコマンドＤ４を生成する。そのため、表示制御部３５は、所定数以上の設定候補が抽出された場合、優先順位情報２７を参照する。優先順位情報２７は、複数の設定項目に対する優先順位が予め定められた情報である。例えば、優先順位は、ユーザーによる設定頻度が高い順に予め定められる。また、優先順位は、各設定候補に対応する設定項目が含まれる操作画面の階層に基づいて予め定められたものであっても構わない。この場合、例えば、トップ画面からの階層が浅い設定候補の優先順位が高くなり、トップ画面からの階層が深い設定項目の優先順位が低くなるように予め定められる。 Further, when a predetermined number or more setting candidates are extracted, the display control unit 35 generates a command D4 for displaying the predetermined number or more setting items according to a predetermined priority order. Therefore, the display control unit 35 refers to the priority information 27 when a predetermined number or more setting candidates are extracted. The priority information 27 is information in which priorities for a plurality of setting items are determined in advance. For example, the priority order is predetermined in descending order of frequency of setting by the user. Further, the priority order may be predetermined based on the hierarchy of the operation screen that includes setting items corresponding to each setting candidate. In this case, for example, it is predetermined in advance that setting candidates with a shallow hierarchy from the top screen have a high priority, and setting candidates with a deep hierarchy from the top screen have a low priority.

また、表示制御部３５は、選択画面として、所定数以上の設定候補のそれぞれに対応するサムネイル画像を配置した画面を画像処理装置２に生成させるようにしても良い。この場合、表示制御部３５は、優先順位情報２７において規定された優先順位に基づいて選択画面に表示するサムネイル画像の画像サイズを変化させるようにしても良い。例えば、優先順位の高い設定候補に対応するサムネイル画像の画像サイズを、優先順位の低い設定候補に対応するサムネイル画像の画像サイズよりも大きいサイズとすることで、ユーザーにとって優先順位の高い設定候補を選択しやすい画面とすることができる。 Further, the display control unit 35 may cause the image processing device 2 to generate, as the selection screen, a screen in which thumbnail images corresponding to each of a predetermined number or more of setting candidates are arranged. In this case, the display control unit 35 may change the image size of the thumbnail image displayed on the selection screen based on the priority order defined in the priority information 27. For example, by setting the image size of a thumbnail image corresponding to a setting candidate with a high priority to be larger than the image size of a thumbnail image corresponding to a setting candidate with a low priority, the setting candidates with a high priority can be viewed by the user. The screen can be made easy to select.

尚、案内情報出力部３２は、第２のモードが設定されている場合において、所定数以上の設定候補を１つずつ順番に音声出力するための案内情報Ｄ３を生成するとき、上記と同様に、優先順位情報２７に規定された優先順位に基づき、優先順位の高い設定候補から順に音声出力されるようにテキストデータを配列した案内情報Ｄ３を生成するようにしても良い。 Note that when the second mode is set, the guidance information output unit 32 generates the guidance information D3 for sequentially outputting a predetermined number or more of setting candidates one by one in the same way as described above. Based on the priority order specified in the priority information 27, the guide information D3 may be generated in which text data is arranged so that the setting candidates with the highest priority order are outputted as audio.

次に、制御装置８及び画像処理装置２における連携動作の流れについて説明する。まず図６は、制御装置８が音声認識結果に基づいて画像処理装置２に反映すべき設定を特定することができた場合の流れを示す図である。制御装置８は、音声認識装置７から出力されるテキストＤ２を取得すると、テキストＤ２に基づいて画像処理装置２に対する設定候補を抽出する処理を行う（プロセスＰ１０）。このとき、テキストＤ２に基づいて１つの設定候補が抽出することができると、制御装置８は、その１つの設定候補が画像処理装置２に反映すべき設定内容であると特定することができる（プロセスＰ１１）。そして制御装置８は、特定した設定を含む設定反映指示Ｄ４１を、コマンドＤ４として画像処理装置２へ送信する。 Next, the flow of cooperative operations in the control device 8 and the image processing device 2 will be explained. First, FIG. 6 is a diagram showing the flow when the control device 8 is able to specify settings to be reflected in the image processing device 2 based on the voice recognition result. When the control device 8 acquires the text D2 output from the speech recognition device 7, it performs a process of extracting setting candidates for the image processing device 2 based on the text D2 (process P10). At this time, if one setting candidate can be extracted based on the text D2, the control device 8 can specify that the one setting candidate is the setting content that should be reflected in the image processing device 2 ( Process P11). Then, the control device 8 transmits a setting reflection instruction D41 including the specified settings to the image processing device 2 as a command D4.

画像処理装置２は、制御装置８から設定反映指示Ｄ４１を受信すると、その設定反映指示Ｄ４１に基づき、設定反映処理を行う（プロセスＰ１２）。すなわち、画像処理装置２は、ユーザーが音声で指示した設定を装置内部に反映させるのである。したがって、ユーザーは、操作パネル１１に対する手動操作を行うことなく、画像処理装置２に対する各種設定を行うことができる。 When the image processing device 2 receives the setting reflection instruction D41 from the control device 8, it performs a setting reflection process based on the setting reflection instruction D41 (process P12). That is, the image processing device 2 reflects the settings given by the user's voice inside the device. Therefore, the user can perform various settings for the image processing apparatus 2 without manually operating the operation panel 11.

次に図７は、制御装置８が音声認識結果に基づいて所定数以上の設定候補を抽出した場合の流れを示す図である。制御装置８は、音声認識装置７から出力されるテキストＤ２を取得すると、テキストＤ２に基づいて画像処理装置２に対する設定候補を抽出する処理を行う（プロセスＰ１０）。このとき、テキストＤ２に基づいて所定数以上の設定候補を抽出すると、制御装置８は、画像処理装置２に反映すべき設定を特定することができない。そのため、制御装置８は、所定数以上の設定候補のうちからユーザーが所望する設定候補の選択を促すための選択画面表示指示Ｄ４２を、コマンドＤ４として画像処理装置２へ送信する。この選択画面表示指示Ｄ４２には、音声認識結果に基づいて抽出された所定数以上の設定候補を示す情報が含まれる。また、選択画面表示指示Ｄ４２には、上述したように、所定数以上の設定候補を所定の優先順位に従って表示させるための指示が含まれていても良い。 Next, FIG. 7 is a diagram showing a flow when the control device 8 extracts a predetermined number or more of setting candidates based on the voice recognition results. When the control device 8 acquires the text D2 output from the speech recognition device 7, it performs a process of extracting setting candidates for the image processing device 2 based on the text D2 (process P10). At this time, if a predetermined number or more of setting candidates are extracted based on the text D2, the control device 8 cannot specify the settings to be reflected in the image processing device 2. Therefore, the control device 8 transmits a selection screen display instruction D42 to the image processing device 2 as a command D4 to prompt the user to select a desired setting candidate from a predetermined number or more of setting candidates. This selection screen display instruction D42 includes information indicating a predetermined number or more of setting candidates extracted based on the voice recognition result. Furthermore, as described above, the selection screen display instruction D42 may include an instruction to display a predetermined number or more of setting candidates according to a predetermined priority order.

画像処理装置２は、制御装置８から選択画面表示指示Ｄ４２を受信すると、その選択画面表示指示Ｄ４２に基づき、所定数以上の設定候補を配置した選択画面を生成し、操作パネル１１の表示部１２に選択画面を表示する（プロセスＰ１３）。これにより、ユーザーは、自身で発した音声に対応する設定候補が所定数以上存在する場合であっても、操作パネル１１の表示部１２に表示される選択画面を見ることにより、所定数以上の設定候補のうちから所望の設定候補を速やかに選択することができる。そのため、仮に音声入出力装置３が所定数以上の設定候補を順番に音声出力している状態であっても、ユーザーは、音声入出力装置３による音声出力の途中で所望の設定候補を選択することが可能であり、画像処理装置２に対する設定に要する時間を短縮することができる。また、ユーザーは、操作パネル１１に表示される選択画面を確認した後、所望の設定候補を手動操作によって選択することができるし、また音声操作によって選択することもできる。 When the image processing device 2 receives the selection screen display instruction D42 from the control device 8, it generates a selection screen on which a predetermined number or more of setting candidates are arranged based on the selection screen display instruction D42, and displays the selection screen on the display section 12 of the operation panel 11. A selection screen is displayed on (process P13). As a result, even if there are a predetermined number or more of setting candidates corresponding to the voice that the user has uttered, the user can select the predetermined number or more by looking at the selection screen displayed on the display section 12 of the operation panel 11. A desired setting candidate can be quickly selected from among the setting candidates. Therefore, even if the audio input/output device 3 is sequentially outputting a predetermined number or more of setting candidates, the user can select a desired setting candidate while the audio input/output device 3 is outputting audio. Therefore, the time required for setting the image processing device 2 can be shortened. Further, after checking the selection screen displayed on the operation panel 11, the user can select a desired setting candidate by manual operation or by voice operation.

次に、制御装置８において行われる処理手順について詳しく説明する。図８乃至図１０は、制御装置８において行われる処理手順の一例を示すフローチャートである。この処理は、制御装置８のＣＰＵ４３がプログラム２５を実行することによって行われる処理であり、ＣＰＵ４３によって繰り返し実行される処理である。 Next, the processing procedure performed in the control device 8 will be explained in detail. 8 to 10 are flowcharts showing an example of a processing procedure performed in the control device 8. This process is a process performed by the CPU 43 of the control device 8 executing the program 25, and is a process repeatedly executed by the CPU 43.

制御装置８は、この処理を開始すると、図８に示すように、音声認識装置７による音声認識結果であるテキストＤ２を受信したか否かを判断する（ステップＳ１０）。音声認識結果を受信していない場合（ステップＳ１０でＮＯ）、制御装置８による処理は終了する。これに対し、音声認識結果を受信した場合（ステップＳ１０でＹＥＳ）、制御装置８は、テキストＤ２に付加されている装置情報に基づき、画像処理装置２の装置モデルを特定し（ステップＳ１１）、その装置モデルに対応するキーワード情報２６を読み出す（ステップＳ１２）。そして制御装置８は、設定候補抽出処理を実行する（ステップＳ１３）。 When the control device 8 starts this process, as shown in FIG. 8, the control device 8 determines whether or not the text D2, which is the result of speech recognition by the speech recognition device 7, has been received (step S10). If the voice recognition result has not been received (NO in step S10), the process by the control device 8 ends. On the other hand, when the voice recognition result is received (YES in step S10), the control device 8 identifies the device model of the image processing device 2 based on the device information added to the text D2 (step S11), Keyword information 26 corresponding to the device model is read out (step S12). The control device 8 then executes setting candidate extraction processing (step S13).

図９は、設定候補抽出処理（ステップＳ１３）の詳細な処理手順の一例を示すフローチャートである。制御装置８は、設定候補抽出処理を開始すると、ステップＳ１２で読み出したキーワード情報２６を参照し、テキストＤ２に対応する設定候補を全て抽出する（ステップＳ２０）。このとき、１つの設定候補だけが抽出されることもあれば、複数の設定候補が抽出されることもある。また、設定候補が１つも抽出されないこともある。設定候補が１つも抽出されなかった場合、制御装置８による処理はその時点で終了する。 FIG. 9 is a flowchart illustrating an example of a detailed processing procedure of the setting candidate extraction process (step S13). When the control device 8 starts the setting candidate extraction process, it refers to the keyword information 26 read out in step S12 and extracts all setting candidates corresponding to the text D2 (step S20). At this time, only one setting candidate may be extracted, or a plurality of setting candidates may be extracted. Furthermore, there may be cases where no setting candidates are extracted. If no setting candidates are extracted, the processing by the control device 8 ends at that point.

ステップＳ２０において少なくとも１つの設定候補が抽出された場合、制御装置８は、ユーザーによって既に指定された現在の設定状態を確認し（ステップＳ２１）、禁則判定を行う（ステップＳ２２）。すなわち、制御装置８は、ステップＳ２０で抽出した少なくとも１つの設定候補の中に、現在の設定状態に対する禁則条件を満たす設定候補が存在するか否かを判定する。その結果、禁則条件に合致する設定候補が存在する場合（ステップＳ２３でＹＥＳ）、制御装置８は、禁則条件に合致する設定候補を除外する（ステップＳ２４）。尚、禁則条件に合致する設定候補を除外した結果、設定候補が１つの残らないこととなった場合、制御装置８による処理はその時点で終了する。一方、禁則条件に合致する設定候補が存在しない場合（ステップＳ２３でＮＯ）、制御装置８は、ステップＳ２０で抽出した設定候補を、有効な設定候補として認定する。以上で、設定候補抽出処理が終了する。 If at least one setting candidate is extracted in step S20, the control device 8 checks the current setting state already designated by the user (step S21), and makes a prohibition determination (step S22). That is, the control device 8 determines whether there is a setting candidate that satisfies the prohibition condition for the current setting state among the at least one setting candidate extracted in step S20. As a result, if there is a setting candidate that matches the prohibition condition (YES in step S23), the control device 8 excludes the setting candidate that matches the prohibition condition (step S24). Note that if, as a result of excluding setting candidates that match the prohibition conditions, there is no remaining setting candidate, the processing by the control device 8 ends at that point. On the other hand, if there is no setting candidate that matches the prohibition condition (NO in step S23), the control device 8 certifies the setting candidate extracted in step S20 as a valid setting candidate. This completes the setting candidate extraction process.

図８のフローチャートに戻り、次に制御装置８は、設定候補抽出処理（ステップＳ１３）において１つの設定候補だけが抽出されたか否かを判断する（ステップＳ１４）。１つの設定候補だけが抽出された場合（ステップＳ１４でＹＥＳ）、制御装置８は、その１つの設定候補を、画像処理装置２に反映すべき設定として特定することができる。この場合、制御装置８は、画像処理装置２に対して設定反映指示Ｄ４１を送信する（ステップＳ１５）。つまり、制御装置８は、音声認識結果に基づいて特定した設定を画像処理装置２に反映させるのである。そして制御装置８は、画像処理装置２に反映させた設定をユーザーに音声案内するための案内情報Ｄ３をテキストデータとして生成し（ステップＳ１６）、その案内情報Ｄ３を音声認識装置７へ送信する（ステップＳ１７）。これにより、音声認識装置７は、テキストデータで表現された案内情報Ｄ３を音声情報に変換し、音声情報に変換した案内情報Ｄ３を音声入出力装置３へ送信する。そして音声入出力装置３は、音声情報に変換された案内情報Ｄ３に基づく音声出力を行うので、画像処理装置２に反映された設定をユーザーに報知することができる。ユーザーは、音声入出力装置３から出力される音声を聞くことにより、自身で発した音声が正しく認識され、所望の設定がなされたか否かを確認することができる。 Returning to the flowchart of FIG. 8, next, the control device 8 determines whether only one setting candidate has been extracted in the setting candidate extraction process (step S13) (step S14). If only one setting candidate is extracted (YES in step S14), the control device 8 can specify the one setting candidate as the setting to be reflected in the image processing device 2. In this case, the control device 8 transmits a setting reflection instruction D41 to the image processing device 2 (step S15). In other words, the control device 8 causes the image processing device 2 to reflect the settings specified based on the voice recognition result. Then, the control device 8 generates guidance information D3 as text data to provide voice guidance to the user about the settings reflected in the image processing device 2 (step S16), and transmits the guidance information D3 to the voice recognition device 7 (step S16). Step S17). Thereby, the voice recognition device 7 converts the guidance information D3 expressed in text data into voice information, and transmits the guidance information D3 converted into voice information to the voice input/output device 3. Since the audio input/output device 3 outputs audio based on the guidance information D3 converted to audio information, it is possible to notify the user of the settings reflected in the image processing device 2. By listening to the voice output from the voice input/output device 3, the user can confirm whether the voice he/she uttered is correctly recognized and the desired settings have been made.

一方、設定候補抽出処理（ステップＳ１３）において複数の設定候補が抽出された場合（ステップＳ１４でＮＯ）、制御装置８は、設定候補提示処理を行う（ステップＳ１８）。図１０は、その設定候補提示処理（ステップＳ１８）の詳細な処理手順の一例を示すフローチャートである。制御装置８は、設定候補提示処理を開始すると、まず音声認識結果に基づいて抽出された設定候補の数を確認する（ステップＳ３０）。そして制御装置８は、所定数以上の設定候補が抽出されたか否かを判断する（ステップＳ３１）。 On the other hand, if a plurality of setting candidates are extracted in the setting candidate extraction process (step S13) (NO in step S14), the control device 8 performs a setting candidate presentation process (step S18). FIG. 10 is a flowchart showing an example of a detailed processing procedure of the setting candidate presentation process (step S18). When the control device 8 starts the setting candidate presentation process, it first checks the number of setting candidates extracted based on the voice recognition result (step S30). Then, the control device 8 determines whether or not a predetermined number or more of setting candidates have been extracted (step S31).

所定数以上の設定候補が抽出された場合（ステップＳ３１でＹＥＳ）、制御装置８は、優先順位情報２７を読み出し（ステップＳ３２）、優先順位情報２７に予め定められている優先順位に基づいて所定数以上の設定候補の優先順位を決定する（ステップＳ３３）。そして制御装置８は、決定した優先順位に基づいて所定数以上の設定候補を表示させるための選択画面表示指示Ｄ４２を生成し、その選択画面表示指示Ｄ４２を画像処理装置２へ送信する（ステップＳ３４）。また、制御装置８は、設定モードに応じてユーザーに対する音声案内を行うための案内情報Ｄ３をテキストデータとして生成し（ステップＳ３５）、その案内情報Ｄ３を音声認識装置７へ送信する（ステップＳ３６）。尚、制御装置８における設定モードが第１のモードであれば、ステップＳ３５、Ｓ３６の処理は行われず、音声入出力装置３による音声出力は行われない。 If a predetermined number or more of setting candidates are extracted (YES in step S31), the control device 8 reads the priority information 27 (step S32), and selects a predetermined setting based on the priority order predetermined in the priority information 27. The priority order of the setting candidates of the number or more is determined (step S33). Then, the control device 8 generates a selection screen display instruction D42 for displaying a predetermined number or more of setting candidates based on the determined priority order, and transmits the selection screen display instruction D42 to the image processing device 2 (step S34 ). The control device 8 also generates guidance information D3 as text data for providing voice guidance to the user according to the setting mode (step S35), and transmits the guidance information D3 to the voice recognition device 7 (step S36). . Note that if the setting mode in the control device 8 is the first mode, the processing in steps S35 and S36 is not performed, and the audio input/output device 3 does not output audio.

一方、所定数以上の設定候補が抽出されていない場合（ステップＳ３１でＮＯ）、制御装置８は、抽出された複数の設定候補を順番に音声出力するための案内情報Ｄ３をテキストデータとして生成し（ステップＳ３７）、その案内情報Ｄ３を音声認識装置７へ送信する（ステップＳ３８）。つまり、本実施形態における制御装置８は、音声認識結果に基づいて抽出された設定候補の数が所定数未満であれば、複数の設定候補を順番に音声出力したとしても、音声出力が終了するまでにそれ程長い時間を要しないため、画像処理装置２の操作パネル１１には選択画面を表示させないようにしている。そのため、ユーザーが画像処理装置２から離れた場所で音声操作を行っている場合には、操作パネル１１に対して無駄な表示を行わなくて済み、操作パネル１１が省電力モードであれば、その省電力モードを継続することができる。 On the other hand, if the predetermined number or more of setting candidates have not been extracted (NO in step S31), the control device 8 generates guidance information D3 as text data for sequentially outputting the plurality of extracted setting candidates by voice. (Step S37), and transmits the guidance information D3 to the voice recognition device 7 (Step S38). In other words, if the number of setting candidates extracted based on the voice recognition result is less than the predetermined number, the control device 8 in the present embodiment ends the voice output even if the plurality of setting candidates are outputted in sequence. Since it does not take that long to complete, the selection screen is not displayed on the operation panel 11 of the image processing device 2. Therefore, if the user is performing voice operations at a location away from the image processing device 2, there is no need to display unnecessary information on the operation panel 11, and if the operation panel 11 is in power saving mode, Power saving mode can be continued.

尚、上記においては、制御装置８が、画像処理装置２に対するコマンドＤ４として、設定反映指示Ｄ４１と選択画面表示指示Ｄ４２を送信する場合を例示したが、これらは単なる一例である。すなわち、制御装置８は、設定反映指示Ｄ４１及び選択画面表示指示Ｄ４２以外にも、様々なコマンドＤ４を生成することができる。例えば、制御装置８は、ユーザーによる指示がジョブの実行指示である場合には、ジョブ実行指示をコマンドＤ４として生成する。 In addition, although the case where the control device 8 transmits the setting reflection instruction D41 and the selection screen display instruction D42 as the command D4 to the image processing device 2 has been described above, these are merely examples. That is, the control device 8 can generate various commands D4 in addition to the setting reflection instruction D41 and the selection screen display instruction D42. For example, if the user's instruction is a job execution instruction, the control device 8 generates the job execution instruction as the command D4.

次に画像処理装置２における処理手順について説明する。図１１は、画像処理装置２において行われる処理手順の一例を示すフローチャートである。尚、図１１では、画像処理装置２が制御装置８からコマンドを受信した場合の処理手順のみを示している。画像処理装置２は、この処理を開始すると、制御装置８からコマンドＤ４を受信したか否かを判断する（ステップＳ４０）。制御装置８からコマンドＤ４を受信していない場合（ステップＳ４０でＮＯ）、画像処理装置２による処理は終了する。一方、制御装置８からコマンドＤ４を受信している場合（ステップＳ４０でＹＥＳ）、画像処理装置２は、受信したコマンドＤ４が設定反映指示Ｄ４１であるか否かを判断する（ステップＳ４１）。コマンドＤ４が設定反映指示Ｄ４１である場合（ステップＳ４１でＹＥＳ）、画像処理装置２は、ジョブ設定部２２を機能させ、設定反映指示Ｄ４１に基づく設定を反映させたジョブ設定を行う（ステップＳ４２）。 Next, the processing procedure in the image processing device 2 will be explained. FIG. 11 is a flowchart showing an example of a processing procedure performed in the image processing device 2. As shown in FIG. Note that FIG. 11 only shows the processing procedure when the image processing device 2 receives a command from the control device 8. When the image processing device 2 starts this process, it determines whether or not it has received the command D4 from the control device 8 (step S40). If the command D4 has not been received from the control device 8 (NO in step S40), the processing by the image processing device 2 ends. On the other hand, if the command D4 has been received from the control device 8 (YES in step S40), the image processing device 2 determines whether the received command D4 is a setting reflection instruction D41 (step S41). If the command D4 is the settings reflection instruction D41 (YES in step S41), the image processing device 2 causes the job setting unit 22 to function and performs job settings that reflect the settings based on the settings reflection instruction D41 (step S42). .

また、コマンドＤ４が設定反映指示Ｄ４１でなかった場合（ステップＳ４１でＮＯ）、画像処理装置２は、コマンドＤ４が選択画面表示指示Ｄ４２であるか否かを判断する（ステップＳ４３）。コマンドＤ４が選択画面表示指示Ｄ４２である場合（ステップＳ４３でＹＥＳ）、画像処理装置２は、パネル制御部２０を機能させ、所定数以上の設定候補を表示するための選択画面を生成する（ステップＳ４４）。このとき、パネル制御部２０は、選択画面表示指示Ｄ４２において所定数以上の設定候補のそれぞれに対して優先順位が設定されていれば、その優先順位に基づいて所定数以上の設定候補をレイアウトした選択画面を生成する。そしてパネル制御部２０は、選択画面表示指示Ｄ４２に基づいて生成した選択画面を操作パネル１１の表示部１２に表示する（ステップＳ４５）。例えば、制御装置８からコマンドＤ４を受信した時点において操作パネル１１が省電力モードである場合、パネル制御部２０は、表示部１２への給電を開始して選択画面を表示する。この場合、パネル制御部２０は、選択画面の表示開始から所定時間が経過するまでの間、表示部１２を点滅表示させることでユーザーが操作パネル１１を注目できるように注意喚起を行うようにしても良い。尚、このような注意喚起は、例えばスピーカーから所定のビープ音などを発することで実現しても良い。 Further, if the command D4 is not a setting reflection instruction D41 (NO in step S41), the image processing device 2 determines whether the command D4 is a selection screen display instruction D42 (step S43). If the command D4 is a selection screen display instruction D42 (YES in step S43), the image processing device 2 causes the panel control unit 20 to function and generates a selection screen for displaying a predetermined number or more of setting candidates (step S44). At this time, if a priority order is set for each of the predetermined number or more setting candidates in the selection screen display instruction D42, the panel control unit 20 lays out the predetermined number or more setting candidates based on the priority order. Generate a selection screen. The panel control unit 20 then displays the selection screen generated based on the selection screen display instruction D42 on the display unit 12 of the operation panel 11 (step S45). For example, if the operation panel 11 is in the power saving mode at the time of receiving the command D4 from the control device 8, the panel control section 20 starts supplying power to the display section 12 and displays the selection screen. In this case, the panel control unit 20 causes the display unit 12 to blink until a predetermined time has elapsed from the start of displaying the selection screen to alert the user to the operation panel 11. Also good. Note that such an alert may be realized, for example, by emitting a predetermined beep sound from a speaker.

図１２は、操作パネル１１の表示部１２に表示される選択画面Ｇ１０の一例を示す図である。制御装置８においてユーザーの音声認識結果に基づく設定候補として所定数以上の設定候補が抽出された場合、画像処理装置２は、操作パネル１１の表示部１２に対して図１２に示すような選択画面Ｇ１０を表示する。図１２では、ユーザーが「リョウメン」という音声を発した場合の選択画面Ｇ１０の一例を示している。この選択画面Ｇ１０には、所定数以上の設定候補を表示するための設定候補表示領域Ｒ１が含まれており、その設定候補表示領域Ｒ１に、所定数以上の設定候補のそれぞれに対応するサムネイル画像６１，６２，６３，６４が表示される。これらのサムネイル画像６１，６２，６３，６４の画像サイズは、制御装置８において決定された優先順位に基づくサイズとなっている。ここで、優先順位がユーザーによる設定頻度の高いものから順に設定されているとすると、ユーザーは、画像サイズの最も大きいサムネイル画像６２が最も設定頻度の高い設定項目であることを把握することができる。つまり、選択画面Ｇ１０は、ユーザーにとって設定頻度の高い設定項目を選択しやすい画面となっているのである。したがって、ユーザーは、音声入出力装置３から出力される音声案内を全て聞かなくても、設定頻度の高い設定項目を効率的に選択することが可能である。 FIG. 12 is a diagram showing an example of the selection screen G10 displayed on the display section 12 of the operation panel 11. When the control device 8 extracts a predetermined number or more of setting candidates based on the user's voice recognition results, the image processing device 2 displays a selection screen as shown in FIG. 12 on the display unit 12 of the operation panel 11. Display G10. FIG. 12 shows an example of the selection screen G10 when the user utters the voice "Ryoumen." This selection screen G10 includes a setting candidate display area R1 for displaying a predetermined number or more of setting candidates, and a thumbnail image corresponding to each of the predetermined number or more setting candidates is displayed in the setting candidate display area R1. 61, 62, 63, and 64 are displayed. The image sizes of these thumbnail images 61, 62, 63, and 64 are based on the priority order determined by the control device 8. Here, if the priority is set in descending order of setting frequency by the user, the user can understand that the thumbnail image 62 with the largest image size is the setting item most frequently set. . In other words, the selection screen G10 is a screen that allows the user to easily select setting items that are set frequently. Therefore, the user can efficiently select setting items that are set frequently without having to listen to all the audio guidance output from the audio input/output device 3.

図１２に示す選択画面Ｇ１が表示されているとき、ユーザーは、所望の設定候補を選択する操作として操作パネル１１に対する手動操作を行うことができる。例えば、ユーザーは、複数のサムネイル画像６１，６２，６３，６４のうちから所望の設定候補に対応するサムネイル画像をタッチし、選択画面Ｇ１０内の操作ボタンＢ１を操作することにより、一の設定候補を選択することができる。尚、ユーザーは、手動操作だけでなく、音声を発することによって所望の設定候補を選択することもできる。 When the selection screen G1 shown in FIG. 12 is displayed, the user can manually operate the operation panel 11 to select a desired setting candidate. For example, the user touches a thumbnail image corresponding to a desired setting candidate from among the plurality of thumbnail images 61, 62, 63, and 64, and operates the operation button B1 on the selection screen G10 to select one setting candidate. can be selected. Note that the user can select a desired setting candidate not only by manual operation but also by uttering voice.

図１３は、図１２とは異なる選択画面Ｇ１１の例を示す図である。制御装置８においてユーザーの音声認識結果に基づく設定候補として所定数以上の設定候補が抽出された場合、画像処理装置２は、操作パネル１１の表示部１２に対して図１３に示すような選択画面Ｇ１１を表示しても良い。図１３でも、ユーザーが「リョウメン」という音声を発した場合の選択画面Ｇ１１の例を示している。この選択画面Ｇ１１においても、所定数以上の設定候補を表示するための設定候補表示領域Ｒ１が含まれており、その設定候補表示領域Ｒ１に、所定数以上の設定候補のそれぞれがリスト形式で表示される。所定数以上の設定候補のリストは、例えば優先順位の高いものからリストの上位に表示される。したがって、ユーザーは、音声入出力装置３から出力される音声案内を全て聞かなくても、優先順位の高い設定項目を効率的に選択することが可能である。尚、図１３では、複数の設定候補のうちから、ユーザーがコピー機能のプリンタ部１８に対する両面印刷をオンに設定することを選択した場合を例示している。 FIG. 13 is a diagram showing an example of a selection screen G11 different from that in FIG. 12. When the control device 8 extracts a predetermined number or more of setting candidates based on the user's voice recognition results, the image processing device 2 displays a selection screen as shown in FIG. 13 on the display unit 12 of the operation panel 11. G11 may also be displayed. FIG. 13 also shows an example of the selection screen G11 when the user utters the voice "Ryoumen." This selection screen G11 also includes a setting candidate display area R1 for displaying a predetermined number or more of setting candidates, and each of the predetermined number or more setting candidates is displayed in list format in the setting candidate display area R1. be done. A list of setting candidates of a predetermined number or more is displayed, for example, in ascending order of priority. Therefore, the user can efficiently select a setting item with a high priority without listening to all the audio guidance output from the audio input/output device 3. Note that FIG. 13 illustrates a case where the user selects to turn on double-sided printing for the printer section 18 of the copy function from among a plurality of setting candidates.

図１１のフローチャートに戻り、画像処理装置２は、受信したコマンドＤ４が選択画面表示指示Ｄ４２でなかった場合（ステップＳ４３でＮＯ）、ジョブの実行指示であるか否かを判断する（ステップＳ４６）。コマンドＤ４がジョブの実行指示である場合（ステップＳ４６でＹＥＳ）、画像処理装置２は、ジョブ制御部２１を機能させ、ユーザーによって指定されたジョブの実行を開始する（ステップＳ４７）。 Returning to the flowchart of FIG. 11, if the received command D4 is not a selection screen display instruction D42 (NO in step S43), the image processing device 2 determines whether it is a job execution instruction (step S46). . If the command D4 is a job execution instruction (YES in step S46), the image processing device 2 causes the job control unit 21 to function and starts executing the job specified by the user (step S47).

一方、受信したコマンドＤ４がジョブの実行指示でもなかった場合（ステップＳ４６でＮＯ）、画像処理装置２は、受信したコマンドＤ４に基づき、上述した処理以外のその他の処理を実行する（ステップＳ４８）。以上で、画像処理装置２による処理が終了する。 On the other hand, if the received command D4 is not a job execution instruction (NO in step S46), the image processing device 2 executes other processes other than the above-mentioned processes based on the received command D4 (step S48). . With this, the processing by the image processing device 2 is completed.

以上のように、本実施形態の制御システム１は、クラウド５上に設置されている制御装置８が、ユーザーの発した音声に基づいて画像処理装置２を遠隔制御するように構成されている。そして、本実施形態の制御装置８は、ユーザーの発した音声に対応する設定候補として、所定数以上の設定候補を抽出すると、画像処理装置２の操作パネル１１に、それら所定数以上の設定候補を表示させるようにしている。そのため、本実施形態の制御装置８は、所定数以上の設定候補を抽出した場合に、それらの所定数以上の設定候補を順番に音声でユーザーに聞かせなくても、所定数以上の設定候補を視認させることで、ユーザーに効率的に所望の設定候補を選択させることができる。それ故、ユーザーが所望の設定候補を選択するまでの時間を短縮することが可能であり、画像処理装置２が一人のユーザーによって占有されてしまう時間を短くすることができる。 As described above, the control system 1 of this embodiment is configured such that the control device 8 installed on the cloud 5 remotely controls the image processing device 2 based on the voice uttered by the user. Then, when the control device 8 of the present embodiment extracts a predetermined number or more of setting candidates as setting candidates corresponding to the voice uttered by the user, the control device 8 displays the predetermined number or more of the setting candidates on the operation panel 11 of the image processing device 2. I am trying to display it. Therefore, when a predetermined number or more setting candidates are extracted, the control device 8 of the present embodiment extracts the predetermined number or more setting candidates without having to listen to the user hear the predetermined number or more setting candidates sequentially. By visually confirming the settings, the user can efficiently select a desired setting candidate. Therefore, it is possible to shorten the time it takes for the user to select a desired setting candidate, and it is possible to shorten the time that the image processing device 2 is occupied by one user.

以上、本発明に関する好ましい実施形態について説明したが、本発明は、上記実施形態において説明した内容のものに限られるものではなく、種々の変形例が適用可能である。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to the contents described in the above embodiments, and various modifications can be applied.

例えば、上記各実施形態では、音声認識装置７と制御装置８とが別体である場合を例示した。しかし、本発明は、音声認識装置７と制御装置８とが別体であるものに限られるものではない。例えば、制御装置８は、音声認識装置７と一体的に構成されるものであっても構わない。 For example, in each of the above embodiments, the voice recognition device 7 and the control device 8 are separate bodies. However, the present invention is not limited to the voice recognition device 7 and the control device 8 being separate units. For example, the control device 8 may be configured integrally with the voice recognition device 7.

また、上記実施形態では、制御装置８がクラウド５上にサーバーとして設置される場合を説明したが、これに限られるものではない。例えば、制御装置８は、ローカル環境に設置されるサーバーであっても構わない。また、制御装置８は、画像処理装置２の内部に設けられるものであっても構わない。画像処理装置２に制御装置８が設けられる場合、その制御装置８は、上述のように音声認識装置７の機能を更に備えたものであっても構わない。 Further, in the above embodiment, a case has been described in which the control device 8 is installed as a server on the cloud 5, but the present invention is not limited to this. For example, the control device 8 may be a server installed in a local environment. Further, the control device 8 may be provided inside the image processing device 2. When the image processing device 2 is provided with the control device 8, the control device 8 may further include the functions of the voice recognition device 7 as described above.

また、上記実施形態では、音声入出力装置３と画像処理装置２とが別体である場合を例示した。しかし、本発明は、それに限られるものでもない。すなわち、画像処理装置２は、上述した音声入出力装置３の機能を備えているものであっても構わない。 Further, in the above embodiment, the case where the audio input/output device 3 and the image processing device 2 are separate bodies is illustrated. However, the present invention is not limited thereto either. That is, the image processing device 2 may have the functions of the audio input/output device 3 described above.

また、上記実施形態では、音声入出力装置３は、音声の入出力を行う装置である場合を例示した。しかし、音声入出力装置３は、音声の入力のみを行う装置であっても構わない。この場合、上述した音声入出力装置３は、音声入力装置としてのみ機能する。 Further, in the above embodiment, the audio input/output device 3 is a device that inputs and outputs audio. However, the audio input/output device 3 may be a device that only inputs audio. In this case, the audio input/output device 3 described above functions only as an audio input device.

また、上記実施形態では、制御装置８の記憶部４１に予めプログラム２５がインストールされている場合を例示した。しかし、プログラム２５は、例えば通信インタフェース４２などを介して制御装置８にインストールされるものであっても構わない。この場合、プログラム２５は、インターネットなどを介してダウンロード可能な態様で提供される。また、これに限らず、プログラム２５は、ＣＤ－ＲＯＭやＵＳＢメモリなどのコンピュータ読み取り可能な記録媒体に記録された態様で提供されるものであっても構わない。 Further, in the embodiment described above, the case where the program 25 is installed in advance in the storage unit 41 of the control device 8 is illustrated. However, the program 25 may be installed in the control device 8 via the communication interface 42 or the like, for example. In this case, the program 25 is provided in a downloadable form via the Internet or the like. Furthermore, the program 25 is not limited to this, and the program 25 may be provided in a form recorded on a computer-readable recording medium such as a CD-ROM or a USB memory.

１制御システム
２画像処理装置
３音声入出力装置
７音声認識装置
８制御装置
２５プログラム
３１設定特定部（設定特定手段）
３２案内情報出力部（案内情報出力手段）
３３装置制御部
３４設定候補抽出部
３５表示制御部（制御手段）
３６設定反映部 1 Control System 2 Image Processing Device 3 Audio Input/Output Device 7 Voice Recognition Device 8 Control Device 25 Program 31 Setting Specification Unit (Setting Specification Means)
32 Guidance information output unit (guidance information output means)
33 Device control unit 34 Setting candidate extraction unit 35 Display control unit (control means)
36 Setting reflection section

Claims

A control device that controls an image processing device,
Setting specifying means for specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The guidance information output means outputs the guidance information for audio guidance of the setting candidates that are less than the predetermined number, when the setting identification means determines that the number of the setting candidates is less than a predetermined number. Characteristic control device.

A control device that controls an image processing device,
Setting specifying means for specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The guide information output means is characterized in that, when the setting specifying means determines that a predetermined number or more of setting candidates exist, the guide information output means outputs the guide information for audio guidance of the predetermined number or more of setting candidates. Control device.

A control device that controls an image processing device,
Setting specifying means for specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The guidance information output means is configured to output the guidance for audio guidance that prompts the user to check a display means provided in the image processing device when the setting identification means determines that a predetermined number or more of setting candidates exist. A control device characterized by outputting information .

A control device that controls an image processing device,
Setting specifying means for specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The control device is characterized in that the guidance information output means does not output the guidance information when the setting identification means determines that a predetermined number or more of setting candidates exist .

5. The control means presents the plurality of setting candidates when the setting specifying means determines that there are a predetermined number or more of setting candidates corresponding to the voice recognition result . The control device according to any one of .

The image processing device has a display means,
6. The control device according to claim 1, wherein the control means presents the plurality of setting candidates by displaying them on the display means.

7. The control device according to claim 1, wherein the control means presents the plurality of setting candidates according to a predetermined priority order.

8. The control device according to claim 7 , wherein the priority order is determined in descending order of frequency of setting by the user.

8. The control device according to claim 7 , wherein the priority order is determined in advance based on a hierarchy of an operation screen that includes a setting item corresponding to the setting candidate.

7. The control device according to claim 6 , wherein the control means causes the display means to display thumbnail images corresponding to each of the plurality of setting candidates.

11. The control device according to claim 10 , wherein the control means changes the image size of the thumbnail image corresponding to each of the plurality of setting candidates according to a predetermined priority order.

11. The control means does not display the plurality of setting candidates on the display means when the setting specifying means determines that the number of setting candidates is less than a predetermined number. 12. The control device according to 11 .

When the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result, the setting specifying means excludes a setting candidate that satisfies a prohibition condition for the current setting state from the plurality of setting candidates. The control device according to any one of claims 1 to 12 .

14. The control device according to claim 1, wherein the control device is a server capable of communicating with the image processing device.

14. The control device according to claim 1, wherein the control device is provided in the image processing device.

an image processing device;
a voice input device for inputting voice for voice operating the image processing device;
a control device that controls the image processing device based on audio input to the audio input device;
A control system comprising:
The control device includes:
Setting specifying means for specifying settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The guide information output means is characterized in that, when the setting specifying means determines that the number of setting candidates is less than a predetermined number, the guide information output means outputs guide information for providing voice guidance for the setting candidates that are less than the predetermined number. control system.

an image processing device;
a voice input device for inputting voice for voice operating the image processing device;
a control device that controls the image processing device based on audio input to the audio input device;
A control system comprising:
The control device includes:
Setting specifying means for specifying settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
Control characterized in that the guidance information output means outputs guidance information for audio guidance of the predetermined number or more setting candidates when the setting specifying means determines that there are more than a predetermined number of setting candidates. system.

an image processing device;
a voice input device for inputting voice for voice operating the image processing device;
a control device that controls the image processing device based on audio input to the audio input device;
A control system comprising:
The control device includes:
Setting specifying means for specifying settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
a control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The guidance information output means is configured to output guidance information for audio guidance that prompts the user to check a display means provided in the image processing device when the setting identification means determines that a predetermined number or more of setting candidates exist. A control system characterized by outputting .

an image processing device;
a voice input device for inputting voice for voice operating the image processing device;
a control device that controls the image processing device based on audio input to the audio input device;
A control system comprising:
The control device includes:
Setting specifying means for specifying settings to be reflected in the image processing device based on a voice recognition result of the voice input to the voice input device;
Guidance information output means for generating guidance information for voice guidance representing the content of the settings specified by the setting specifying means, and outputting voice guidance based on the guidance information from a predetermined voice output means;
control means for presenting the plurality of setting candidates when the setting specifying means determines that there are a plurality of setting candidates corresponding to the voice recognition result;
Equipped with
The control system is characterized in that the guide information output means does not output the guide information when the setting specifying means determines that a predetermined number or more of setting candidates exist .

A control program that is executed by a processor to control an image processing device, the program comprising:
a setting specifying step of specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
a guidance information output step of generating guidance information for audio guidance representing the content of the settings specified in the settings specifying step, and outputting audio guidance based on the guidance information from a predetermined audio output means;
a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result;
run the
The guidance information output step outputs the guidance information for audio guidance of the setting candidates that are less than the predetermined number, when it is determined in the setting specifying step that the number of setting candidates is less than a predetermined number. Characteristic control program.

A control program that is executed by a processor to control an image processing device, the program comprising:
a setting specifying step of specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
a guidance information output step of generating guidance information for audio guidance representing the content of the settings specified in the settings specifying step, and outputting audio guidance based on the guidance information from a predetermined audio output means;
a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result;
run the
The control program is characterized in that the guidance information output step outputs the guidance information for audio guidance of the plurality of setting candidates when it is determined in the setting specifying step that there are a predetermined number or more of setting candidates. .

A control program that is executed by a processor to control an image processing device, the program comprising:
a setting specifying step of specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
a guidance information output step of generating guidance information for audio guidance representing the content of the settings specified in the settings specifying step, and outputting audio guidance based on the guidance information from a predetermined audio output means;
a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result;
run the
The guidance information output step includes, when it is determined in the setting specifying step that there are a predetermined number or more of setting candidates, the guidance for audio guidance prompting the user to check a display means provided in the image processing device. A control program characterized by outputting information .

A control program that is executed by a processor to control an image processing device, the program comprising:
a setting specifying step of specifying settings to be reflected in the image processing device based on the voice recognition result of the voice uttered by the user;
a guidance information output step of generating guidance information for audio guidance representing the content of the settings specified in the settings specifying step, and outputting audio guidance based on the guidance information from a predetermined audio output means;
a control step of presenting the plurality of setting candidates when it is determined in the setting specifying step that there are a plurality of setting candidates corresponding to the voice recognition result;
run the
The control program is characterized in that the guidance information output step does not output the guidance information if it is determined in the setting identification step that a predetermined number or more of setting candidates exist .

20. The control step presents the predetermined number or more setting candidates when it is determined in the setting specifying step that there are a predetermined number or more setting candidates corresponding to the voice recognition result. 24. The control program according to any one of 23 to 23 .

The image processing device has a display means,
25. The control program according to claim 20, wherein the control step presents the plurality of setting candidates by displaying them on the display means.

26. The control program according to claim 20, wherein the control step presents the plurality of setting candidates according to a predetermined priority order.

27. The control program according to claim 26 , wherein the priority order is determined in descending order of frequency of setting by the user.

27. The control program according to claim 26 , wherein the priority order is determined in advance based on a hierarchy of an operation screen that includes a setting item corresponding to the setting candidate.

26. The control program according to claim 25 , wherein the control step causes the display unit to display thumbnail images corresponding to each of the plurality of setting candidates.

30. The control program according to claim 29 , wherein the control step changes the image size of the thumbnail image corresponding to each of the plurality of setting candidates according to a predetermined priority order.

29. The control step is characterized in that, when it is determined in the setting specifying step that the number of setting candidates is less than a predetermined number, the plurality of setting candidates are not displayed on the display means. The control program according to No. 30 .

The setting specifying step is characterized in that, when it is determined that there are a plurality of setting candidates corresponding to the voice recognition result, a setting candidate that satisfies a prohibition condition for the current setting state is excluded from the plurality of setting candidates. The control program according to any one of claims 20 to 31 .