JP2014203024A

JP2014203024A - Control device, image forming apparatus, terminal device, control method, and control program

Info

Publication number: JP2014203024A
Application number: JP2013081052A
Authority: JP
Inventors: 小澤　開拓; Kaitaku Ozawa; 開拓小澤; 健一 ▲高▼橋; Kenichi Takahashi; 三縞　信広; Nobuhiro Mishima; 信広三縞; 米田　修司; Shuji Yoneda; 修司米田; 山田　匡実; Masami Yamada; 匡実山田; 佑樹浅井; Yuki Asai
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2013-04-09
Filing date: 2013-04-09
Publication date: 2014-10-27
Anticipated expiration: 2033-04-09
Also published as: JP6171511B2

Abstract

PROBLEM TO BE SOLVED: To provide a control device capable of easily improving the precision of an instruction by voice in an image forming apparatus that is operable by voice.SOLUTION: In an image processing system, when an input of a first voice signal representing voice inputted from a microphone connected to an MFP 100 is received (S1), voice recognition based on the first voice signal is performed (S3). When the voice recognition is unsuccessful (S5), an input of a second voice signal representing voice inputted from a microphone connected to a terminal device 300 is received (S7). Voice recognition based on the second voice signal is executed (S9), and the MFP 100 executes image processing associated with a result of the voice recognition (S11).

Description

この発明は制御装置、画像形成装置、端末装置、制御方法、および制御プログラムに関し、特に、音声にて指示可能な画像形成装置の制御装置、画像形成装置、端末装置、制御方法、および制御プログラムに関する。 The present invention relates to a control device, an image forming device, a terminal device, a control method, and a control program, and more particularly to a control device, an image forming device, a terminal device, a control method, and a control program for an image forming device that can be instructed by voice. .

プリンターやファクシミリや、それらの複合機であるＭＦＰ（Multi-Functional Peripheral）などの画像形成装置に対する指示を音声にて行なう操作方法がある。一例として、画像形成装置がマイクを備え、そのマイクで入力した音声を音声認識機能を用いて認識してコマンドに変換し、画像形成処理が行なわれる。 There is an operation method in which an instruction is given by voice to an image forming apparatus such as a printer, a facsimile, or an MFP (Multi-Functional Peripheral) which is a complex machine of them. As an example, the image forming apparatus includes a microphone, and voice input through the microphone is recognized using a voice recognition function and converted into a command, and image forming processing is performed.

特開２００４−２２６４８９号公報Japanese Patent Laid-Open No. 2004-226489

しかしながら、画像形成装置が設置されている環境等によっては、音声認識の精度が低下することがあるという問題があった。たとえば、画像形成装置が工場やイベント会場などの騒音の多い環境に設置されていたり、画像形成装置が音の伴う動作を行なっている最中であったりすると、画像形成を指示する音声と共にノイズが入力されて、音声認識の精度が低下してしまう。 However, depending on the environment in which the image forming apparatus is installed, there is a problem that the accuracy of voice recognition may decrease. For example, if the image forming apparatus is installed in a noisy environment such as a factory or an event venue, or if the image forming apparatus is in the middle of an operation with sound, noise is generated along with the sound for instructing image formation. As a result, the accuracy of voice recognition is reduced.

画像形成装置自体の発する音に関しては、特開２００４−２２６４８９号公報（特許文献１）は、音声認識作業の要求を受け付けると印刷動作を中断することで音声認識率の低下を防ぐ技術を開示しているが、印刷中のジョブ処理を中断することから、作業効率が低下してしまうことになる。 Regarding the sound generated by the image forming apparatus itself, Japanese Patent Laying-Open No. 2004-226489 (Patent Document 1) discloses a technique for preventing a decrease in voice recognition rate by interrupting a printing operation when a request for voice recognition work is received. However, since the job processing during printing is interrupted, work efficiency is reduced.

本発明はこのような問題に鑑みてなされたものであって、音声にて操作可能な画像形成装置において、音声による指示の精度を容易に向上させることのできる制御装置、画像形成装置、端末装置、制御方法、および制御プログラムを提供することを目的としている。 The present invention has been made in view of such problems, and in an image forming apparatus that can be operated by voice, a control apparatus, an image forming apparatus, and a terminal device that can easily improve the accuracy of voice instructions. It is an object to provide a control method and a control program.

上記目的を達成するために、本発明のある局面に従うと、制御装置は画像形成装置の制御装置であって、画像形成装置に接続されたマイクから入力された音声を表わす第１の音声信号の入力を受け付けるための第１の音声入力手段と、端末装置と通信するための通信手段と、端末装置と通信して、端末装置に接続されたマイクから入力された音声を表わす第２の音声信号の入力を受け付けるための第２の音声入力手段と、音声信号に基づく音声認識を行なうための音声認識手段と、音声認識手段での認識結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行するよう制御するための制御手段と、制御手段が、第１の音声信号に基づく音声認識の結果を用いて画像形成装置を制御するか、第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御するか、を判断するための判断手段とを備える。制御手段は、第１の音声信号に基づく音声認識が成功の場合には第１の音声信号に基づく音声認識の結果を用いて画像形成装置を制御し、不成功の場合には第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御する。 In order to achieve the above object, according to one aspect of the present invention, the control device is a control device for an image forming apparatus, and the first sound signal representing sound input from a microphone connected to the image forming apparatus. A first voice input means for receiving an input; a communication means for communicating with the terminal apparatus; and a second voice signal that communicates with the terminal apparatus and represents the voice input from a microphone connected to the terminal apparatus. The second voice input means for receiving the input of the voice, the voice recognition means for performing voice recognition based on the voice signal, and the image processing associated with the recognition result by the voice recognition means, Control means for controlling the apparatus to perform image processing; and the control means controls the image forming apparatus using a result of voice recognition based on the first voice signal, or voice based on the second voice signal. And a determination means for determining, for controlling the image forming apparatus using the result of identification. The control means controls the image forming apparatus using the result of the voice recognition based on the first voice signal when the voice recognition based on the first voice signal is successful, and the second voice when the voice recognition based on the first voice signal is not successful. The image forming apparatus is controlled using the result of the speech recognition based on the signal.

好ましくは、判断手段は、さらに、音声認識手段での第１の音声信号に基づく音声認識に先だって、第１の音声信号に基づく音声認識の実行の適否を判断する。第１の音声信号に基づく音声認識の実行が適切でないと判断された場合に、制御手段は第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御する。 Preferably, the determination unit further determines whether or not the voice recognition based on the first voice signal is appropriate prior to the voice recognition based on the first voice signal by the voice recognition unit. When it is determined that the speech recognition based on the first speech signal is not appropriate, the control unit controls the image forming apparatus using the speech recognition result based on the second speech signal.

より好ましくは、判断手段は、第１の音声信号に含まれるノイズが規定量以上であるか否かを判断することで第１の音声信号に基づく音声認識の実行の適否を判断する。 More preferably, the determination unit determines whether or not the speech recognition based on the first voice signal is appropriate by determining whether or not the noise included in the first voice signal is equal to or greater than a predetermined amount.

好ましくは、判断手段は、画像形成装置に備えられるマイクで音声入力を受け付ける際に画像形成装置で実行中の画像処理に基づいて第１の音声信号に基づく音声認識の実行の適否を判断する。 Preferably, the determination unit determines whether sound recognition based on the first sound signal is appropriate based on image processing being executed in the image forming apparatus when receiving a sound input with a microphone provided in the image forming apparatus.

好ましくは、判断手段が、制御手段が第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御すると判断した場合、通信手段は端末装置との通信を確立して、第２の音声入力手段が第２の音声信号の入力を受け付ける。 Preferably, when the determination unit determines that the control unit controls the image forming apparatus using the result of the voice recognition based on the second audio signal, the communication unit establishes communication with the terminal device, and the second unit The voice input means receives the input of the second voice signal.

好ましくは、判断手段は、制御手段が第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御すると判断した場合、さらに、第２の音声信号に含まれるノイズが規定量以上であるか否かを判断し、通信手段は、第２の音声信号に含まれるノイズが規定量以上であった場合に、予め記憶しているメッセージを端末装置に対して送信する。 Preferably, when the determination unit determines that the control unit controls the image forming apparatus using the result of the voice recognition based on the second audio signal, the noise included in the second audio signal is more than a predetermined amount. The communication means transmits a message stored in advance to the terminal device when the noise included in the second audio signal is equal to or greater than a predetermined amount.

本発明の他の局面に従うと、画像形成装置は、接続されたマイクから入力された音声を表わす第１の音声信号の入力を受け付けるための第１の音声入力手段と、端末装置と通信するための通信手段と、端末装置と通信して、端末装置に接続されたマイクから入力された音声を表わす第２の音声信号の入力を受け付けるための第２の音声入力手段と、音声信号に基づく音声認識を行なうための音声認識手段と、音声認識手段での認識結果に関連付けられている画像処理を特定して、特定された画像処理を実行するための実行手段と、実行手段が、第１の音声信号に基づく音声認識の結果から特定される画像処理を実行するか、第２の音声信号に基づく音声認識の結果から特定される画像処理を実行するか、を判断するための判断手段とを備える。実行手段は、第１の音声信号に基づく音声認識が成功の場合には第１の音声信号に基づく音声認識の結果から特定される画像処理を実行し、不成功の場合には第２の音声信号に基づく音声認識の結果から特定される画像処理を実行する。 According to another aspect of the present invention, the image forming apparatus communicates with the terminal device and a first sound input means for receiving an input of a first sound signal representing a sound input from a connected microphone. Communication means, a second voice input means for communicating with the terminal device and receiving a second voice signal representing voice inputted from a microphone connected to the terminal device, and voice based on the voice signal A voice recognition unit for performing recognition, an execution unit for specifying the image processing associated with the recognition result of the voice recognition unit, and executing the specified image processing; Determining means for determining whether to perform image processing specified from a result of speech recognition based on a speech signal or to perform image processing specified from a result of speech recognition based on a second speech signal; PrepareThe execution means executes the image processing specified from the result of the voice recognition based on the first voice signal when the voice recognition based on the first voice signal is successful, and the second voice if not successful The image processing specified from the result of speech recognition based on the signal is executed.

本発明のさらに他の局面に従うと、端末装置は画像形成装置と通信可能な端末装置であって、接続されたマイクから入力された音声を表わす音声信号に基づく音声認識を行なうための音声認識手段と、音声認識手段での認識結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行するよう制御信号を出力するための制御手段とを備える。 According to still another aspect of the present invention, the terminal device is a terminal device capable of communicating with the image forming apparatus, and is a voice recognition means for performing voice recognition based on a voice signal representing a voice input from a connected microphone. And control means for specifying the image processing associated with the recognition result of the voice recognition means and outputting a control signal to the image forming apparatus to execute the image processing.

好ましくは、端末装置は、画像形成装置と通信し、画像形成装置に接続されたマイクから入力された音声を表わす第１の音声信号の入力を受け付けるための音声入力手段と、端末装置に接続されたマイクから入力された音声を表わす音声信号を第２の音声信号として、制御手段が、第１の音声信号に基づく音声認識の結果を用いて画像形成装置を制御するか、第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御するか、を判断するための判断手段とをさらに備える。制御手段は、第１の音声信号に基づく音声認識が成功の場合には第１の音声信号に基づく音声認識の結果を用いて画像形成装置を制御し、不成功の場合には第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御する。 Preferably, the terminal device communicates with the image forming apparatus and is connected to the terminal device and a voice input means for receiving an input of a first voice signal representing a voice inputted from a microphone connected to the image forming apparatus. The control means controls the image forming apparatus using the result of voice recognition based on the first voice signal, or the second voice signal, with the voice signal representing the voice input from the microphone as the second voice signal. And a determination means for determining whether to control the image forming apparatus using the result of voice recognition based on the above. The control means controls the image forming apparatus using the result of the voice recognition based on the first voice signal when the voice recognition based on the first voice signal is successful, and the second voice when the voice recognition based on the first voice signal is not successful. The image forming apparatus is controlled using the result of the speech recognition based on the signal.

本発明のさらに他の局面に従うと、制御方法は音声を用いた画像形成装置の制御方法であって、画像形成装置に接続されたマイクから入力された音声を表わす第１の音声信号の入力を受け付けるステップと、第１の音声信号に基づく音声認識を行なうステップと、第１の音声信号に基づく音声認識が成功の場合に、第１の音声信号に基づく音声認識の結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行させるステップと、第１の音声信号に基づく音声認識が不成功の場合に、端末装置に接続されたマイクから入力された音声を表わす第２の音声信号の入力を受け付けるステップと、第２の音声信号に基づく音声認識を行なうステップと、第２の音声信号に基づく音声認識の結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行させるステップとを備える。 According to still another aspect of the present invention, the control method is a control method for an image forming apparatus using sound, and the first sound signal representing the sound input from the microphone connected to the image forming apparatus is input. An image associated with the result of the speech recognition based on the first speech signal when the step of accepting, the step of performing speech recognition based on the first speech signal, and the speech recognition based on the first speech signal are successful Specifying the processing and causing the image forming apparatus to execute the image processing; and a second representing the voice input from the microphone connected to the terminal device when voice recognition based on the first voice signal is unsuccessful Receiving a voice signal input, performing voice recognition based on the second voice signal, and image processing associated with the result of voice recognition based on the second voice signal. To, and a step of executing image processing on the image forming apparatus.

本発明のさらに他の局面に従うと、制御プログラムはコンピューターに画像形成装置の制御を行なわせるためのプログラムであって、画像形成装置に接続されたマイクから入力された音声を表わす第１の音声信号の入力を受け付けるステップと、第１の音声信号に基づく音声認識を行なうステップと、第１の音声信号に基づく音声認識が成功の場合に、第１の音声信号に基づく音声認識の結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行させるステップと、第１の音声信号に基づく音声認識が不成功の場合に、端末装置と通信して、端末装置に接続されたマイクから入力された音声を表わす第２の音声信号の入力を受け付けるステップと、第２の音声信号に基づく音声認識を行なうステップと、第２の音声信号に基づく音声認識の結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行させるステップとをコンピューターに実行させる。 According to still another aspect of the present invention, the control program is a program for causing a computer to control the image forming apparatus, and is a first sound signal representing sound input from a microphone connected to the image forming apparatus. When the voice recognition based on the first voice signal is successful, and the voice recognition based on the first voice signal is associated with the result of the voice recognition based on the first voice signal. The image forming apparatus is identified and the image forming apparatus executes the image processing, and when the voice recognition based on the first audio signal is unsuccessful, the terminal apparatus communicates with and is connected to the terminal apparatus. A step of receiving an input of a second voice signal representing a voice inputted from a microphone, a step of performing voice recognition based on the second voice signal, and a second voice signal Based identifies the image processing associated with the result of speech recognition, and a step of executing image processing on the image forming apparatus to the computer.

本発明のさらに他の局面に従うと、制御プログラムはコンピューターに端末装置の制御を行なわせるためのプログラムであって、端末装置に接続されたマイクから入力された音声を表わす音声信号の入力を受け付けるステップと、音声信号に基づく音声認識を行なうステップと、音声認識の結果に関連付けられている画像処理を特定して、画像形成装置に画像処理を実行するよう制御信号を出力するステップとをコンピューターに実行させる。 According to still another aspect of the present invention, the control program is a program for causing a computer to control the terminal device, and accepts an input of an audio signal representing audio input from a microphone connected to the terminal device. And a step of performing voice recognition based on the voice signal and a step of identifying the image processing associated with the result of the voice recognition and outputting a control signal to the image forming apparatus to perform the image processing. Let

好ましくは、制御プログラムは、画像形成装置と通信し、画像形成装置に接続されたマイクから入力された音声を表わす第１の音声信号の入力を受け付けるステップと、端末装置に接続されたマイクから入力された音声を表わす音声信号を第２の音声信号として、第１の音声信号に基づく音声認識の結果を用いて画像形成装置を制御するか、第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御するか、を判断するステップとをさらにコンピューターに実行させ、制御信号を出力するステップでは、第１の音声信号に基づく音声認識が成功の場合には第１の音声信号に基づく音声認識の結果を用いて画像形成装置を制御し、不成功の場合には第２の音声信号に基づく音声認識の結果を用いて画像形成装置を制御する。 Preferably, the control program communicates with the image forming apparatus, accepts an input of a first audio signal representing a sound input from a microphone connected to the image forming apparatus, and is input from a microphone connected to the terminal apparatus. The image forming apparatus is controlled by using the result of speech recognition based on the first speech signal, or the result of speech recognition based on the second speech signal is used as the second speech signal. Determining whether to control the image forming apparatus, and causing the computer to execute a control signal and outputting the control signal to the first voice signal when the voice recognition based on the first voice signal is successful. The image forming apparatus is controlled using the result of the speech recognition based on the voice recognition, and the image forming apparatus is controlled using the result of the voice recognition based on the second voice signal when the result is unsuccessful.

この発明によると、音声にて操作可能な画像形成装置において、音声による指示の精度を容易に向上させることができる。 According to the present invention, in an image forming apparatus that can be operated by voice, the accuracy of voice instructions can be easily improved.

実施の形態にかかる画像処理システムの構成の具体例を示す図である。It is a figure which shows the specific example of a structure of the image processing system concerning embodiment. 画像処理システムに含まれるＭＦＰ（Multi-Functional Peripheral）のハードウェア構成の具体例を示す図である。FIG. 3 is a diagram illustrating a specific example of a hardware configuration of an MFP (Multi-Functional Peripheral) included in the image processing system. 画像処理システムに含まれる端末装置のハードウェア構成の具体例を示す図である。It is a figure which shows the specific example of the hardware constitutions of the terminal device contained in an image processing system. 画像処理システムでの動作概要を表わした図である。It is a figure showing the operation | movement outline | summary in an image processing system. ＭＦＰの機能構成の具体例を示すブロック図である。2 is a block diagram illustrating a specific example of a functional configuration of an MFP. FIG. ＭＦＰでの動作の流れの具体例を表わしたフローチャートである。5 is a flowchart showing a specific example of an operation flow in the MFP. ＭＦＰでの動作の流れの具体例を表わしたフローチャートである。5 is a flowchart showing a specific example of an operation flow in the MFP. 図６のステップＳ１２３での、音声入力を切り替える処理の具体例を表わしたフローチャートである。It is a flowchart showing the specific example of the process which switches audio | voice input in step S123 of FIG. 変形例にかかるＭＦＰの動作の、第１の例を表わしたフローチャートである。12 is a flowchart illustrating a first example of the operation of an MFP according to a modified example. 変形例にかかるＭＦＰの動作の、第１の例を表わしたフローチャートである。12 is a flowchart illustrating a first example of the operation of an MFP according to a modified example. 変形例にかかるＭＦＰの動作の、第２の例を表わしたフローチャートである。12 is a flowchart showing a second example of the operation of the MFP according to the modification. 変形例にかかるＭＦＰの動作の、第２の例を表わしたフローチャートである。12 is a flowchart showing a second example of the operation of the MFP according to the modification.

以下に、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品および構成要素には同一の符号を付してある。それらの名称および機能も同じである。したがって、これらの説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. In the following description, the same parts and components are denoted by the same reference numerals. Their names and functions are also the same. Therefore, these descriptions will not be repeated.

＜システム構成＞
図１は、本実施の形態にかかる画像処理システムの構成の具体例を示す図である。 <System configuration>
FIG. 1 is a diagram showing a specific example of the configuration of the image processing system according to the present embodiment.

図１を参照して、本実施の形態にかかる画像処理システムは、画像形成装置の一例としてのＭＦＰ（Multi-Function Peripheral）１００と、端末装置の一例としての端末装置３００とを含む。これらは、ＬＡＮ（Local Area Network）などのネットワークで接続されている。画像処理システムは、図示されたように、さらにサーバー５００を含んでもよい。 Referring to FIG. 1, the image processing system according to the present embodiment includes an MFP (Multi-Function Peripheral) 100 as an example of an image forming apparatus and a terminal device 300 as an example of a terminal apparatus. These are connected by a network such as a LAN (Local Area Network). The image processing system may further include a server 500 as shown.

ネットワークは有線であっても無線であってもよい。一例として、図１に示されるように、ＭＦＰ１００とサーバー５００とが有線ＬＡＮに接続され、端末装置３００が無線で接続されている例が挙げられる。 The network may be wired or wireless. As an example, as illustrated in FIG. 1, an example in which the MFP 100 and the server 500 are connected to a wired LAN, and the terminal device 300 is connected wirelessly.

ＭＦＰ１００は、画像形成機能を実現するための構成と、音声入力を受け付ける機能を実現するための構成としての音声入力装置（マイク）とを含む（図２）。本実施の形態にかかる画像処理システムに含まれる画像形成装置は、少なくともこれら機能を有するものであればＭＦＰに限定されない。なお、音声入力を受け付ける機能を実現するための構成は、音声入力装置に限定されず、接続された音声入力装置から入力される音声信号を処理する処理装置も含む。後述する端末装置３００でも同様である。 MFP 100 includes a configuration for realizing an image forming function and a voice input device (microphone) as a configuration for realizing a function for receiving voice input (FIG. 2). The image forming apparatus included in the image processing system according to the present embodiment is not limited to the MFP as long as it has at least these functions. Note that the configuration for realizing the function of receiving voice input is not limited to the voice input device, and includes a processing device that processes a voice signal input from the connected voice input device. The same applies to the terminal device 300 described later.

端末装置３００は、携帯電話機や、スマートフォンなどと言われる携帯端末などであってよい。端末装置３００は、音声入力を受け付ける機能を実現するための構成としての音声入力装置（マイク）と、情報を出力する機能を実現するための構成として表示装置（タッチパネル）や音声出力装置（スピーカー）とを含む（図３）。本実施の形態にかかる画像処理システムに含まれる端末装置は、少なくともこれら機能を有し、ユーザーが持ち運び可能なサイズであればどのような装置であってもよい。 The terminal device 300 may be a mobile phone or a mobile terminal called a smartphone. The terminal device 300 includes a voice input device (microphone) as a configuration for realizing a function of receiving voice input, and a display device (touch panel) and a voice output device (speaker) as a configuration for realizing a function of outputting information. (FIG. 3). The terminal device included in the image processing system according to the present embodiment may be any device as long as it has at least these functions and can be carried by the user.

＜ＭＦＰの構成＞
図２は、ＭＦＰ１００のハードウェア構成の具体例を示す図である。 <Configuration of MFP>
FIG. 2 is a diagram illustrating a specific example of the hardware configuration of the MFP 100.

図２を参照して、ＭＦＰ１００は、ＭＦＰ１００の制御装置として機能する演算装置であるＣＰＵ（Central Processing Unit）１０と、メモリーとしての、ＣＰＵ１０で実行されるプログラムなどを記憶するためのＲＯＭ（Read Only Memory）１１、ＣＰＵ１０でプログラムを実行する際の作業領域として機能するためのＲＡＭ（Random Access Memory）１２、および画像データーなどを保存するためのＨＤ（ハードディスク）１６と、図示しない原稿台に載置された原稿を光学的に読み取って画像データーを得るためのスキャナー１３と、画像データーを印刷用紙上に固定するためのプリンター１４と、表示装置および入力装置である操作パネル１５と、上記ネットワークを介した通信を制御するためのネットワークコントローラー１７と、マイク１８とを含む。 Referring to FIG. 2, MFP 100 includes a CPU (Central Processing Unit) 10 that is an arithmetic device that functions as a control device of MFP 100, and a ROM (Read Only) that stores a program executed by CPU 10 as a memory. (Memory) 11, a RAM (Random Access Memory) 12 for functioning as a work area when the CPU 10 executes a program, an HD (Hard Disk) 16 for storing image data and the like, and a document table (not shown). A scanner 13 for optically reading the original document to obtain image data, a printer 14 for fixing the image data on the printing paper, an operation panel 15 as a display device and an input device, and the network. A network controller 17 for controlling the communication and a microphone 18 Including.

操作パネル１５は、図示しないタッチパネルと操作キー群とを含む。タッチパネルは、液晶表示装置などの表示装置と光学式タッチパネルや静電容量タッチパネルなどの位置指示装置とが重なって構成され、操作画面を表示して、その操作画面上の指示位置を特定する。ＣＰＵ１０は予め記憶されている画面表示をさせるためのデーターに基づいてタッチパネルに操作画面を表示させる。 The operation panel 15 includes a touch panel (not shown) and an operation key group. The touch panel is configured by overlapping a display device such as a liquid crystal display device and a position indicating device such as an optical touch panel or a capacitive touch panel, and displays an operation screen to specify an indicated position on the operation screen. CPU10 displays an operation screen on a touchscreen based on the data for displaying the screen memorize | stored beforehand.

特定されたタッチパネル上での指示位置（タッチされた位置）や、押下されたキーを示す操作信号はＣＰＵ１０に入力される。ＣＰＵ１０は押下されたキー、または表示している操作画面と指示位置とから操作内容を特定し、それに基づいて処理を実行する。 The specified position (touched position) on the touch panel and the operation signal indicating the pressed key are input to the CPU 10. The CPU 10 specifies the operation content from the pressed key or the displayed operation screen and the designated position, and executes processing based on the operation content.

＜端末装置の構成＞
図３は、端末装置３００のハードウェア構成の具体例を示す図である。 <Configuration of terminal device>
FIG. 3 is a diagram illustrating a specific example of the hardware configuration of the terminal device 300.

図３を参照して、端末装置３００は、全体を制御するための演算装置であるＣＰＵ３０と、メモリーとしての、ＣＰＵ３０で実行されるプログラムなどを記憶するためのＲＯＭ３１、およびＣＰＵ３０でプログラムを実行する際の作業領域として機能するためのＲＡＭ３２と、マイク３３と、スピーカー３４と、表示装置および入力装置であるタッチパネル３５と、上記ネットワークを介した通信を制御するためネットワークコントローラー３６とを含む。 Referring to FIG. 3, terminal device 300 executes a program with CPU 30 that is an arithmetic device for overall control, ROM 31 for storing a program executed by CPU 30 as a memory, and CPU 30. A RAM 32 for functioning as a work area, a microphone 33, a speaker 34, a touch panel 35 as a display device and an input device, and a network controller 36 for controlling communication via the network.

端末装置３００は、上述のように携帯電話機やスマートフォンなどのような電話機能を有する場合には、図３の装置構成に加えて、電話機能を実現するための構成をさらに含む。 When the terminal device 300 has a telephone function such as a mobile phone or a smartphone as described above, the terminal device 300 further includes a configuration for realizing the telephone function in addition to the device configuration of FIG.

＜サーバーの構成＞
サーバー５００は、パーソナルコンピューター等の、通常のコンピューターで実現することができる。そのため、そのハードウェア構成は、通常のコンピューターのハードウェア構成と同様とすることができる。そこで、ここでは、その構成の詳細な説明は行なわない。 <Server configuration>
The server 500 can be realized by a normal computer such as a personal computer. Therefore, the hardware configuration can be the same as the hardware configuration of a normal computer. Therefore, detailed description of the configuration will not be given here.

＜動作概要＞
ユーザーは、本実施の形態にかかる画像処理システムを用いてＭＦＰ１００に画像処理を実行させる際に、音声で指示する。すなわち、第１のステップとして、ユーザーは、ＭＦＰ１００に対して「コピー開始」などと音声で指示する。ＭＦＰ１００は、入力された「コピー開始」の音声に従って画像処理を実行する。 <Overview of operation>
The user gives a voice instruction when causing the MFP 100 to execute image processing using the image processing system according to the present embodiment. That is, as a first step, the user instructs MFP 100 by voice such as “start copying”. The MFP 100 executes image processing in accordance with the input “start copy” sound.

ＭＦＰ１００が工場やイベント会場などの騒音の多い環境に設置されていたり、ＭＦＰ１００が音の伴う動作を行なっている最中であったりすると、「コピー開始」の音声と共にノイズが入力されることがある。ノイズによって「コピー開始」の音声が適切に認識されないと、ＭＦＰ１００は、ユーザーの指示した画像処理を実行しない。 When the MFP 100 is installed in a noisy environment such as a factory or an event venue, or when the MFP 100 is performing an operation accompanied by sound, noise may be input together with the sound of “copy start”. . If the “copy start” sound is not properly recognized due to noise, MFP 100 does not execute the image processing instructed by the user.

そこで、本実施の形態にかかる画像処理システムでは、ＭＦＰ１００以外の他の装置である端末装置３００でも音声の入力を受け付ける。ＭＦＰ１００で入力を受け付けた音声を第１の音声（第１の音声信号）とし、端末装置３００で入力を受け付けた音声を第２の音声（第２の音声信号）とする。詳しくは、本実施の形態にかかる画像処理システムは、ＭＦＰ１００に入力された第１の音声の認識結果が画像処理の指示として適切でない場合には音声認識が不成功であったものとする。このとき、画像処理システムは、第２の音声を入力可能な状態として第２の音声の入力を受け付ける。そして、ＭＦＰ１００は、第２の音声に従って画像処理を実行する。 Therefore, in the image processing system according to the present embodiment, the terminal device 300, which is a device other than the MFP 100, accepts voice input. The sound received by the MFP 100 is defined as a first sound (first sound signal), and the sound received by the terminal device 300 is defined as a second sound (second sound signal). Specifically, in the image processing system according to the present embodiment, it is assumed that the speech recognition is unsuccessful when the recognition result of the first sound input to MFP 100 is not appropriate as an image processing instruction. At this time, the image processing system accepts the input of the second sound in a state where the second sound can be input. Then, MFP 100 executes image processing according to the second sound.

上記の動作を行なうため、本実施の形態にかかる画像処理システムは音声認識機能を備える。一例として、サーバー５００が音声認識機能を含む。もちろん、サーバー５００に替ってＭＦＰ１００が、または、端末装置３００が、音声認識機能を含んでもよい。 In order to perform the above operation, the image processing system according to the present embodiment includes a voice recognition function. As an example, the server 500 includes a voice recognition function. Of course, the MFP 100 or the terminal device 300 may include a voice recognition function instead of the server 500.

図４は、画像処理システムでの動作概要を表わした図である。図４を参照して、ＭＦＰ１００は、マイク１８で音声（第１の音声）の入力を受け付けると（ステップＳ１）、その音声を表わす音声信号（第１の音声信号）をサーバー５００に送信する（ステップＳ２）。サーバー５００は、第１の音声信号に基づく音声識別を実行し（ステップＳ３）、音声認識の結果をＭＦＰ１００に返す（ステップＳ４）。サーバー５００は、上記ステップＳ３で一例として音声をテキストに変換し、ステップＳ４で音声認識の結果として上記テキストをＭＦＰ１００に送信する。 FIG. 4 is a diagram showing an outline of the operation in the image processing system. Referring to FIG. 4, when MFP 100 receives an input of voice (first voice) with microphone 18 (step S1), MFP 100 transmits a voice signal (first voice signal) representing the voice to server 500 (step S1). Step S2). Server 500 executes voice identification based on the first voice signal (step S3), and returns the result of voice recognition to MFP 100 (step S4). In step S3, server 500 converts speech into text as an example, and transmits the text to MFP 100 as a result of speech recognition in step S4.

ＭＦＰ１００は、第１の音声信号に基づく音声識別の結果（上記テキスト）を用いて画像処理を実行するか、端末装置３００で入力を受け付けた音声を表わす第２の音声信号に基づく音声識別の結果を用いて画像処理を実行するかを判断する（ステップＳ５）。ＭＦＰ１００は、予め、実行可能な画像処理について、その画像処理を実行させるためのコマンドに関連付けてキーワードを記憶しておく。たとえば、ＭＦＰ１００は、画像処理であるコピー処理を実行させるためのコマンドに関連付けて、「コピー開始」、「コピー実行」、「コピースタート」、「複写開始」、などのキーワードを記憶しておく。ＭＦＰ１００は、上記ステップＳ５で、受信したテキストが記憶しているキーワードと一致するか否かを検索する。一致するキーワードがあった場合、ＭＦＰ１００は、音声認識が成功と判断する。そして、この場合、ＭＦＰ１００は、第１の音声信号に基づく音声識別の結果（上記テキスト）を用いて画像処理を実行すると判断する。 MFP 100 performs image processing using the result of voice identification based on the first voice signal (the above text), or the result of voice identification based on the second voice signal representing the voice received by terminal device 300. Is used to determine whether to execute image processing (step S5). MFP 100 previously stores keywords associated with commands for executing the image processing for executable image processing. For example, the MFP 100 stores keywords such as “copy start”, “copy execution”, “copy start”, “copy start”, and the like in association with a command for executing copy processing as image processing. In step S5, the MFP 100 searches for whether the received text matches the stored keyword. If there is a matching keyword, MFP 100 determines that the speech recognition is successful. In this case, MFP 100 determines to perform image processing using the result of voice identification (the text) based on the first voice signal.

上記ステップＳ５で、受信したテキストと一致するキーワードがなかった場合、ＭＦＰ１００は、音声認識が不成功と判断する。そして、この場合、ＭＦＰ１００は、第２の音声信号に基づく音声識別の結果（端末装置３００で受け付けた音声の認識結果）を用いて画像処理を実行すると判断する。好ましくは、ＭＦＰ１００は、音声認識が不成功と判断すると、再度、マイク１８で音声（第１の音声）の入力を受け付ける。そして、好ましくは、ＭＦＰ１００は、予め規定された回数、音声認識が不成功であった場合に、第２の音声信号に基づく音声識別の結果（端末装置３００で受け付けた音声の認識結果）を用いて画像処理を実行すると判断する。 If there is no keyword that matches the received text in step S5, MFP 100 determines that the speech recognition is unsuccessful. In this case, MFP 100 determines to perform image processing using the result of voice identification based on the second voice signal (the voice recognition result received by terminal device 300). Preferably, when MFP 100 determines that voice recognition is unsuccessful, it again accepts input of voice (first voice) with microphone 18. Preferably, MFP 100 uses a result of voice identification based on the second voice signal (a voice recognition result received by terminal device 300) when voice recognition has been unsuccessful a predetermined number of times. The image processing is determined to be executed.

第２の音声信号に基づく音声識別の結果を用いて画像処理を実行すると判断すると、ＭＦＰ１００は、自動的に端末装置３００との間の通信を確立する（ステップＳ６）。このとき、ＭＦＰ１００は、マイク１８が音声入力を受け付けない状態となるようにマイク１８をＯＦＦとしてもよいし、マイク１８は音声入力を受け付け可能な状態としたままでマイク１８からの音声信号を処理しないようにしてもよい。 If it is determined that the image processing is to be executed using the result of the voice identification based on the second voice signal, the MFP 100 automatically establishes communication with the terminal device 300 (step S6). At this time, the MFP 100 may turn off the microphone 18 so that the microphone 18 does not accept voice input, or the microphone 18 processes the voice signal from the microphone 18 while keeping the voice input acceptable. You may make it not.

上記ステップＳ６で、詳しくは、ＭＦＰ１００は、端末装置３００に対する通信を開始して、端末装置３００からの音声信号の送信を待機する。ＭＦＰ１００から端末装置３００に対する通信は、上記ネットワークを介して端末装置３００に対して信号（リクエスト）を送信することであってもよいし、双方が電話機能を有する場合には、ＭＦＰ１００から端末装置３００に対して架電することであってもよい。ＭＦＰ１００から端末装置３００に対する通信は、好ましくは、端末装置３００のユーザーに対して音声入力を促すメッセージを含む。このメッセージは音声であってもテキストであってもよい。たとえば、ＭＦＰ１００は端末装置３００に対して「端末装置から、もう一度、音声で指示して下さい」などの音声データーを送信し、端末装置３００のスピーカー３４からこの音声が出力されてもよい。 Specifically, in step S <b> 6, the MFP 100 starts communication with the terminal device 300 and waits for transmission of an audio signal from the terminal device 300. Communication from the MFP 100 to the terminal device 300 may be to transmit a signal (request) to the terminal device 300 via the network, or when both have a telephone function, the MFP 100 to the terminal device 300. It may be possible to call. Communication from MFP 100 to terminal device 300 preferably includes a message prompting the user of terminal device 300 to input a voice. This message may be voice or text. For example, the MFP 100 may transmit audio data such as “Please give another instruction from the terminal device by voice” to the terminal device 300, and this voice may be output from the speaker 34 of the terminal device 300.

なお、ＭＦＰ１００と端末装置３００との間の通信の確立は、上記のように、ＭＦＰ１００から端末装置３００に対して要求されることで確立するものの他、その逆も含む。たとえば、ＭＦＰ１００は、第２の音声信号に基づく音声識別の結果を用いて画像処理を実行すると、端末装置３００に対して、「ＭＦＰに電話をかけてください」などの音声データーを送信して、通信を終了してもよい。そして、端末装置３００からＭＦＰ１００に対して架電することでＭＦＰ１００と端末装置３００との間の通信が確立してもよい。 As described above, establishment of communication between MFP 100 and terminal device 300 includes what is established by request from MFP 100 to terminal device 300 and vice versa. For example, when the MFP 100 executes image processing using the voice identification result based on the second voice signal, the MFP 100 transmits voice data such as “Please call the MFP” to the terminal device 300, Communication may be terminated. Then, communication between MFP 100 and terminal device 300 may be established by calling terminal device 300 to MFP 100.

端末装置３００は、マイク３３で音声（第２の音声）の入力を受け付けると（ステップＳ７）、その音声を表わす音声信号（第２の音声信号）をサーバー５００に送信する（ステップＳ８）。サーバー５００は、上記ステップＳ４と同様にして第２の音声信号に基づく音声識別を実行し（ステップＳ９）、音声認識の結果（テキスト）をＭＦＰ１００に返す（ステップＳ１０）。ＭＦＰ１００は、サーバー５００から受信した、第２の音声信号に基づく音声識別の結果（端末装置３００で受け付けた音声の認識結果）を用いて画像処理を実行する（ステップＳ１１）。 When terminal device 300 receives an input of voice (second voice) with microphone 33 (step S7), terminal apparatus 300 transmits a voice signal (second voice signal) representing the voice to server 500 (step S8). Server 500 executes voice identification based on the second voice signal in the same manner as in step S4 (step S9), and returns the voice recognition result (text) to MFP 100 (step S10). The MFP 100 executes image processing using the voice identification result based on the second voice signal received from the server 500 (the voice recognition result received by the terminal device 300) (step S11).

なお、ＭＦＰ１００は、好ましくは、第２の音声信号に基づく音声識別の結果（端末装置３００で受け付けた音声の認識結果）についても上記ステップＳ５と同様にして、この音声識別が成功か不成功かを判断する。そして、第２の音声信号に基づく音声識別が不成功と判断すると、ＭＦＰ１００は、再度、端末装置３００から音声（第２の音声）の入力を受け付けるところから、処理を繰り返す。 MFP 100 preferably also determines whether the voice identification is successful or unsuccessful in the same manner as in step S5 with respect to the result of voice identification based on the second voice signal (the result of voice recognition received by terminal device 300). Judging. If the MFP 100 determines that the voice identification based on the second voice signal is unsuccessful, the MFP 100 repeats the process from accepting the voice (second voice) input from the terminal device 300 again.

＜機能構成＞
図５は、上記動作を行なうためのＭＦＰ１００の機能構成の具体例を示すブロック図である。図５の各機能は、ＭＦＰ１００のＣＰＵ１０がＲＯＭ１１に記憶されているプログラムを読み出して実行することで、主に、ＣＰＵ１０が実現するものである。しかしながら、少なくとも一部が、図２に表わされたハードウェア構成、または、図２には示されていない電気回路等のハードウェア構成によって実現されてもよい。 <Functional configuration>
FIG. 5 is a block diagram showing a specific example of a functional configuration of MFP 100 for performing the above operation. Each function in FIG. 5 is mainly realized by the CPU 10 when the CPU 10 of the MFP 100 reads out and executes a program stored in the ROM 11. However, at least a part may be realized by a hardware configuration shown in FIG. 2 or a hardware configuration such as an electric circuit not shown in FIG.

図５を参照して、ＨＤ１６は、画像処理を実行させるためのコマンドごとに関連付けられたキーワードを記憶するための記憶領域であるコマンド記憶部１６１を含む。 Referring to FIG. 5, HD 16 includes a command storage unit 161 that is a storage area for storing a keyword associated with each command for executing image processing.

さらに図５を参照して、ＣＰＵ１０は、マイクから入力された音声を表わす第１の音声信号の入力を受け付けるための第１音声入力部１０１と、ネットワークコントローラー１７を介して端末装置３００と通信することでマイク３３から入力された音声を表わす第２の音声信号の入力を受け付けるための第２音声入力部１０２と、音声信号に基づく音声認識を行なうための音声認識部１０３と、音声認識部１０３での認識結果に関連付けられている画像処理を特定するための特定部１０９を含み、画像処理を実行するようスキャナー１３やプリンター１４を制御するための制御部１０８と、制御部１０８が第１の音声信号に基づく音声認識の結果を用いて制御するか、第２の音声信号に基づく音声認識の結果を用いて制御するか、を判断するための判断部１０６と、判断部１０６で第２の音声信号に基づく音声認識の結果を用いて制御すると判断した場合に端末装置３００との間の通信を確立するための通信部１０７とを含む。制御部１０８は、音声認識部１０３が第１の音声信号に基づく音声認識に成功した場合には第１の音声信号に基づく音声認識の結果を用いて画像処理を実行するようスキャナー１３やプリンター１４を制御し、不成功の場合には第２の音声信号に基づく音声認識の結果を用いて制御する。 Further, referring to FIG. 5, CPU 10 communicates with terminal device 300 via network controller 17 and first audio input unit 101 for receiving an input of a first audio signal representing an audio input from a microphone. Thus, the second voice input unit 102 for receiving the input of the second voice signal representing the voice input from the microphone 33, the voice recognition unit 103 for performing voice recognition based on the voice signal, and the voice recognition unit 103 A control unit 108 for controlling the scanner 13 and the printer 14 to execute the image processing, and a control unit 108 that includes the specifying unit 109 for specifying the image processing associated with the recognition result in Determine whether to control using the result of speech recognition based on the speech signal or to control using the result of speech recognition based on the second speech signal And a communication unit 107 for establishing communication with the terminal device 300 when the determination unit 106 determines to control using the result of speech recognition based on the second audio signal. . When the voice recognition unit 103 succeeds in the voice recognition based on the first voice signal, the control unit 108 performs the image processing using the result of the voice recognition based on the first voice signal so as to execute the image processing. If unsuccessful, control is performed using the result of speech recognition based on the second speech signal.

図４に示されたように、サーバー５００が音声認識機能を有し、ＭＦＰ１００がその音声認識機能を利用する場合には、音声認識部１０３は、入力した音声をサーバー５００に送信するための送信部１０４と、サーバー５００からその認識結果の入力を受け付けるための認識結果入力部１０５とを含む。ＭＦＰ１００が音声認識機能を備えて、その音声認識機能を利用する場合には、音声認識部１０３は、音声認識処理を行なう。 As shown in FIG. 4, when server 500 has a voice recognition function and MFP 100 uses the voice recognition function, voice recognition unit 103 transmits the input voice to server 500. Part 104 and recognition result input part 105 for receiving the input of the recognition result from server 500. When MFP 100 has a voice recognition function and uses the voice recognition function, voice recognition unit 103 performs voice recognition processing.

＜動作フロー＞
図６および図７は、ＭＦＰ１００での動作の流れの具体例を表わしたフローチャートである。図６および図７のフローチャートに表わされた動作は、ＭＦＰ１００のＣＰＵ１０がＲＯＭ１１に記憶されたプログラムをＲＡＭ１２上に読み出して実行し、図５の各機能を発揮することによって実現される。 <Operation flow>
6 and 7 are flowcharts showing specific examples of the flow of operations in MFP 100. The operations shown in the flowcharts of FIGS. 6 and 7 are realized by the CPU 10 of the MFP 100 reading out and executing the program stored in the ROM 11 on the RAM 12 and exhibiting the functions shown in FIG.

図６を参照して、ＣＰＵ１０はユーザー認証に成功し（ステップＳ１０１でＹＥＳ）、音声認識機能を用いてＭＦＰ１００を制御するモードとすると（ステップＳ１０３でＹＥＳ）、ＣＰＵ１０に接続されているマイク１８をＯＮにして、マイク１８からの音声入力を有効にする。また、音声認識が不成功であった回数をカウントするためのカウンターを初期化する（ステップＳ１０５）。 Referring to FIG. 6, CPU 10 succeeds in user authentication (YES in step S <b> 101), and enters a mode for controlling MFP 100 using the voice recognition function (YES in step S <b> 103), microphone 18 connected to CPU 10 is set. Turn on to enable voice input from the microphone 18. In addition, a counter for counting the number of times voice recognition is unsuccessful is initialized (step S105).

ＣＰＵ１０は、マイク１８から入力された音声を表わす第１の音声信号をサーバー５００に送信し（ステップＳ１０７）、サーバー５００からその音声の認識結果を受信する（ステップＳ１０９）。音声認識が成功の場合（ステップＳ１１１でＹＥＳ）、すなわち、音声識別の結果であるテキストが画像処理を実行させるためのコマンドに関連付けて記憶しているいずれかのキーワードに一致した場合、ＣＰＵ１０は、音声認識機能を用いてＭＦＰ１００を制御するモードをＯＦＦとして、カウンターをリセットする（ステップＳ１１３）。そして、図７を参照して、ＣＰＵ１０は、第１の音声信号に基づく音声識別の結果（上記テキスト）を用いて画像処理を実行する。すなわち、第１の音声の認識結果であるテキストに関連付けられているコマンドに従って画像処理を実行する（ステップＳ１３７）。 CPU10 transmits the 1st audio | voice signal showing the audio | voice input from the microphone 18 to the server 500 (step S107), and receives the recognition result of the audio | voice from the server 500 (step S109). If the speech recognition is successful (YES in step S111), that is, if the text that is the result of speech identification matches any of the keywords stored in association with the command for executing image processing, the CPU 10 The mode for controlling MFP 100 using the voice recognition function is turned OFF, and the counter is reset (step S113). Then, referring to FIG. 7, the CPU 10 executes image processing using the result of voice identification (the above text) based on the first voice signal. That is, the image processing is executed in accordance with the command associated with the text that is the first speech recognition result (step S137).

図６に戻って、第１の音声信号に基づく音声認識が不成功の場合（ステップＳ１１１でＮＯ）、すなわち、音声識別の結果であるテキストが画像処理を実行させるためのコマンドに関連付けて記憶しているいずれのキーワードとも一致しない場合、ＣＰＵ１０は、音声認識が不成功であった回数をカウントするためのカウンターを１、インクリメントする（ステップＳ１１５）。音声認識が不成功であった回数が予め規定された所定回数に達していない場合には（ステップＳ１１７でＮＯ）、ＣＰＵ１０は、音声認識が不成功であった旨を操作パネル１５に表示してユーザーに報知すると共に、再入力を促す（ステップＳ１１９）。そして、ＣＰＵ１０は、音声の再入力を受け付けて、上記ステップＳ１０７からの動作を繰り返す。 Returning to FIG. 6, when the speech recognition based on the first speech signal is unsuccessful (NO in step S111), that is, the text that is the result of speech identification is stored in association with the command for executing the image processing. If none of the keywords matches, the CPU 10 increments the counter for counting the number of times speech recognition was unsuccessful by 1 (step S115). If the number of times of unsuccessful voice recognition has not reached the predetermined number of times defined in advance (NO in step S117), the CPU 10 displays on the operation panel 15 that the voice recognition has failed. The user is notified and prompted to input again (step S119). And CPU10 receives the re-input of an audio | voice and repeats the operation | movement from said step S107.

音声認識が不成功であった回数が所定回数に達すると（ステップＳ１１７でＹＥＳ）、ＣＰＵ１０はカウンターをリセットした上で（ステップＳ１２１）、音声入力をマイク１８での入力から端末装置３００での入力に切り替える処理を行なう（ステップＳ１２３）。なお、ＣＰＵ１０は、カウンターをリセットする際に、マイク１８をＯＦＦにしてもよい。 When the number of times of unsuccessful voice recognition reaches a predetermined number (YES in step S117), the CPU 10 resets the counter (step S121), and then inputs the voice input from the microphone 18 to the terminal device 300. The process to switch to is performed (step S123). The CPU 10 may turn off the microphone 18 when resetting the counter.

図７を参照して、ＣＰＵ１０は、端末装置３００から第２の音声信号の入力を受け付けると（ステップＳ１２５でＹＥＳ）、第２の音声信号をサーバー５００に送信し（ステップＳ１２７）、サーバー５００からその音声の認識結果を受信する（ステップＳ１２９）。音声認識が成功の場合（ステップＳ１３１でＹＥＳ）、すなわち、音声識別の結果であるテキストが画像処理を実行させるためのコマンドに関連付けて記憶しているいずれかのキーワードに一致した場合、ＣＰＵ１０は、第２の音声信号に基づく音声識別の結果（上記テキスト）を用いて画像処理を実行する。すなわち、第２の音声の認識結果であるテキストに関連付けられているコマンドに従って画像処理を実行する（ステップＳ１３７）。このとき、ＣＰＵ１０は、画像処理の実行に先だって端末装置３００との通信を切断するようにしてもよいし、それ以降に端末装置３００との通信を切断してもよい（ステップＳ１３５）。 Referring to FIG. 7, when receiving the input of the second audio signal from terminal device 300 (YES in step S125), CPU 10 transmits the second audio signal to server 500 (step S127). The voice recognition result is received (step S129). If the speech recognition is successful (YES in step S131), that is, if the text that is the result of speech identification matches one of the keywords stored in association with the command for executing image processing, the CPU 10 Image processing is executed using the voice identification result (the above text) based on the second voice signal. That is, the image processing is executed in accordance with the command associated with the text that is the second speech recognition result (step S137). At this time, the CPU 10 may disconnect communication with the terminal device 300 prior to execution of image processing, or may disconnect communication with the terminal device 300 thereafter (step S135).

第２の音声信号に基づく音声認識が不成功の場合（ステップＳ１３１でＮＯ）、すなわち、音声識別の結果であるテキストが画像処理を実行させるためのコマンドに関連付けて記憶しているいずれのキーワードとも一致しない場合、ＣＰＵ１０は、音声認識が不成功であった旨と音声の再入力を促すメッセージとを、端末装置３００に対して送信する（ステップＳ１３３）。好ましくは、ＣＰＵ１０は、音声認識が不成功であった旨と音声の再入力を促すメッセージとを、音声ガイダンスとして端末装置３００に対して送信する。そして、ＣＰＵ１０は、端末装置３００から音声の再入力を受け付けて、上記ステップＳ１２７からの動作を繰り返す。 If speech recognition based on the second speech signal is unsuccessful (NO in step S131), that is, any keyword stored in association with a command for executing text processing on the text that is the result of speech identification If they do not match, the CPU 10 transmits to the terminal device 300 a message indicating that the voice recognition is unsuccessful and a message prompting re-input of the voice (step S133). Preferably, the CPU 10 transmits, to the terminal device 300, as voice guidance, a message indicating that the voice recognition has been unsuccessful and a message prompting the user to re-input the voice. And CPU10 receives the re-input of the audio | voice from the terminal device 300, and repeats the operation | movement from said step S127.

なお、ＣＰＵ１０は、端末装置３００から第２の音声信号の入力がなかった場合には（ステップＳ１２５でＮＯ）、音声認識機能を用いてＭＦＰ１００を制御するモードをＯＦＦとして（ステップＳ１３９）、一連の動作を終了する。 If the second audio signal is not input from the terminal device 300 (NO in step S125), the CPU 10 sets the mode for controlling the MFP 100 using the voice recognition function to OFF (step S139), and a series of steps. End the operation.

好ましくは、ＣＰＵ１０は、端末装置３００からのユーザーの音声入力を受け付けるよりも以前の周囲の音をノイズとして、端末装置３００のマイク３３から取り込む。そして、ＣＰＵ１０は、そのノイズと予め記憶しているレベルとを比較する。周囲のノイズが規定量以上のノイズであると判断した場合には、ＣＰＵ１０は、好ましくは、「静かなところへ移動して下さい」などの予め記憶しているメッセージを端末装置３００に対して送信する。これにより、第２の音声信号に基づく音声認識の精度を向上させることができる。なお、後述するように、上記のノイズのレベルは、ログインユーザーごとに登録されていてもよい。ユーザーの声質によっては、ノイズが多少あっても音声認識しやすかったり、ノイズが少なくても音声認識が難しかったりするためである。さらに、ＣＰＵ１０は、所定レベルをステップＳ１１１の音声識別の判断結果から学習するようにしてもよい。 Preferably, the CPU 10 captures, from the microphone 33 of the terminal device 300, surrounding sounds before receiving the user's voice input from the terminal device 300 as noise. Then, the CPU 10 compares the noise with a prestored level. When it is determined that the ambient noise is greater than the specified amount, the CPU 10 preferably transmits a previously stored message such as “Please move to a quiet place” to the terminal device 300. To do. Thereby, the accuracy of speech recognition based on the second speech signal can be improved. As will be described later, the noise level may be registered for each login user. This is because, depending on the voice quality of the user, it is easy to recognize voice even if there is some noise, or it is difficult to recognize voice even if there is little noise. Further, the CPU 10 may learn a predetermined level from the determination result of voice identification in step S111.

図８は、上記ステップＳ１２３での、音声入力を切り替える処理の具体例を表わしたフローチャートである。図８を参照して、ＣＰＵ１０は、メモリーに記憶されている電話帳などのユーザー情報に、ログインユーザーに関連付けられた端末装置３００のアクセス情報（たとえば電話番号など）が含まれている場合（ステップＳ２０１でＹＥＳ）、その電話番号宛に架電するなどして、端末装置３００との間の通信を開始する（ステップＳ２０３）。その他の例として、ＣＰＵ１０は、自身のアクセス情報（ＵＲＬや電話番号など）を記載したメールを端末装置３００に送信してもよい。 FIG. 8 is a flowchart showing a specific example of the process of switching the voice input in step S123. Referring to FIG. 8, CPU 10 includes a case where access information (for example, a telephone number) of terminal device 300 associated with a login user is included in user information such as a telephone directory stored in memory (step). The communication with the terminal device 300 is started by making a call to the telephone number (YES in S201) (step S203). As another example, the CPU 10 may transmit a mail describing its own access information (URL, telephone number, etc.) to the terminal device 300.

なお、上記の通信開始に先立って、ＣＰＵ１０は、操作パネル１５に「Ａさんの端末装置０９０−＊＊＊＊−＊＊＊＊へ通信を開始します」などの通信開始の通知と、その可否を指示するための「ＯＫ」「ＮＧ」などのボタンとを表示して、ログインユーザーの指示を受け付けるようにしてもよい。そして、ＣＰＵ１０は、この画面において「ＯＫ」ボタンが押された場合に端末装置３００に対する通信を開始するようにしてもよい。 Prior to the start of the communication, the CPU 10 notifies the operation panel 15 of the start of communication such as “communication to Mr. A's terminal device 090-***-***-***”, Buttons such as “OK” and “NG” for instructing permission / inhibition may be displayed to accept the login user's instruction. Then, the CPU 10 may start communication with the terminal device 300 when the “OK” button is pressed on this screen.

ＭＦＰ１００からの通信に対して端末装置３００から応答があった場合（ステップＳ２０７でＹＥＳ）、すなわち、たとえば、上記の端末装置３００への架電に対して端末装置３００のユーザーが応答したり、上記の端末装置３００へのメールに記載されたアクセス情報に基づいて端末装置３００からＭＦＰ１００に対してアクセスがあったりした場合、ＣＰＵ１０は、音声入力を依頼するメッセージを端末装置３００に対して送信する（ステップＳ２０９）。電話で通信する例の場合、ＣＰＵ１０は、音声ガイダンスを送信してもよい。ＵＲＬなどへのアクセスであった場合、ＣＰＵ１０は、テキストデータを送信してもよい。そして、ＣＰＵ１０は、その依頼に応じた端末装置３００から音声信号を受信する（ステップＳ２１１）。 When there is a response from the terminal device 300 to the communication from the MFP 100 (YES in step S207), that is, for example, the user of the terminal device 300 responds to the call to the terminal device 300 or the above When the terminal device 300 accesses the MFP 100 based on the access information described in the mail to the terminal device 300, the CPU 10 transmits a message requesting voice input to the terminal device 300 ( Step S209). In the case of an example in which communication is performed by telephone, the CPU 10 may transmit voice guidance. If the access is to a URL or the like, the CPU 10 may transmit text data. Then, the CPU 10 receives an audio signal from the terminal device 300 according to the request (step S211).

なお、ログインユーザーに関連付けられた端末装置３００のアクセス情報がユーザー情報に含まれていない場合（ステップＳ２０１でＮＯ）、ＣＰＵ１０は、端末装置３００のアクセス情報を取得するための処理を実行する（ステップＳ２０５）。ステップＳ２０５では、たとえば、ＣＰＵ１０は、操作パネル１５に「Ａさんの端末装置の電話番号を入力してください」などを表示して、アクセス情報の直接入力を受け付けるようにしてもよい。または、ＣＰＵ１０は、操作パネル１５に「Ａさんの端末装置からＭＦＰ＊＊＊−＊＊＊−＊＊＊＊に電話をかけて下さい」などのＭＦＰ１００のアクセス情報（たとえば電話番号）を含んだメッセージを表示してＭＦＰ１００へのアクセスを促してもよい。この場合、ＣＰＵ１０は、この表示から所定時間内に受け付けた自身へのアクセスを端末装置３００からのアクセスとみなして、その発信元を端末装置３００のアクセス情報として記憶するようにすればよい。このとき、ＣＰＵ１０は、端末装置３００からＭＦＰ１００へのアクセスを利用して音声入力を受け付けるようにしてもよい。しかしながら、好ましくは、ＣＰＵ１０は、端末装置３００からの通信を用いて通信を開始せずに、いったん、その通信を切断し、取得した端末装置３００のアクセス情報に基づいて通信を開始するようにする。このようにすることで、端末装置３００への通信の課金を抑えることができる。 If the access information of the terminal device 300 associated with the login user is not included in the user information (NO in step S201), the CPU 10 executes a process for acquiring the access information of the terminal device 300 (step S201). S205). In step S <b> 205, for example, the CPU 10 may display “Please enter the phone number of Mr. A's terminal device” on the operation panel 15 to accept direct input of access information. Alternatively, CPU 10 includes access information (for example, a telephone number) of MFP 100 such as “Please call MFP ***-***-***-*** from Mr. A's terminal device” on operation panel 15. A message may be displayed to prompt access to MFP 100. In this case, the CPU 10 may regard the access to itself received within a predetermined time from this display as the access from the terminal device 300 and store the transmission source as the access information of the terminal device 300. At this time, the CPU 10 may accept voice input using access from the terminal device 300 to the MFP 100. However, preferably, the CPU 10 does not start communication using communication from the terminal device 300, but temporarily disconnects the communication and starts communication based on the acquired access information of the terminal device 300. . In this way, charging for communication to the terminal device 300 can be suppressed.

＜実施の形態の効果＞
本実施の形態にかかるＭＦＰ１００が上記の制御を実行することで、音声認識機能を利用して画像処理を指示する際に、周囲の騒音やノイズなどでＭＦＰ１００のマイク１８からの音声信号に基づく音声認識が正しくできない場合に、ユーザーの携帯電話機などの端末装置３００で入力された音声を利用することができる。これにより、ユーザーの利便性を損なうことなく音声認識機能を利用した画像処理の指示の精度を向上させることができる。 <Effect of Embodiment>
When the MFP 100 according to the present embodiment executes the above-described control, the voice based on the voice signal from the microphone 18 of the MFP 100 due to ambient noise or noise when instructing image processing using the voice recognition function. When the recognition cannot be performed correctly, the voice input from the terminal device 300 such as the user's mobile phone can be used. Thereby, the precision of the instruction | indication of the image process using a speech recognition function can be improved, without impairing a user's convenience.

＜変形例＞
より好ましくは、ＣＰＵ１０は、マイク１８で入力された音声を表わす第１の音声信号に基づく音声認識に先だって、第１の音声信号に基づく音声認識の適否を判断する。そして、この音声認識が適切でないと判断した場合、ＣＰＵ１０は、端末装置３００からの音声信号である第２音音声信号に基づく音声認識の結果を用いて画像処理を実行する。 <Modification>
More preferably, the CPU 10 determines whether or not the voice recognition based on the first voice signal is appropriate prior to the voice recognition based on the first voice signal representing the voice input by the microphone 18. If it is determined that this voice recognition is not appropriate, the CPU 10 performs image processing using the result of voice recognition based on the second sound signal that is a sound signal from the terminal device 300.

上記判断の第１の例として、ＣＰＵ１０は、マイク１８で音声入力を受け付ける際に実行中の画像処理に基づいて上記判断を行なう。たとえば、マイク１８で音声入力を受け付ける際にプリント処理を実行中（印刷中）であったり、フィニッシャー処理（ステープル処理等）の実行中であったり、ＢＧＭや効果音などの音声出力を伴う処理中であったりした、大きな音が発生するジョブを実行中であった場合には、音声認識が不成功となる可能性が高い。そこで、ＣＰＵ１０は、マイク１８で音声入力を受け付ける際にこれらの画像処理中であった場合には、第１の音声信号に基づく音声認識が適切でないと判断する。このため、一例として、ＣＰＵ１０は、予め上記のような大きな音のする処理を伴うジョブの種類を記憶しておき、その処理に該当するか否かを判断することで上記の判断を行なう。 As a first example of the above determination, the CPU 10 makes the above determination based on the image processing being executed when the microphone 18 accepts voice input. For example, when a voice input is received by the microphone 18, a print process is being executed (printing), a finisher process (staple process, etc.) is being executed, or a process involving voice output such as BGM or sound effects is being executed. If a job that generates a loud sound is being executed, there is a high possibility that speech recognition will be unsuccessful. Therefore, the CPU 10 determines that the voice recognition based on the first voice signal is not appropriate when the image input is being performed when the microphone 18 receives the voice input. For this reason, as an example, the CPU 10 stores the type of job accompanied by the above-mentioned process that produces a loud sound in advance, and performs the above determination by determining whether or not the process is applicable.

図９および図１０は、変形例にかかるＭＦＰ１００の動作の、第１の例を表わしたフローチャートである。図９および図１０のフローチャートは、図６および図７のフローチャートのステップＳ１０３の処理の後にステップＳ１０４の判断およびステップＳ１２０の処理が加わったものである。すなわち、図９を参照して、変形例の第１の例で、ＣＰＵ１０は、音声認識機能を用いてＭＦＰ１００を制御するモードとした後、マイク１８で音声入力を受け付けるためにマイク１８をＯＮするのに先だって、実行中のジョブが上記したような予め記憶している、大きな音のする処理を伴うジョブの種類に該当するか否かを判断する（ステップＳ１０４）。そして、実行中のジョブがそのようなジョブの種類に該当する場合（ステップＳ１０４でＹＥＳ）、ＣＰＵ１０は、第１の音声入力を受け付けることなく端末装置３００での入力に切り替える処理を行なう（ステップＳ１２３）。このとき、好ましくは、ＣＰＵ１０は、操作パネル１５に、端末装置３００からの音声入力に切り替える旨を表示する（ステップＳ１２０）。このように表示することで、ログインユーザーは、マイク１８からの音声入力を行なうことなく端末装置３００を用いた音声入力にスムーズに切り替えることができる。 9 and 10 are flowcharts illustrating a first example of the operation of MFP 100 according to the modification. The flowcharts of FIGS. 9 and 10 are obtained by adding the determination of step S104 and the process of step S120 after the process of step S103 of the flowcharts of FIGS. That is, referring to FIG. 9, in the first example of the modified example, CPU 10 switches to a mode for controlling MFP 100 using the voice recognition function, and then turns on microphone 18 to accept voice input with microphone 18. Prior to this, it is determined whether or not the job being executed corresponds to the type of job that is stored in advance as described above and involves processing with a loud sound (step S104). When the job being executed corresponds to such a job type (YES in step S104), the CPU 10 performs a process of switching to the input in the terminal device 300 without receiving the first voice input (step S123). ). At this time, preferably, the CPU 10 displays on the operation panel 15 that switching to voice input from the terminal device 300 is performed (step S120). By displaying in this way, the logged-in user can smoothly switch to voice input using the terminal device 300 without performing voice input from the microphone 18.

上記判断の第２の例として、ＣＰＵ１０は、第１の音声信号に含まれるノイズが、予め記憶している規定量以上であるか否かを判断することで上記判断を行なう。マイク１８からの音声入力に先だって周囲の音を表わすノイズが大きい場合、音声認識が不成功となる可能性が高い。そこで、ＣＰＵ１０は、マイク１８で音声入力を受け付けるに先だって周囲の音を表わすノイズが所定レベル以上である場合には、第１の音声信号に基づく音声認識が適切でないと判断する。このため、ＣＰＵ１０は、予め、ノイズのしきい値となる上記の所定レベルを記憶しておく。なお、この所定レベルは、ログインユーザーごとに登録されていてもよい。ユーザーの声質によっては、ノイズが多少あっても音声認識しやすかったり、ノイズが少なくても音声認識が難しかったりするためである。さらに、ＣＰＵ１０は、所定レベルをステップＳ１１１の音声識別の判断結果から学習するようにしてもよい。 As a second example of the above determination, the CPU 10 performs the above determination by determining whether or not the noise included in the first audio signal is greater than or equal to a predetermined amount stored in advance. If the noise representing the surrounding sound is large prior to the voice input from the microphone 18, there is a high possibility that the voice recognition will be unsuccessful. Therefore, the CPU 10 determines that the voice recognition based on the first voice signal is not appropriate when the noise representing the surrounding sound is equal to or higher than a predetermined level before the voice input is received by the microphone 18. For this reason, the CPU 10 stores the above-mentioned predetermined level that becomes a noise threshold value in advance. This predetermined level may be registered for each login user. This is because, depending on the voice quality of the user, it is easy to recognize voice even if there is some noise, or it is difficult to recognize voice even if there is little noise. Further, the CPU 10 may learn a predetermined level from the determination result of voice identification in step S111.

図１１および図１２は、変形例にかかるＭＦＰ１００の動作の、第２の例を表わしたフローチャートである。図１１および図１２のフローチャートは、図６および図７のフローチャートのステップＳ１０５の処理の後にステップＳ１０６−１の処理、ステップＳ１０６−２の判断、およびステップＳ１２０の処理が加わったものである。すなわち、図１１を参照して、変形例の第２の例で、ＣＰＵ１０は、ステップＳ１０５でマイク１８をＯＮすると、ユーザーからの音声入力を受け付けるよりも以前の周囲の音をノイズとして、マイク１８から取り込む（ステップＳ１０６−１）。そして、ＣＰＵ１０は、ノイズと予め記憶しているレベルとを比較し、所定レベルよりも高いノイズが発生しているか否かを判断する（ステップＳ１０６−２）。 11 and 12 are flowcharts showing a second example of the operation of MFP 100 according to the modification. The flowcharts of FIGS. 11 and 12 are obtained by adding the process of step S106-1, the determination of step S106-2, and the process of step S120 after the process of step S105 of the flowcharts of FIGS. That is, referring to FIG. 11, in the second example of the modification, when the CPU 10 turns on the microphone 18 in step S105, the microphone 18 uses the surrounding sound before receiving the voice input from the user as noise. (Step S106-1). Then, the CPU 10 compares the noise with a prestored level and determines whether or not noise higher than the predetermined level is generated (step S106-2).

ノイズが予め記憶している所定レベルよりも高い場合（ステップＳ１０６−２でＹＥＳ）、ＣＰＵ１０は、第１の音声入力を受け付けることなく端末装置３００での入力に切り替える処理を行なう（ステップＳ１２３）。このとき、好ましくは、ＣＰＵ１０は、操作パネル１５に、端末装置３００からの音声入力に切り替える旨を表示する（ステップＳ１２０）。このように表示することで、ログインユーザーは、マイク１８からの音声入力を行なうことなく端末装置３００を用いた音声入力にスムーズに切り替えることができる。 If the noise is higher than a predetermined level stored in advance (YES in step S106-2), CPU 10 performs a process of switching to the input in terminal device 300 without accepting the first voice input (step S123). At this time, preferably, the CPU 10 displays on the operation panel 15 that switching to voice input from the terminal device 300 is performed (step S120). By displaying in this way, the logged-in user can smoothly switch to voice input using the terminal device 300 without performing voice input from the microphone 18.

変形例にかかるＭＦＰ１００が上記の制御を実行することで、音声認識機能を利用して、効率的に画像処理を指示することができる。 When MFP 100 according to the modification executes the above-described control, it is possible to efficiently instruct image processing using the voice recognition function.

なお、上の例では、音声認識をサーバー５００が行なうものとしている。しかしながら、音声認識は、上述したようにＭＦＰ１００で行なってもよい。または、端末装置３００が予め、ＭＦＰ１００に画像処理を実行させるためのコマンドに関連付けたキーワードを記憶しておき、マイク３３で受け付けた音声を表わす音声信号に基づく音声認識を行なってその結果からコマンドを特定して、ＭＦＰ１００に対してコマンドを送信するようにしてもよい。 In the above example, it is assumed that the server 500 performs voice recognition. However, voice recognition may be performed by MFP 100 as described above. Alternatively, terminal device 300 stores in advance a keyword associated with a command for causing MFP 100 to execute image processing, performs voice recognition based on a voice signal representing voice received by microphone 33, and obtains a command from the result. In particular, the command may be transmitted to the MFP 100.

また、端末装置３００が上記のサーバー５００の機能を含み、ＭＦＰ１００からの第１の音声信号に基づく音声認識の結果を用いてＭＦＰ１００を制御するか、マイク３３からの第２の音声信号に基づく音声認識の結果を用いてＭＦＰ１００を制御するかを判断し、その結果に基づいてＭＦＰ１００を制御するようにしてもよい。つまり、図５に表わされた各機能は、画像処理システムに含まれるいずれの装置が有していてもよい。 In addition, the terminal device 300 includes the function of the server 500 described above, and controls the MFP 100 using the voice recognition result based on the first voice signal from the MFP 100 or the voice based on the second voice signal from the microphone 33. It may be determined whether MFP 100 is to be controlled using the recognition result, and MFP 100 may be controlled based on the result. That is, each function illustrated in FIG. 5 may be included in any device included in the image processing system.

さらに、上述の動作をＭＦＰ１００のＣＰＵ１０、端末装置３００のＣＰＵ３０などに実行させるためのプログラムを提供することもできる。このようなプログラムは、コンピューターに付属するフレキシブルディスク、ＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）、ＲＯＭ、ＲＡＭおよびメモリカードなどのコンピューター読取り可能な記録媒体にて記録させて、プログラム製品として提供することもできる。あるいは、コンピューターに内蔵するハードディスクなどの記録媒体にて記録させて、プログラムを提供することもできる。また、ネットワークを介したダウンロードによって、プログラムを提供することもできる。 Furthermore, a program for causing the CPU 10 of the MFP 100, the CPU 30 of the terminal device 300, and the like to execute the above-described operation can be provided. Such a program is recorded on a computer-readable recording medium such as a flexible disk attached to the computer, a CD-ROM (Compact Disk-Read Only Memory), a ROM, a RAM, and a memory card, and provided as a program product. You can also. Alternatively, the program can be provided by being recorded on a recording medium such as a hard disk built in the computer. A program can also be provided by downloading via a network.

なお、本発明にかかるプログラムは、コンピューターのオペレーティングシステム（ＯＳ）の一部として提供されるプログラムモジュールのうち、必要なモジュールを所定の配列で所定のタイミングで呼出して処理を実行させるものであってもよい。その場合、プログラム自体には上記モジュールが含まれずＯＳと協働して処理が実行される。このようなモジュールを含まないプログラムも、本発明にかかるプログラムに含まれ得る。 A program according to the present invention is a program module that is provided as a part of an operating system (OS) of a computer and that executes necessary processes by calling necessary modules in a predetermined arrangement at a predetermined timing. Also good. In that case, the program itself does not include the module, and the process is executed in cooperation with the OS. A program that does not include such a module can also be included in the program according to the present invention.

また、本発明にかかるプログラムは他のプログラムの一部に組込まれて提供されるものであってもよい。その場合にも、プログラム自体には上記他のプログラムに含まれるモジュールが含まれず、他のプログラムと協働して処理が実行される。このような他のプログラムに組込まれたプログラムも、本発明にかかるプログラムに含まれ得る。 The program according to the present invention may be provided by being incorporated in a part of another program. Even in this case, the program itself does not include the module included in the other program, and the process is executed in cooperation with the other program. Such a program incorporated in another program can also be included in the program according to the present invention.

提供されるプログラム製品は、ハードディスクなどのプログラム格納部にインストールされて実行される。なお、プログラム製品は、プログラム自体と、プログラムが記録された記録媒体とを含む。 The provided program product is installed in a program storage unit such as a hard disk and executed. The program product includes the program itself and a recording medium on which the program is recorded.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１０，３０ＣＰＵ、１１，３１ＲＯＭ、１２，３２ＲＡＭ、１３スキャナー、１４プリンター、１５操作パネル、１６ＨＤ、１７，３６ネットワークコントローラー、１８，３３マイク、３４スピーカー、３５タッチパネル、３００端末装置、１００ＭＦＰ、１０１第１音声入力部、１０２第２音声入力部、１０３音声認識部、１０４送信部、１０５認識結果入力部、１０６判断部、１０７通信部、１０８制御部、１０９特定部、１６１コマンド記憶部、５００サーバー。 10, 30 CPU, 11, 31 ROM, 12, 32 RAM, 13 Scanner, 14 Printer, 15 Operation panel, 16 HD, 17, 36 Network controller, 18, 33 Microphone, 34 Speaker, 35 Touch panel, 300 Terminal device, 100 MFP, 101 First voice input unit, 102 Second voice input unit, 103 Voice recognition unit, 104 Transmission unit, 105 Recognition result input unit, 106 Judgment unit, 107 Communication unit, 108 Control unit, 109 Identification unit, 161 Command storage Department, 500 servers.

Claims

A control device for an image forming apparatus,
First audio input means for receiving input of a first audio signal representing audio input from a microphone connected to the image forming apparatus;
A communication means for communicating with the terminal device;
A second voice input means for communicating with the terminal device and receiving an input of a second voice signal representing a voice inputted from a microphone connected to the terminal device;
Speech recognition means for performing speech recognition based on a speech signal;
Control means for specifying image processing associated with a recognition result by the voice recognition means and controlling the image forming apparatus to execute the image processing;
The control unit controls the image forming apparatus using a voice recognition result based on the first voice signal, or controls the image forming apparatus using a voice recognition result based on the second voice signal. And a determination means for determining whether to
The control unit controls the image forming apparatus using a result of speech recognition based on the first speech signal when speech recognition based on the first speech signal is successful, and when unsuccessful. A control device that controls the image forming apparatus using a result of voice recognition based on the second voice signal.

The determination means further determines whether or not the voice recognition based on the first voice signal is appropriate prior to the voice recognition based on the first voice signal in the voice recognition means,
The control means controls the image forming apparatus using a result of voice recognition based on the second voice signal when it is determined that voice recognition based on the first voice signal is not appropriate. Item 2. The control device according to Item 1.

The determination unit determines whether or not speech recognition based on the first audio signal is appropriate by determining whether or not noise included in the first audio signal is greater than or equal to a predetermined amount. The control device described in 1.

The determination unit determines whether sound recognition based on the first sound signal is appropriate based on image processing being performed in the image forming apparatus when receiving sound input from the microphone included in the image forming apparatus. The control device according to claim 2.

When the determination unit determines that the control unit controls the image forming apparatus using a result of voice recognition based on the second audio signal, the communication unit establishes communication with the terminal device, The control device according to claim 1, wherein the second voice input unit accepts an input of the second voice signal.

When the determination unit determines that the control unit controls the image forming apparatus using the result of the voice recognition based on the second audio signal, the noise included in the second audio signal further includes a specified amount. Determine whether or not
The said communication means transmits the message memorize | stored beforehand with respect to the said terminal device, when the noise contained in a said 2nd audio | voice signal is more than the said predetermined amount, The any one of Claims 1-5 The control device described in 1.

First voice input means for receiving input of a first voice signal representing voice input from a connected microphone;
A communication means for communicating with the terminal device;
A second voice input means for communicating with the terminal device and receiving an input of a second voice signal representing a voice inputted from a microphone connected to the terminal device;
Speech recognition means for performing speech recognition based on a speech signal;
An execution unit for specifying the image processing associated with the recognition result by the voice recognition unit and executing the specified image processing;
The execution means executes the image processing specified from the result of speech recognition based on the first sound signal, or executes the image processing specified from the result of speech recognition based on the second sound signal. And a determination means for determining whether to
The execution means executes the image processing specified from the result of speech recognition based on the first speech signal when speech recognition based on the first speech signal is successful, and when unsuccessful. An image forming apparatus that executes the image processing specified from a result of voice recognition based on the second voice signal.

A terminal device capable of communicating with an image forming apparatus,
Speech recognition means for performing speech recognition based on a speech signal representing speech input from a connected microphone;
A terminal device comprising: control means for specifying image processing associated with a recognition result by the voice recognition means and outputting a control signal to the image forming apparatus to execute the image processing.

Audio input means for communicating with the image forming apparatus and receiving an input of a first audio signal representing audio input from a microphone connected to the image forming apparatus;
The control unit uses the result of voice recognition based on the first voice signal as the second voice signal, the voice signal representing the voice input from the microphone connected to the terminal device as the image formation. Determination means for determining whether to control the apparatus or to control the image forming apparatus using a result of voice recognition based on the second voice signal;
The control unit controls the image forming apparatus using a result of speech recognition based on the first speech signal when speech recognition based on the first speech signal is successful, and when unsuccessful. The terminal device according to claim 8, wherein the image forming apparatus is controlled using a result of voice recognition based on the second voice signal.

A control method of an image forming apparatus using sound,
Receiving an input of a first audio signal representing audio input from a microphone connected to the image forming apparatus;
Performing speech recognition based on the first speech signal;
When speech recognition based on the first speech signal is successful, the image processing associated with the result of speech recognition based on the first speech signal is identified and the image processing is executed on the image forming apparatus Step to
Receiving voice signal input from a microphone connected to a terminal device when voice recognition based on the first voice signal is unsuccessful;
Performing speech recognition based on the second speech signal;
And a step of specifying image processing associated with a result of speech recognition based on the second audio signal and causing the image forming apparatus to execute the image processing.

A program for causing a computer to control an image forming apparatus,
Receiving an input of a first audio signal representing audio input from a microphone connected to the image forming apparatus;
Performing speech recognition based on the first speech signal;
When speech recognition based on the first speech signal is successful, the image processing associated with the result of speech recognition based on the first speech signal is identified and the image processing is executed on the image forming apparatus Step to
A step of communicating with a terminal device when voice recognition based on the first voice signal is unsuccessful and receiving an input of a second voice signal representing a voice input from a microphone connected to the terminal device; ,
Performing speech recognition based on the second speech signal;
A control program for causing the computer to execute the step of specifying the image processing associated with the result of speech recognition based on the second audio signal and causing the image forming apparatus to execute the image processing.

A program for causing a computer to control a terminal device,
Receiving an input of an audio signal representing an audio input from a microphone connected to the terminal device;
Performing speech recognition based on the speech signal;
A control program for causing the computer to execute the step of identifying image processing associated with the result of the speech recognition and outputting a control signal to the image forming apparatus to execute the image processing.

Communicating with the image forming apparatus and receiving an input of a first audio signal representing an audio input from a microphone connected to the image forming apparatus;
Whether to control the image forming apparatus using a voice recognition result based on the first voice signal, with the voice signal representing the voice inputted from the microphone connected to the terminal device as a second voice signal. And causing the computer to further execute a step of determining whether to control the image forming apparatus using a result of speech recognition based on the second speech signal,
In the step of outputting the control signal, if the speech recognition based on the first speech signal is successful, the image forming apparatus is controlled using the speech recognition result based on the first speech signal, 13. The control program according to claim 12, wherein the image forming apparatus is controlled using a result of speech recognition based on the second speech signal.