JP2021092924A

JP2021092924A - Voice operating system, image forming device, voice operating method, voice operating server, and voice operating program

Info

Publication number: JP2021092924A
Application number: JP2019222362A
Authority: JP
Inventors: 恵太石原; Keita Ishihara
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2021-06-17

Abstract

To allow a user to confirm that those around a user cannot hear voice operating information the user wants to be secret about a voice operating system.SOLUTION: A voice operation system S comprises: a microphone 21 which inputs voice; a display unit 13 which displays information; a speaker 22 which outputs voice; and a response control unit 42 which selects whether to cause the speaker 22 to output voice or whether to cause the display unit 13 to display information on the basis of a user voice input to the microphone 21.SELECTED DRAWING: Figure 1

Description

本発明は、音声操作システム、画像形成装置、音声操作方法、音声操作サーバ、および、音声操作プログラムに関する。 The present invention relates to a voice operation system, an image forming apparatus, a voice operation method, a voice operation server, and a voice operation program.

複合機の音声操作において、ユーザが声で発した設定内容に従って操作を完了することが求められる。例えば、スキャンジョブの場合、複合機は、ユーザの音声に従ってスキャンジョブの設定を行い、ジョブを実行する。音声操作の場合、ユーザの発した音声が必ずしも操作に適切とは限らない。そのため、ユーザの音声を確認することが必要である。複合機とは、いわゆるＭＦＰ（Multi Function Peripheral）のことをいう。 In the voice operation of the multifunction device, it is required to complete the operation according to the setting contents uttered by the user. For example, in the case of a scan job, the multifunction device sets the scan job according to the voice of the user and executes the job. In the case of voice operation, the voice uttered by the user is not always appropriate for the operation. Therefore, it is necessary to confirm the user's voice. A multifunction device is a so-called MFP (Multi Function Peripheral).

ユーザの音声指示による設定内容の確認方法として、表示部（パネル）にユーザの音声指示を表示して、その設定内容を確認することが考えられる。しかし、複合機の設定内容は多岐に渡り、かつ、表示部に表示できる内容は限られるため、すべての設定内容を一つの画面に表示することは難しい。 As a method of confirming the setting contents by the user's voice instruction, it is conceivable to display the user's voice instruction on the display unit (panel) and confirm the setting contents. However, since the setting contents of the multifunction device are diverse and the contents that can be displayed on the display unit are limited, it is difficult to display all the setting contents on one screen.

ユーザの音声操作を簡単な設定項目に限定して、常に表示部だけにも表示することが考えられるが、設定項目が限定されてしまう。
多岐に渡る複合機の設定内容に対応するため、複合機のスピーカにて設定内容の復唱を行い、ユーザに確認を求める方法が考えられる。 It is conceivable to limit the user's voice operation to simple setting items and always display only on the display unit, but the setting items are limited.
In order to deal with a wide variety of settings of the multifunction device, it is conceivable to repeat the settings on the speaker of the multifunction device and ask the user for confirmation.

特開２０１８−１９４８３２号公報JP-A-2018-194832

しかしながら、複合機の設定内容には、社外の取引先のスキャン宛先や、公にしたくない個人ファイル名等の個人情報や、機密情報といった秘匿情報であって、周囲に出来る限り知られたくない場合がある。このような設定内容を音声操作（入力）する際、複合機に復唱され、秘匿にして置きたかったものが意図せずに公になってしまうおそれがある。 However, if the settings of the multifunction device include scan destinations of external business partners, personal information such as personal file names that you do not want to make public, or confidential information such as confidential information, and you do not want to be known to the surroundings as much as possible. There is. When such a setting content is voice-operated (input), it may be repeated by the multifunction device, and what was desired to be kept secret may be unintentionally made public.

そこで、音声操作時において、ＭＦＰ（複合機）が、ユーザ音声の音量に応じて出力音量を制御することが考えられる（特許文献１）。しかし、ユーザは、周囲に聞かれたくない情報を、通常の小さな声より、さらに小さな声で音声入力する。そのため、複合機が復唱する音量は、小さすぎてユーザに聞こえないおそれがある。 Therefore, it is conceivable that the MFP (multifunction device) controls the output volume according to the volume of the user's voice during voice operation (Patent Document 1). However, the user inputs information that he / she does not want to be heard by voice in a quieter voice than in a normal quiet voice. Therefore, the volume that the multifunction device repeats may be too low for the user to hear.

出力音量が小さすぎるとユーザに聞こえなくなるために、複合機は、復唱時の出力音量を所定値よりも小さくしないことが考えられる。しかし、それでは秘匿効果が低く、複合機による復唱が周囲に聞こえてしまうおそれがある。 If the output volume is too low, the user cannot hear it. Therefore, it is conceivable that the multifunction device does not reduce the output volume at the time of recitation below a predetermined value. However, this has a low concealment effect, and there is a risk that the repetition by the multifunction device will be heard by the surroundings.

そこで、本発明は、音声操作システム、画像形成装置、音声操作方法、音声操作サーバ、および、音声操作プログラムについて、ユーザが、秘匿にしたい音声操作情報を周囲に聞こえないように確認することを課題とする。 Therefore, it is an object of the present invention to confirm that the user does not hear the voice operation information to be kept secret with respect to the voice operation system, the image forming apparatus, the voice operation method, the voice operation server, and the voice operation program. And.

すなわち、本発明の上記課題は、下記の構成により解決される。 That is, the above problem of the present invention is solved by the following configuration.

（１）音声を入力する音声入力部と、
情報を表示する表示部と、
音声を出力する音声出力部と、
前記音声入力部に入力されたユーザの音声に基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する応答制御部と、
を有する音声操作システム。 (1) A voice input unit for inputting voice and
A display unit that displays information and
An audio output unit that outputs audio and
A response control unit that selects whether to repeat by the voice output unit or / or to display a confirmation response of the utterance content on the display unit based on the user's voice input to the voice input unit.
Voice operation system with.

（２）前記応答制御部は、前記ユーザが発した音声の音量に基づいて、前記音声出力部による復唱の音量を変更する、
（１）に記載の音声操作システム。 (2) The response control unit changes the volume of the repeat by the voice output unit based on the volume of the voice emitted by the user.
The voice operation system according to (1).

（３）前記応答制御部は、前記ユーザが発した音声の音量が閾値を超える場合には、前記音声出力部による復唱を行い、
前記ユーザが発した音声の音量が前記閾値以下である場合には、前記表示部に発話内容の確認応答を表示する、
（１）又は（２）に記載の音声操作システム。 (3) When the volume of the voice emitted by the user exceeds the threshold value, the response control unit repeats the voice output unit.
When the volume of the voice uttered by the user is equal to or lower than the threshold value, a confirmation response of the utterance content is displayed on the display unit.
The voice operation system according to (1) or (2).

（４）前記ユーザを撮影する撮影部を有し、
前記応答制御部は、前記音声出力部による復唱の際の前記ユーザのポーズに応じて、次回の音声操作の際の前記音声出力部による復唱の音量を下げるか、前記表示部に発話内容の確認応答を表示するか、を選択可能な画面を前記表示部に表示させる、
（３）に記載の音声操作システム。 (4) It has a photographing unit for photographing the user, and has a photographing unit.
The response control unit lowers the volume of the repeat by the voice output unit in the next voice operation, or confirms the utterance content on the display unit, in response to the pause of the user when the voice output unit repeats. Display a screen on the display unit where you can select whether to display the response.
The voice operation system according to (3).

（５）前記応答制御部は、前記表示部に発話内容の確認応答の表示を行わせる際に、ユーザに対して前記表示部の確認を誘導する音声を前記音声出力部に出力する、
（１）又は（２）に記載の音声操作システム。 (5) The response control unit outputs a voice for inducing the user to confirm the display unit to the voice output unit when the display unit displays the confirmation response of the utterance content.
The voice operation system according to (1) or (2).

（６）前記ユーザとの距離を検知する距離センサを有し、
前記応答制御部は、更に前記距離センサが検知した前記ユーザとの距離に基づいて、前記音声出力部による復唱を行うか、又は、前記表示部に発話内容の確認応答を表示するかを選択する、
（１）又は（２）に記載の音声操作システム。 (6) It has a distance sensor that detects the distance to the user.
The response control unit further selects whether to repeat the voice output unit or display a confirmation response of the utterance content on the display unit based on the distance to the user detected by the distance sensor. ,
The voice operation system according to (1) or (2).

（７）前記ユーザを撮影する撮影部を有し、
前記応答制御部は、更に前記ユーザの顔又は視線の向きに基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する、
（１）又は（２）に記載の音声操作システム。 (7) It has a photographing unit for photographing the user, and has a photographing unit.
The response control unit further selects whether to repeat the voice output unit or / or display a confirmation response of the utterance content on the display unit based on the direction of the user's face or line of sight.
The voice operation system according to (1) or (2).

（８）ユーザを撮影する撮影部を有し、
前記応答制御部は、更に前記撮影部が撮影したユーザのポーズに基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する、
（１）又は（２）に記載の音声操作システム。 (8) It has a shooting unit that shoots the user, and has a shooting unit.
The response control unit further selects whether to repeat the voice output unit or / or display a confirmation response of the utterance content on the display unit based on the pose of the user photographed by the photographing unit. ,
The voice operation system according to (1) or (2).

（９）前記応答制御部は、更に前記ユーザが発した音声の発話内容の秘匿性に基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する、
（１）又は（２）に記載の音声操作システム。 (9) The response control unit further repeats by the voice output unit based on the confidentiality of the utterance content of the voice uttered by the user, or / or sends a confirmation response of the utterance content to the display unit. Select whether to display,
The voice operation system according to (1) or (2).

（１０）前記応答制御部は、更に前記ユーザが音声を発したときの前記表示部の画面種別に基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する、
（１）又は（２）に記載の音声操作システム。 (10) The response control unit further repeats the voice output unit based on the screen type of the display unit when the user emits a voice, or / or displays the utterance content on the display unit. Select whether to display an acknowledgment,
The voice operation system according to (1) or (2).

（１１）前記応答制御部は、前記ユーザが視覚障碍を有する場合、前記音声出力部による復唱を行う、
（１）又は（２）に記載の音声操作システム。 (11) The response control unit repeats the voice output unit when the user has a visual impairment.
The voice operation system according to (1) or (2).

（１２）前記応答制御部は、前記ユーザが視覚障碍を有する場合、前記表示部に発話内容の確認応答を表示することに代えて、前記音声出力部による復唱を最低音量で行う、
ことを特徴とする（１１）に記載の音声操作システム。 (12) When the user has a visual impairment, the response control unit repeats the speech output unit at the lowest volume instead of displaying the confirmation response of the utterance content on the display unit.
The voice operation system according to (11).

（１３）原稿をスキャンするスキャン部を有し、
前記応答制御部は、前記スキャン部に置かれた原稿が複写禁止である場合には、前記音声出力部による所定音量以上での警告出力を行わせる、
請求項１又は２に記載の音声操作システム。 (13) It has a scanning unit for scanning documents, and has a scanning unit.
When the document placed on the scanning unit is prohibited from copying, the response control unit causes the voice output unit to output a warning at a predetermined volume or higher.
The voice operation system according to claim 1 or 2.

（１４）音声を入力する音声入力部と、
情報を表示する表示部と、
音声を出力する音声出力部と、
前記音声入力部に入力されたユーザの音声に基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する応答制御部と、
を有する画像形成装置。 (14) A voice input unit for inputting voice and
A display unit that displays information and
An audio output unit that outputs audio and
A response control unit that selects whether to repeat by the voice output unit or / or to display a confirmation response of the utterance content on the display unit based on the user's voice input to the voice input unit.
An image forming apparatus having.

（１５）音声を入力する音声入力部と、
情報を表示する表示部と、
音声を出力する音声出力部と、
応答制御部とを備えた装置の音声操作方法であって、
前記音声入力部が、ユーザの音声を入力し、
前記応答制御部が、前記ユーザの音声に基づいて、前記音声出力部による復唱を行うか、または／および、前記表示部に発話内容の確認応答を表示するかを選択する、
音声操作方法。 (15) A voice input unit for inputting voice and
A display unit that displays information and
An audio output unit that outputs audio and
It is a voice operation method of a device equipped with a response control unit.
The voice input unit inputs the user's voice and
The response control unit selects whether to repeat the voice output unit based on the user's voice, or / and to display a confirmation response of the utterance content on the display unit.
Voice operation method.

（１６）音声入力装置に入力されたユーザの音声を認識したテキストデータから、ユーザの指示内容を認識する指示内容認識部と、
前記音声に基づいて、音声出力装置による復唱を行わせるか、または／および、表示装置に発話内容の確認応答を表示させるかを選択する応答制御部と、
を有する音声操作サーバ。 (16) An instruction content recognition unit that recognizes the user's instruction content from the text data that recognizes the user's voice input to the voice input device, and
A response control unit that selects whether to repeat the voice by the voice output device based on the voice, and / or to display the confirmation response of the utterance content on the display device.
Voice operation server with.

（１７）コンピュータに、
音声入力装置に入力されたユーザの音声を認識したテキストデータから、ユーザの指示内容を認識する手順、
前記音声に基づいて、音声出力装置による復唱を行わせるか、または／および、表示装置に発話内容の確認応答を表示させるかを選択する手順、
を実行させるための音声操作プログラム。
(17) On the computer
Procedure for recognizing the user's instruction content from the text data that recognizes the user's voice input to the voice input device,
A procedure for selecting whether to repeat the voice by the voice output device or / and display the confirmation response of the utterance content on the display device based on the voice.
A voice operation program for executing.

本発明によれば、ユーザが、秘匿にしたい音声操作情報を周囲に聞こえないように確認することが可能となる。 According to the present invention, it is possible for the user to confirm that the voice operation information to be kept secret is not heard by the surroundings.

本実施形態における音声操作システムの概略を示す構成図である。It is a block diagram which shows the outline of the voice operation system in this embodiment. 本実施形態における動作条件テーブルの一例を示す図である。It is a figure which shows an example of the operation condition table in this embodiment. 音声操作処理を示すフローチャート（その１）である。It is a flowchart (the 1) which shows the voice operation processing. 音声操作処理を示すフローチャート（その２）である。It is a flowchart (2) which shows the voice operation processing. 音声操作処理を示すフローチャート（その３）である。It is a flowchart (3) which shows the voice operation processing. 複写禁止原稿の確認応答画面の一例である。This is an example of a confirmation response screen for a copy-prohibited manuscript. 秘匿にすべき情報を含む確認応答画面の一例である。This is an example of an acknowledgment screen containing information that should be kept secret. 秘匿にすべき情報を含む確認応答画面の他の例である。This is another example of an acknowledgment screen that contains information that should be kept secret. 次回の応答の指示画面の他の例である。This is another example of the instruction screen for the next response. 設置環境の反響度合いを検知する処理を示すフローチャートである。It is a flowchart which shows the process of detecting the degree of reverberation of an installation environment. こっそりと話しかけるポーズを検知する処理を示すフローチャートである。It is a flowchart which shows the process which detects the pose which talks secretly. こっそりと話しかけるポーズの一例である。This is an example of a pose that talks secretly. 内緒のポーズを検知する処理を示すフローチャートである。It is a flowchart which shows the process of detecting a secret pose. 内緒のポーズの一例である。This is an example of a secret pose.

以降、本発明を実施するための形態を、各図を参照して詳細に説明する。
本発明の第１のポイントは、ユーザの音声の音量に応じて、複合機は、音声で応答するか、または表示部に応答を表示するかを切り替えて制御することである。
第２のポイントは、パスワード等の秘匿すべき情報ならば、複合機は、音量の大小に関らず、音声で応答せず、表示部に応答を表示することである。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to each figure.
The first point of the present invention is to control the multifunction device by switching between responding by voice and displaying the response on the display unit according to the volume of the voice of the user.
The second point is that if the information should be kept secret such as a password, the multifunction device does not respond by voice regardless of the volume level, and displays the response on the display unit.

第３のポイントは、視覚障碍者に対して、複合機は最小閾値の音量の音声にて応答し、表示部に応答を表示しないことである。
第４のポイントは、複写禁止原稿がスキャン部に置かれていたならば、複合機は、ユーザ音声の音量の大小に関らず、所定音量の音声で応答することである。 The third point is that the multifunction device responds to a visually impaired person with a voice having a minimum threshold volume, and does not display the response on the display unit.
The fourth point is that if the copy-prohibited original is placed in the scanning unit, the multifunction device responds with a predetermined volume of voice regardless of the volume of the user's voice.

図１は、本実施形態における音声操作システムＳの概略を示す構成図である。
音声操作システムＳは、複合機１と、音声処理サーバ３と、複合機制御サーバ４とが不図示のネットワークによって相互に通信可能に接続されて構成される。以下、複合機制御サーバ４が主体となる動作を、複合機１を主体として記載する場合がある。
なお、このような構成限られず、音声処理サーバ３の各機能部や複合機制御サーバ４の各機能部が、複合機１の内部に設けられてもよい。 FIG. 1 is a configuration diagram showing an outline of the voice operation system S in the present embodiment.
The voice operation system S is configured by connecting the multifunction device 1, the voice processing server 3, and the multifunction device control server 4 so as to be able to communicate with each other by a network (not shown). Hereinafter, the operation in which the multifunction device control server 4 is the main body may be described with the multifunction device 1 as the main body.
The configuration is not limited to this, and each functional unit of the voice processing server 3 and each functional unit of the multifunction device control server 4 may be provided inside the multifunction device 1.

複合機１は、印刷機能とスキャン機能とファックス機能を有する画像形成装置であり、制御部１１と、操作部１２と、表示部１３と、スキャン部１４と、ファックス部１０と、記憶部１５と、印刷部１７と、カードリーダ１８とを備える。
制御部１１は、この複合機１を統括制御する部位であり、例えば不図示のＣＰＵ（Central Processing Unit）とＲＡＭ（Random Access Memory）とＲＯＭ（Read Only Memory）とを含んで構成される。 The multifunction device 1 is an image forming apparatus having a printing function, a scanning function, and a fax function, and includes a control unit 11, an operation unit 12, a display unit 13, a scanning unit 14, a fax unit 10, and a storage unit 15. A printing unit 17 and a card reader 18 are provided.
The control unit 11 is a portion that controls the multifunction device 1 in an integrated manner, and includes, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) (not shown).

操作部１２は、この複合機１の操作情報を入力する部位であり、例えばタッチパネルディスプレイのタッチパネル部分である。
表示部１３は、複合機１の設定内容等の情報を表示する表示部であり、例えばタッチパネルディスプレイのディスプレイ部分である。 The operation unit 12 is a portion for inputting operation information of the multifunction device 1, and is, for example, a touch panel portion of a touch panel display.
The display unit 13 is a display unit that displays information such as the setting contents of the multifunction device 1, and is, for example, a display unit of a touch panel display.

スキャン部１４は、原稿を光学的に読み取る部位である。ファックス部１０は、電話回線を介してファックスを送受信する部位である。記憶部１５は、この複合機１の設定内容等を記憶する部位である。印刷部１７は、記録媒体に画像を形成する部位である。カードリーダ１８は、ＩＤカードに記憶されたユーザの識別情報を読み取る部位である。カードリーダ１８が読み取った識別情報に基づき、複合機１は、現在のユーザを識別可能である。 The scanning unit 14 is a portion for optically reading the document. The fax unit 10 is a portion for transmitting and receiving faxes via a telephone line. The storage unit 15 is a portion that stores the setting contents and the like of the multifunction device 1. The printing unit 17 is a portion that forms an image on a recording medium. The card reader 18 is a portion that reads the user's identification information stored in the ID card. Based on the identification information read by the card reader 18, the multifunction device 1 can identify the current user.

複合機１は更に、カメラ１６と、距離センサ１９と、マイク２１と、スピーカ２２とを備える。マイク２１は、ユーザの音声を入力する音声入力部である。マイク２１に入力された音声は、音声データと音量データとに変換されて、音声処理サーバ３に送信される。 The multifunction device 1 further includes a camera 16, a distance sensor 19, a microphone 21, and a speaker 22. The microphone 21 is a voice input unit for inputting a user's voice. The voice input to the microphone 21 is converted into voice data and volume data and transmitted to the voice processing server 3.

スピーカ２２は、合成された音声を出力する音声出力部である。スピーカ２２は、音声処理サーバ３から出力された音声データと音量データに基づく応答の音声を再生する。 The speaker 22 is a voice output unit that outputs the synthesized voice. The speaker 22 reproduces the voice of the response based on the voice data and the volume data output from the voice processing server 3.

カメラ１６は、ユーザを撮影する撮影部である。カメラ１６が撮影したユーザ画像に基づき、制御部１１は、ユーザのポーズを抽出することができる。撮影画像に基づいて人のポーズを抽出するソフトウエアライブラリとして、例えば、カーネギーメロン大学のZhe Caoらが開発したOpenPoseがある。
距離センサ１９は、マイク２１とユーザとの距離を検知する部位である。 The camera 16 is a shooting unit that shoots the user. Based on the user image taken by the camera 16, the control unit 11 can extract the pose of the user. As a software library that extracts human poses based on captured images, for example, there is OpenPose developed by Zhe Cao et al. Of Carnegie Mellon University.
The distance sensor 19 is a portion that detects the distance between the microphone 21 and the user.

音声処理サーバ３と複合機制御サーバ４とは、不図示のＣＰＵとＲＡＭとＲＯＭとを含んで構成されるコンピュータである。不図示のＣＰＵがプログラムを実行することにより、各機能部が具現化される。 The voice processing server 3 and the multifunction device control server 4 are computers including a CPU (not shown), a RAM, and a ROM (not shown). Each functional unit is embodied by executing a program by a CPU (not shown).

音声処理サーバ３には、音声認識部３１と音声合成部３２とが具現化されている。音声認識部３１は、マイク２１が収録した音声データを認識して、テキストデータに変換する部位である。音声認識部３１が出力したテキストデータと音量データは、複合機制御サーバ４に出力される。
音声合成部３２は、テキストデータから音声データを合成する部位である。音量データと、音声合成部３２が合成した音声データは、スピーカ２２に出力される。 The voice processing server 3 embodies a voice recognition unit 31 and a voice synthesis unit 32. The voice recognition unit 31 is a portion that recognizes the voice data recorded by the microphone 21 and converts it into text data. The text data and volume data output by the voice recognition unit 31 are output to the multifunction device control server 4.
The voice synthesis unit 32 is a part that synthesizes voice data from text data. The volume data and the voice data synthesized by the voice synthesis unit 32 are output to the speaker 22.

複合機制御サーバ４には、指示内容認識部４１と応答制御部４２とコマンド変換部４３とが具現化されている。複合機制御サーバ４の不図示のＣＰＵが、不図示の記憶部に格納された音声操作プログラムを実行することにより、音声による操作を実現する音声操作サーバとして機能する。
指示内容認識部４１は、音声認識部３１が認識したテキストデータから、ユーザの指示内容を認識する部位である。指示内容認識部４１が認識した指示内容は、応答制御部４２とコマンド変換部４３に出力され、更に音量データが応答制御部４２に出力される。 The multifunction device control server 4 embodies an instruction content recognition unit 41, a response control unit 42, and a command conversion unit 43. The CPU (not shown) of the multifunction device control server 4 functions as a voice operation server that realizes a voice operation by executing a voice operation program stored in a storage unit (not shown).
The instruction content recognition unit 41 is a portion that recognizes the user's instruction content from the text data recognized by the voice recognition unit 31. The instruction content recognized by the instruction content recognition unit 41 is output to the response control unit 42 and the command conversion unit 43, and the volume data is further output to the response control unit 42.

応答制御部４２は、ユーザの指示内容とユーザの音声に係る音量とに基づき、どのような応答を返すかを制御する部位である。応答制御部４２は、マイク２１に入力されたユーザの音声に基づいて、スピーカ２２による復唱を行わせるか、または／および、表示部１３に発話内容の確認表示画面を表示させるのかを選択する。
応答制御部４２は、音声合成部３２に応答のテキストデータと音量データとを出力し、コマンド変換部４３にメッセージの表示指示を出力する。 The response control unit 42 is a unit that controls what kind of response is returned based on the content of the user's instruction and the volume related to the user's voice. The response control unit 42 selects whether to repeat the speech by the speaker 22 based on the user's voice input to the microphone 21 or / or to display the confirmation display screen of the utterance content on the display unit 13.
The response control unit 42 outputs the text data and volume data of the response to the voice synthesis unit 32, and outputs a message display instruction to the command conversion unit 43.

コマンド変換部４３は、指示内容認識部４１が認識した指示内容や、応答制御部４２が出力したメッセージの表示指示に基づき、複合機１の表示部１３に所定表示を指示する部位である。 The command conversion unit 43 is a portion that instructs the display unit 13 of the multifunction device 1 to perform a predetermined display based on the instruction content recognized by the instruction content recognition unit 41 and the display instruction of the message output by the response control unit 42.

図２は、動作条件テーブル１５１の一例を示す図である。
動作条件テーブル１５１の左側４列は、複合機１に入力される様々な動作条件が記載されており、右側３列は、複合機１から出力される応答条件が記載されている。 FIG. 2 is a diagram showing an example of the operating condition table 151.
The four columns on the left side of the operating condition table 151 describe various operating conditions input to the multifunction device 1, and the three columns on the right side describe the response conditions output from the multifunction device 1.

第１列は、入力音量レベルが記載されている。入力音量レベル１００％とは、入力音量が５０％を超え、かつ１００％以下であることを示している。入力音量レベル５０％とは、入力音量が２５％を超え、かつ５０％以下であることを示している。入力音量レベル２５％とは、入力音量が２５％以下であることを示している。 The first column describes the input volume level. The input volume level of 100% means that the input volume exceeds 50% and is 100% or less. The input volume level of 50% means that the input volume exceeds 25% and is 50% or less. The input volume level of 25% indicates that the input volume is 25% or less.

第２列は、入力された音声情報にパスワードが含まれているか否かを示している。パスワードが含まれている場合が「〇」であり、パスワードが含まれていない場合が「−」である。 The second column indicates whether or not the input voice information includes a password. When the password is included, it is "○", and when the password is not included, it is "-".

第３列は、ユーザは視覚障碍者であるか否かを示している。視覚障碍者の場合は「〇」であり、視覚障碍者ではない場合が「−」である。 The third column shows whether or not the user is visually impaired. In the case of a visually impaired person, it is "○", and in the case of a non-visually impaired person, it is "-".

第４列は、複写禁止原稿が置かれているか否かを示している。複写禁止原稿が置かれている場合は「〇」であり、複写禁止原稿が置かれていない場合が「−」である。 The fourth column shows whether or not a copy-prohibited manuscript is placed. If a copy-prohibited manuscript is placed, it is "○", and if a copy-prohibited manuscript is not placed, it is "-".

第５列は、スピーカ２２の出力音量レベルを示している。第６列は、スピーカ２２が出力する応答内容を示している。第７列は、表示部１３に表示される応答内容を示している。 The fifth column shows the output volume level of the speaker 22. The sixth column shows the response contents output by the speaker 22. The seventh column shows the response contents displayed on the display unit 13.

《ユーザの操作音声の入力音量に応じた制御》
動作条件テーブル１５１の第１行から第３行は、入力された音声情報にパスワードが含まれておらず、ユーザは視覚障碍者ではなく、複写禁止原稿が置かれていない場合の動作を示している。 << Control according to the input volume of the user's operation voice >>
The first to third rows of the operating condition table 151 show the operation when the input voice information does not include the password, the user is not a visually impaired person, and the copy prohibited manuscript is not placed. There is.

第１行目は、ユーザが複合機１を音声で操作し、かつ入力音量が５０％を超え、１００％以下である場合を示している。このとき、複合機１は、出力音量レベル１００％でユーザの操作音声を復唱し、表示部１３に発話内容に関する確認内容を表示しない。 The first line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 50% and is 100% or less. At this time, the multifunction device 1 repeats the user's operation voice at the output volume level of 100%, and does not display the confirmation content regarding the utterance content on the display unit 13.

第２行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％を超え、５０％以下である場合を示している。このとき、複合機１は、出力音量レベル５０％でユーザの操作音声を復唱し、表示部１３に発話内容に関する確認内容を表示しない。 The second line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 25% and is 50% or less. At this time, the multifunction device 1 repeats the user's operation voice at the output volume level of 50%, and does not display the confirmation content regarding the utterance content on the display unit 13.

第３行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％以下である場合を示している。このとき、複合機１は、出力音量レベル５０％で「表示部をご確認ください」の音声を出力し、かつ表示部１３に発話内容に関する確認内容を表示する。 The third line shows a case where the user operates the multifunction device 1 by voice and the input volume is 25% or less. At this time, the multifunction device 1 outputs the voice of "Please check the display unit" at the output volume level of 50%, and displays the confirmation content regarding the utterance content on the display unit 13.

即ち、ユーザが入力音量レベル２５％以下の小声で周囲に聞えないように話しかけた場合、複合機１は、出力音声による復唱を行わず、代わりに表示部１３に確認応答画面を表示している。応答制御部４２は、マイク２１に入力されたユーザの音声の入力音量に基づいて、スピーカ２２による復唱を行わせるか、または／および、表示部１３に発話内容の確認応答画面を表示させるのかを選択する。 That is, when the user speaks in a low voice with an input volume level of 25% or less so as not to be heard by the surroundings, the multifunction device 1 does not repeat the output voice, but instead displays a confirmation response screen on the display unit 13. .. The response control unit 42 determines whether to repeat the speech by the speaker 22 based on the input volume of the user's voice input to the microphone 21 or / or to display the confirmation response screen of the utterance content on the display unit 13. select.

これにより、ユーザが秘匿にしたい音声操作情報を、周囲に聞こえないように確認することができる。ここでは、入力音量レベル２５％は、スピーカ２２の合成音声に代わって、表示部１３で応答するように切り替える閾値Ｔである。 As a result, it is possible to confirm that the voice operation information that the user wants to keep secret is not heard by the surroundings. Here, the input volume level of 25% is a threshold value T for switching to respond on the display unit 13 instead of the synthetic voice of the speaker 22.

《音声情報にパスワードが含まれる場合の例外》
動作条件テーブル１５１の第４行から第６行は、入力された音声情報にパスワードが含まれており、ユーザは視覚障碍者ではなく、複写禁止原稿が置かれていない場合の動作を示している。 << Exception when voice information includes password >>
The 4th to 6th rows of the operating condition table 151 show the operation when the input voice information includes the password, the user is not a visually impaired person, and the copy prohibited manuscript is not placed. ..

第４行目は、ユーザが複合機１を音声で操作し、かつ入力音量が５０％を超え、１００％以下である場合を示している。このとき、複合機１は、出力音量１００％で「表示部をご確認ください」の音声を出力し、かつ表示部１３に発話内容に関する確認内容を表示する。 The fourth line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 50% and is 100% or less. At this time, the multifunction device 1 outputs the voice of "Please check the display unit" at an output volume of 100%, and displays the confirmation content regarding the utterance content on the display unit 13.

第５行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％を超え、５０％以下である場合を示している。このとき、複合機１は、出力音量５０％で「表示部をご確認ください」の音声を出力し、かつ表示部１３に発話内容に関する確認内容を表示する。 The fifth line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 25% and is 50% or less. At this time, the multifunction device 1 outputs the voice of "Please check the display unit" at an output volume of 50%, and displays the confirmation content regarding the utterance content on the display unit 13.

第６行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％以下である場合を示している。このとき、複合機１は、出力音量レベル５０％で「表示部をご確認ください」の音声を出力し、かつ表示部１３に発話内容に関する確認内容を表示する。 The sixth line shows a case where the user operates the multifunction device 1 by voice and the input volume is 25% or less. At this time, the multifunction device 1 outputs the voice of "Please check the display unit" at the output volume level of 50%, and displays the confirmation content regarding the utterance content on the display unit 13.

即ち、パスワードのように周囲に秘匿すべき情報を音声入力した場合、複合機１は、入力音量レベルによらず、常に表示部１３に確認内容を表示する。例えば、ユーザが音声で操作しようしているとき、表示部１３にログイン画面が表示されていたならば、パスワードの音声入力と判定するとよい。 That is, when information to be kept secret is input by voice such as a password, the multifunction device 1 always displays the confirmation content on the display unit 13 regardless of the input volume level. For example, if the login screen is displayed on the display unit 13 when the user is trying to operate by voice, it may be determined that the password is input by voice.

秘匿が必要なパスワードを表示部１３で確認させ、かつスピーカ２２で発話内容を復唱しないので、パスワードを他の者から秘匿することができる。更にユーザは、表示部１３を見ている時間を短くすることができる。 Since the password that needs to be kept secret is confirmed on the display unit 13 and the utterance content is not repeated on the speaker 22, the password can be kept secret from other people. Further, the user can shorten the time for viewing the display unit 13.

《ユーザが視覚障碍者の場合の例外》
動作条件テーブル１５１の第７行から第９行は、入力された音声情報にパスワードが含まれておらず、ユーザは視覚障碍者であり、複写禁止原稿が置かれていない場合の動作を示している。 << Exception when the user is visually impaired >>
The 7th to 9th rows of the operating condition table 151 show the operation when the input voice information does not include the password, the user is visually impaired, and the copy prohibited manuscript is not placed. There is.

第７行目は、ユーザが複合機１を音声で操作し、かつ入力音量が５０％を超え、１００％以下である場合を示している。このとき、複合機１は、出力音量１００％でユーザの操作音声を復唱し、表示部１３に発話内容に関する確認内容を表示しない。 The seventh line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 50% and is 100% or less. At this time, the multifunction device 1 repeats the user's operation voice at an output volume of 100%, and does not display the confirmation content regarding the utterance content on the display unit 13.

第８行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％を超え、５０％以下である場合を示している。このとき、複合機１は、出力音量５０％でユーザの操作音声を復唱し、表示部１３に発話内容に関する確認内容を表示しない。 The eighth line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 25% and is 50% or less. At this time, the multifunction device 1 repeats the user's operation voice at an output volume of 50%, and does not display the confirmation content regarding the utterance content on the display unit 13.

第９行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％以下である場合を示している。このとき、複合機１は、出力音量５０％でユーザの操作音声を復唱し、表示部１３には発話内容に関する確認内容を表示しない。 The ninth line shows a case where the user operates the multifunction device 1 by voice and the input volume is 25% or less. At this time, the multifunction device 1 repeats the user's operation voice at an output volume of 50%, and does not display the confirmation content regarding the utterance content on the display unit 13.

即ち、ユーザが視覚障碍者であった場合、複合機１は、入力音量レベルによらず、常にユーザの操作音声をスピーカ２２で復唱し、表示部１３には確認内容を表示しない。複合機１は、カードリーダ１８によるユーザ認証と、ユーザ属性のデータベース（不図示）により、現在のユーザが視覚障碍者であるか否かを判定可能である。 That is, when the user is visually impaired, the multifunction device 1 always repeats the user's operation voice on the speaker 22 regardless of the input volume level, and does not display the confirmation content on the display unit 13. The multifunction device 1 can determine whether or not the current user is visually impaired by user authentication by the card reader 18 and a database of user attributes (not shown).

なお、ユーザが視覚障碍者であるか否かの判定は、ユーザ属性データベースの参照に限定されない。複合機１は、ユーザが歩行補助線を杖で確認しながら歩いて来たか、ユーザが点字部分を手でなぞった後に音声操作を開始したか、カメラ１６でユーザの目を撮影した結果が視覚障碍の症例に合致するか、盲導犬を連れているか、のうち何れか１つまたは複数に該当する場合は、視覚障碍者であると判定するとよい。 The determination of whether or not the user is visually impaired is not limited to the reference of the user attribute database. In the multifunction device 1, the user walks while checking the walking assistance line with a cane, the user starts the voice operation after tracing the Braille part by hand, or the result of photographing the user's eyes with the camera 16 is visible. If one or more of the cases of the disability are met or the guide dog is carried, it is judged that the person is visually impaired.

なお、ユーザが視覚障碍者であった場合に限定されず、ユーザが表示部１３を確認できない場合、入力音量レベルによらず、音声でユーザの操作音声を復唱してもよい。ユーザが表示部１３を確認できない場合とは、例えばユーザが重たいものを持っていて直ぐにパネル前に来られない場合などである。 It should be noted that the present invention is not limited to the case where the user is visually impaired, and when the user cannot confirm the display unit 13, the user's operation voice may be repeated by voice regardless of the input volume level. The case where the user cannot confirm the display unit 13 is, for example, the case where the user has a heavy object and cannot immediately come to the front of the panel.

《複写禁止原稿に対する例外》
動作条件テーブル１５１の第１０行から第１２行は、入力された音声情報にパスワードが含まれておらず、ユーザは視覚障碍者ではなく、複写禁止原稿が置かれている場合の動作を示している。 << Exceptions to copy-prohibited manuscripts >>
The 10th to 12th rows of the operating condition table 151 show the operation when the input voice information does not include the password, the user is not a visually impaired person, and the copy prohibited manuscript is placed. There is.

第１０行目は、ユーザが複合機１を音声で操作し、かつ入力音量が５０％を超え、１００％以下である場合を示している。このとき、複合機１は、出力音量１００％でユーザの操作音声を復唱し、表示部１３には警告を表示する。 The tenth line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 50% and is 100% or less. At this time, the multifunction device 1 repeats the user's operation voice at an output volume of 100%, and displays a warning on the display unit 13.

第１１行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％を超え、５０％以下である場合を示している。このとき、複合機１は、出力音量１００％でユーザの操作音声を復唱し、表示部１３には警告を表示する。 The eleventh line shows a case where the user operates the multifunction device 1 by voice and the input volume exceeds 25% and is 50% or less. At this time, the multifunction device 1 repeats the user's operation voice at an output volume of 100%, and displays a warning on the display unit 13.

第１２行目は、ユーザが複合機１を音声で操作し、かつ入力音量が２５％以下である場合を示している。このとき、複合機１は、出力音量１００％でユーザの操作音声を復唱し、表示部１３には警告を表示する。 The twelfth line shows a case where the user operates the multifunction device 1 by voice and the input volume is 25% or less. At this time, the multifunction device 1 repeats the user's operation voice at an output volume of 100%, and displays a warning on the display unit 13.

即ち、スキャン部１４に複写禁止原稿が置かれていた場合、ユーザの操作音声が小音量であったとしても、複合機１は、周囲へ周知が出来るように、大音量でユーザの操作音声を復唱し、かつ、表示部１３には警告を表示する。 That is, when a copy-prohibited document is placed on the scanning unit 14, even if the user's operation voice is low volume, the multifunction device 1 can transmit the user's operation voice at a high volume so that it can be known to the surroundings. It repeats and a warning is displayed on the display unit 13.

スキャン部１４に原稿が置かれると、複合機１は、原稿サイズ自動検知時などのプレスキャンを行う。複合機１は、このプレスキャンデータにより、この原稿が複写禁止であるか否かを判定可能である。 When the document is placed on the scanning unit 14, the multifunction device 1 performs a pre-scan such as when the document size is automatically detected. The multifunction device 1 can determine whether or not the original is copy-prohibited from the pre-scan data.

スキャン部１４に複写禁止原稿が置かれ、かつユーザが「〇×の設定でコピー」と口頭で操作指示した場合、複合機１は、「その原稿は禁止原稿です、スキャンを停止してください」と周囲に聞えるような音量で応答する。これにより、複合機１は、複写禁止原稿のスキャンやコピーを抑止することができる。 When a copy-prohibited document is placed on the scanning unit 14 and the user verbally instructs "copy with XX setting", the multifunction device 1 says, "The document is a prohibited document. Please stop scanning." Respond at a volume that can be heard around. As a result, the multifunction device 1 can suppress scanning and copying of the copy-prohibited original.

図３Ａから図３Ｃは、音声操作処理を示すフローチャートである。
ステップＳ１０において、ユーザが音声操作して、マイク２１に音声が入力されると、複合機制御サーバ４による音声操作処理が開始する。
複合機制御サーバ４の指示内容認識部４１は、複合機１にユーザとマイク２１との距離の問い合わせコマンドを送信し、距離センサ１９により、ユーザとマイク２１との距離を検知する（Ｓ１１）。そして、複合機１の制御部１１は、検知した距離情報を、複合機制御サーバ４の指示内容認識部４１に送信する。これにより指示内容認識部４１は、ユーザとマイク２１との距離の回答を得ることができる。 3A to 3C are flowcharts showing voice operation processing.
In step S10, when the user operates the voice and the voice is input to the microphone 21, the voice operation process by the multifunction device control server 4 starts.
The instruction content recognition unit 41 of the multifunction device control server 4 transmits an inquiry command for the distance between the user and the microphone 21 to the multifunction device 1, and the distance sensor 19 detects the distance between the user and the microphone 21 (S11). Then, the control unit 11 of the multifunction device 1 transmits the detected distance information to the instruction content recognition unit 41 of the multifunction device control server 4. As a result, the instruction content recognition unit 41 can obtain an answer of the distance between the user and the microphone 21.

そして複合機制御サーバ４の指示内容認識部４１は、マイク２１が検知した音量データと、ユーザとマイク２１との距離情報などから音声の入力音量を算出する（Ｓ１２）。
例えば、指示内容認識部４１は、実際にマイク２１に届いた音量Ｖｍと、マイク２１からの距離Ｄ１と雰囲気Ｄ２と声色Ｄ３と反響度合いＤ４とによる減算音量Ｄを加味して、音声の入力音量Ｖｉｎを計算するとよい。 Then, the instruction content recognition unit 41 of the multifunction device control server 4 calculates the voice input volume from the volume data detected by the microphone 21 and the distance information between the user and the microphone 21 (S12).
For example, the instruction content recognition unit 41 adds the volume Vm that actually reaches the microphone 21, the subtraction volume D due to the distance D1 from the microphone 21, the atmosphere D2, the voice color D3, and the reverberation degree D4, and adds the voice input volume. Vin should be calculated.

距離Ｄ１は、マイク２１からユーザの顔までの距離であり、例えば距離センサ１９によって測定可能である。
雰囲気Ｄ２は、例えば手を添えて音が漏れないように発声しているか否かであり、後記する図９の処理によって判定可能である。 The distance D1 is the distance from the microphone 21 to the user's face, and can be measured by, for example, the distance sensor 19.
The atmosphere D2 is, for example, whether or not the sound is uttered with a hand so that the sound does not leak, and can be determined by the process of FIG. 9 described later.

声色Ｄ３は、内緒話のような声色か否かであり、入力音声のスペクトル分析により判定可能である。内緒話のような声色か否かは、ユーザの通常の声音を予め登録しておき、登録された声音とマイク２１の入力音の声音が異なっているか否かで判定してもよい。
反響度合いＤ４は、周囲の静けさなどの環境音を考慮し、かつ環境の反響音を考慮したものであり、後記する図８の処理で算出可能である。
これらＤ１〜Ｄ４は固定値にしてもよく、程度によってレベル分けしてもよい。音声の入力音量Ｖｉｎは、実際にマイク２１に届いた音量Ｖｍと、減算音量Ｄとから計算される。 The voice color D3 is whether or not the voice color is like a secret story, and can be determined by spectral analysis of the input voice. Whether or not the voice color is like a secret story may be determined by registering the user's normal voice sound in advance and determining whether or not the registered voice sound and the voice sound of the input sound of the microphone 21 are different.
The degree of reverberation D4 takes into consideration environmental sounds such as the quietness of the surroundings and also takes into consideration the reverberant sounds of the environment, and can be calculated by the process of FIG. 8 described later.
These D1 to D4 may be fixed values or may be divided into levels according to the degree. The voice input volume Vin is calculated from the volume Vm that actually reaches the microphone 21 and the subtraction volume D.

複合機制御サーバ４の応答制御部４２は、複合機１に複写禁止原稿の有無の問い合わせコマンドを送信し、スキャン部１４に複写禁止原稿が置かれているか否かを判定する（Ｓ１３）。複合機１の制御部１１は、ステップＳ１３の判定結果を複合機制御サーバ４の応答制御部４２に送信する。これにより応答制御部４２は、複写禁止原稿の有無の回答を得ることができる。
ステップＳ１４において、応答制御部４２は、スキャン部１４に複写禁止原稿が置かれているか否かを判定する。応答制御部４２は、スキャン部１４に複写禁止原稿が置かれていたならば（Ｙｅｓ）、出力応答（復唱）のテキストと通常音量の音量データを音声処理サーバ３へ送信して、スピーカ２２により通常音量での出力応答（復唱）を行わせる（Ｓ１５）。更に応答制御部４２は、警告の表示コマンドを複合機１に送信して、複合機１の表示部１３に警告を表示させる（Ｓ１６）。ステップＳ１６の処理が終了すると、この音声操作処理を終了する。 The response control unit 42 of the multifunction device control server 4 transmits an inquiry command for the presence or absence of a copy-prohibited document to the multifunction device 1 and determines whether or not a copy-prohibited document is placed in the scanning unit 14 (S13). The control unit 11 of the multifunction device 1 transmits the determination result of step S13 to the response control unit 42 of the multifunction device control server 4. As a result, the response control unit 42 can obtain an answer as to whether or not there is a copy prohibited document.
In step S14, the response control unit 42 determines whether or not a copy prohibited document is placed on the scanning unit 14. If the copy-prohibited document is placed in the scanning unit 14 (Yes), the response control unit 42 transmits the text of the output response (repeation) and the volume data of the normal volume to the voice processing server 3, and the speaker 22 transmits the text. An output response (repetition) is performed at a normal volume (S15). Further, the response control unit 42 transmits a warning display command to the multifunction device 1 to display the warning on the display unit 13 of the multifunction device 1 (S16). When the process of step S16 is completed, the voice operation process is completed.

図４は、ステップＳ１６において表示部１３に表示される警告画面５１である。
警告画面５１には、「複写が禁止されている原稿です。複写を実行しますか。」の文章が表示され、更にその下側に「はい」ボタン５１１と「いいえ」ボタン５１２とが表示されている。ユーザが「はい」ボタン５１１をタップすると、スキャン部１４は、複写を実行する。ユーザが「いいえ」ボタン５１２をタップすると、スキャン部１４は、複写を実行しない。 FIG. 4 is a warning screen 51 displayed on the display unit 13 in step S16.
On the warning screen 51, the text "This is a manuscript whose copying is prohibited. Do you want to copy?" Is displayed, and "Yes" button 511 and "No" button 512 are displayed below it. ing. When the user taps the "Yes" button 511, the scanning unit 14 executes copying. When the user taps the "No" button 512, the scanning unit 14 does not perform copying.

図３Ａに戻り説明を続ける。ステップＳ１４において、応答制御部４２は、スキャン部１４に複写禁止原稿が置かれていなかったならば（Ｎｏ）、ステップＳ１７に進み、発話内容を判定する。
ステップＳ１８において、応答制御部４２は、発話内容が秘匿情報、例えばパスワードや秘密にしたい宛先情報であったならば（Ｙｅｓ）、図３ＢのステップＳ２８に進む。応答制御部４２は、コマンド変換部４３を介して表示部１３に、発話内容に関する確認応答の画面を表示させると（Ｓ２８）、この音声操作処理を終了する。 The explanation will be continued by returning to FIG. 3A. In step S14, if the copy prohibited document is not placed in the scanning unit 14, the response control unit 42 proceeds to step S17 and determines the utterance content.
In step S18, if the utterance content is confidential information, for example, a password or destination information to be kept secret (Yes), the response control unit 42 proceeds to step S28 of FIG. 3B. When the response control unit 42 causes the display unit 13 to display the confirmation response screen regarding the utterance content via the command conversion unit 43 (S28), the voice operation process ends.

図５と図６は、ステップＳ２８において表示部１３に表示される確認応答画面の例である。図５の確認応答画面５２には、「Ｒ社のＳさんに対する送信でよろしいですか。」の文章が表示され、更にその下側に「はい」ボタン５２１と「いいえ」ボタン５２２とが表示されている。ここでＲ社のＳさんの情報は、秘匿したい情報として、この音声操作システムＳの不図示のデータベースに登録されている。 5 and 6 are examples of confirmation response screens displayed on the display unit 13 in step S28. On the confirmation response screen 52 of FIG. 5, the sentence "Are you sure you want to send to Mr. S of Company R?" Is displayed, and "Yes" button 521 and "No" button 522 are displayed below it. ing. Here, the information of Mr. S of Company R is registered in a database (not shown) of this voice operation system S as information to be kept secret.

ユーザが「はい」ボタン５２１をタップすると、ファックス部１０は、ファックスの送信を実行する。ユーザが「いいえ」ボタン５２２をタップすると、ファックス部１０は、ファックスの送信を実行しない。 When the user taps the "Yes" button 521, the fax unit 10 executes the transmission of the fax. When the user taps the "No" button 522, the fax unit 10 does not execute the fax transmission.

図６の確認応答画面５３には、「パスワードは“tokkyotaro”でよろしいですか。」の文章が表示され、更にその下側に「はい」ボタン５３１と「いいえ」ボタン５３２とが表示されている。この画面の直前はパスワード入力画面なので、“tokkyotaro”は秘匿したい情報として判定可能である。 On the confirmation response screen 53 of FIG. 6, the sentence "Are you sure you want to use" tokkyotaro "for the password?" Is displayed, and "Yes" button 531 and "No" button 532 are displayed below it. .. Since the password input screen is immediately before this screen, "tokkyotaro" can be determined as information to be kept secret.

ユーザが「はい」ボタン５３１をタップすると、パスワード入力画面に遷移し、このパスワードによるログインが実行される。ユーザが「いいえ」ボタン５３２をタップすると、ログインは実行されない。 When the user taps the "Yes" button 531 to move to the password input screen, login with this password is executed. If the user taps the "No" button 532, no login will be performed.

図３Ａに戻り説明を続ける。ステップＳ１８において、応答制御部４２は、発話内容が秘匿情報を含まないならば（Ｎｏ）、ステップＳ１９に進み、表示部確認フラグがセットされているか否かを判定する。この表示部確認フラグは、後記する図７に示す確認応答画面５４にて、「表示部で確認」ボタン５４２をタップすることでセットされる。
応答制御部４２は、表示部確認フラグがセットされていたならば、図３ＢのステップＳ２８に進む。応答制御部４２は、コマンド変換部４３を介して表示部１３に、発話内容に関する応答を表示させ、この音声操作処理を終了する。応答制御部４２は、表示部確認フラグがクリアされていたならば、ステップＳ２０の処理に進む。 The explanation will be continued by returning to FIG. 3A. In step S18, if the utterance content does not include confidential information (No), the response control unit 42 proceeds to step S19 and determines whether or not the display unit confirmation flag is set. This display unit confirmation flag is set by tapping the "confirm on display unit" button 542 on the confirmation response screen 54 shown in FIG. 7, which will be described later.
If the display unit confirmation flag is set, the response control unit 42 proceeds to step S28 in FIG. 3B. The response control unit 42 causes the display unit 13 to display a response regarding the utterance content via the command conversion unit 43, and ends the voice operation process. If the display unit confirmation flag is cleared, the response control unit 42 proceeds to the process of step S20.

ステップＳ２０において、複合機１の制御部１１は、カメラ１６で撮影したユーザ画像を応答制御部４２に送信する。複合機制御サーバ４の応答制御部４２は、カメラ１６で撮影したユーザ画像から、ユーザのポーズを抽出する。なお、ポーズの抽出は複合機１側で行ってもよく、限定されない。
ステップＳ２１において、複合機制御サーバ４の応答制御部４２は、ユーザのポーズがこっそりと話しかけるポーズ、例えば手をメガホンのように口の前に翳すポーズであるか否かを判定する。応答制御部４２は、ユーザがこっそりと話しかけるポーズならば（Ｙｅｓ）、図３ＢのステップＳ２４に進む。応答制御部４２は、ユーザがこっそりと話しかけるポーズでないならば（Ｎｏ）、ステップＳ２２に進む。 In step S20, the control unit 11 of the multifunction device 1 transmits the user image captured by the camera 16 to the response control unit 42. The response control unit 42 of the multifunction device control server 4 extracts the user's pose from the user image taken by the camera 16. The pose extraction may be performed on the multifunction device 1 side, and is not limited.
In step S21, the response control unit 42 of the multifunction device control server 4 determines whether or not the pose of the user is a pose in which the user speaks secretly, for example, a pose in which the hand is held in front of the mouth like a megaphone. The response control unit 42 proceeds to step S24 of FIG. 3B if the user is in a pose to talk secretly (Yes). The response control unit 42 proceeds to step S22 if it is not a pose in which the user talks secretly (No).

ステップＳ２２において、応答制御部４２は、ユーザとマイク２１との距離が閾値以内であるか否かを判定する。こっそりと話しかける場合、ユーザは、マイク２１に近づいて小声で話しかけると考えられるためである。 In step S22, the response control unit 42 determines whether or not the distance between the user and the microphone 21 is within the threshold value. This is because when speaking secretly, the user is considered to approach the microphone 21 and speak in a quiet voice.

応答制御部４２は、ユーザとマイク２１との距離が閾値以内ならば（Ｙｅｓ）、図３ＢのステップＳ２４に進む。応答制御部４２は、ユーザとマイク２１との距離が閾値を超えていたならば（Ｎｏ）、図３ＢのステップＳ２３に進む。 If the distance between the user and the microphone 21 is within the threshold value (Yes), the response control unit 42 proceeds to step S24 in FIG. 3B. If the distance between the user and the microphone 21 exceeds the threshold value (No), the response control unit 42 proceeds to step S23 in FIG. 3B.

ステップＳ２３において、応答制御部４２は、入力音量が閾値以下であるか否かを判定する。応答制御部４２は、入力音量が閾値以下ならば（Ｙｅｓ）、ステップＳ２４に進み、入力音量が閾値を超えていたならば（Ｎｏ）、図３ＣのステップＳ２９に進む。 In step S23, the response control unit 42 determines whether or not the input volume is equal to or less than the threshold value. If the input volume is below the threshold value (Yes), the response control unit 42 proceeds to step S24, and if the input volume exceeds the threshold value (No), proceeds to step S29 in FIG. 3C.

ステップＳ２４において、応答制御部４２は、現在のユーザが視覚障碍者であるか否かを判定する。応答制御部４２は、現在のユーザが視覚障碍者ならば（Ｙｅｓ）、出力応答（復唱）のテキストと最低音量の音量データを音声処理サーバ３へ送信して、最低音量での出力応答（復唱）を行わせ（Ｓ２５）、この音声操作処理を終了する。応答制御部４２は、現在のユーザが視覚障碍者でないならば（Ｎｏ）、ステップＳ２６に進む。 In step S24, the response control unit 42 determines whether or not the current user is visually impaired. If the current user is visually impaired (Yes), the response control unit 42 transmits the text of the output response (repeated) and the volume data of the lowest volume to the voice processing server 3, and outputs the output response (repeated) at the lowest volume. ) Is performed (S25), and this voice operation process is terminated. If the current user is not visually impaired (No), the response control unit 42 proceeds to step S26.

ステップＳ２６において、複合機１の制御部１１は、カメラ１６で撮影したユーザ画像を応答制御部４２に送信する。複合機制御サーバ４の応答制御部４２は、カメラ１６で撮影したユーザ画像から、現在のユーザが表示部１３を注視しているか否かを判定する。応答制御部４２は、現在のユーザが表示部１３を注視していないならば（Ｎｏ）、「表示部を見てください」のテキストを音声処理サーバ３へ送信して、「表示部を見てください」の誘導音声をスピーカ２２に出力させて（Ｓ２７）、ステップＳ２８の処理に進む。応答制御部４２は、現在のユーザが表示部１３を注視していたならば（Ｙｅｓ）、ステップＳ２８の処理に進む。
ステップＳ２８において、応答制御部４２は、コマンド変換部４３を介して、発話内容に関する応答の表示コマンドを複合機１に送信して、複合機１の表示部１３に、発話内容に関する応答を表示させると、この音声操作処理を終了する。 In step S26, the control unit 11 of the multifunction device 1 transmits the user image captured by the camera 16 to the response control unit 42. The response control unit 42 of the multifunction device control server 4 determines whether or not the current user is gazing at the display unit 13 from the user image captured by the camera 16. If the current user is not gazing at the display unit 13 (No), the response control unit 42 transmits the text "Look at the display unit" to the voice processing server 3 and "looks at the display unit". The guidance voice of "Please" is output to the speaker 22 (S27), and the process proceeds to step S28. If the current user is gazing at the display unit 13 (Yes), the response control unit 42 proceeds to the process of step S28.
In step S28, the response control unit 42 transmits a response display command regarding the utterance content to the multifunction device 1 via the command conversion unit 43, and causes the display unit 13 of the multifunction device 1 to display the response regarding the utterance content. Then, this voice operation process is terminated.

図３ＣのステップＳ２９において、応答制御部４２は、発話音量に応じた出力応答（復唱）を行う。制御部１１は、カメラ１６によって復唱中のユーザを撮影すると、撮影したユーザ画像を応答制御部４２に送信する。複合機制御サーバ４の応答制御部４２は、カメラ１６で撮影したユーザ画像から、ユーザのポーズを抽出する（Ｓ３０）。なお、ポーズの抽出は複合機１側で行ってもよく、限定されない。ステップＳ３０の詳細は、後記する図１１で説明する。 In step S29 of FIG. 3C, the response control unit 42 performs an output response (repetition) according to the utterance volume. When the control unit 11 takes a picture of the user being repeated by the camera 16, the control unit 11 transmits the taken user image to the response control unit 42. The response control unit 42 of the multifunction device control server 4 extracts the user's pose from the user image taken by the camera 16 (S30). The pose extraction may be performed on the multifunction device 1 side, and is not limited. The details of step S30 will be described later with reference to FIG.

ステップＳ３１において、応答制御部４２は、ユーザが内緒のポーズ、例えば唇の前に人差し指を立てるポーズをしているか否かを判定する。後記する図１２は、内緒のポーズの一例である。 In step S31, the response control unit 42 determines whether or not the user is in a secret pose, for example, a pose in which the index finger is raised in front of the lips. FIG. 12, which will be described later, is an example of a secret pose.

応答制御部４２は、ユーザが内緒のポーズをしていなかったならば、この音声操作処理を終了する。応答制御部４２は、ユーザが内緒のポーズをしていたならば（Ｙｅｓ）、次回の応答から音量を下げるか、または、表示部１３で確認するかの選択の表示コマンドを複合機１に送信する。これにより、複合機１の表示部１３には、次回の応答から音量を下げるか、または、表示部１３で確認するかの選択が表示される（Ｓ３２）。ステップＳ３２の処理が終了すると、応答制御部４２は、この音声操作処理を終了する。 If the user has not made a secret pose, the response control unit 42 ends this voice operation process. If the user is in a secret pause (Yes), the response control unit 42 transmits a display command for selecting whether to lower the volume from the next response or to confirm on the display unit 13 to the multifunction device 1. To do. As a result, the display unit 13 of the multifunction device 1 displays a selection of whether to lower the volume from the next response or to confirm on the display unit 13 (S32). When the process of step S32 is completed, the response control unit 42 ends this voice operation process.

図７は、ステップＳ３２において表示部１３に表示される確認応答画面５４である。
確認応答画面５４には、「次回の応答を指示してください。」の文章が表示され、その下側には「音量を小さく」ボタン５４１と「表示部で確認」ボタン５４２とが表示されている。ユーザが「音量を小さく」ボタン５４１をタップすると、次回の音声操作における音声応答の音量が小さくなる。ユーザが「表示部で確認」ボタン５４２をタップすると、表示部確認フラグがセットされる。これにより、次回の音声操作の応答は、表示部１３の表示内容によって確認可能となる。 FIG. 7 is a confirmation response screen 54 displayed on the display unit 13 in step S32.
On the confirmation response screen 54, the sentence "Please specify the next response" is displayed, and below that, a "decrease volume" button 541 and a "confirm on display" button 542 are displayed. There is. When the user taps the "decrease volume" button 541, the volume of the voice response in the next voice operation is reduced. When the user taps the "confirm on display" button 542, the display confirmation flag is set. As a result, the response of the next voice operation can be confirmed by the display content of the display unit 13.

なお、ステップＳ３２において、ユーザが、「もう少し音量下げて」と音声で指示した場合、複合機１は、次の応答からはスピーカ２２の出力音量を下げるとよい。
スピーカ２２の出力音量を下げるにあたり、複合機１は、ユーザの入力音量をＶｉｎとした場合、出力音量Ｖｏｕｔを、入力音量Ｖｉｎよりも５％低い値や、１０％低い値のように、ユーザの入力音量Ｖｉｎを基準として、出力音量Ｖｏｕｔを微調整できるようにするとよい。但し、複合機１は、出力音量Ｖｏｕｔが最低音量の閾値Ｔ（ここでは、２５％）よりも小さくならないようにする。 In step S32, when the user gives a voice instruction to "lower the volume a little more", the multifunction device 1 may lower the output volume of the speaker 22 from the next response.
In lowering the output volume of the speaker 22, when the input volume of the user is Vin, the multifunction device 1 sets the output volume Vout to a value 5% lower or 10% lower than the input volume Vin. It is preferable that the output volume Vout can be finely adjusted based on the input volume Vin. However, the multifunction device 1 makes sure that the output volume Vout does not become smaller than the minimum volume threshold value T (here, 25%).

複合機１は、出力音量Ｖｏｕｔを最低音量の閾値Ｔより小さくしたい場合は、表示部１３による発話内容の確認応答の表示を推奨するとよい。更にユーザが表示部確認を選択し、かつ、前回の応答時の出力音量がＢだった場合、出力音量Ｂが最低音量以下となるように、最低音量の閾値Ｔを変更するとよい。例えば、最低音量の閾値Ｔを、出力音量Ｂよりも５％高い値とすることである。 When the multifunction device 1 wants to make the output volume Vout smaller than the threshold value T of the minimum volume, it is recommended that the display unit 13 display the confirmation response of the utterance content. Further, when the user selects the display unit confirmation and the output volume at the time of the previous response is B, the threshold T of the minimum volume may be changed so that the output volume B is equal to or lower than the minimum volume. For example, the threshold T of the lowest volume is set to a value 5% higher than the output volume B.

逆に、ユーザが、スピーカ２２の出力音声が聞えない素振りを見せていたら、閾値Ｔを上げるなど、ユーザごとにカスタマイズしてもよい。出力音声が聞えない素振りとは、例えばユーザが耳に手を翳すポーズなどである。 On the contrary, if the user is pretending that the output voice of the speaker 22 cannot be heard, the threshold value T may be increased or the like may be customized for each user. The gesture in which the output voice cannot be heard is, for example, a pose in which the user holds his / her hand over his / her ear.

図８は、設置環境の反響度合いを検知する処理を示すフローチャートである。
例えば深夜などのように、静寂な環境が期待できる時間帯おいて、制御部１１は処理を開始する。 FIG. 8 is a flowchart showing a process of detecting the degree of reverberation in the installation environment.
The control unit 11 starts processing at a time zone in which a quiet environment can be expected, such as midnight.

ステップＳ４０において、制御部１１は、マイク２１により音を検出する。ステップＳ４１において、制御部１１は、検出した音量が所定値以下であるか否かを判定する。制御部１１は、検出した音量が所定値を超えたならば（Ｎｏ）、図８の処理を終了し、検出した音量が所定値以下ならば（Ｙｅｓ）、ステップＳ４２に進む。 In step S40, the control unit 11 detects the sound by the microphone 21. In step S41, the control unit 11 determines whether or not the detected volume is equal to or less than a predetermined value. If the detected volume exceeds the predetermined value (No), the control unit 11 ends the process of FIG. 8, and if the detected volume is equal to or less than the predetermined value (Yes), the control unit 11 proceeds to step S42.

ステップＳ４２において、制御部１１は、スピーカ２２からキャリブレーション音を出力する。制御部１１は、このキャリブレーション音をマイク２１にて検知する（Ｓ４３）。 In step S42, the control unit 11 outputs a calibration sound from the speaker 22. The control unit 11 detects this calibration sound with the microphone 21 (S43).

ステップＳ４２において、制御部１１は、スピーカ２２に出力したキャリブレーション音の音量とマイク２１で検知した音量から、設置環境の反響度合いを算出する。ステップＳ４２の処理が終了すると、制御部１１は、図８の処理を終了する。 In step S42, the control unit 11 calculates the degree of reverberation of the installation environment from the volume of the calibration sound output to the speaker 22 and the volume detected by the microphone 21. When the process of step S42 is completed, the control unit 11 ends the process of FIG.

図９は、こっそりと話しかけるポーズを検知する処理を示すフローチャートである。
カメラ１６は、発話中のユーザを撮像する（Ｓ５０）。これにより、ユーザ画像を得ることができる。そして制御部１１は、このユーザ画像から、ユーザのポーズを抽出する（Ｓ５１）。 FIG. 9 is a flowchart showing a process of detecting a pose in which a person talks secretly.
The camera 16 captures a talking user (S50). Thereby, the user image can be obtained. Then, the control unit 11 extracts the pose of the user from the user image (S51).

制御部１１は、抽出したユーザのポーズを、こっそりと話しかけるときのボーズのデータベースと照合し（Ｓ５２）、ステップＳ５３において、データベースの何れかのポーズと一致するか否かを判定する。図１０は、こっそりと話しかけるときのユーザ画像の一例である。データベースには、このような多数のポーズが格納されている。 The control unit 11 collates the extracted user's pose with the Bose database when talking secretly (S52), and in step S53, determines whether or not the pose matches any of the databases. FIG. 10 is an example of a user image when talking secretly. The database stores a large number of such poses.

図９に戻り説明を続ける。制御部１１は、何れかのポーズと一致するならば（Ｙｅｓ）、発話中のユーザのポーズを、こっそりと話しかけるときのポーズとして判定して（Ｓ５４）、図９の処理を終了する。制御部１１は、何れのポーズとも一致しないならば（Ｎｏ）、図９の処理を終了する。 The explanation will be continued by returning to FIG. If the pose matches any of the poses (Yes), the control unit 11 determines the pose of the user who is speaking as a pose for secretly speaking (S54), and ends the process of FIG. If the pose does not match any of the poses (No), the control unit 11 ends the process of FIG.

図１１は、内緒のポーズを検知する処理を示すフローチャートである。
カメラ１６は、発話中のユーザを撮像する（Ｓ６０）。これにより、ユーザ画像を得ることができる。そして制御部１１は、このユーザ画像から、ユーザのポーズを抽出する（Ｓ６１）。 FIG. 11 is a flowchart showing a process of detecting a secret pose.
The camera 16 captures a talking user (S60). Thereby, the user image can be obtained. Then, the control unit 11 extracts the user's pose from this user image (S61).

制御部１１は、抽出したユーザのポーズを、内緒のボーズのデータベースと照合し（Ｓ６２）、ステップＳ６３において、データベースの何れかのポーズと一致するか否かを判定する。図１２は、内緒のポーズをとるユーザ画像の一例である。データベースには、このようなポーズが多数格納されている。 The control unit 11 collates the extracted user's pose with the secret Bose database (S62), and determines in step S63 whether or not it matches any of the database poses. FIG. 12 is an example of a user image that poses in secret. Many such poses are stored in the database.

図１１に戻り説明を続ける。制御部１１は、何れかのポーズと一致するならば（Ｙｅｓ）、発話中のユーザのポーズを、内緒のポーズとして判定して（Ｓ６４）、図１０の処理を終了する。制御部１１は、何れのポーズとも一致しないならば（Ｎｏ）、図１０の処理を終了する。 The explanation will be continued by returning to FIG. If the pose matches any of the poses (Yes), the control unit 11 determines the pose of the user who is speaking as a secret pose (S64), and ends the process of FIG. If the pose does not match any of the poses (No), the control unit 11 ends the process of FIG.

本実施形態の音声操作システムＳは、スピーカ２２による応答の出力音量を所定値以上として、必要に応じて音声操作内容の確認情報を表示部１３に表示している。これによりユーザは、複合機１の応答が聞こえないということがなく、かつ、ユーザの秘匿にしたい情報を周囲に聞こえないようにするといった秘匿効果を得ることができる。 In the voice operation system S of the present embodiment, the output volume of the response by the speaker 22 is set to a predetermined value or more, and confirmation information of the voice operation content is displayed on the display unit 13 as necessary. As a result, the user can obtain a concealment effect such that the response of the multifunction device 1 is not inaudible and the information to be concealed by the user is not heard by the surroundings.

音声操作システムＳは、音声操作の際の設定内容の確認方法を、表示部１３で確認が推奨される秘匿項目のみに限定している。これにより、音声操作システムＳは、かんたん音声操作モードのような簡単な設定項目に限定することなく、多岐に渡る設定項目にも対応することができる。 The voice operation system S limits the method of confirming the setting contents at the time of voice operation to only the secret items for which confirmation is recommended on the display unit 13. As a result, the voice operation system S can support a wide variety of setting items without being limited to simple setting items such as the simple voice operation mode.

（変形例）
本発明は、上記実施形態に限定されることなく、本発明の趣旨を逸脱しない範囲で、変更実施が可能であり、例えば、次の（ａ）〜（ｇ）のようなものがある。 (Modification example)
The present invention is not limited to the above-described embodiment, and can be modified without departing from the spirit of the present invention. For example, there are the following (a) to (g).

（ａ）ユーザがこっそりと音声操作しようとしているか否かは、ユーザの声色で判定してもよい。具体的にいうと、ユーザの普段の声色を登録しておき、音声操作における声色が登録した声色と所定値以上違っていたならば、こっそりと音声操作しようとしていると判定してもよい。
（ｂ）ユーザがこっそりと音声操作しようとしているか否かを人工知能などで機械学習しておいて、学習結果に基づいて判断してもよい。また、ユーザごとに学習させておいて、ユーザのクセに対応できるようにしてもよい。 (A) Whether or not the user is secretly trying to operate the voice may be determined by the voice of the user. Specifically, the user's usual voice color may be registered, and if the voice color in the voice operation differs from the registered voice color by a predetermined value or more, it may be determined that the user is secretly trying to perform the voice operation.
(B) Whether or not the user is secretly trying to perform voice operation may be machine-learned by artificial intelligence or the like, and a judgment may be made based on the learning result. In addition, it may be possible to learn for each user so that the user's habit can be dealt with.

（ｃ）音声操作システムは、ユーザをカメラで監視し、このユーザが表示部を見ながら音声操作しているか否かを判断してもよい。音声操作システムは、ユーザが表示部を見ているならば表示部のみで応答し、ユーザが表示部を見ていないならば誘導音声と表示部の双方で応答する。これにより、不要な音声操作を回避して、複合機の周辺環境の静粛性を向上させることができる。
（ｄ）音声操作システムは、表示部を見ることを促す誘導音声の音量を通常の音量（例えば、１００％）としてもよく、所定の閾値（例えば５０％）よりも下げないようにしてもよい。 (C) The voice operation system may monitor the user with a camera and determine whether or not the user is operating the voice while looking at the display unit. The voice operation system responds only with the display unit if the user is looking at the display unit, and responds with both the guidance voice and the display unit if the user is not looking at the display unit. As a result, unnecessary voice operations can be avoided and the quietness of the surrounding environment of the multifunction device can be improved.
(D) The voice operation system may set the volume of the guidance voice prompting to see the display unit to a normal volume (for example, 100%) and may not lower the volume below a predetermined threshold value (for example, 50%). ..

（ｅ）音声操作システムは、ユーザが音声操作を実施した際に、「音量下げて」の音声入力を受け付けてもよい。複合機１は、ユーザの入力音量Ｖｉｎとした場合、出力音量を入力音量よりも５％低く、更に１０％低く…のように、ユーザの入力音量Ｖｉｎを基準として、出力音量Ｖｏｕｔを微調整する。但し、出力音量Ｖｏｕｔは、最低音量の閾値Ｔよりも小さくしない。最低音量の閾値Ｔより小さくしたい場合、音声操作システムは、ユーザに対し、表示部応答に切り替えることを推奨するとよい。 (E) The voice operation system may accept a voice input of "volume down" when the user performs a voice operation. When the user's input volume Vin is set, the multifunction device 1 finely adjusts the output volume Vout based on the user's input volume Vin, such as 5% lower than the input volume, 10% lower than the input volume, and so on. .. However, the output volume Vout is not made smaller than the minimum volume threshold T. If it is desired to make the volume lower than the minimum volume threshold T, the voice operation system may recommend the user to switch to the display response.

（ｆ）音声操作システムは、複合機などの画像形成装置に限定されず、例えばテレビ、録画装置、カメラ、ビデオレコーダ、ファックス、冷蔵庫、炊飯器、カーナビゲーションシステムなど任意の装置に適用してもよい。
（ｇ）距離センサに代わり、カメラによってユーザからマイクまでの距離を測定してもよい。 (F) The voice operation system is not limited to an image forming device such as a multifunction device, and may be applied to any device such as a television, a recording device, a camera, a video recorder, a fax machine, a refrigerator, a rice cooker, and a car navigation system. Good.
(G) Instead of the distance sensor, the distance from the user to the microphone may be measured by a camera.

Ｓ音声操作システム
１複合機（画像形成装置）
１０ファックス部
１１制御部
１２操作部
１３表示部
１４スキャン部
１５記憶部
１５１動作条件テーブル
１６カメラ
１７印刷部
１８カードリーダ
１９距離センサ
２１マイク（音声入力部）
２２スピーカ（音声出力部）
３音声処理サーバ
３１音声認識部
３２音声合成部
４複合機制御サーバ（音声操作サーバ）
４１指示内容認識部
４２応答制御部
４３コマンド変換部
５１警告画面
５１１「はい」ボタン
５１２「いいえ」ボタン
５２確認応答画面
５２１「はい」ボタン
５２２「いいえ」ボタン
５３確認応答画面
５３１「はい」ボタン
５３２「いいえ」ボタン
５４確認応答画面
５４１「音量を小さく」ボタン
５４２「表示部で確認」ボタン S Voice operation system 1 Multifunction device (image forming device)
10 Fax unit 11 Control unit 12 Operation unit 13 Display unit 14 Scan unit 15 Storage unit 151 Operating condition table 16 Camera 17 Printing unit 18 Card reader 19 Distance sensor 21 Microphone (voice input unit)
22 Speaker (audio output section)
3 Voice processing server 31 Voice recognition unit 32 Voice synthesis unit 4 Multifunction device control server (voice operation server)
41 Instruction content recognition unit 42 Response control unit 43 Command conversion unit 51 Warning screen 511 "Yes" button 512 "No" button 52 Confirmation response screen 521 "Yes" button 522 "No" button 53 Confirmation response screen 531 "Yes" button 532 "No" button 54 Confirmation response screen 541 "Reduce volume" button 542 "Confirm on display" button

Claims

A voice input unit for inputting voice and
A display unit that displays information and
An audio output unit that outputs audio and
A response control unit that selects whether to repeat by the voice output unit or / or to display a confirmation response of the utterance content on the display unit based on the user's voice input to the voice input unit.
Voice operation system with.

The response control unit changes the volume of the repeat by the voice output unit based on the volume of the voice emitted by the user.
The voice operation system according to claim 1.

When the volume of the voice emitted by the user exceeds the threshold value, the response control unit repeats the voice output unit.
When the volume of the voice uttered by the user is equal to or lower than the threshold value, a confirmation response of the utterance content is displayed on the display unit.
The voice operation system according to claim 1 or 2.

It has a shooting unit that shoots the user,
The response control unit lowers the volume of the repeat by the voice output unit in the next voice operation, or confirms the utterance content on the display unit, in response to the pause of the user when the voice output unit repeats. Display a screen on the display unit where you can select whether to display the response.
The voice operation system according to claim 3.

The response control unit outputs a voice for inducing the user to confirm the display unit to the voice output unit when the display unit displays the confirmation response of the utterance content.
The voice operation system according to claim 1 or 2.

It has a distance sensor that detects the distance to the user,
The response control unit further selects whether to repeat the voice output unit or display a confirmation response of the utterance content on the display unit based on the distance to the user detected by the distance sensor. ,
The voice operation system according to claim 1 or 2.

It has a shooting unit that shoots the user,
The response control unit further repeats the voice output unit based on the direction of the user's face or line of sight photographed by the imaging unit, or / and displays a confirmation response of the utterance content on the display unit. To choose
The voice operation system according to claim 1 or 2.

It has a shooting unit that shoots the user,
The response control unit further selects whether to repeat the voice output unit or / or display a confirmation response of the utterance content on the display unit based on the pose of the user photographed by the photographing unit. ,
The voice operation system according to claim 1 or 2.

The response control unit further repeats by the voice output unit based on the confidentiality of the utterance content of the voice uttered by the user, or / and displays a confirmation response of the utterance content on the display unit. To select,
The voice operation system according to claim 1 or 2.

The response control unit further repeats the voice output unit based on the screen type of the display unit when the user emits a voice, or / or sends a confirmation response of the utterance content to the display unit. Select whether to display,
The voice operation system according to claim 1 or 2.

When the user has a visual impairment, the response control unit repeats the voice output unit.
The voice operation system according to claim 1 or 2.

When the user has a visual impairment, the response control unit repeats the speech output unit at the lowest volume instead of displaying the confirmation response of the utterance content on the display unit.
The voice operation system according to claim 11.

It has a scanning unit that scans documents
When the document placed on the scanning unit is prohibited from copying, the response control unit causes the voice output unit to output a warning at a predetermined volume or higher.
The voice operation system according to claim 1 or 2.

A voice input unit for inputting voice and
A display unit that displays information and
An audio output unit that outputs audio and
A response control unit that selects whether to repeat by the voice output unit or / or to display a confirmation response of the utterance content on the display unit based on the user's voice input to the voice input unit.
An image forming apparatus having.

A voice input unit for inputting voice and
A display unit that displays information and
An audio output unit that outputs audio and
It is a voice operation method of a device equipped with a response control unit.
The voice input unit inputs the user's voice and
The response control unit selects whether to repeat the voice output unit based on the user's voice, or / and to display a confirmation response of the utterance content on the display unit.
Voice operation method.

An instruction content recognition unit that recognizes the user's instruction content from the text data that recognizes the user's voice input to the voice input device, and
A response control unit that selects whether to repeat the voice by the voice output device based on the voice, and / or to display the confirmation response of the utterance content on the display device.
Voice operation server with.

On the computer
Procedure for recognizing the user's instruction content from the text data that recognizes the user's voice input to the voice input device,
A procedure for selecting whether to repeat the voice by the voice output device or / and display the confirmation response of the utterance content on the display device based on the voice.
A voice operation program for executing.