JP7409155B2

JP7409155B2 - Image forming device

Info

Publication number: JP7409155B2
Application number: JP2020034980A
Authority: JP
Inventors: 憲三山本
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2024-01-09
Anticipated expiration: 2040-03-02
Also published as: JP2021141362A

Description

本開示は、画像形成装置に関し、音声認識によるジョブデータの設定入力の改良に関する。 The present disclosure relates to an image forming apparatus, and relates to an improvement in job data setting input using voice recognition.

近年の画像形成装置は多機能化が進んでおり旧来のようなタッチパネルディスプレイを用いた操作では、画像形成装置が具備する多彩な機能を全て使いこなすことは難しいといわれる。階層的なメニューを開いて、ジョブに必要な各項目を正しく入力せねばならないからである。 In recent years, image forming apparatuses have become increasingly multi-functional, and it is said that it is difficult to make full use of all the various functions provided by the image forming apparatus by operating using a touch panel display as in the past. This is because a hierarchical menu must be opened and each item required for the job must be entered correctly.

その反面、音声認識では「２ｉｎ１でコピーして」というように、画像形成装置にさせたいことを直接的に言葉で表現すればよく、直観的な操作が可能になる。また、「２ｉｎ１コピーよろしく。あとスティプルも」「コピー２ｉｎ１でお願い」など様々な言い回しが可能であり、ストレスのない操作が可能になるので、新時代を担うユーザーインターフェイスとして期待がかかっている。 On the other hand, voice recognition allows for intuitive operation by simply expressing in words what you want the image forming apparatus to do, such as "copy 2-in-1." In addition, various phrases such as ``2-in-1 copy please. Also, stipple'' and ``2-in-1 copy please'' are possible, allowing for stress-free operation, so it is expected to be a user interface for a new era.

従来の画像形成装置では、ユーザーが音声認識モードの起動のために予め定められた単語を発すると音声認識モードに移行し、音声による指示の取り込みを開始する。そうして取り込まれた音声をデジタル化して、スペクトル変調、フーリエ変換等の信号処理を施して音素を得る。一群の音素を単語にまとめて、名詞、形容詞に変換して、ユーザーが発話した発話文の構文を明らかにする。それによってジョブの内容が認識されると、画像形成装置はその発話文に基づきジョブを実行する。 In a conventional image forming apparatus, when a user utters a predetermined word to activate the voice recognition mode, the image forming apparatus shifts to the voice recognition mode and starts capturing instructions by voice. The captured audio is digitized and subjected to signal processing such as spectrum modulation and Fourier transformation to obtain phonemes. It collects a group of phonemes into words, converts them into nouns and adjectives, and clarifies the syntax of the sentence uttered by the user. When the content of the job is thereby recognized, the image forming apparatus executes the job based on the utterance.

特開平04-343561号公報Japanese Patent Application Publication No. 04-343561

ところで画像形成装置は動作音が大きく、音声認識モードが起動された後に、動作音が発生すると、動作音がノイズになって、ユーザーが発した音声が正しい音素に変換されなくなる。音素の変換の段階でつまづくと、以降の処理において、ジョブの設定を命じる正しい文章が生成されず、画像形成装置は音声の再入力をユーザーに求めざるを得ない。自装置が発する動作音が原因で音声入力が阻害されているというのに音声の入力を何度も求めるのは装置としての一貫性を欠き、画像形成装置に対する不信感を募らせてしまう。 By the way, the image forming apparatus has a loud operating sound, and if the operating sound is generated after the voice recognition mode is activated, the operating sound becomes noise, and the voice uttered by the user is not converted into correct phonemes. If a problem occurs during the phoneme conversion stage, a correct sentence for instructing job settings will not be generated in subsequent processing, and the image forming apparatus will be forced to ask the user to re-enter the voice. Requesting voice input over and over again even though voice input is being inhibited due to operational noise emitted by the image forming apparatus lacks consistency as a device and increases distrust in the image forming apparatus.

音声認識の発話中は、装置動作を停止させ、静寂な音響環境を保つことも考えられる（詳しくは特許文献１を参照）。しかし、多人数の従業員が在籍する職場で画像形成装置が設置されている場合、こうした音声入力時の動作停止は望ましくない。一人のユーザーの音声認識のために、画像形成装置の動作を止めてしまうことになり、職場全体の作業効率の低下をもたらすからである。 During speech recognition, it may be possible to stop the device operation and maintain a quiet acoustic environment (see Patent Document 1 for details). However, if the image forming apparatus is installed in a workplace with many employees, it is undesirable for the image forming apparatus to stop operating during voice input. This is because the operation of the image forming apparatus has to be stopped in order to recognize the voice of one user, resulting in a decrease in the work efficiency of the entire workplace.

本開示の目的は、職場の作業効率の低下を最小限にしつつも、音声認識のための発声の繰り返しを少なくすることができる、画像形成装置を提供することである。 An object of the present disclosure is to provide an image forming apparatus that can reduce repetition of utterances for voice recognition while minimizing deterioration in work efficiency in the workplace.

上記課題は、ユーザーによる発話に応じて、音声認識モードを起動し、音声による画像形成に関する指示を受け付ける画像形成装置であって、音声認識モードが起動された後、次に移行すべき状態を現在の状態に基づき特定する特定手段と、次の状態に移行する前に、その状態において生じる動作音が前記音声認識モードでの音声認識を遮るかどうかを、前記音声認識モードの起動時にユーザーが発した声の音圧レベルに基づき判定する判定手段と、遮ると判定された場合、画像形成に関する指示の内容が、音声認識により確定するまでの間、前記現在の状態を維持し、指示内容が確定すると前記次の状態に移行する制御手段とを備えることを特徴とする画像形成装置により解決される。 The above-mentioned problem is an image forming apparatus that activates a voice recognition mode in response to a user's utterance and receives voice instructions regarding image formation. and specifying means for specifying based on the state of the voice recognition mode, and before transitioning to the next state, a user makes a sound when starting the voice recognition mode to determine whether the operation sound generated in that state interrupts voice recognition in the voice recognition mode. determining means for making a determination based on the sound pressure level of a voice that has been interrupted; and when it is determined that the voice is obstructing, the current state is maintained until the content of the instruction regarding image formation is determined by voice recognition, and the content of the instruction is determined. The problem is solved by an image forming apparatus characterized by comprising a control means for shifting to the next state.

前記特定手段は、現状態がスリープ状態である場合、次に移行すべき状態としてウォームアップ状態を特定し、前記判定手段による音声認識を遮るかどうかの判定は、ウォームアップ状態で生ずる動作音の音圧レベルと、音声認識モードの起動時におけるユーザーの発話の音圧レベルとを比較することでなされてもよい。 The identifying means identifies a warm-up state as the next state to be transitioned to when the current state is a sleep state, and determines whether or not to interrupt voice recognition by the determining means based on the operation sound generated in the warm-up state. This may be done by comparing the sound pressure level with the sound pressure level of the user's utterances when the voice recognition mode is activated.

前記特定手段は、現状態が画像形成ジョブの実行待ち状態である場合、次に移行すべき状態として当該実行待ちジョブを実行する実行状態を特定し、前記動作音が、前記判定手段による音声認識を遮るかどうかの判定は、実行待ちジョブを行う際の動作音の音圧レベルと、音声認識モードの起動時におけるユーザーの発話の音圧レベルとを比較することでなされてもよい。 The specifying means specifies, when the current state is a state of waiting for execution of an image forming job, an execution state in which the job waiting for execution is executed as a state to which the next state should be transitioned, and the operation sound is detected by the voice recognition by the determining means. The determination as to whether or not to interrupt the process may be made by comparing the sound pressure level of the operation sound when executing the job waiting to be executed and the sound pressure level of the user's utterance when the voice recognition mode is activated.

音圧レベルの複数の閾値を示す閾値テーブルを備え、
前記閾値テーブルにおける各閾値は、自装置が内蔵している複数の機構部、及び／又は、自装置と接続された後処理装置内の複数の機構部の何れかを単独で又は同時に駆動することにより生ずる動作音の音圧レベルを示し、前記判定手段による音声認識を遮るかどうかの判定は、閾値テーブルに記載された複数の閾値のうち、特定手段が特定した次に移行すべき状態に対応するものと、ユーザーが発した音声の音圧レベルとを比較することでなされてもよい。 Equipped with a threshold table showing multiple threshold values of sound pressure level,
Each threshold value in the threshold table can be used to drive either a plurality of mechanical parts built into the own device and/or a plurality of mechanical parts in a post-processing device connected to the own device, either singly or simultaneously. indicates the sound pressure level of the operation sound generated by the determination means, and the determination of whether or not to interrupt the voice recognition by the determination means corresponds to the next state to be transitioned to, which is identified by the identification means among the plurality of threshold values listed in the threshold value table. This may be done by comparing the sound pressure level of the voice emitted by the user with the sound pressure level of the user's voice.

自装置が内蔵している複数の機構部、及び、自装置と接続された複数の機構部の何れかを単独で又は同時に駆動することにより生ずる動作音を録音する録音手段を備え、前記閾値テーブルに示される閾値は、録音手段により録音された動作音の音圧レベルに基づき定められてもよい。 A recording means is provided for recording an operation sound generated by driving any one of a plurality of mechanical parts built into the own device or a plurality of mechanical parts connected to the own device, singly or simultaneously, and the threshold value table is The threshold shown in may be determined based on the sound pressure level of the operation sound recorded by the recording means.

自装置が内蔵している機構部には、感光体を露光して静電潜像を得る露光器、感光体に得られた静電潜像を現像する現像器、シートを搬送する搬送部、現像で得られた像をシートに転写する転写部、シートに転写された像を定着する定着器、原稿を読み取る原稿読取部の少なくとも２つがあるとしてもよい。 The mechanical parts built into the device include an exposure device that exposes the photoconductor to form an electrostatic latent image, a developer that develops the electrostatic latent image obtained on the photoconductor, a conveyance section that conveys the sheet, There may be at least two: a transfer section that transfers an image obtained by development onto a sheet, a fixing device that fixes the image transferred to the sheet, and a document reading section that reads a document.

前記特定手段が特定した次の状態への移行を開始することなく、ユーザーからの音声入力を受け付け、音声認識を試みたが、発話内容を認識できない場合、操作パネルを用いることが適切である旨をユーザーに報知する報知手段を更に備えてもよい。 If voice input from the user is accepted and voice recognition is attempted without starting the transition to the next state identified by the identifying means, but the content of the utterance cannot be recognized, a statement that it is appropriate to use the operation panel. It may further include notification means for notifying the user.

音声認識モードの起動時にユーザーが発した音声に基づき、次に移行すべき状態における動作音が、音声認識モードにおける音声認識を遮るかどうかを判定し、遮ると判定された場合、音声による画像形成の指示内容が確定するまでの間、前記現在の状態を維持するので、良好な音響環境の下、音声による指示内容を発することができる。良好な音響環境の下、ユーザーにより入力された音声から指示内容の音声認識を試みるので、指示内容が正しく認識される確率が高まり、画像形成装置が動作しない期間を短くすることができる。一方、遮らないと判定されると、次の状態への移行をすぐに開始することができる。これにより、画像形成装置の稼働率を確保しつつも、ユーザーが発した音声からジョブの内容を正しく導くことができる。 Based on the voice emitted by the user when starting the voice recognition mode, it is determined whether the operation sound in the next state will block the voice recognition in the voice recognition mode, and if it is determined that it will block the voice recognition in the voice recognition mode, image formation using voice will be performed. Since the current state is maintained until the instruction content is finalized, the instruction content can be issued by voice in a good acoustic environment. Since voice recognition of the instruction content is attempted from the voice input by the user in a good acoustic environment, the probability that the instruction content is correctly recognized is increased, and the period during which the image forming apparatus does not operate can be shortened. On the other hand, if it is determined not to block, the transition to the next state can be started immediately. With this, it is possible to correctly derive the content of the job from the voice uttered by the user while ensuring the operating rate of the image forming apparatus.

画像形成装置１０００の外観を示す。1 shows an external appearance of an image forming apparatus 1000. 動作音の音源となる画像形成装置１０００の機構部を示す。A mechanical section of the image forming apparatus 1000 that is a source of operation sound is shown. 画像形成装置１０００の制御系統の構成を示す。1 shows a configuration of a control system of an image forming apparatus 1000. 各動作において駆動される機構部を表形式に示す。The mechanical parts driven in each operation are shown in table format. 画像形成装置１０００の制御内容のメインルーチンを示すフローチャートである。3 is a flowchart showing a main routine of control contents of the image forming apparatus 1000. 図６（ａ）は、ユーザーと、スマートスピーカー１００６との間でなされる会話シーケンスの一例を示す。図６（ｂ）は、発話の繰り返しをもたらす会話シーケンスの一例を示す。FIG. 6(a) shows an example of a conversation sequence between the user and the smart speaker 1006. FIG. 6(b) shows an example of a conversation sequence that results in repeated utterances. 図７（ａ）は、ウォームアップ状態への移行と、音声指示とを入れ替えた場合の会話シーケンスを示す。図７（ｂ）は、ユーザーＡがウェークワードＶ１を発話した後、ユーザーＢがＰＣプリントの実行を要求したケースの会話シーケンスを示す。FIG. 7(a) shows a conversation sequence when transition to a warm-up state and voice instructions are replaced. FIG. 7(b) shows a conversation sequence in a case where user B requests execution of PC printing after user A utters wake word V1. スマートスピーカー１００６及び自然言語処理部１０５によりなされる会話シーケンスの手順を示すフローチャートである。10 is a flowchart showing the steps of a conversation sequence performed by the smart speaker 1006 and the natural language processing unit 105. FIG. 閾値テーブル２０５に記載された閾値を更新するため、追加される構成要素を示す。Components added to update the threshold values listed in the threshold value table 205 are shown. タッチパネルディスプレイ１００１への切り替えを促す場合の会話シーケンスを示す。A conversation sequence when prompting switching to the touch panel display 1001 is shown.

以下、図面を参照しながら、本開示にかかる画像形成装置の実施形態について説明する。 Embodiments of an image forming apparatus according to the present disclosure will be described below with reference to the drawings.

［１］画像形成装置
（１－１）画像形成装置の外観
図１は、画像形成装置１０００の外観を示す。図１に示す画像形成装置１０００は、多機能複合機 (Multifunction Peripheral)であり、タッチパネルディスプレイ１００１によりジョブの設定を受け付けた上、スキャナー１００４により原稿画像を光学的に読み取り、給紙カセット１００２ａ、ｂ、ｃ、ｄから繰り出された用紙に、上記設定に従った画像を形成する。 [1] Image forming apparatus (1-1) External appearance of image forming apparatus FIG. 1 shows the external appearance of image forming apparatus 1000. The image forming apparatus 1000 shown in FIG. 1 is a multifunction peripheral, which accepts job settings on a touch panel display 1001, optically reads a document image using a scanner 1004, and scans document images in paper feed cassettes 1002a and 1002b. An image according to the above settings is formed on the paper fed out from , c, and d.

スマートスピーカー１００６は、オプション機器として画像形成装置１０００に接続され、画像形成装置１０００を音声で操作するためのユーザーとの会話（ユーザーが発した音声の取り込み、ユーザーに対する発話）を行う。ユーザーがウェークワードを発した際、スマートスピーカー１００６は起動して、ユーザーが発した音声の認識を試みる。スマートスピーカー１００６の音声認識により、実行すべきジョブの内容を定める動作モードを音声認識モードという。音声認識モードにおいて、ユーザーが発した音声を認識し、ユーザーが発した音声を文章化したテキスト文字列を画像形成装置１０００に出力する。 The smart speaker 1006 is connected to the image forming apparatus 1000 as an optional device, and has a conversation with the user (captures the voice uttered by the user and speaks to the user) in order to operate the image forming apparatus 1000 by voice. When the user utters the wake word, smart speaker 1006 wakes up and attempts to recognize the voice uttered by the user. An operation mode in which the content of a job to be executed is determined by voice recognition of the smart speaker 1006 is referred to as a voice recognition mode. In the voice recognition mode, the voice uttered by the user is recognized, and a text string obtained by converting the voice uttered by the user into text is output to the image forming apparatus 1000.

（１－２）動作音の音源となる画像形成装置１０００の機構部
画像形成装置１０００は、電子写真方式の画像形成を行うための複数の機構部として、図２に示すように感光体ドラム１１０Ｄを露光して静電潜像を得る露光器１１０、感光体ドラム１１０Ｄに得られた静電潜像を現像してトナー像を得る現像器１１１、給紙カセット１００２ａ、ｂ、ｃ、ｄから用紙を繰り出して搬送する搬送部１１３、現像で得られたトナー像を中間転写ベルト１１２Ｎに転写して２次転写位置１１２Ｐに搬送し、２次転写位置１１２Ｐにおいて用紙に転写する転写器１１２、定着ローラー１１４Ｒを用いてトナー像を用紙に定着する定着器１１４を含む。 (1-2) Mechanical parts of the image forming apparatus 1000 that serve as sound sources of operation sounds The image forming apparatus 1000 has a plurality of mechanical parts for performing electrophotographic image formation, including a photoreceptor drum 110D as shown in FIG. an exposure unit 110 that exposes the electrostatic latent image to form an electrostatic latent image; a developing unit 111 that develops the electrostatic latent image obtained on the photoreceptor drum 110D to form a toner image; a conveying unit 113 that feeds out and conveys the toner image, a transfer device 112 that transfers the toner image obtained by development onto an intermediate transfer belt 112N, conveys it to a secondary transfer position 112P, and transfers it to paper at the secondary transfer position 112P, and a fixing roller. It includes a fixing device 114 that fixes the toner image on the paper using 114R.

これらの機構部は、ギア機構を通じて、駆動モーター（図３の１１０Ｍ、１１１Ｍ、１１２Ｍ、１１３Ｍ、１１４Ｍを参照）の回転駆動力を、感光体ドラム１１０Ｄ、定着ローラー１１４Ｒ等のローラー部材に印加する。そうした回転駆動力を印加する際、ギア機構でギアとギアとが噛み合う際の機械音や、ローラの軸受けで回転軸が摺動する際の機械音、フレーム部材の振動等も各機構部の動作音に含まれる。 These mechanical units apply the rotational driving force of the drive motor (see 110M, 111M, 112M, 113M, and 114M in FIG. 3) to roller members such as the photoreceptor drum 110D and the fixing roller 114R through a gear mechanism. When such rotational driving force is applied, the mechanical noise that occurs when gears mesh in a gear mechanism, the mechanical noise that occurs when a rotating shaft slides on a roller bearing, the vibration of frame members, etc. are also affected by the operation of each mechanical part. included in the sound.

機構部のうち何れが駆動されるかは、ウォームアップ状態、ジョブ実行状態で異なり、また、ジョブの内容によっても異なる。ウォームアップ状態や各ジョブで駆動される機構部については、後段の動作例で詳しく説明する。 Which of the mechanical units is driven differs depending on the warm-up state and job execution state, and also varies depending on the content of the job. The warm-up state and the mechanical parts that are driven for each job will be explained in detail in the operation example below.

［２］画像形成装置１０００の制御系統
画像形成装置１０００の制御系統の構成を図３に示す。画像形成装置１０００の制御系統には、画像形成に関するものと、音声認識モードでの音響環境の保全に関するものとがある。まず、画像形成に関する制御系統について説明する。画像形成に関する制御系統は、図３に示すように、パネル制御部１０１、待ち行列メモリ１０２、ジョブデータ実行部１０３、メカコントローラー１０４、自然言語処理部１０５、通信制御部１０６を含み、露光器１１０、現像器１１１、転写器１１２、搬送部１１３、定着器１１４を制御する。 [2] Control System of Image Forming Apparatus 1000 The configuration of the control system of image forming apparatus 1000 is shown in FIG. The control system of the image forming apparatus 1000 includes one related to image formation and one related to preservation of the acoustic environment in the voice recognition mode. First, a control system related to image formation will be explained. As shown in FIG. 3, the control system related to image formation includes a panel control section 101, a queue memory 102, a job data execution section 103, a mechanical controller 104, a natural language processing section 105, a communication control section 106, and an exposure device 110. , the developing device 111, the transfer device 112, the transport section 113, and the fixing device 114.

（２－１）パネル制御部１０１
パネル制御部１０１は、ジョブの設定画面をタッチパネルディスプレイ１００１に表示させる制御や設定画面に対するユーザー操作を検出する制御を行う入出力モジュールであり、スキャナー１００４により原稿が読み取られ、タッチパネルディスプレイ１００１に対してユーザーにより設定操作がなされると、スキャナー１００４の読み取りにより得られた画像データと、当該設定操作の内容とを含むジョブデータを生成して待ち行列メモリ１０２に蓄積する。 (2-1) Panel control unit 101
The panel control unit 101 is an input/output module that performs control to display a job setting screen on the touch panel display 1001 and control to detect user operations on the setting screen. When a setting operation is performed by the user, job data including image data obtained by reading by the scanner 1004 and the contents of the setting operation is generated and stored in the queue memory 102.

（２－２）待ち行列メモリ１０２
待ち行列メモリ１０２は、先入先出し式のメモリ（ＦＩＦＯメモリ）であり、実行すべきコピーの内容を示す複数のジョブデータを、実行順序に従い格納する。 (2-2) Queue memory 102
The queue memory 102 is a first-in, first-out memory (FIFO memory), and stores a plurality of job data indicating the contents of the copy to be executed in the order of execution.

（２－３）ジョブデータ実行部１０３
ジョブデータ実行部１０３は、待ち行列メモリ１０２に蓄積された個々のジョブデータに含まれる設定データを解釈する設定データ解釈モジュールであり、待ち行列メモリ１０２に格納されたジョブデータを１つずつ取り出し、取り出されジョブデータの個々のページを、印刷出力用の画像データ（露光パターン）に変換して、変換後の画像データを画像メモリ１０３Ｍに書き込む。 (2-3) Job data execution unit 103
The job data execution unit 103 is a setting data interpretation module that interprets the setting data included in each job data stored in the queue memory 102, and takes out the job data stored in the queue memory 102 one by one. Each page of the retrieved job data is converted into image data (exposure pattern) for print output, and the converted image data is written into the image memory 103M.

（２－４）メカコントローラー１０４
メカコントローラー１０４は、ＡＳＩＣ、マイコンシステム等で構成され、パルス幅変調回路１０４Ｗを通じて駆動モーター１１０Ｍ、１１１Ｍ、１１３Ｍ、１１４Ｍの回転速度を調整して、露光器１１０、現像器１１１、転写器１１２、搬送部１１３、定着器１１４のローラー部材等の回転を制御する。具体的にいうと、メカコントローラー１０４は、図２に示す露光器１１０における感光体ドラム１１０Ｄの回転やポリゴンミラー１１０Ｐの回転、現像器１１１における攪拌スクリュー１１１Ｓ、１１１Ｔの回転や現像ローラー１１１Ｒの回転、中間転写ベルト１１２Ｎが張架された駆動ローラー１１２Ｒの回転や、ピックアップローラー１２１及び給紙ローラー１２２の回転、タイミングローラー１２３の回転開始、定着ローラー１１４Ｒの回転を制御する。こうした制御により、露光器１１０の露光走査や現像器１１１による静電潜像の現像、転写器１１２によるトナー像の一次転写や二次転写、搬送部１１３による用紙搬送、定着器１１４による熱定着を順次実行させる。 (2-4) Mecha controller 104
The mechanical controller 104 is composed of an ASIC, a microcomputer system, etc., and adjusts the rotational speeds of the drive motors 110M, 111M, 113M, and 114M through a pulse width modulation circuit 104W, and controls the exposure device 110, developer 111, transfer device 112, and transport 113 and the rotation of the roller member of the fixing device 114. Specifically, the mechanical controller 104 controls the rotation of the photosensitive drum 110D and the polygon mirror 110P in the exposure device 110 shown in FIG. It controls the rotation of the drive roller 112R on which the intermediate transfer belt 112N is stretched, the rotation of the pickup roller 121 and the paper feed roller 122, the start of rotation of the timing roller 123, and the rotation of the fixing roller 114R. Through such control, exposure scanning by the exposure device 110, development of an electrostatic latent image by the developing device 111, primary transfer and secondary transfer of toner images by the transfer device 112, sheet conveyance by the conveyance unit 113, and thermal fixing by the fixing device 114 are controlled. Execute sequentially.

（２－５）自然言語処理部１０５
自然言語処理部１０５は、テキストベースの会話モジュールであり、後述する図６（ａ）に示すような会話シーケンスをスマートスピーカー１００６に行わせる。会話シーケンスは、スマートスピーカー１００６から発せられたテキスト文字列の意味内容の解析（１）、解析された意味内容がジョブの指示内容として成立するかどうかの成立性判定（２）、解析した意味内容を示すテキスト文字列をスマートスピーカー１００６に引き渡してスマートスピーカー１００６に発話を行わせ、解析された意味内容が正しいかどうかを確認するという応答（３）を含む。ユーザーに対する応答に対し、解析した意味内容の発話が、ユーザーによって肯定された場合、解析された指示内容と、スキャナー１００４の読み取りで得られた画像データとを含むジョブデータを生成して待ち行列メモリ１０２に蓄積する。 (2-5) Natural language processing unit 105
The natural language processing unit 105 is a text-based conversation module, and causes the smart speaker 1006 to perform a conversation sequence as shown in FIG. 6(a), which will be described later. The conversation sequence consists of analyzing the semantic content of the text string emitted from the smart speaker 1006 (1), determining whether the analyzed semantic content is valid as job instruction content (2), and analyzing the semantic content. The response includes a response (3) of passing a text string indicating `` to the smart speaker 1006, causing the smart speaker 1006 to make an utterance, and confirming whether the analyzed semantic content is correct. In response to the user's response, if the user affirms the utterance with the analyzed meaning content, job data including the analyzed instruction content and the image data obtained by reading the scanner 1004 is generated and stored in the queue memory. 102.

（２－６）通信制御部１０６
通信制御部１０６は、階層的な通信手順を行う通信モジュールであり、無線、有線ネットワークを通じて図１に示したパーソナルコンピュータ型の端末１０１１が送信したジョブデータを受信して、待ち行列メモリ１０２に蓄積する。端末１０１１が送信するジョブデータは、パーソナルコンピュータ型の端末による印刷要求のジョブ（ＰＣプリントと呼ぶ）である。スマートスピーカー１００６の音声認識では、上記の会話シーケンスを行う必要があり、時間がかかるので、ＰＣプリントによる割り込みが発生する。この割り込みにより、音声認識モードが起動されてから、自然言語処理部１０５が生成したジョブデータが待ち行列メモリ１０２に格納されるまでの間に、端末１０１１が送信したＰＣプリントのジョブデータが先に待ち行列メモリ１０２に格納されると、実行順序が入れ替わることになる。 (2-6) Communication control unit 106
The communication control unit 106 is a communication module that performs hierarchical communication procedures, and receives job data transmitted from the personal computer type terminal 1011 shown in FIG. 1 through a wireless or wired network, and stores it in the queue memory 102. do. The job data transmitted by the terminal 1011 is a print request job (referred to as PC print) by a personal computer type terminal. In the voice recognition of the smart speaker 1006, it is necessary to perform the above-mentioned conversation sequence, which takes time, and therefore an interruption occurs due to PC printing. Due to this interrupt, after the voice recognition mode is activated until the job data generated by the natural language processing unit 105 is stored in the queue memory 102, the PC print job data sent by the terminal 1011 is sent first. When stored in the queue memory 102, the execution order is changed.

［３］音声認識モードでの制御のための構成要素
以上は、画像形成のための構成要素である。続いて音声認識モードでの音響環境保全に関する構成要素を説明する。 [3] Components for control in voice recognition mode The above are the components for image formation. Next, components related to acoustic environment preservation in speech recognition mode will be explained.

（３－１）状態制御部２０１
状態制御部２０１は、現在状態レジスタ２０１Ｒの管理を行うのと共に、待ち行列メモリ１０２におけるジョブデータの有無やユーザー操作の有無に応じて、待ち行列メモリ１０２に蓄積されたジョブデータをジョブデータ実行部１０３に実行させるかどうかを決定する。現在状態レジスタ２０１Ｒは、画像形成装置１０００の状態を表す環境変数レジスタの１つであり、現在状態レジスタ２０１Ｒを用いることで、現在の画像形成装置１０００の状態は、ジョブ実行状態、実行待ち状態、スリープ状態、ウォームアップ状態の何れであるかを明らかにする。 (3-1) State control unit 201
The state control unit 201 manages the current state register 201R, and also transfers job data accumulated in the queue memory 102 to the job data execution unit according to the presence or absence of job data in the queue memory 102 or the presence or absence of user operations. 103 to be executed. The current status register 201R is one of the environment variable registers that represents the status of the image forming apparatus 1000. By using the current status register 201R, the current status of the image forming apparatus 1000 can be determined such as job execution status, execution waiting status, Clarify whether it is in the sleep state or warm-up state.

スリープ状態では、タッチパネルディスプレイ１００１のバックライトを消灯させ、露光器１１０、現像器１１１、転写器１１２、搬送部１１３、定着器１１４への電源供給を絶ち、本図の制御系統のみに電力供給を行う。画像形成装置１０００がスリープ状態を解除する際、ウォームアップ状態への移行を行う。 In the sleep state, the backlight of the touch panel display 1001 is turned off, power supply to the exposure unit 110, developer unit 111, transfer unit 112, transport unit 113, and fixing unit 114 is cut off, and power is supplied only to the control system shown in this figure. conduct. When the image forming apparatus 1000 cancels the sleep state, it shifts to a warm-up state.

（３－２）次状態特定部２０２
次状態特定部２０２は、画像形成装置の状態遷移パターンに従い、現在状態レジスタ２０１Ｒに示される現在状態から次に移行すべき状態を特定する。現在状態レジスタ２０１Ｒがスリープ状態を示す場合、次の状態として、ウォームアップ状態を特定する。一方、現在状態レジスタ２０１Ｒがジョブの実行待ち状態を示し、待ち行列メモリ１０２にジョブデータが新たに蓄積された場合、次の状態は、ジョブの実行状態であるとして特定する。この場合、待ち行列メモリ１０２に蓄積されたジョブデータに基づき、次の状態でなされるジョブの内容まで特定する。 (3-2) Next state identification unit 202
The next state specifying unit 202 specifies the next state to be transitioned from the current state indicated in the current state register 201R, according to the state transition pattern of the image forming apparatus. When the current state register 201R indicates a sleep state, a warm-up state is specified as the next state. On the other hand, if the current status register 201R indicates a job waiting state and job data is newly stored in the queue memory 102, the next state is specified as the job execution state. In this case, based on the job data stored in the queue memory 102, the details of the job to be performed in the next state are also specified.

（３－３）ウェークワードキャプチャ部２０３
ウェークワードキャプチャ部２０３は、音声信号の入出力を制御する入力モジュールであり、スマートスピーカー１００６にウェークワードが入力され、音声認識モードが起動した際、その時点に取り込まれたデジタル音声データを、ウェークワードの音声データとして音声波形メモリ２０３Ｍに取り込む。 (3-3) Wake word capture unit 203
The wake word capture unit 203 is an input module that controls the input and output of audio signals, and when a wake word is input to the smart speaker 1006 and the voice recognition mode is activated, the wake word capture unit 203 captures the digital audio data captured at that point in time. It is taken into the audio waveform memory 203M as word audio data.

（３－４）音圧レベル算出部２０４
音圧レベル算出部２０４は、音声信号を処理する信号処理モジュールであり、ウェークワードを取り込むことで得られた音声波形からデシベル単位の音圧レベルを得る。音圧レベルの計算手順は以下の通りである。ウェークワードの音声波形のうちユーザーが発した音声の基本周波数の整数倍となる範囲から、Ｎ個の電圧データV_[n]を取り出して、電圧の平均値を算出することで、ウェークワードの実効電圧を計算する。ウェークワードの実効電圧をVe_[v]、マイク感度をＦ_[V/Pa] 、アンプ利得をAとすると、ウェークワードの音圧レベルL_x[dB]は以下の数１の式（式１）により算出される。 (3-4) Sound pressure level calculation unit 204
The sound pressure level calculation unit 204 is a signal processing module that processes an audio signal, and obtains a sound pressure level in decibels from the audio waveform obtained by capturing the wake word. The procedure for calculating the sound pressure level is as follows. By extracting N pieces of voltage data V _[n] from a range that is an integer multiple of the fundamental frequency of the voice emitted by the user from the wake word audio waveform and calculating the average value of the voltage, the effective effect of the wake word can be calculated. Calculate voltage. When the effective voltage of the wake word is Ve _[v] , the microphone sensitivity is F _[V/Pa] , and the amplifier gain is A, the sound pressure level L _{x [dB]} of the wake word is calculated by the following formula (Equation 1): Calculated by

（３－５）閾値テーブル２０５
閾値テーブル２０５は、行列構造でデータを格納するメモリ等で構成され、画像形成装置１０００が、ウォームアップ状態、ジョブ実行状態の何れかに移行した際、これらの状態で、生ずる音圧レベルの閾値を示す。ここで、ウォームアップ状態の閾値は、画像形成装置１０００がウォームアップ状態になった際、各機構部から生ずる音圧レベルを示す。その閾値テーブル２０５は、ウォームアップ状態の動作音の閾値の他、ＰＣプリントの閾値、後処理付きのＰＣプリントの閾値、ＦＡＸ受信の閾値を示す。これらの閾値はそれぞれ、ＰＣプリント、後処理付きのＰＣプリント、ＦＡＸ受信が実行される際、各機構部から生ずる音圧レベルを示す。

(3-5) Threshold table 205
The threshold table 205 is configured with a memory or the like that stores data in a matrix structure, and determines the threshold value of the sound pressure level that occurs in either the warm-up state or the job execution state when the image forming apparatus 1000 enters the warm-up state or the job execution state. shows. Here, the warm-up state threshold indicates the sound pressure level generated from each mechanical section when the image forming apparatus 1000 enters the warm-up state. The threshold table 205 shows, in addition to the operating sound threshold in the warm-up state, a PC print threshold, a PC print with post-processing threshold, and a FAX reception threshold. These threshold values each indicate the sound pressure level generated from each mechanical section when PC printing, PC printing with post-processing, and FAX reception are executed.

（３－６）閾値比較部２０６
閾値比較部２０６は、閾値テーブル２０５をアクセスするメモリ読出回路、及び、当該アクセスで読みだされた閾値と、音圧レベル算出部２０４が算出した音圧レベルとを比較する比較器であり、次の状態として、ウォームアップ状態が特定された場合、ウォームアップ状態に対応する閾値を閾値テーブル２０５から読み出す。一方、ジョブ実行状態が特定され、尚且つ、実行されるジョブが特定された場合、ジョブ実行状態で実行されるジョブに対応する閾値を閾値テーブル２０５から読み出す。そうして読み出された閾値と、音声認識モードの起動時にユーザーが発したウェークワードの音圧レベルとを比較する。 (3-6) Threshold comparison unit 206
The threshold comparison unit 206 is a memory reading circuit that accesses the threshold table 205 and a comparator that compares the threshold read in the access with the sound pressure level calculated by the sound pressure level calculation unit 204. When a warm-up state is specified as the state, the threshold corresponding to the warm-up state is read from the threshold table 205. On the other hand, when the job execution state is specified and the job to be executed is specified, the threshold value corresponding to the job to be executed in the job execution state is read from the threshold value table 205. The threshold thus read out is compared with the sound pressure level of the wake word uttered by the user when the voice recognition mode is activated.

（３－７）維持制御部２０７
維持制御部２０７は、メカコントローラー１０４に次状態への移行の指示信号を発する指示モジュールであり、スマートスピーカー１００６及び自然言語処理部１０５の音声認識に応じた指示制御を実行する。その指示制御とは以下のものである。即ち、音圧レベル比較部２０６の比較により、ウェークワードの音声入力時の音圧レベルが閾値テーブル２０５から読み出された閾値を下回ると判定された場合、スマートスピーカー１００６の会話シーケンスにより、ユーザーが命じた指示内容が特定されるまで、次状態への移行の指示信号を発しない。スマートスピーカー１００６の会話シーケンスにより、ユーザーが命じた指示内容が明らかになると、次状態への移行の指示信号をメカコントローラー１０４に発する。これにより次状態への移行をメカコントローラー１０４に指示する。 (3-7) Maintenance control section 207
The maintenance control unit 207 is an instruction module that issues an instruction signal to the mechanical controller 104 to move to the next state, and executes instruction control according to the voice recognition of the smart speaker 1006 and the natural language processing unit 105. The instruction control is as follows. That is, if the sound pressure level at the time of voice input of the wake word is determined to be lower than the threshold read from the threshold table 205 as a result of comparison by the sound pressure level comparison unit 206, the conversation sequence of the smart speaker 1006 causes the user to The instruction signal for transition to the next state is not issued until the specified instruction content is specified. When the contents of the instruction given by the user become clear through the conversation sequence of the smart speaker 1006, an instruction signal to move to the next state is issued to the mechanical controller 104. This instructs the mechanical controller 104 to move to the next state.

［４］動作例
以上のように構成された装置の動作を説明する。 [4] Operation example The operation of the device configured as above will be explained.

（４－１）閾値テーブル２０５の設定
閾値テーブル２０５には、動作毎の閾値が、画像形成装置１０００のメーカーにより事前に設定されている。ある動作で駆動するｎ個の機構部のそれぞれが発する音圧レベルをL₁、L₂、L₃～Ｌ_nとし、Po を基準音圧（20μpa）とした場合、L₁、L₂、L₃～Ｌ_nは、以下の数２の式（式２）により算出される。 (4-1) Setting of Threshold Table 205 Thresholds for each operation are set in advance in the threshold table 205 by the manufacturer of the image forming apparatus 1000. Let L ₁ , L ₂ , L ₃ to L _n be the sound pressure levels emitted by each of the n mechanical parts driven in a certain operation, and let Po be the reference sound pressure (20 μpa), then L ₁ , L ₂ , L ₃ to L _n are calculated by the following formula 2 (formula 2).

画像形成装置１０００が何れかの動作を行う場合、当該動作音は、その動作で駆動される機構部の音圧レベルL₁、L₂、L₃～Ｌ_nを用いて、以下の数３の式（式３）により算出される。

When the image forming apparatus 1000 performs any operation, the operation sound is determined by the following equation 3 using the sound pressure levels L ₁ , L ₂ , L ₃ to L _n of the mechanical parts driven by the operation. It is calculated by the formula (Formula 3).

（４－１－１）ウォームアップ状態、各ジョブで駆動される機構部
駆動される機構部は、動作毎に異なる。図４は、各動作において駆動される機構部を表形式に示す。ウォームアップ状態では、給紙カセット１００２ａ、ｂをはじめとする複数の機構部が同時に動作するのに対し、後処理付きのＰＣプリントでは、画像形成装置１０００により印刷がなされた複数の用紙をスィッチバックして図２に示す後処理用トレイ１００５Ｆに格納して、スティプラー、パンチ等に供し、後処理装置１００５の排出トレイ１００５Ｔに排出する。そのため後処理付きのコピー、後処理付きのＰＣプリントの閾値は、スティプラー、パンチ等の後処理の動作音の音圧レベルを対象として算出される。

(4-1-1) Warm-up state, mechanical parts driven by each job The mechanical parts driven differ depending on the operation. FIG. 4 shows the mechanical parts driven in each operation in a table format. In the warm-up state, multiple mechanical units including the

paper feed cassettes

1002a and 1002b operate simultaneously, whereas in PC printing with post-processing, multiple sheets printed by the image forming apparatus 1000 are switched back. The processed paper is stored in the post-processing tray 1005F shown in FIG. Therefore, the threshold values for copying with post-processing and PC printing with post-processing are calculated based on the sound pressure level of the operation sound of post-processing such as stippler and punch.

その他、両面印刷のＰＣプリントでは、排出口１１９においてスィッチバックした用紙がタイミングローラー１２３により２次転写位置１１２Ｐに送り込まれる。そのため、両面印刷のＰＣプリントについては反転搬送路１１６で生じる動作音を対象として算出される。 In addition, in double-sided PC printing, the paper that has been switched back at the discharge port 119 is sent to the secondary transfer position 112P by the timing roller 123. Therefore, for double-sided PC printing, the operation sound generated in the reversing conveyance path 116 is calculated.

図２に示される機構部の騒音レベルを音圧レベルＬi(i=1,2,3,4～n)として式３に代入することで複数の機構部が同時に動作した場合の合成音圧レベルを算出する。数値の一例をあげると、定着器１１４で生ずる音圧レベルは５０[dB]、スキャナー１００４におけるスキャンユニット１００４Ｓで生ずる音圧レベルは６０[dB]、後処理装置１００５で生ずる音圧レベルは７０［dB］、図２に示した搬送部１１３が動作時に発する音圧レベルは６０［dB］であり、これらを式３に代入することで合成音圧レベルを算出し、こうして算出した合成音圧レベルを、閾値として閾値テーブル２０５に記載する。 By substituting the noise level of the mechanical parts shown in Figure 2 into Equation 3 as the sound pressure level Li (i=1,2,3,4~n), the composite sound pressure level when multiple mechanical parts operate simultaneously Calculate. To give an example of numerical values, the sound pressure level generated in the fixing device 114 is 50 [dB], the sound pressure level generated in the scanning unit 1004S in the scanner 1004 is 60 [dB], and the sound pressure level generated in the post-processing device 1005 is 70 [dB]. dB], the sound pressure level emitted by the transport section 113 shown in FIG. is written in the threshold value table 205 as a threshold value.

（４－１－２）ウォームアップ状態で駆動される機構部
ウォームアップ状態において、図２に示した機構部を同時に駆動する理由を以下に示す。 (4-1-2) Mechanical units driven in warm-up state The reason why the mechanical units shown in FIG. 2 are simultaneously driven in the warm-up state will be described below.

ｉ.スリープ状態に移行した際、給紙カセット１００２ａ、ｂ、ｃ、ｄの用紙束が下降している場合がある。そこでウォームアップ状態においては、給紙カセット１００２ａ、ｂ、ｃ、ｄのリフトアップを行う。 i. When the machine enters the sleep state, the stacks of paper in the paper feed cassettes 1002a, b, c, and d may be lowered. Therefore, in the warm-up state, the paper feed cassettes 1002a, b, c, and d are lifted up.

ii.前回のジョブ終了時に、スキャナー１００４のスキャンユニット１００４Ｕの位置がホームポジションからずれている可能性があり、スキャナー１００４におけるスキャンユニット１００４Ｕ（図２参照）の位置をホームポジションに復帰させる必要がある。そこでウォームアップ状態においてスキャナー１００４は、スキャンユニットをホームポジションに移動させる。 ii. The position of the scanning unit 1004U of the scanner 1004 may have shifted from the home position when the previous job was completed, and it is necessary to return the position of the scanning unit 1004U (see FIG. 2) of the scanner 1004 to the home position. . Therefore, in the warm-up state, the scanner 1004 moves the scanning unit to the home position.

iii.定着器１１４の定着ローラー１１４Ｒは、例えば、１６０℃の定着温度まで昇温させる必要があり、またこの昇温に併せて定着ローラー１１４Ｒを回転させ、定着ローラー１１４Ｒの周面に熱を拡散させる必要がある。そこでウォームアップ状態において定着器１１４は加熱を開始し、定着ローラー１１４Ｒの回転を開始させる。 iii. The temperature of the fixing roller 114R of the fixing device 114 needs to be raised to a fixing temperature of 160° C., for example, and the fixing roller 114R is rotated in conjunction with this temperature increase to spread heat to the circumferential surface of the fixing roller 114R. It is necessary to do so. Therefore, in the warm-up state, the fixing device 114 starts heating and starts rotating the fixing roller 114R.

iv.前回のジョブ実行までに生じた汚れが、転写器１１２の中間転写ベルト１１２Ｎに残っている可能がある。これらの汚れを除去するため、転写器１１２はウォームアップ状態に中間転写ベルト１１２Ｎの周回駆動を行う。 iv. There is a possibility that dirt generated before the previous job execution remains on the intermediate transfer belt 112N of the transfer device 112. In order to remove these stains, the transfer device 112 rotates the intermediate transfer belt 112N during a warm-up state.

v.現像器１１１の攪拌槽における現像剤が放置されることで、攪拌槽１１１Ｌの現像剤の嵩が低下している可能性がある。そこでウォームアップ状態において現像器１１１は、攪拌槽１１１Ｌに蓄積された現像剤の攪拌を行い、現像剤の嵩を回復させる。 v. Because the developer in the stirring tank of the developing device 111 is left unattended, the volume of the developer in the stirring tank 111L may be reduced. Therefore, in the warm-up state, the developing device 111 stirs the developer accumulated in the stirring tank 111L to restore the bulk of the developer.

vi.前回のジョブ実行までに生じた汚れが、感光体ドラム１１０Ｄに残っている可能がある。これらの汚れを除去するため、ウォームアップ状態に感光体ドラム１１０Ｄの回転を行う。 vi. There is a possibility that dirt generated before the previous job execution remains on the photosensitive drum 110D. In order to remove these stains, the photosensitive drum 110D is rotated during the warm-up state.

ウォームアップ状態では、これら給紙カセット１００２、スキャナー１００４、定着ローラー１１４Ｒ、中間転写ベルト１１２Ｎ、攪拌スクリュー１１１Ｓ，Ｔ、感光体ドラム１１０Ｄを同時に駆動するため、動作音の音圧レベルは相当なものとなる。こうした動作音の音圧レベルに基づく閾値を、動作毎に、閾値テーブル２０５に明記している。 In the warm-up state, the paper feed cassette 1002, scanner 1004, fixing roller 114R, intermediate transfer belt 112N, stirring screws 111S and 111T, and photosensitive drum 110D are driven simultaneously, so the sound pressure level of the operating noise is considerable. Become. The threshold value based on the sound pressure level of such operation sound is specified in the threshold value table 205 for each operation.

（４－１－３）音声認識モードに対する、動作音の影響
ウォームアップ状態等における動作音が、音声認識にどのように影響するかを説明する。音声認識モードにおいてスマートスピーカー１００６の認識部は、ユーザーが発した音声の音声信号に対しスペクトラム変調、フーリエ変換の信号処理を施し、発話を表す特徴量ベクトルを得る。入力される音声のＳ/Ｎ比が高いと、特徴量空間におけるユーザーによる１の音素の発話を表す特徴量ベクトルと、他の音素の発話を表す特徴量ベクトルとのベクトル間距離が大きくなる。この場合、ユーザーによる発話を表す特徴量ベクトルが、どういった音素を表すかが明らかになり、ユーザーが発した音声は正しい音素に変換される。しかし、入力される音声のＳ/Ｎ比が低いと、特徴量空間において、１の音素の発話を表す特徴量ベクトルと、他の音素の発話を表す特徴量ベクトルとの距離が短くなり、ユーザーが発した音声の特徴量ベクトルが、どういった音素を表すかの区別が曖昧になって認識精度が低下する。このようにユーザーが発する音声のＳ/Ｎ比は、音素の変換精度、音声の認識精度を大きく左右するので、ウォームアップ状態への移行やジョブの実行を開始するに先立ち、それらウォームアップ状態への移行、ジョブ実行で生ずる動作音が、ユーザーが発した音声の認識を阻害するかどうかを閾値テーブル２０５を用いて判定することにしている。 (4-1-3) Effect of operating sound on speech recognition mode How operating sound in the warm-up state etc. affects speech recognition will be explained. In the voice recognition mode, the recognition unit of the smart speaker 1006 performs signal processing such as spectrum modulation and Fourier transform on the voice signal of the voice uttered by the user, and obtains a feature vector representing the utterance. When the S/N ratio of input speech is high, the distance between the feature vectors representing the user's utterance of one phoneme and the feature vectors representing the utterances of other phonemes in the feature space becomes large. In this case, it becomes clear what kind of phoneme the feature vector representing the user's utterance represents, and the voice uttered by the user is converted into the correct phoneme. However, if the S/N ratio of the input speech is low, the distance between the feature vector representing the utterance of phoneme 1 and the feature vector representing the utterance of other phonemes becomes short in the feature space, and the user It becomes unclear what kind of phoneme the feature vector of the voice uttered represents, and recognition accuracy decreases. In this way, the S/N ratio of the voice emitted by the user greatly affects the accuracy of phoneme conversion and speech recognition, so before transitioning to the warm-up state or starting job execution, it is necessary to enter the warm-up state. The threshold table 205 is used to determine whether the operation sound generated during the transition and job execution inhibits the recognition of the voice uttered by the user.

（４－２）画像形成装置１０００の動作
ウェークワードキャプチャ部２０３、音圧レベル算出部２０４、次動作特定部２０２、閾値比較部２０６、維持制御部２０７が行う一連の処理を、条件分岐やループ等のプログラム的記法を用いて表したのが図５のフローチャートである。また、自然言語処理部１０５によってなされる会話シーケンスを条件分岐やループ等のプログラム的記法を用いて表したのが図８のフローチャートである。これらのフローチャートを参照して、画像形成装置１０００の動作を説明する。 (4-2) Operation of the image forming apparatus 1000 A series of processes performed by the wake word capture unit 203, sound pressure level calculation unit 204, next operation identification unit 202, threshold comparison unit 206, and maintenance control unit 207 are performed by conditional branching or looping. The flowchart in FIG. 5 is expressed using programmatic notation such as. Further, the flowchart in FIG. 8 represents the conversation sequence performed by the natural language processing unit 105 using programmatic notation such as conditional branching and loops. The operation of image forming apparatus 1000 will be described with reference to these flowcharts.

一人のユーザー（ユーザーＡ）が、図１に示した画像形成装置１０００の前面に立ち、スマートスピーカー１００６に対しウェークワードの発話を行って、何等かのジョブ（コピー、スキャン、ＦＡＸ送信）を命じたとする。 One user (user A) stands in front of the image forming apparatus 1000 shown in FIG. 1, speaks a wake word to the smart speaker 1006, and instructs it to perform some job (copy, scan, fax transmission). Suppose that

ウェークワードキャプチャ部２０３はスマートスピーカー１００６に対してウェークワードが音声入力されたかどうかの入力待ちを行っている（ステップＳ1０１）。音声入力がなされると（ステップＳ１０１でYes）、ウェークワード発話時の音量を取得して（ステップＳ１０２）、ステップＳ１０３、ステップＳ１０４からなる判定ステップ列に移行する。ステップＳ１０３は、待ち行列メモリ１０２にジョブデータがなく、スリープ状態であるかどうかの判定であり、ステップＳ１０４は待ち行列メモリ１０２が空であったが、ウェークワード入力後に端末１０１１が要求したジョブデータが割り込んできたかどうかの判定である。画像形成装置１０００が実行待ち状態であり、ウェークワードを入力した後に、他のユーザーによるジョブが待ち行列メモリ１０２に蓄積されていない場合、ステップＳ１０３がNo、ステップＳ１０４がNoになり、ステップＳ１０５に移行する。ステップＳ１０５では、スキャナー１００４の読み取りにより得られた画像データと、スマートスピーカー１００６の音声認識で認識された指示内容とを含むジョブデータを生成して待ち行列メモリ１０２に蓄積する。その後、ジョブ実行状態に移行し（ステップＳ１０６）、待ち行列メモリ１０２に蓄積されたジョブデータを実行する（ステップＳ１０７）。 The wake word capture unit 203 waits for an input to determine whether a wake word has been input by voice to the smart speaker 1006 (step S101). When voice input is performed (Yes in step S101), the volume at which the wake word is uttered is obtained (step S102), and the process moves to a series of determination steps consisting of step S103 and step S104. In step S103, there is no job data in the queue memory 102 and it is determined whether the state is in the sleep state. In step S104, the queue memory 102 is empty, but the job data requested by the terminal 1011 after inputting the wake word is determined. This is to determine whether or not someone has interrupted. If the image forming apparatus 1000 is in the execution waiting state and no job by another user is stored in the queue memory 102 after inputting the wake word, step S103 becomes No, step S104 becomes No, and step S105 returns. Transition. In step S105, job data including image data obtained by reading by the scanner 1004 and instruction content recognized by voice recognition by the smart speaker 1006 is generated and stored in the queue memory 102. Thereafter, the job execution state is entered (step S106), and the job data stored in the queue memory 102 is executed (step S107).

図６（ａ）は、ユーザーと、スマートスピーカー１００６との間でなされる会話のシーケンスの一例を示す。本図に示すように、会話のシーケンスは、音声認識モードを起動するためのウェークワードの発話Ｖ１、「２ｉｎ１でコピーして」など、具体的な指示内容の発話Ｖ２、「２ｉｎ１でコピーですね。よろしいですか」との認識内容の確認をスマートスピーカー１００６がユーザーに求めるレスポンスＲ１、「スタートして」等の最終的な意思確認のためのユーザーによる発話Ｖ３で構成される。 FIG. 6(a) shows an example of a conversation sequence between the user and the smart speaker 1006. As shown in this figure, the conversation sequence consists of utterance V1 of the wake word to start the voice recognition mode, V2 utterance of specific instructions such as "Copy with 2in1", and utterance V2 of the specific instruction such as "Copy with 2in1". The response consists of a response R1 in which the smart speaker 1006 requests the user to confirm the recognition content, such as "Are you sure?", and an utterance V3 by the user to confirm the final intention, such as "Start".

（４－３）画像形成装置１０００がスリープ状態である場合
自装置がスリープ状態であると、次状態はウォームアップ状態となる。そうして音声認識モードが起動された後にウォームアップ状態に移行すると、図６（ｂ）の会話シーケンスに示すような発話の繰り返しが発生する。この発話の繰り返しとは、スマートスピーカー１００６による「聞き取れません」とのレスポンスＲ１１、Ｒ１２、Ｒ１３と、音声によるジョブ内容の再指示Ｖ１３、Ｖ１４、Ｖ１５とを繰り返すものである。ユーザーの発話が充分な音圧レベルをもっていれば、そうした繰り返しは発生しない。しかし近くに人がいる場合、周囲で打合せがなされている場合や休憩中でオフィスが静寂を保っている場合、画像形成装置の前で大きな声を出させるのはユーザーに苦痛を与える。更に、精神状態や病気、障害等により、声を出せないユーザーが存在することも配慮せねばならない。 (4-3) When the image forming apparatus 1000 is in a sleep state If the image forming apparatus 1000 is in a sleep state, the next state will be a warm-up state. When the voice recognition mode is activated and then transitions to a warm-up state, repetition of utterances as shown in the conversation sequence of FIG. 6(b) occurs. This repetition of utterances means repeating responses R11, R12, and R13 of "I can't hear" from the smart speaker 1006, and re-instructions of job content V13, V14, and V15 by voice. If the user's speech has a sufficient sound pressure level, such repetition will not occur. However, if there are people nearby, if there are meetings nearby, or if the office is quiet during breaks, having to speak loudly in front of the image forming device can be distressing for the user. Furthermore, consideration must be given to the fact that there are users who are unable to speak due to mental conditions, illnesses, disabilities, etc.

そこで、スリープ状態において、ウェークワードの発声がなされ、音声認識モードが起動された際、ステップＳ１０３がYesになって、ステップＳ１０８以降の処理を行う。つまり、次に移行すべき状態として、次状態特定部２０２がウォームアップ状態を特定する。ウォームアップ状態に対応する閾値を、閾値テーブル２０５から読み出し、当該閾値を、ウェークワード発話時の音圧レベルと比較して、ウェークワード発話時の音圧レベルは、ウォームアップ状態の閾値以下かどうかを判定する（ステップＳ１０８）。 Therefore, in the sleep state, when a wake word is uttered and the voice recognition mode is activated, step S103 becomes Yes, and the processes from step S108 onwards are performed. That is, the next state specifying unit 202 specifies the warm-up state as the state to which the next state should be transferred. The threshold corresponding to the warm-up state is read from the threshold table 205, and the threshold is compared with the sound pressure level when the wake word is uttered to determine whether the sound pressure level when the wake word is uttered is equal to or lower than the threshold for the warm-up state. is determined (step S108).

ウェークワードの音圧レベルが閾値以下であれば（ステップＳ１０８でYes）、自然言語処理部１０５の会話シーケンスにより指示内容が確定するまで（ステップＳ１１１でNo）、現在の状態を維持する（ステップＳ１１２）。 If the sound pressure level of the wake word is below the threshold (Yes in step S108), the current state is maintained (step S112) until the instruction content is determined by the conversation sequence of the natural language processing unit 105 (No in step S111). ).

音声認識により指示内容が確定すれば（ステップＳ１１１でYes）、認識された指示内容と、画像データとを含むジョブデータを生成して待ち行列メモリ１０２に蓄積する（ステップＳ１１３）。その後、ウォームアップ状態に移行し（ステップＳ１１４）、ジョブ実行状態に移行して（ステップＳ１０６）、待ち行列メモリ１０２に蓄積されたジョブデータを実行する（ステップＳ１０７）。この場合、ユーザーのシーケンスは図７（ａ）に示すものとなる。つまり、ウェークワードの発話Ｖ１がなされ、音声による指示Ｖ２がなされた後にウォームアップ状態への移行Ｗ１を開始する。音声による指示Ｖ２は、動作音がない良好な音響環境下でなされるから、図６（ｂ）に示したようなスマートスピーカー１００６のレスポンス、音声によるジョブ内容の指示の繰り返しが発生しない。 If the instruction content is determined by voice recognition (Yes in step S111), job data including the recognized instruction content and image data is generated and stored in the queue memory 102 (step S113). Thereafter, the process shifts to a warm-up state (step S114), shifts to a job execution state (step S106), and executes the job data stored in the queue memory 102 (step S107). In this case, the user's sequence will be as shown in FIG. 7(a). That is, after the wake word V1 is uttered and the voice instruction V2 is given, the transition W1 to the warm-up state is started. Since the audio instruction V2 is given in a good acoustic environment with no operating sounds, the response of the smart speaker 1006 and the repetition of the audio instruction about the job content as shown in FIG. 6(b) do not occur.

ウェークワード発声時のユーザーの音圧レベルが、閾値テーブル２０５に記載されたウォームアップ状態の閾値を上回る場合（ステップＳ１０８でNo）、ウォームアップ状態に移行し（ステップＳ１０９）、その後、スキャナー１００４の読み取りにより得られた画像データと、スマートスピーカー１００６の音声認識で認識された指示内容とを含むジョブデータを生成して待ち行列メモリ１０２に蓄積する（ステップＳ１０５）。 If the sound pressure level of the user when uttering the wake word exceeds the warm-up state threshold listed in the threshold table 205 (No in step S108), the scanner 1004 transitions to the warm-up state (step S109). Job data including the image data obtained by the reading and the instruction content recognized by the voice recognition of the smart speaker 1006 is generated and stored in the queue memory 102 (step S105).

（４－４）ＰＣプリントの割り込み時
待ち行列メモリ１０２が空だったが、ウェークワードの入力後、端末１０１１が要求したジョブデータが割り込んできたケースについて説明する。具体的にいうと、上述した事業所において、ウェークワードが入力された後、ユーザーＢにより要求されたジョブデータが割り込んできた場合、図５のステップＳ１０３がNo、ステップＳ１０４がYesになる。音圧レベル比較部２０６は、閾値テーブル２０５から読み出した、次に実行すべきジョブの動作音の閾値と、ウェークワードの音圧レベルとを比較する（ステップＳ１１５）。ウェークワードの音圧レベルが閾値以下であれば（ステップＳ１１５でYes）、自然言語処理部１０５の会話シーケンスにより指示内容が確定するまで（ステップＳ１１６）、現在の状態を維持する（ステップＳ１１７）。音声認識により指示内容が確定すれば（ステップＳ１１６でYes）、認識された指示内容と、画像データとを含むジョブデータを生成して待ち行列メモリ１０２に蓄積する（ステップＳ１０５）。ユーザーＡがウェークワードＶ１を発話した後、ユーザーＢがＰＣプリントの実行が要求されたケースの会話シーケンスを図７（ｂ）に示す。この場合、ウェークワードが入力された後に、ユーザーＢが要求したジョブのジョブデータが要求され（Ｌ１）、ジョブデータが待ち行列メモリ１０２に格納されたが、ユーザーＡの音声によるジョブの指示Ｖ２がなされるまで、ユーザーＢによるジョブの実行開始は猶予される（Ｌ２）。ユーザーＡによる音声認識は、静寂な音響環境下でなされるので、ユーザーによる音声入力と、レスポンスとの繰り返しは発生しない。 (4-4) When interrupting PC printing A case will be described in which the queue memory 102 is empty, but job data requested by the terminal 1011 interrupts after inputting the wake word. Specifically, in the above-mentioned office, if the job data requested by user B interrupts after the wake word is input, step S103 in FIG. 5 becomes No and step S104 becomes Yes. The sound pressure level comparison unit 206 compares the threshold of the operation sound of the job to be executed next, read from the threshold table 205, with the sound pressure level of the wake word (step S115). If the sound pressure level of the wake word is below the threshold (Yes in step S115), the current state is maintained (step S117) until the instruction content is determined by the conversation sequence of the natural language processing unit 105 (step S116). If the instruction content is determined by voice recognition (Yes in step S116), job data including the recognized instruction content and image data is generated and stored in the queue memory 102 (step S105). FIG. 7B shows a conversation sequence in a case where user B is requested to execute PC print after user A utters wake word V1. In this case, after the wake word is input, the job data of the job requested by user B is requested (L1) and the job data is stored in the queue memory 102, but the job instruction V2 by user A's voice is The start of job execution by user B is postponed until the job is completed (L2). Since the speech recognition by user A is performed in a quiet acoustic environment, repetition of the user's speech input and response does not occur.

（４－５）ジョブ実行中に音声認識モードが起動された場合
ウェークワードが発せられる前に待ち行列メモリ１０２にジョブデータが蓄積され、既にジョブの実行が開始されている場合、図５のステップＳ１０３、ステップＳ１０４がNoになり、ウェークワードが発せられた際のユーザーの声の音圧レベルと、閾値との比較（ステップＳ１０７、Ｓ１１１）は行わない。音声認識モードが起動しているということは、ウェークワードの入力のため、ユーザーが発した声が相応の音圧レベルを有しているということであり、待ち行列メモリ１０２に蓄積されているジョブを停止することなく、そのまま実行を継続したとしても支障がないと考えられるからである。ユーザーの声が充分な音圧レベルでないとしても、ジョブが既に実行されている場合、ジョブを中止するようなことはせず、画像形成装置１０００の作業効率を優先すべきと考えるためである。 (4-5) When the voice recognition mode is activated during job execution If the job data is accumulated in the queue memory 102 before the wake word is issued and the job execution has already started, the step in FIG. If S103 and S104 become No, the sound pressure level of the user's voice when the wake word is uttered is not compared with the threshold (steps S107 and S111). The fact that the voice recognition mode is activated means that the voice uttered by the user has a suitable sound pressure level to input the wake word, and the voice recognition mode is active. This is because it is considered that there will be no problem even if the execution continues without stopping. This is because even if the user's voice does not have a sufficient sound pressure level, if a job is already being executed, the job should not be canceled and the work efficiency of the image forming apparatus 1000 should be prioritized.

（４－６）音声認識により指示内容が特定するまでの過程
図８は、自然言語処理部１０５、スマートスピーカー１００６による会話シーケンスの手順を示すフローチャートである。本フローチャートは、ウェークワードが入力されて音声認識モードが起動された後に実行される。スマートスピーカー１００６に対してユーザーによる発声がなされるのを待つ（ステップＳ１２０）。ユーザーによる発声がなされると（ステップＳ１２０でYes）、ユーザーの音声を音素に変換し（ステップＳ１２１）、音素を単語に変換する（ステップＳ１２２）。単語を隠れハフマンモデル等の学習モデルに適用して発話文を生成し（ステップＳ１２３）、当該発話文のテキスト文字列を自然言語処理部１０５に引き渡す。自然言語処理部１０５は、スマートスピーカー１００６から受け取った発話文がジョブ設定として意味をなすかどうかを判定する（ステップＳ１２４）。受け取った発話文がジョブ設定として意味をなす場合（ステップＳ１２４でYes）、自然言語処理部１０５の解析で明らかになった意味内容を示すテキスト文字列をスマートスピーカー１００６に引き渡して、意味内容をスマートスピーカー１００６に発声させることでユーザーに対する応答を行う（ステップＳ１２７）。この応答に対して肯定的な発声がなされたかどうかを判定し（ステップＳ１２８）、肯定的な発声がなされると、本フローチャートを終了する。否定的な発声がなされると、ステップＳ１２０にまで戻ってユーザーによる発声を再び待つ。得られた発話文がジョブとして意味をなさない場合（ステップＳ１２４でNo）、「もっと大きな声で」又は「もっとマイクに近づいて」とのガイダンスを発して、音声の再入力をユーザーに促し（ステップＳ１２５）、ステップＳ１２０に戻ってユーザーによる発声を再び待つ。以上の過程を経て、音声による指示内容を特定する。 (4-6) Process until instruction content is specified by voice recognition FIG. 8 is a flowchart showing the procedure of a conversation sequence by the natural language processing unit 105 and the smart speaker 1006. This flowchart is executed after the wake word is input and the voice recognition mode is activated. The process waits for the user to speak to the smart speaker 1006 (step S120). When the user speaks (Yes in step S120), the user's voice is converted into phonemes (step S121), and the phonemes are converted into words (step S122). The words are applied to a learning model such as a hidden Huffman model to generate an utterance (step S123), and the text string of the utterance is delivered to the natural language processing unit 105. The natural language processing unit 105 determines whether the utterance received from the smart speaker 1006 makes sense as a job setting (step S124). If the received utterance makes sense as a job setting (Yes in step S124), a text string indicating the meaning revealed by the analysis of the natural language processing unit 105 is passed to the smart speaker 1006, and the meaning is changed into a smart speaker. A response to the user is made by making the speaker 1006 speak (step S127). It is determined whether or not an affirmative utterance is made in response to this response (step S128), and if an affirmative utterance is made, this flowchart ends. If a negative utterance is made, the process returns to step S120 and waits for the user's utterance again. If the obtained utterance does not make sense as a job (No in step S124), the user is prompted to re-enter the voice by issuing guidance such as "speak louder" or "come closer to the microphone" ( Step S125), the process returns to step S120 and waits for the user to speak again. Through the above process, the content of the voice instruction is specified.

［５］まとめ
以上のように本実施形態によれば、ウォームアップ状態に移行する際、ジョブを実行する際の動作音が、音声認識のための音響環境を害するかどうかを次状態に移行する前、ジョブの実行を開始する前に判定して、音声認識のための音響環境を害する場合、各駆動装置のウォームアップ状態への移行やジョブの実行を行わせず、音声認識環境を良好なものとする。良好な音響環境を維持してウェークワードに続く音声信号が聞こえるようにするので、画像形成装置１０００の動作音を原因とする認識率の低下を回避することができる。 [5] Summary As described above, according to the present embodiment, when moving to the warm-up state, it is determined whether the operation sound during job execution will harm the acoustic environment for speech recognition before moving to the next state. If the acoustic environment for speech recognition is determined to be harmful to the acoustic environment for speech recognition, it is determined before starting the execution of the job that the speech recognition environment is maintained in a good condition without transitioning to the warm-up state of each drive unit or executing the job. shall be taken as a thing. Since a good acoustic environment is maintained so that the audio signal following the wake word can be heard, it is possible to avoid a decrease in the recognition rate due to the operation sound of the image forming apparatus 1000.

ウォームアップ状態に移行する際、ジョブを実行する際の動作音が、音声認識のための音響環境を害するかどうかの判定は、ユーザーが発したウェークワードの音圧レベルと、閾値テーブルに記載された閾値とに基づきなされるので、音声入力や音声認識のやり直しを発生させない。そのため、画像形成装置１０００が配置された事業所の作業効率を低下させることはない。 When transitioning to the warm-up state, the judgment as to whether or not the operating sound during job execution will harm the acoustic environment for speech recognition is based on the sound pressure level of the wake word uttered by the user and the sound pressure level recorded in the threshold table. This is done based on the set threshold value, so there is no need to redo voice input or voice recognition. Therefore, the work efficiency of the office where the image forming apparatus 1000 is installed will not be reduced.

［６］変形例
以上、本発明を実施の形態に基づいて説明してきたが本発明は上述の実施の形態に限定されないのは勿論であり以下の変形例が考えられる。 [6] Modifications Although the present invention has been described above based on the embodiments, it goes without saying that the present invention is not limited to the above-described embodiments, and the following modifications can be considered.

（１）上記実施形態において、閾値テーブルにおける閾値は予め記載されているとしたがこれに限られない。画像形成装置１０００が逐次、閾値テーブル２０５に記載された閾値の更新を行うようにしてもよい。 (1) In the above embodiment, the threshold values in the threshold table are described in advance, but the invention is not limited to this. The image forming apparatus 1000 may sequentially update the threshold values listed in the threshold value table 205.

閾値更新のための構成要素を図９に示す。閾値更新のための構成要素としては、録音部２１１、ストレージ２１２が追加されている。 Components for updating the threshold are shown in FIG. A recording unit 211 and a storage 212 are added as components for updating the threshold value.

録音部２１１は、駆動装置が動作を開始した際、スマートスピーカー１００６のマイクにより駆動装置の駆動による動作音を含む環境音を録音する。そして、録音された音声データを含む録音ファイル（ウォームアップ時録音ファイル２１２Ｗ、コピー時録音ファイル２１２Ｃ、後処理付きコピー時録音ファイル２１２Ｆ、ＰＣプリント時録音ファイル２１２Ｐ）をストレージ２１２に格納する。 When the driving device starts operating, the recording unit 211 uses the microphone of the smart speaker 1006 to record environmental sounds including operation sounds caused by driving the driving device. Then, recording files (warm-up recording file 212W, copying recording file 212C, copying recording file with post-processing 212F, PC printing recording file 212P) containing the recorded audio data are stored in the storage 212.

音圧レベル算出部２０３は、そうして録音された録音ファイルのそれぞれにおいて音圧レベルを算出し、算出された音圧レベルを用いて閾値テーブル２０５を作成する。画像形成装置１０００が設置された場所の周囲環境の騒音を含む形で、ウォームアップ状態で図２に示した各機構部が動作する際の動作音や、コピー時の動作音が録音されるので、閾値テーブル２０５に記載される音圧レベルは画像形成装置１０００の実際の使用環境で生ずるものに近くなる。 The sound pressure level calculation unit 203 calculates the sound pressure level for each of the recording files thus recorded, and creates a threshold table 205 using the calculated sound pressure levels. The operating sounds of the various mechanisms shown in FIG. 2 in the warm-up state and the operating sounds during copying are recorded, including the noise of the surrounding environment where the image forming apparatus 1000 is installed. , the sound pressure level described in the threshold value table 205 is close to that occurring in the actual usage environment of the image forming apparatus 1000.

また、一定期間が経過する度に、録音部２１１による録音、及び、音圧レベル算出部２０４による音圧レベルの算出を実行し、閾値テーブル２０５を更新してもよい。画像形成装置１０００や後処理装置１００５が劣化することによる動作音の増大を、閾値テーブルに反映して高精度な判定を行うことができる。 Furthermore, the threshold table 205 may be updated by performing recording by the recording unit 211 and calculating the sound pressure level by the sound pressure level calculation unit 204 every time a certain period of time passes. An increase in operating noise due to deterioration of the image forming apparatus 1000 or the post-processing apparatus 1005 can be reflected in the threshold table to perform highly accurate determination.

（２）上記実施形態ではウォームアップ状態への移行や別ユーザーが要求したジョブの実行開始を猶予したが、そのようにウォームアップ状態への移行やジョブ実行開始を猶予したとしても、音声認識を正しく行うことができないことがある。このような場合、音声入力から、タッチパネルディスプレイ１００１の操作に切り替えるよう案内を発するよう、自然言語処理部１０５に報知に行わせてもよい。図１０は、タッチパネルディスプレイ１００１への切り替えを促す場合の会話シーケンスを示す。このシーケンスでは、音声の指示Ｖ２を発したものの、ジョブ内容を認識し得ない場合、タッチパネルディスプレイ１００１への切り替えを促すレスポンスＲ２１をユーザーに返す。これに応じて、タッチパネルディスプレイ１００１を用いた操作Ｐ１、Ｐ２を行うことで、ユーザーによる音声入力の繰り返しが生じない。 (2) In the above embodiment, the transition to the warm-up state and the start of execution of a job requested by another user are postponed, but even if the transition to the warm-up state and the start of job execution are postponed, voice recognition cannot be performed. Sometimes you can't do it correctly. In such a case, the natural language processing unit 105 may be made to issue a notification to switch from voice input to operation of the touch panel display 1001. FIG. 10 shows a conversation sequence when prompting the user to switch to the touch panel display 1001. In this sequence, if the job content cannot be recognized even though a voice instruction V2 is issued, a response R21 prompting the user to switch to the touch panel display 1001 is returned to the user. By performing operations P1 and P2 using the touch panel display 1001 in accordance with this, the user does not have to repeat voice input.

（３）上記実施形態では、合成音圧レベルを予め閾値テーブル２０５に記載するとしたがこれに限られない。各機構部が発する音圧レベルを閾値テーブル２０５に記載しておいてもよい。そして次になすべき動作を次状態特定部２０２が決定する度に、式３の計算を実行することで合成音圧レベルを算出して、当該合成音圧レベルと、音圧レベル算出部２０４が算出した音圧レベルとの比較を閾値比較部２０６に行わせてもよい。 (3) In the above embodiment, the synthetic sound pressure level is written in the threshold table 205 in advance, but the present invention is not limited to this. The sound pressure level emitted by each mechanical section may be recorded in the threshold table 205. Then, each time the next state identifying unit 202 determines the next action to be performed, the synthesized sound pressure level is calculated by executing the calculation of Equation 3, and the synthesized sound pressure level and the sound pressure level calculating unit 204 are calculated. The threshold comparison unit 206 may be made to compare the calculated sound pressure level with the calculated sound pressure level.

（４）スマートスピーカー１００６における音声認識技術は日々進化しており、隠れハフマンモデル、伝播ニューラルネットワーク、ビタビアルゴリズム、ディープラーニング等の採用で、音声認識の耐ノイズ性は日々向上している。劣悪な音響環境の音声入力がなされ、一部、全部の音素、単語が不鮮明であったとしても、前後の文脈から、それらしき文章を推測することができるからである。こうした、スマートスピーカー１００６による音声認識の耐ノイズ性を示す係数を、各動作時の合成音圧レベルに乗じた値を閾値として閾値テーブル２０５に記載して、ウェークワードの音圧レベルと比較してもよい。 (4) The voice recognition technology in the smart speaker 1006 is evolving day by day, and the noise resistance of voice recognition is improving day by day by employing hidden Huffman models, propagation neural networks, Viterbi algorithms, deep learning, etc. This is because even if voice input is performed in a poor acoustic environment and some or all of the phonemes and words are unclear, a similar sentence can be inferred from the surrounding context. The coefficient indicating the noise resistance of speech recognition by the smart speaker 1006 is multiplied by the synthesized sound pressure level during each operation, and the value is set as a threshold value and is recorded in the threshold table 205, and compared with the sound pressure level of the wake word. Good too.

本開示にかかる画像形成装置１０００は、スマートスピーカー１００６による音声認識で、コピー、ＰＣプリント、スキャン、ＦＡＸ等、様々なジョブを画像形成装置１０００に実行させることができ、ＯＡ機器、情報機器の産業分野を始め、小売業、賃貸業、不動産業、広告業、運輸業、出版業等、様々な業種の産業分野で利用される可能性がある。 The image forming apparatus 1000 according to the present disclosure can cause the image forming apparatus 1000 to execute various jobs such as copying, PC printing, scanning, and faxing through voice recognition by the smart speaker 1006, and can be used in the OA equipment and information equipment industries. It has the potential to be used in a variety of industrial fields, including the retail industry, rental industry, real estate industry, advertising industry, transportation industry, and publishing industry.

２０１状態制御部
２０２次状態特定部
２０３ウェークワードキャプチャ部
２０４音圧レベル算出部
２０４Ｍ音声波形メモリ
２０５閾値テーブル
２０６閾値比較部
２０７維持制御部
１０００画像形成装置
１００１タッチパネルディスプレイ
１００３排出トレイ
１００６スマートスピーカー
１０１１端末 201 State control unit 202 Next state identification unit 203 Wake word capture unit 204 Sound pressure level calculation unit 204M Audio waveform memory 205 Threshold table 206 Threshold comparison unit 207 Maintenance control unit 1000 Image forming device 1001 Touch panel display 1003 Ejection tray 1006 Smart speaker 1011 Terminal

Claims

An image forming apparatus that activates a voice recognition mode in response to utterances by a user and receives instructions regarding image formation using voice, the image forming apparatus comprising:
After the voice recognition mode is activated, specifying means for specifying a state to which the next transition should be made based on the current state;
Before moving to the next state, it is determined whether the operational sound generated in that state interrupts the voice recognition in the voice recognition mode based on the sound pressure level of the voice uttered by the user when the voice recognition mode is activated. A determination means,
If it is determined that the image formation is interrupted, the control means maintains the current state until the content of the instruction regarding image formation is determined by voice recognition, and shifts to the next state when the content of the instruction is determined;
An image forming apparatus comprising:

The identifying means identifies a warm-up state as the next state to be transitioned to when the current state is a sleep state,
The determination as to whether or not to interrupt the voice recognition by the determination means is made by comparing the sound pressure level of the operation sound generated in the warm-up state and the sound pressure level of the user's utterances when the voice recognition mode is activated. The image forming apparatus according to claim 1, characterized in that:

The specifying means specifies, when the current state is a state of waiting for execution of an image forming job, an execution state in which the waiting job is executed as a state to which the next state should be transferred;
The determination as to whether the operation sound interrupts the voice recognition by the determination means is based on the sound pressure level of the operation sound when executing a job waiting to be executed and the sound pressure level of the user's utterance when starting the voice recognition mode. The image forming apparatus according to claim 1, wherein the image forming apparatus is determined by comparing.

Equipped with a threshold table showing multiple threshold values of sound pressure level,
Each threshold value in the threshold table can be used to drive either a plurality of mechanical parts built into the own device and/or a plurality of mechanical parts in a post-processing device connected to the own device, either singly or simultaneously. Indicates the sound pressure level of the operating sound caused by
The determination as to whether or not to interrupt the voice recognition by the determination means is based on the threshold value corresponding to the next state to be transitioned to specified by the identification means among the plurality of threshold values listed in the threshold value table, and the sound of the voice uttered by the user. The image forming apparatus according to claim 1, wherein the image forming apparatus performs the image forming process by comparing the pressure level.

Equipped with a recording means for recording operation sounds generated by driving any of the plurality of mechanical parts built into the own device and the plurality of mechanical parts connected to the own device, singly or simultaneously,
The image forming apparatus according to claim 4, wherein the threshold value shown in the threshold value table is determined based on the sound pressure level of an operation sound recorded by a recording unit.

The mechanical parts built into the device include an exposure device that exposes the photoconductor to form an electrostatic latent image, a developer that develops the electrostatic latent image obtained on the photoconductor, a conveyance section that conveys the sheet, 6. The image forming apparatus according to claim 4, further comprising at least two parts: a transfer section that transfers the image obtained by development onto the sheet, a fixing device that fixes the image transferred to the sheet, and a document reading section that reads the document.

If voice input from the user is accepted and voice recognition is attempted without starting the transition to the next state identified by the identifying means, but the content of the utterance cannot be recognized, a statement that it is appropriate to use the operation panel. The image forming apparatus according to any one of claims 1 to 6, further comprising a notification means for notifying a user.