JP7439459B2

JP7439459B2 - Machine learning device, conveyance device, image forming device, machine learning method, and program

Info

Publication number: JP7439459B2
Application number: JP2019197580A
Authority: JP
Inventors: 一彦小輪▲瀬▼; 浩一斎藤; 駿菅井
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2024-02-28
Anticipated expiration: 2039-10-30
Also published as: JP2021071875A

Description

本発明は、機械学習装置、搬送装置、画像形成装置、機械学習方法、およびプログラムに関する。 The present invention relates to a machine learning device, a transport device, an image forming device, a machine learning method, and a program.

従来、用紙、フィルム、布等の対象物を搬送する搬送装置を備えた装置が知られている。このような装置として、たとえば、プリンター、複写機、ファクシミリ、およびこれらの複合機（ＭＦＰ：ＭｕｌｔｉｆｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）などの画像形成装置が知られている。 2. Description of the Related Art Conventionally, devices equipped with a conveyance device for conveying objects such as paper, film, cloth, etc. are known. As such devices, image forming devices such as printers, copiers, facsimile machines, and multifunction peripherals (MFPs) of these devices are known.

このような画像形成装置を用いる商業印刷機器分野においては、ユーザーのニーズに合った出力物（印刷物）の提供が求められている。そのため、画像形成（印刷）に使用される媒体、画像形成装置に対する要望は多岐にわたる。これらの要望に対応するためには、個別の状況に応じて画像形成装置を制御する必要がある。しかしながら、現在は、人手による設計に頼っているため、あらゆる要望には応えられていない。人手による設計では、最悪条件、代表的な条件を満たす設計にならざるを得ない。 In the field of commercial printing equipment that uses such image forming apparatuses, there is a need to provide output products (printed materials) that meet the needs of users. Therefore, there are various demands on media and image forming apparatuses used for image formation (printing). In order to meet these demands, it is necessary to control image forming apparatuses according to individual situations. However, since it currently relies on manual design, it cannot meet all requests. Manual design has no choice but to create a design that satisfies the worst and typical conditions.

たとえば、搬送装置内において用紙を搬送するローラーは、用紙の種類、ローラーの劣化に応じて回転速度が変わる。用紙をダメージ無く搬送し、さらに搬送路内の用紙の撓みに関する制約を満たすためには、適切にローラーの回転速度を設定する必要がある。 For example, the rotational speed of a roller that conveys paper within a conveying device changes depending on the type of paper and the deterioration of the roller. In order to convey the paper without damage and to satisfy the constraints regarding the deflection of the paper in the conveyance path, it is necessary to appropriately set the rotational speed of the rollers.

特許文献１（特開２０１４－２０１４０９号公報）には、第１の搬送ローラーと、これに隣り合う下流側の第２の搬送ローラーと、第１の搬送ローラーおよび第２の搬送ローラーをそれぞれ回転駆動する第１の駆動モーターおよび第２の駆動モーターと、用紙の撓み量を検出する撓み検出部と、第２の駆動モーターのトルク量を検出するトルク検出部と、制御モードに応じて第１の駆動モーターおよび第２の駆動モーターをそれぞれ制御する制御部とを備える画像形成装置が開示されている。制御部は、先行の用紙について検出された撓み量またはトルク量に基づいて、後続の用紙の搬送時における第１の搬送ローラーと第２の搬送ローラーとの用紙搬送速度の相対速度差を調整する。 Patent Document 1 (Japanese Unexamined Patent Application Publication No. 2014-201409) describes a first conveyance roller, a second conveyance roller adjacent to this on the downstream side, and a first conveyance roller and a second conveyance roller that rotate, respectively. A first drive motor and a second drive motor to be driven, a deflection detection section that detects the amount of deflection of the paper, a torque detection section that detects the amount of torque of the second drive motor, and a first drive motor that detects the amount of deflection of the paper. An image forming apparatus is disclosed that includes a control section that controls a first drive motor and a second drive motor, respectively. The control unit adjusts a relative speed difference in paper transport speed between the first transport roller and the second transport roller when transporting the subsequent paper based on the amount of deflection or torque detected for the preceding paper. .

また、近年のコンピュータの能力向上にともない、機械学習が注目を浴びている。たとえば、特許文献２（特開２０１７－０３４８４４号公報）には、電動機制御における電流ゲインのパラメーを最適化することにより、モーターの応答性の向上、送りムラの改善、および精度を向上させることを目的とした機械学習装置が開示されている。 Additionally, machine learning is attracting attention as computer capabilities have improved in recent years. For example, Patent Document 2 (Japanese Unexamined Patent Publication No. 2017-034844) discloses that by optimizing the current gain parameter in motor control, the response of the motor can be improved, uneven feeding can be improved, and accuracy can be improved. A machine learning device for the purpose is disclosed.

特開２０１４－２０１４０９号公報Japanese Patent Application Publication No. 2014-201409 特開２０１７－０３４８４４号公報JP2017-034844A

用紙等の対象物を搬送する搬送装置内には、複数のローラー（搬送手段）が設置されている。このため、各ローラーの回転速度の組み合わせは膨大にある。したがって、ローラーの回転速度の組み合わせ（すなわち、モーター等の駆動手段のパラメーター）を最適化することは、非常に困難である。 A plurality of rollers (transport means) are installed in a transport device that transports objects such as sheets of paper. For this reason, there are a huge number of combinations of rotational speeds for each roller. Therefore, it is very difficult to optimize the combination of rotational speeds of the rollers (ie, the parameters of the drive means such as motors).

本開示は、上記の問題点に鑑みなされたものであって、その目的は、搬送手段を駆動する駆動手段のパラメーターの値を最適化することが可能な機械学習装置、搬送装置、画像形成装置、機械学習方法、およびプログラムを提供することにある。 The present disclosure has been made in view of the above-mentioned problems, and an object of the present disclosure is to provide a machine learning device, a transportation device, and an image forming device that can optimize the values of parameters of a driving device that drives a transportation device. , machine learning methods, and programs.

本開示のある局面に従うと、機械学習装置は、搬送対象物の撓み量または引っ張り量を表す状態量を、搬送装置の搬送路の複数の区間において取得する状態量取得手段を備える。搬送装置は、複数の搬送手段によって搬送対象物を順に挟持して、搬送対象物を搬送路の上流から下流へと搬送する。機械学習装置は、状態量に基づいて報酬を付与する報酬付与手段と、各搬送手段を駆動する各駆動手段のパラメーターのセットの価値をセット毎に表す行動価値関数を、報酬に基づき更新する機械学習を行う学習手段と、更新後の行動価値関数に基づいて複数のセットから１つのセットを決定し、かつ、決定されたセットのパラメーターで搬送手段を駆動するように駆動手段に対して指示する決定手段とをさらに備える。 According to an aspect of the present disclosure, a machine learning device includes a state quantity acquisition unit that acquires a state quantity representing a deflection amount or a tension amount of a conveyance target object in a plurality of sections of a conveyance path of a conveyance device. The conveyance device sequentially holds an object to be conveyed by a plurality of conveyance means and conveys the object from upstream to downstream on a conveyance path. The machine learning device is a machine that updates an action value function that represents the value of a set of parameters of each drive means that drives each transport means based on the reward. determining one set from the plurality of sets based on the learning means that performs learning and the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set. and determining means.

好ましくは、状態量取得手段は、選択されたセットのパラメーターに基づいて搬送手段を駆動したときの状態量をさらに取得する。報酬付与手段は、さらに取得された状態量に基づいて報酬をさらに付与する。学習手段は、さらに付与された報酬に基づき、行動価値関数をさらに更新する。 Preferably, the state quantity acquisition means further acquires the state quantity when the conveying means is driven based on the selected set of parameters. The reward granting means further grants a reward based on the acquired state amount. The learning means further updates the action value function based on the given reward.

好ましくは、セットは、速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値の少なくとも１つを含む。 Preferably, the set includes at least one of speed, drive timing, stop timing, shift timing, and drive current value.

好ましくは、状態量取得手段は、搬送手段による搬送対象物の搬送速度、または搬送路中の搬送対象物の位置に基づき、状態量を取得する。 Preferably, the state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path.

好ましくは、搬送装置をシミュレートするシミュレーターと通信する。状態量取得手段は、シミュレーターからの出力に基づき、状態量を取得する。 Preferably, it communicates with a simulator that simulates the transport device. The state quantity acquisition means acquires the state quantity based on the output from the simulator.

好ましくは、行動価値関数は、Ｑテーブルである。決定手段は、取得された状態量とＱテーブルとに基づいて複数のセットから１つのセットを決定する。 Preferably, the action value function is a Q-table. The determining means determines one set from the plurality of sets based on the acquired state quantity and Q table.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、所定の撓み量を許容する設定がなされている場合、報酬付与手段は、第１の搬送手段と第２の搬送手段との間の区間における状態量が所定の撓み量以下の撓み量を表しているときに、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. When a setting is made to allow a predetermined amount of deflection in the section between the first conveying means and the second conveying means among the plurality of sections, the reward giving means A positive reward is given when the state quantity in the section between the vehicle and the conveying means represents a deflection amount that is less than or equal to a predetermined deflection amount.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、搬送対象物の撓みを許容しない設定がなされている場合、報酬付与手段は、第１の搬送手段と第２の搬送手段との間の区間における状態量が引っ張り量を表しているときに、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. If the section between the first conveyance means and the second conveyance means among the plurality of sections is set not to allow deflection of the conveyed object, the reward giving means A positive reward is given when the state quantity in the section between the second transport means and the second transport means represents the amount of tension.

好ましくは、報酬付与手段は、状態量と、搬送手段の状態とに基づいて報酬を付与する。 Preferably, the reward giving means gives the reward based on the state quantity and the state of the transport means.

好ましくは、複数の搬送手段のうちの所定の搬送手段は、複数の搬送対象物を格納した格納手段から搬送対象物を１つずつ搬送路に搬送する。報酬付与手段は、複数の区間のうち、搬送対象物の後端が所定の搬送手段に到達する前の位置における状態量が引っ張り量を表しており、かつ搬送対象物の後端が所定の搬送手段を通過する際に所定の搬送手段が停止している場合、正の報酬を付与する。 Preferably, a predetermined transport means among the plurality of transport means transports the objects to be transported one by one from a storage means storing a plurality of objects to be transported to the transport path. The reward giving means is such that a state quantity at a position before the rear end of the conveyance target reaches a predetermined conveyance means among the plurality of sections represents the amount of tension, and when the rear end of the conveyance target reaches the predetermined conveyance means. If the predetermined transport means is stopped when passing through the means, a positive reward is given.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、搬送対象物の撓みと、第２の搬送手段において搬送方向への搬送対象物への力の発生とが許容されていない場合、報酬付与手段は、第１の搬送手段と第２の搬送手段との間の区間における状態量が引っ張り量および撓み量のいずれも表していないときに、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. In the section between the first conveyance means and the second conveyance means among the plurality of sections, the object to be conveyed is deflected and the second conveyance means generates a force on the object to be conveyed in the conveyance direction. If not allowed, the reward giving means gives a positive reward when the state amount in the section between the first conveying means and the second conveying means does not represent either the amount of tension or the amount of deflection. do.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第２の搬送手段が搬送対象物を引っ張った状態で搬送することにより第１の搬送手段を搬送対象物が通過する時間を早くすることが可能な場合に、報酬付与手段は、第２の搬送手段が搬送対象物を引っ張った状態で搬送しているときに、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. If it is possible to speed up the time for the object to be transported through the first transport means by transporting the object in a state where the second transport means pulls the object, the reward giving means A positive reward is given when the means is conveying the conveyed object while pulling it.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第１の搬送手段と第２の搬送手段とで同時に搬送対象物を搬送している場合、複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、所定の撓み量を許容する設定がなされているとき、報酬付与手段は、第１の搬送手段の搬送速度が第２の搬送手段の搬送速度以上であることを条件に、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. When the object to be transported is simultaneously transported by the first transport means and the second transport means, a predetermined deflection occurs in the section between the first transport means and the second transport means among the plurality of sections. When the setting is made to allow the amount, the reward giving means gives a positive reward on the condition that the conveyance speed of the first conveyance means is equal to or higher than the conveyance speed of the second conveyance means.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第１の搬送手段と第２の搬送手段とで同時に搬送対象物を搬送している場合、複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、搬送対象物の撓みを許容しない設定がなされている場合、報酬付与手段は、第１の搬送手段の搬送速度が第２の搬送手段の搬送速度以下であることを条件に、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. When the first conveying means and the second conveying means are simultaneously conveying the conveyed object, in the section between the first conveying means and the second conveying means among the plurality of sections, the conveyed object If the setting is such that the deflection is not allowed, the reward giving means gives a positive reward on the condition that the conveyance speed of the first conveyance means is equal to or lower than the conveyance speed of the second conveyance means.

好ましくは、搬送対象物は用紙である。
好ましくは、搬送対象物は布である。 Preferably, the object to be transported is paper.
Preferably, the object to be transported is cloth.

好ましくは、学習手段は、報酬と搬送対象物の物性とに基づき、各駆動手段のパラメーターの値を更新する機械学習を行う。 Preferably, the learning means performs machine learning to update the parameter values of each drive means based on the reward and the physical properties of the conveyed object.

好ましくは、物性は剛度である。
好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第１の搬送手段と第２の搬送手段とで同時に搬送対象物を搬送している場合、報酬付与手段は、搬送対象物の剛度が所定値以上であり、かつ、第１の搬送手段の搬送速度と第２の搬送手段の搬送速度とが同じであることを条件に、正の報酬を付与する。 Preferably, the physical property is stiffness.
Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. When the object to be transported is simultaneously transported by the first transport means and the second transport means, the remuneration means is configured such that the stiffness of the transport object is equal to or higher than a predetermined value, and the first transport means transports the object. A positive reward is given on the condition that the speed and the transport speed of the second transport means are the same.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第１の搬送手段と第２の搬送手段とで同時に搬送対象物を搬送しており、かつ複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、所定の撓み量を許容する設定がなされている場合、報酬付与手段は、搬送対象物の剛度が所定値未満であり、第１の搬送手段と第２の搬送手段との間の区間における状態量が所定の撓み量以下の撓み量を表しているときに、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. The object to be transported is simultaneously transported by the first transport means and the second transport means, and a predetermined deflection occurs in the section between the first transport means and the second transport means among the plurality of sections. When the setting is made to allow the amount, the reward giving means determines that the stiffness of the conveyed object is less than a predetermined value and the state quantity in the section between the first conveying means and the second conveying means is a predetermined value. A positive reward is given when the amount of deflection is less than or equal to the amount of deflection.

好ましくは、物性は坪量である。
好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第１の搬送手段と第２の搬送手段とで同時に搬送対象物を搬送している場合、報酬付与手段は、搬送対象物の坪量が所定値以上であり、かつ、第１の搬送手段の搬送速度と第２の搬送手段の搬送速度とが同じであることを条件に、正の報酬を付与する。 Preferably, the physical property is basis weight.
Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. When the first conveying means and the second conveying means are conveying the conveyed object at the same time, the reward giving means determines that the basis weight of the conveyed object is equal to or greater than a predetermined value and that the first conveying means A positive reward is given on the condition that the transport speed and the transport speed of the second transport means are the same.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。第１の搬送手段と第２の搬送手段とで同時に搬送対象物を搬送しており、かつ複数の区間のうち第１の搬送手段と第２の搬送手段との間の区間において、所定の撓み量を許容する設定がなされている場合、報酬付与手段は、搬送対象物の坪量が所定値未満であり、第１の搬送手段と第２の搬送手段との間の区間における状態量が所定の撓み量以下の撓み量を表しているときに、正の報酬を付与する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. The object to be transported is simultaneously transported by the first transport means and the second transport means, and a predetermined deflection occurs in the section between the first transport means and the second transport means among the plurality of sections. If the setting is made to allow the amount, the reward giving means determines that the basis weight of the object to be transported is less than a predetermined value and the state quantity in the section between the first transport means and the second transport means is a predetermined value. A positive reward is given when the amount of deflection is less than or equal to the amount of deflection.

好ましくは、搬送手段は、ローラー対である。
好ましくは、所定の撓み量は、搬送対象物の厚み方向の搬送路の幅未満の値である。 Preferably, the conveying means is a pair of rollers.
Preferably, the predetermined amount of deflection is less than the width of the conveyance path in the thickness direction of the object to be conveyed.

好ましくは、所定の撓み量は、撓み量が０よりも大きい所定の値である。
好ましくは、状態量取得手段は、搬送装置に設けられた機械式のセンサーのシミュレーションモデルを用いて撓み量を取得する。 Preferably, the predetermined amount of deflection is a predetermined value greater than zero.
Preferably, the state quantity acquisition means acquires the amount of deflection using a simulation model of a mechanical sensor provided in the transport device.

好ましくは、状態量取得手段は、搬送装置に設けられた機械式のセンサーからの出力に基づいて、撓み量を取得する。 Preferably, the state quantity acquisition means acquires the amount of deflection based on an output from a mechanical sensor provided in the transport device.

好ましくは、状態量取得手段は、搬送装置に設けられた光学式のセンサーのシミュレーションモデルを用いて撓み量を取得する。 Preferably, the state quantity acquisition means acquires the amount of deflection using a simulation model of an optical sensor provided in the transport device.

好ましくは、状態量取得手段は、搬送装置に設けられた光学式のセンサーからの出力に基づいて、撓み量を取得する。 Preferably, the state quantity acquisition means acquires the amount of deflection based on an output from an optical sensor provided in the transport device.

好ましくは、状態量取得手段は、搬送対象物の位置に基づき搬送対象物の長さを取得する。状態量取得手段は、取得された長さよりも搬送対象物の基準長さが長い場合には、取得された長さと基準長さとの差分を撓み量とする。 Preferably, the state quantity acquisition means acquires the length of the object to be transported based on the position of the object to be transported. If the reference length of the object to be transported is longer than the obtained length, the state quantity obtaining means determines the difference between the obtained length and the reference length as the amount of deflection.

好ましくは、状態量取得手段は、搬送装置に設けられた負荷検出手段のシミュレーションモデルを用いて搬送手段の負荷を取得する。状態量取得手段は、負荷に基づいて、引っ張り量を取得する。 Preferably, the state quantity acquisition means acquires the load of the conveyance means using a simulation model of a load detection means provided in the conveyance apparatus. The state quantity acquisition means acquires the amount of tension based on the load.

好ましくは、状態量取得手段は、搬送装置に設けられた負荷検出手段によって検出された負荷に基づいて、引っ張り量を取得する。 Preferably, the state quantity acquisition means acquires the amount of tension based on the load detected by the load detection means provided in the conveying device.

好ましくは、状態量取得手段は、搬送装置に設けられた光学式のセンサーのシミュレーションモデルを用いて引っ張り量を取得する。 Preferably, the state quantity acquisition means acquires the amount of tension using a simulation model of an optical sensor provided in the transport device.

好ましくは、状態量取得手段は、搬送装置に設けられた光学式のセンサーからの出力に基づいて、引っ張り量を取得する。 Preferably, the state quantity acquisition means acquires the amount of tension based on an output from an optical sensor provided in the conveyance device.

好ましくは、複数の搬送手段は、第１の搬送手段と、第１の搬送手段の下流側の次の搬送手段である第２の搬送手段とを含む。状態量取得手段は、搬送対象物の位置に基づき搬送対象物の長さを取得する。状態量取得手段は、取得された長さと搬送対象物の基準長さとの差分がなく、かつ第２の搬送手段の搬送速度が第１の搬送手段の搬送速度以上である場合、搬送対象物が引っ張られた状態と判断する。 Preferably, the plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means. The state quantity acquisition means acquires the length of the object to be transported based on the position of the object to be transported. The state quantity acquisition means determines that when there is no difference between the acquired length and the reference length of the object to be conveyed, and the conveyance speed of the second conveyance means is equal to or higher than the conveyance speed of the first conveyance means, the object to be conveyed is It is considered to be in a pulled state.

好ましくは、機械学習装置は、搬送手段の搬送対象物の搬送速度が、前回の機械学習時によって設定された搬送速度と異なった場合に、機械学習を再度実行する。 Preferably, the machine learning device performs machine learning again when the transport speed of the object to be transported by the transport means is different from the transport speed set during the previous machine learning.

好ましくは、機械学習装置は、機械学習の結果としてのパラメーターを含む更新用の制御プログラムを、搬送装置の動作を制御するコントローラーに送信する。 Preferably, the machine learning device transmits a control program for updating that includes parameters as a result of machine learning to a controller that controls the operation of the transport device.

本開示の他の局面に従うと、搬送装置は、上記機械学習装置を備える。
本開示のさらに他の局面に従うと、画像形成装置は、上記機械学習装置を備える。 According to another aspect of the present disclosure, a transport device includes the machine learning device described above.
According to yet another aspect of the present disclosure, an image forming apparatus includes the machine learning device described above.

本開示のさらに他の局面に従うと、機械学習方法は、搬送対象物の撓み量または引っ張り量を表す状態量を、搬送装置の搬送路の複数の位置において取得するステップを備える。搬送装置は、複数の搬送手段によって搬送対象物を順に挟持して、搬送対象物を搬送路の上流から下流へと搬送する。機械学習方法は、状態量に基づいて報酬を付与するステップと、各搬送手段を駆動する各駆動手段のパラメーターのセットの価値をセット毎に表す行動価値関数を、報酬に基づき更新するステップと、更新後の行動価値関数に基づいて複数のセットから１つのセットを決定し、かつ、決定されたセットのパラメーターで搬送手段を駆動するように駆動手段に対して指示するステップとをさらに備える。 According to still another aspect of the present disclosure, the machine learning method includes the step of acquiring state quantities representing the amount of deflection or the amount of tension of the conveyance target at a plurality of positions on the conveyance path of the conveyance device. The conveyance device sequentially holds an object to be conveyed by a plurality of conveyance means and conveys the object from upstream to downstream on a conveyance path. The machine learning method includes a step of providing a reward based on a state quantity, a step of updating an action value function representing the value of a set of parameters of each driving means that drives each transport means based on the reward, The method further includes the step of determining one set from the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set.

本開示のさらに他の局面に従うと、コンピュータを動作させるプログラムは、搬送対象物の撓み量または引っ張り量を表す状態量を、搬送装置の搬送路の複数の位置において取得するステップをコンピュータのプロセッサに実行させる。搬送装置は、複数の搬送手段によって搬送対象物を順に挟持して、搬送対象物を搬送路の上流から下流へと搬送する。プログラムは、状態量に基づいて報酬を付与するステップと、各搬送手段を駆動する各駆動手段のパラメーターのセットの価値をセット毎に表す行動価値関数を、報酬に基づき更新するステップと、更新後の行動価値関数に基づいて複数のセットから１つのセットを決定し、かつ、決定されたセットのパラメーターで搬送手段を駆動するように駆動手段に対して指示するステップとを、プロセッサにさらに実行させる、プログラム。 According to still another aspect of the present disclosure, a program for operating a computer causes the processor of the computer to acquire state quantities representing the amount of deflection or the amount of tension of the conveyance target at a plurality of positions on the conveyance path of the conveyance device. Let it run. The conveyance device sequentially holds an object to be conveyed by a plurality of conveyance means and conveys the object from upstream to downstream on a conveyance path. The program includes a step of giving a reward based on the state quantity, a step of updating an action value function representing the value of a set of parameters of each driving means that drives each conveyance means based on the reward, and a step of determining one set from the plurality of sets based on the action value function of the set, and instructing the driving means to drive the conveying means with the parameters of the determined set. ,program.

本開示によれば、搬送手段を駆動する駆動手段のパラメーターの値を最適化を得ることができる。 According to the present disclosure, it is possible to optimize the values of the parameters of the driving means for driving the conveying means.

強化学習の概要を説明するための模式図である。FIG. 2 is a schematic diagram for explaining an overview of reinforcement learning. 本実施の形態に係る学習システムを表した図である。FIG. 1 is a diagram showing a learning system according to the present embodiment. 画像形成装置の内部構造を示す概略図である。1 is a schematic diagram showing the internal structure of an image forming apparatus. 画像形成装置のハードウェア構成の一例を説明するためのブロック図である。FIG. 2 is a block diagram for explaining an example of the hardware configuration of an image forming apparatus. 搬送装置の一部の構成を表した模式図である。FIG. 2 is a schematic diagram showing the configuration of a part of the transport device. Ｑ学習で用いられるデーターを説明するための模式図である。FIG. 2 is a schematic diagram for explaining data used in Q-learning. 搬送装置の変形例を説明するための模式図である。It is a schematic diagram for explaining the modification of a conveyance device. 学習装置のハードウェア構成の典型例を表した図である。FIG. 2 is a diagram showing a typical example of the hardware configuration of a learning device. Ｑテーブルの概要を説明するための模式図である。FIG. 2 is a schematic diagram for explaining the outline of a Q table. 画像形成装置１の機能的構成を説明するための機能ブロック図である。1 is a functional block diagram for explaining the functional configuration of an image forming apparatus 1. FIG. 学習装置の構成と、シミュレーターの構成とを説明するための機能ブロック図である。FIG. 2 is a functional block diagram for explaining the configuration of a learning device and the configuration of a simulator. 搬送路の幅が通常の箇所を用紙Ｐが通過している状態を表した模式図である。FIG. 3 is a schematic diagram illustrating a state in which a sheet of paper P passes through a portion where the conveyance path has a normal width. 用紙が撓んでいる状態を表した模式図である。FIG. 3 is a schematic diagram showing a state in which paper is bent. 用紙が引っ張られて状態を表した模式図である。FIG. 2 is a schematic diagram illustrating a state in which paper is stretched. 幅が狭い箇所を用紙が通過している状態を表した模式図である。FIG. 3 is a schematic diagram illustrating a state in which paper passes through a narrow area. 画像形成装置の給紙カセット１４から用紙Ｐが搬送路１５に供給されている状態を表した模式図である。FIG. 3 is a schematic diagram showing a state in which paper P is being supplied to a conveyance path 15 from a paper feed cassette 14 of the image forming apparatus. 画像形成装置において印刷済みの用紙Ｐが排出トレイ２７１に排出されている状態を表した模式図である。3 is a schematic diagram illustrating a state in which printed paper P is discharged to a discharge tray 271 in the image forming apparatus. FIG. Ｑ学習の処理の手順を表したフロー図である。It is a flow diagram showing the procedure of Q-learning processing. 撓み量に関する報酬の付与例を説明するためのフロー図である。FIG. 3 is a flowchart for explaining an example of giving compensation regarding the amount of deflection. 許容できる撓み量を区間毎に判断するための処理を説明するためのフロー図である。FIG. 3 is a flow diagram for explaining processing for determining an allowable amount of deflection for each section. 引っ張り量に関する報酬の付与例を説明するためのフロー図である。It is a flowchart for explaining the example of granting the reward regarding the amount of pull. 画像形成装置内で強化学習を実施する構成を説明するための模式図である。FIG. 2 is a schematic diagram illustrating a configuration for implementing reinforcement learning within an image forming apparatus.

実施の形態におけるシステムについて、以下、図を参照しながら説明する。以下に説明する実施の形態において、個数、量などに言及する場合、特に記載がある場合を除き、本開示の範囲は必ずしもその個数、量などに限定されない。同一の部品、相当部品に対しては、同一の参照番号を付し、重複する説明は繰り返さない場合がある。 A system in an embodiment will be described below with reference to the drawings. In the embodiments described below, when referring to the number, amount, etc., the scope of the present disclosure is not necessarily limited to the number, amount, etc. unless otherwise specified. Identical or equivalent parts will be given the same reference numbers, and duplicate descriptions may not be repeated.

図面においては、実際の寸法の比率に従って図示しておらず、構造の理解を容易にするために、構造が明確となるように比率を変更して図示している箇所がある。なお、以下で説明される各変形例は、適宜選択的に組み合わされてもよい。 In the drawings, some parts are not shown according to the actual size ratio, but are shown with the ratio changed to make the structure clearer, in order to make the structure easier to understand. Note that each modification described below may be selectively combined as appropriate.

また、以下では、対象物を搬送装置を備えた機器として、画像形成装置を例に挙げて限定するが、これに限定されるものではない。対象物としては、用紙、フィルム、布等が挙げられる。 Furthermore, in the following description, an image forming apparatus will be used as an example of a device equipped with a transport device as a target object, but the present invention is not limited thereto. Examples of the object include paper, film, cloth, and the like.

なお、画像形成装置としてはは、たとえば、カラープリンタ、モノクロプリンタ、ＦＡＸ、複合機（ＭＦＰ：Multi-Functional Peripheral）が挙げられる。 Note that examples of the image forming apparatus include a color printer, a monochrome printer, a FAX, and a multi-functional peripheral (MFP).

＜Ａ．機械学習＞
本実施の形態にかかる学習システムの具体例を説明する前に、本例で用いる機械学習の概要について、以下に簡単に説明する。 <A. Machine learning＞
Before describing a specific example of the learning system according to this embodiment, an overview of machine learning used in this example will be briefly described below.

機械学習の一例として、深層学習と、強化学習とが知られている。強化学習としては、たとえばＱ学習、ＴＤ学習が知られている。 Deep learning and reinforcement learning are known as examples of machine learning. For example, Q learning and TD learning are known as reinforcement learning.

また、深層学習と強化学習とを用いた機械学習は、「深層強化学習（ＤＱＮ：Deep Q-Network）」と称される。深層強化学習は、強化学習の行動価値関数の表現にディープラーニングを用いる手法である。深層強化学習は、強化学習の一種である。 Furthermore, machine learning using deep learning and reinforcement learning is called "deep reinforcement learning (DQN: Deep Q-Network)." Deep reinforcement learning is a method that uses deep learning to express the action value function of reinforcement learning. Deep reinforcement learning is a type of reinforcement learning.

本実施の形態の例では、強化学習について説明する。特に、Ｑ学習（より詳しくは、行動価値関数の一種であるＱテーブルを用いた学習）を例に挙げて説明する。なお、強化学習は、Ｑテーブルを用いる構成に限定されず、深層強化学習であってもよい。また、状態に基づき行動を選択し、選択された行動に基づき報酬を付与する学習であれば、学習方法は、特に限定されるものではない。 In this embodiment, reinforcement learning will be explained. In particular, Q learning (more specifically, learning using a Q table, which is a type of action value function) will be explained as an example. Note that reinforcement learning is not limited to a configuration using a Q table, and may be deep reinforcement learning. Furthermore, the learning method is not particularly limited as long as it is learning in which an action is selected based on the state and a reward is given based on the selected action.

図１は、強化学習の概要を説明するための模式図である。
図１を参照して、エージェントは、あるタイミングにおいて、環境の状態ｓを観測する。次に、エージェントは、状態ｓと方策πとに基づいて、行動ａを決定する。方策πは、行動を決定するためのルールである。方策πを最適化することは、行動選択を最適化することになる。次に、エージェントは、行動ａに基づき、環境から報酬ｒを得る。 FIG. 1 is a schematic diagram for explaining an overview of reinforcement learning.
Referring to FIG. 1, an agent observes the state s of the environment at a certain timing. Next, the agent determines action a based on state s and policy π. Policy π is a rule for determining behavior. Optimizing policy π amounts to optimizing action selection. Next, the agent obtains a reward r from the environment based on action a.

さらに、エージェントは、最終状態に至るまで、状態ｓの取得と、方策πに基づいた行動ａの決定（選択）と、決定された行動ａに基づく報酬ｒの取得とを繰り返す。すなわち、エージェントは、行動選択（詳しくは、パラメータ）が最適化されるまで、上述した処理を繰り返す。 Furthermore, the agent repeats the acquisition of the state s, the determination (selection) of the action a based on the policy π, and the acquisition of the reward r based on the determined action a until the agent reaches the final state. That is, the agent repeats the above-described process until the behavior selection (specifically, the parameters) is optimized.

強化学習では、典型的には、以下の式（１）で表される状態価値関数Ｖ（ｓ）が利用される。 Reinforcement learning typically uses a state value function V(s) expressed by the following equation (1).

Ｖ（ｓ）＝Σ π（ａ｜ｓ）Ｑ（ｓ，ａ） … （１）
詳しくは、式（１）は、ａについて全ての項の和をとることを示す式である。式（１）において、π（ａ｜ｓ）は、確率的方策を表している。π（ａ｜ｓ）は、状態ｓにおいて行動ａを選択する確率を表している。一方、Ｑ（ｓ，ａ）は、行動価値関数を表している。 V(s)=Σ π(a|s)Q(s,a)...(1)
Specifically, equation (1) is an equation that indicates the sum of all terms for a. In equation (1), π(a|s) represents a stochastic policy. π(a|s) represents the probability of selecting action a in state s. On the other hand, Q(s, a) represents an action value function.

強化学習のうち、行動価値関数Ｑ（ｓ，ａ）に着目する方式（価値ベース（価値反復））がＱ学習と称されている。また、強化学習のうち、確率的方策π（ａ｜ｓ）に着目する方式（方策ベース（方策反復））は、方策勾配法と称されている。 Among reinforcement learning, a method (value-based (value repetition)) that focuses on the behavior value function Q(s, a) is called Q-learning. Further, among reinforcement learning, a method (policy-based (policy repetition)) that focuses on a stochastic policy π(a|s) is called a policy gradient method.

Ｑ学習では、エージェントが行動ａを選択する度に、以下の式（２）にしたがって、行動価値関数Ｑ（ｓ，ａ）が更新される。 In Q-learning, each time the agent selects action a, the action value function Q(s, a) is updated according to the following equation (2).

Ｑ（ｓ_ｔ，ａ_ｔ）←Ｑ（ｓ_ｔ，ａ_ｔ）＋α｛ｒ_ｔ＋１＋γｍａｘ_ａＱ（ｓ_ｔ＋１，ａ_ｔ＋１）－Ｑ（ｓ_ｔ，ａ_ｔ）｝ … （２）
Ｑ（ｓ_ｔ，ａ_ｔ）は状態ｓ_ｔにおいて行動ａ_ｔを行うことにより得られる報酬の期待値を表している。ｒ_ｔ＋１は、時刻ｔ＋１で行動ａｔに対し与えられる即時の報酬である。αは、学習のスピードを決める学習率（０＜α＜１）である。γは、割引率（０＜γ＜１）である。なお、割引率は、Ｑテーブルが発散しないようにするためのものである。 Q(s _t , a _t )←Q(s _t , _at )+α{r _t+1 +γmax _a Q(s _t+1 ,a _t+1 )−Q(s _t , _at )} … (2)
Q(s _t , a _t ) represents the expected value of reward obtained by performing action a _t in state s _t . r _t+1 is the immediate reward given for action at at time t+1. α is a learning rate (0<α<1) that determines the speed of learning. γ is a discount rate (0<γ<1). Note that the discount rate is used to prevent the Q table from diverging.

＜Ｂ．システム構成＞
図２は、本実施の形態に係る学習システム１０００を表した図である。 <B. System configuration>
FIG. 2 is a diagram showing a learning system 1000 according to this embodiment.

図２を参照して、学習システム１０００は、ユーザーが利用する（ユーザー側、エッジ側）の画像形成装置１と、クラウド側の情報処理装置とで構成される。学習システム１０００では、クラウド上でＱ学習が実行される。詳しくは、画像形成装置１は、クラウドに対して、画像形成装置１にて検出されたデーターをアップロードする。これにより、画像形成装置１の状態をクラウド上で再現し、クラウド上でＱ学習を実行する。より詳しくは、以下のとおりである。 Referring to FIG. 2, learning system 1000 includes an image forming apparatus 1 used by a user (on the user side, edge side) and an information processing apparatus on the cloud side. In the learning system 1000, Q learning is executed on the cloud. Specifically, the image forming apparatus 1 uploads data detected by the image forming apparatus 1 to the cloud. Thereby, the state of the image forming apparatus 1 is reproduced on the cloud, and Q learning is executed on the cloud. More details are as follows.

クラウド上では、エミュレーターとして機能と、シミュレーターとして機能と、強化学習を実行する機能（ＡＩ機能）とが、１つ以上の情報処理装置によって実行される。たとえば、エミュレーターとして機能と、シミュレーターとして機能と、強化学習を実行する機能とが別々の情報処理装置によって実行される。これに限定されず、エミュレーターとして機能と、シミュレーターとして機能とが１つの情報処理装置で実行され、強化学習が別の情報処理装置で実行されてもよい。また、シミュレーターとしての機能と、強化学習を実行する機能とが、同じ情報処理装置で実行されてもよい。 On the cloud, one or more information processing devices perform an emulator function, a simulator function, and a reinforcement learning function (AI function). For example, a function as an emulator, a function as a simulator, and a function to perform reinforcement learning are performed by separate information processing devices. The present invention is not limited to this, and the functions as an emulator and a simulator may be executed by one information processing device, and reinforcement learning may be executed by another information processing device. Further, the function as a simulator and the function to perform reinforcement learning may be performed by the same information processing device.

シミュレーターは、メカニカルシミュレータである。シミュレーターは、外から見た画像形成装置１の動作を再現する。 The simulator is a mechanical simulator. The simulator reproduces the operation of the image forming apparatus 1 as seen from the outside.

シミュレーターは、画像形成装置１に相当するシミュレーションモデルによって、画像形成装置１の動作をシミュレートする。製造メーカーは、ユーザーからの画像形成装置１に関する個別要求（カスタマイズ要求）を受け付けると、当該個別要求に基づいたシミュレーションモデルを生成する。 The simulator simulates the operation of the image forming apparatus 1 using a simulation model corresponding to the image forming apparatus 1. Upon receiving an individual request (customization request) regarding the image forming apparatus 1 from a user, the manufacturer generates a simulation model based on the individual request.

より詳しくは、シミュレーションモデルは、エミュレーターによって提供される部品モデルを利用して生成される。 More specifically, the simulation model is generated using a component model provided by an emulator.

エミュレーターは、定着装置のエミュレーター部品、プロセスのエミュレーター部品、搬送装置のエミュレーター部品等のモデルを含む。エミュレーター、画像形成装置１の中身の動作まで再現する。各部品のモデルは、画像形成装置１の各機器状態に応じて更新される。 The emulator includes models such as fixing device emulator parts, process emulator parts, and conveyance device emulator parts. The emulator reproduces even the internal operations of the image forming apparatus 1. The model of each component is updated according to the status of each device of the image forming apparatus 1.

学習システム１０００では、シミュレーターと、強化学習を実行する情報処理装置（以下、「学習装置」とも称する）とが協働して、画像形成装置１の搬送装置を駆動する駆動手段（典型的には、モーター）の各種のパラメーターを決定する。詳しくは、本例では、学習装置が、行動価値関数Ｑ（ｓ，ａ）の一例であるＱテーブル内のパラメーターを学習により決定する。 In the learning system 1000, a simulator and an information processing device (hereinafter also referred to as a “learning device”) that executes reinforcement learning cooperate with each other, and a driving unit (typically, , motor). Specifically, in this example, the learning device determines the parameters in the Q table, which is an example of the action value function Q(s, a), through learning.

より詳しくは、Ｑ学習に先立ち、実機である画像形成装置１は、画像形成装置１内で検出された各種のデータ（「センシングデータ」とも称する）をクラウドにアップロードする。クラウド側では、当該センシングデーターを利用してエミュレーター内の各部品を示すモデルが更新される。詳しくは、モデルのパラメーターが更新される。また、センシングデーター利用して、シミュレーターの各種の設定値が更新される。これにより、シミュレーターが、実機である画像形成装置１の状態をより反映したものとなる。なお、シミュレーターは、エミュレーターで定義された各種の部品（パラメーターがセンシングデーターによって更新された部品）を含んでいる。 More specifically, prior to Q-learning, the image forming apparatus 1, which is a real device, uploads various data detected within the image forming apparatus 1 (also referred to as "sensing data") to the cloud. On the cloud side, the sensing data is used to update the model representing each part in the emulator. More specifically, the model parameters are updated. Additionally, various settings of the simulator are updated using the sensing data. This allows the simulator to better reflect the state of the actual image forming apparatus 1. Note that the simulator includes various parts defined in the emulator (parts whose parameters have been updated using sensing data).

Ｑ学習が終了すると、決定されたパラメーターに基づいて、画像形成装置１用のファームウェアを更新するためのプログラムが、クラウド上の情報処理装置にて生成される。生成された更新用のプログラムは、画像形成装置１に送られる。 When the Q learning is completed, a program for updating the firmware for the image forming apparatus 1 is generated by the information processing apparatus on the cloud based on the determined parameters. The generated update program is sent to the image forming apparatus 1.

画像形成装置１では、更新用のプログラムにより、画像形成装置１の制御装置内のファームウェアが更新される。 In the image forming apparatus 1, the firmware in the control device of the image forming apparatus 1 is updated by the update program.

＜Ｃ．画像形成装置のハードウェア構成＞
（ｃ１．内部構造）
図３は、画像形成装置１の内部構造を示す概略図である。図３を参照して、画像形成装置１は、上述したように、本体部１０と、後処理装置２０とを備えている。 <C. Hardware configuration of image forming apparatus>
(c1. Internal structure)
FIG. 3 is a schematic diagram showing the internal structure of the image forming apparatus 1. As shown in FIG. Referring to FIG. 3, the image forming apparatus 1 includes the main body section 10 and the post-processing device 20, as described above.

本体部１０は、画像形成ユニット１１と、スキャナーユニット１２と、自動原稿搬送ユニット１３と、２つの給紙カセット１４と、搬送路１５と、メディアセンサー１６と、反転搬送路１７と、操作パネル３４と、給紙ローラー１１３とを備えている。なお、自動原稿搬送ユニットは、ＡＤＦ（auto document feeder）とも称される。 The main body 10 includes an image forming unit 11, a scanner unit 12, an automatic document transport unit 13, two paper feed cassettes 14, a transport path 15, a media sensor 16, a reverse transport path 17, and an operation panel 34. and a paper feed roller 113. Note that the automatic document feeder unit is also called an ADF (auto document feeder).

本体部１０は、画像形成装置１の動作を制御するコントローラー３１をさらに備えている。なお、本例では、本体部１０は、いわゆるタンデム方式のカラープリンタである。本体部１０は、印刷設定に基づいて画像形成を実行する。 The main body 10 further includes a controller 31 that controls the operation of the image forming apparatus 1 . In this example, the main body 10 is a so-called tandem color printer. The main body section 10 executes image formation based on the print settings.

自動原稿搬送ユニット１３は、原稿台上に載置された原稿を、原稿読取部の読取位置に自動的に搬送する。スキャナーユニット１２は、自動原稿搬送ユニット１３により搬送された原稿の画像を読み取り、画像データーを生成する。また、スキャナーユニット１２は、自動原稿搬送ユニット１３を用いずにユーザーがプラテン上に置いた原稿の画像も読み取り、画像データーを生成する。スキャナーユニット１２によって取得された原稿の画像データーは、メモリ（典型的には、図４に示す固定記憶装置３２）に記憶される。 The automatic document transport unit 13 automatically transports the document placed on the document table to the reading position of the document reading section. The scanner unit 12 reads the image of the document conveyed by the automatic document conveyance unit 13 and generates image data. The scanner unit 12 also reads an image of a document placed on a platen by a user without using the automatic document conveyance unit 13, and generates image data. The image data of the document acquired by the scanner unit 12 is stored in a memory (typically, the fixed storage device 32 shown in FIG. 4).

給紙カセット１４には、用紙Ｐ等のシートが収容される。給紙ローラー１１３は、図３の例の場合には用紙Ｐを搬送路１５に沿って上方へ送る。給紙カセット１４は、底上げ板１４２と、センサー１４３とを備える。センサー１４３は、給紙カセット内の規制板（図示せず）位置を検知し、かつ用紙のサイズを検知する。なお、用紙以外のシートとしては、たとえば、封筒、ＯＨＰ（Overhead projector）フィルム、布が挙げられる。 The paper feed cassette 14 stores sheets such as paper P. In the example shown in FIG. 3, the paper feed roller 113 feeds the paper P upward along the conveyance path 15. The paper feed cassette 14 includes a bottom raising plate 142 and a sensor 143. The sensor 143 detects the position of a regulating plate (not shown) in the paper feed cassette, and also detects the size of the paper. Note that sheets other than paper include, for example, envelopes, OHP (overhead projector) films, and cloth.

搬送路１５は、片面印刷および両面印刷のときに使用される。反転搬送路１７は、両面印刷のときに使用される。 The conveyance path 15 is used during single-sided printing and double-sided printing. The reversing conveyance path 17 is used during double-sided printing.

画像形成ユニット１１は、スキャナーユニット１２が生成した画像データー、または、外部の装置から取得した印刷データーに基づいて、給紙カセット１４により供給される用紙Ｐに対し画像形成を行なう。 The image forming unit 11 forms an image on the paper P fed by the paper feed cassette 14 based on image data generated by the scanner unit 12 or print data acquired from an external device.

画像形成ユニット１１は、中間転写ベルト１０１と、テンションローラー１０２と、駆動ローラー１０３と、イエローの画像形成部１０４Ｙと、マゼンタの画像形成部１０４Ｍと、シアンの画像形成部１０４Ｃ，ブラックの画像形成部１０４Ｋと、画像濃度センサー１０５と、１次転写装置１１１と、２次転写装置１１５と、レジストローラー対１１６と、加熱ローラー１２１と加圧ローラー１２２とからなる定着装置１２０とを有している。テンションローラー１０２と駆動ローラー１０３とで、中間転写ベルト１０１を保持し、かつ図のＡ方向に中間転写ベルト１０１を回転駆動させる。レジストローラー対１１６は、給紙ローラー１１３によって搬送された用紙Ｐをさらに下流に搬送する。 The image forming unit 11 includes an intermediate transfer belt 101, a tension roller 102, a drive roller 103, a yellow image forming section 104Y, a magenta image forming section 104M, a cyan image forming section 104C, and a black image forming section. 104K, an image density sensor 105, a primary transfer device 111, a secondary transfer device 115, a registration roller pair 116, and a fixing device 120 including a heating roller 121 and a pressure roller 122. The tension roller 102 and the drive roller 103 hold the intermediate transfer belt 101 and rotate the intermediate transfer belt 101 in the direction A in the figure. The registration roller pair 116 transports the paper P transported by the paper feed roller 113 further downstream.

メディアセンサー１６は、搬送路１５に設置される。メディアセンサー１６によって、紙種自動検出機能（用紙の種類を自動検出する機能）が実現される。メディアセンサー１６は、給紙ローラー１１３と、レジストローラー対１１６との間に設置されている。 The media sensor 16 is installed on the conveyance path 15. The media sensor 16 realizes a paper type automatic detection function (a function that automatically detects the paper type). Media sensor 16 is installed between paper feed roller 113 and registration roller pair 116.

メディアセンサー１６は、たとえば、用紙に光を照射する発光部と、用紙で反射した反射光を受光する受光部とを有する光学式のセンサーである。メディアセンサー１６としての光学式のセンサーは、受光した光の電圧値から、紙の坪量を判定する。メディアセンサー１６として、用紙の厚さを検出する変位センサー、用紙の含水量を検出する静電容量センサー、用紙の表面性を撮像するカメラ、超音波センサー等の用紙の特性を検出するものが該当する。典型的には、給紙カセット１４に用紙がセットされた後、最初の１枚目の用紙が給紙カセット１４から給紙されたときに、メディアセンサー１６によって用紙の種類が判別される。 The media sensor 16 is, for example, an optical sensor that includes a light emitting section that irradiates light onto a sheet of paper and a light receiving section that receives light reflected by the sheet of paper. An optical sensor serving as the media sensor 16 determines the basis weight of paper from the voltage value of the received light. The media sensor 16 includes a displacement sensor that detects the thickness of the paper, a capacitance sensor that detects the moisture content of the paper, a camera that captures the surface quality of the paper, an ultrasonic sensor that detects the characteristics of the paper, etc. do. Typically, after the paper is set in the paper feed cassette 14, the type of paper is determined by the media sensor 16 when the first sheet of paper is fed from the paper feed cassette 14.

メディアセンサー１６から坪量の情報がコントローラー３１に送られる。これにより、コントローラー３１は、用紙の種類を判定する。 Basis weight information is sent from the media sensor 16 to the controller 31. Thereby, the controller 31 determines the type of paper.

なお、後処理装置２０は、パンチ処理装置２２０と、平綴じ処理部２５０と、中綴じ処理部２６０と、排出トレイ２７１と、排出トレイ２７２と、下部の排出トレイ２７３とをさらに備える。 The post-processing device 20 further includes a punching device 220, a side stitching section 250, a saddle stitching section 260, an ejection tray 271, an ejection tray 272, and a lower ejection tray 273.

（ｃ２．ハードウェア構成）
図４は、画像形成装置１のハードウェア構成の一例を説明するためのブロック図である。 (c2. Hardware configuration)
FIG. 4 is a block diagram for explaining an example of the hardware configuration of the image forming apparatus 1. As shown in FIG.

図４を参照して、本体部１０は、コントローラー３１と、固定記憶装置３２と、短距離無線ＩＦ（Inter Face）３３と、スキャナーユニット１２と、操作パネル３４と、給紙カセット１４と、メディアセンサー１６と、画像形成ユニット１１と、プリンタコントローラー３５と、ネットワークＩＦ３６と、ワイヤレスＩＦ３７とを有する。コントローラー３１には、各部１１，１２，１４，１６，３２～３７がバス３０を介して接続されている。 Referring to FIG. 4, the main body 10 includes a controller 31, a fixed storage device 32, a short-range wireless IF (Inter Face) 33, a scanner unit 12, an operation panel 34, a paper feed cassette 14, and a media It has a sensor 16, an image forming unit 11, a printer controller 35, a network IF 36, and a wireless IF 37. Each section 11, 12, 14, 16, 32-37 is connected to the controller 31 via a bus 30.

コントローラー３１は、画像形成装置１の動作を制御する。コントローラー３１は、ＣＰＵ（Central Processing Unit）３１１と、制御プログラムの格納されたＲＯＭ（Read Only Memory）３１２と、作業用のＳ－ＲＡＭ（Static Random Access Memory）３１３と、画像形成に関わる各種の設定を記憶するバッテリバックアップされたＮＶ－ＲＡＭ（Non-Volatile RAM：不揮発性メモリ）３１４と、時計ＩＣ（Integrated Circuit）３１５とを有する。各部３１１～３１５は、バス３０を介して接続されている。また、コントローラー３１は、典型的には、制御基盤として本体部１０に内蔵される。 Controller 31 controls the operation of image forming apparatus 1 . The controller 31 includes a CPU (Central Processing Unit) 311, a ROM (Read Only Memory) 312 in which a control program is stored, an S-RAM (Static Random Access Memory) 313 for work, and various settings related to image formation. It has a battery-backed NV-RAM (Non-Volatile RAM: nonvolatile memory) 314 for storing the information, and a clock IC (Integrated Circuit) 315. Each section 311 to 315 is connected via a bus 30. Further, the controller 31 is typically built into the main body 10 as a control base.

操作パネル３４は、各種の入力を行うキー、および表示部を有する。操作パネル３４は、典型的には、タッチスクリーンと、ハードウェアキーとで構成される。なお、タッチスクリーンは、ディスプレイの上にタッチパネルが重畳されたデバイスである。 The operation panel 34 has keys for performing various inputs and a display section. The operation panel 34 typically includes a touch screen and hardware keys. Note that a touch screen is a device in which a touch panel is superimposed on a display.

ネットワークＩＦ３６は、ネットワークを介して接続されたＰＣ３、サーバー（図示せず）および他の画像形成装置（図示せず）をはじめとする外部装置との間で各種の情報を送受信する。 The network IF 36 transmits and receives various information to and from external devices connected via the network, including the PC 3, a server (not shown), and another image forming apparatus (not shown).

プリンタコントローラー３５は、ネットワークＩＦ３６により受信したプリントデータから複写画像を生成する。画像形成ユニット１１は、複写画像を用紙上に形成する。 The printer controller 35 generates a copy image from the print data received by the network IF 36. The image forming unit 11 forms a copy image on paper.

なお、固定記憶装置３２は、典型的には、ハードディスク装置である。固定記憶装置３２には、各種のデーターが記憶されている。なお、固定記憶装置３２は、フラッシュメモリであってもよい。 Note that the fixed storage device 32 is typically a hard disk device. The fixed storage device 32 stores various data. Note that the fixed storage device 32 may be a flash memory.

（ｃ３．搬送装置）
図５は、搬送装置の一部の構成を表した模式図である。なお、搬送装置とは、画像形成装置１内にて、搬送対象物である用紙Ｐを搬送するための機構である。搬送装置は、複数の搬送ユニットを含む。各搬送ユニットは、搬送手段としてのローラー対と、ローラー対を駆動する駆動手段としてのモーターとを含む。なお、ローラー対は、典型的には、駆動ローラーと、駆動ローラーの回転に従動して回転する従動ローラーとを含む。 (c3. Conveyance device)
FIG. 5 is a schematic diagram showing the configuration of a part of the transport device. Note that the conveyance device is a mechanism for conveying paper P, which is an object to be conveyed, within the image forming apparatus 1. The transport device includes a plurality of transport units. Each conveyance unit includes a pair of rollers as a conveyance means and a motor as a drive means for driving the pair of rollers. Note that the roller pair typically includes a drive roller and a driven roller that rotates as the drive roller rotates.

図５を参照して、搬送装置３９は、複数のローラー対を含む。たとえば、搬送装置３９は、給紙ローラー対と、タイミングローラー対と、定着ローラー対と、排紙ローラー対とを含む。 Referring to FIG. 5, conveyance device 39 includes a plurality of roller pairs. For example, the conveyance device 39 includes a pair of paper feed rollers, a pair of timing rollers, a pair of fixing rollers, and a pair of paper discharge rollers.

給紙ローラー対の駆動ローラーは、給紙クラッチによって移動可能に構成されている。給紙クラッチがオンの状態では、給紙ローラー対の駆動ローラーは従動ローラーに当接する。給紙クラッチがオフの状態では、給紙ローラー対の駆動ローラーは従動ローラーから離間する。 The drive rollers of the paper feed roller pair are configured to be movable by a paper feed clutch. When the paper feed clutch is on, the drive roller of the paper feed roller pair comes into contact with the driven roller. When the paper feed clutch is off, the drive roller of the paper feed roller pair is separated from the driven roller.

タイミングローラー対については、タイミングクラッチがオンの状態で駆動ローラーが回転し、タイミングクラッチがオフの状態で駆動ローラーが停止する。 Regarding the timing roller pair, the drive roller rotates when the timing clutch is on, and the drive roller stops when the timing clutch is off.

定着ローラー対は、定着クラッチ（図示せず）によって移動可能に構成されている。定着クラッチがオンの状態では、定着ローラー対の駆動ローラーは従動ローラーに当接する。定着クラッチがオフの状態では、定着ローラー対の駆動ローラーは従動ローラーから離間する。 The fixing roller pair is configured to be movable by a fixing clutch (not shown). When the fixing clutch is on, the driving roller of the fixing roller pair comes into contact with the driven roller. When the fixing clutch is off, the driving roller of the fixing roller pair is separated from the driven roller.

排紙ローラー対は、排紙クラッチ（図示せず）によって移動可能に構成されている。排紙クラッチがオンの状態では、排紙ローラー対の駆動ローラーは従動ローラーに当接する。排紙クラッチがオフの状態では、排紙ローラー対の駆動ローラーは従動ローラーから離間する。 The paper ejection roller pair is configured to be movable by a paper ejection clutch (not shown). When the paper ejection clutch is on, the drive roller of the paper ejection roller pair comes into contact with the driven roller. When the paper ejection clutch is off, the driving roller of the paper ejection roller pair is separated from the driven roller.

給紙ローラーおよびタイミングローラーは、メインモーターによって回転駆動する。定着ローラーは、定着モーターによって回転駆動する。排紙ローラーは、排紙モーターによって回転駆動する。各モーターの回転速度等は、コントローラー３１から指示される。 The paper feed roller and timing roller are rotationally driven by a main motor. The fixing roller is rotationally driven by a fixing motor. The paper ejection roller is rotationally driven by a paper ejection motor. The rotational speed and the like of each motor are instructed by the controller 31.

搬送路１５には、用紙Ｐの位置を検出するための複数のセンサー＃１，＃２，…＃２０が設置されている。なお、センサーの数を上記のような数としたのは、図面を簡略化するためであり、センサーの数は、これに限定されるものではない。ただし、隣り合うセンサー同士の距離が、連続して搬送される用紙Ｐ同士の間の距離（ピッチ）以下にすることが好ましい。用紙Ｐの通過を１枚毎に検出するためである。 A plurality of sensors #1, #2, . . . #20 for detecting the position of the paper P are installed on the conveyance path 15. Note that the number of sensors is set to the above number to simplify the drawing, and the number of sensors is not limited to this number. However, it is preferable that the distance between adjacent sensors be equal to or less than the distance (pitch) between sheets P that are continuously conveyed. This is to detect the passage of paper P one by one.

なお、センサーによる検出信号は、コントローラー３１に送られる。コントローラー３１は、各センサーからの出力に基づき、用紙Ｐの位置を判断する。 Note that a detection signal from the sensor is sent to the controller 31. The controller 31 determines the position of the paper P based on the output from each sensor.

このような搬送装置３９の動作およびセンシング結果の出力は、Ｑ学習のために、クラウド上のシミュレーターにても再現される。 The operation of the transport device 39 and the output of the sensing results are also reproduced in a simulator on the cloud for Q learning.

図６は、Ｑ学習で用いられるデーターを説明するための模式図である。
図６を参照して、Ｑ学習では、センサー＃１，＃２，…＃２０の出力（オンまたはオフ）を、状態ｓとする。また、Ｑ学習では、メインモーター、定着モーター、排紙モーターの速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値を、学習対象のパラメーターとする。また、Ｑ学習では、給紙クラッチおよびタイミングクラッチのオンまたはオフも学習対象とパラメーターとなる。 FIG. 6 is a schematic diagram for explaining data used in Q-learning.
Referring to FIG. 6, in Q learning, the outputs (on or off) of sensors #1, #2, . . . #20 are set to state s. In addition, in Q learning, the speeds, drive timings, stop timings, speed change timings, and drive current values of the main motor, fixing motor, and paper ejection motor are parameters to be learned. In addition, in Q learning, on/off of the paper feed clutch and the timing clutch are also learning targets and parameters.

行動ａは、「Ｑテーブルを参照して行動を決定（選択）し、決定された行動に対応付けられたパラメーター（詳しくは、パラメーターのセット）で各モーターを駆動すること」に該当する。また、行動ａにより、報酬ｒが得られる。 Action a corresponds to "determining (selecting) an action with reference to the Q table, and driving each motor with a parameter (specifically, a set of parameters) associated with the determined action." In addition, a reward r can be obtained by action a.

本例では、対象物である用紙Ｐの撓み量または引っ張り量を表す状態量（以下、「用紙状態量」と称する）に基づき、報酬ｒが付与される。用紙Ｐの区間（搬送路中の区間）に応じて、引っ張った状態であることが好ましかったり、撓みが許容されたり、撓みが許容されなかったりする。このため、搬送経路の複数の区間にて用紙状態量を取得し、用紙状態量に基づき報酬ｒを付与する。 In this example, a reward r is given based on a state quantity (hereinafter referred to as a "paper state quantity") representing the amount of deflection or tension of the paper P, which is the object. Depending on the section of the paper P (section on the conveyance path), it is preferable that the paper P be in a stretched state, that flexing is allowed, or that sagging is not allowed. For this reason, the paper state quantity is acquired in a plurality of sections of the conveyance path, and the reward r is given based on the paper state quantity.

上述したように、本例では画像形成装置１のシミュレーターを用いるため、搬送装置３９の構成もシミュレーター上で再現される。機械学習時は、センサーを用いた用紙Ｐの位置の検出も、シミュレーターにて行われる。 As described above, since the simulator of the image forming apparatus 1 is used in this example, the configuration of the transport device 39 is also reproduced on the simulator. During machine learning, detection of the position of paper P using a sensor is also performed in the simulator.

より詳しくは、最初は用紙状態量を実機である画像形成装置１から取得し、その後は、シミュレーターにて用紙状態量を算出する。シミュレーターにて算出された用紙状態量との各々に対して、報酬ｒの算出（付与）がなされる。 More specifically, first, the paper state quantity is obtained from the image forming apparatus 1, which is an actual machine, and then the paper state quantity is calculated by a simulator. A reward r is calculated (awarded) for each paper state quantity calculated by the simulator.

また、学習装置は、搬送路１５中の用紙Ｐの位置の代わりに、搬送装置３９による用紙Ｐの搬送速度、または搬送路１５中の用紙Ｐの位置に基づき、用紙状態量（撓み量または引っ張り量）を算出してもよい。なお、搬送速度とは、ローラーの回転速度（回転数）と、ローラーの半径との積として表される。 Furthermore, the learning device determines the amount of paper condition (the amount of deflection or tension) based on the transport speed of the paper P by the transport device 39 or the position of the paper P in the transport path 15 instead of the position of the paper P in the transport path 15. amount) may be calculated. Note that the conveyance speed is expressed as the product of the rotation speed (rotation speed) of the roller and the radius of the roller.

図７は、搬送装置３９の変形例を説明するための模式図である。
図７を参照して、搬送装置３９Ａは、図５に示した搬送装置３９の複数のセンサーの一部を仮想センサーに置き換えた図である。 FIG. 7 is a schematic diagram for explaining a modification of the transport device 39.
Referring to FIG. 7, a transport device 39A is a diagram in which some of the plurality of sensors of the transport device 39 shown in FIG. 5 are replaced with virtual sensors.

実機である画像形成装置１においては、多数のセンサーを搬送路１５に配置することは、コストがかかりすぎる。そこで、センサーを仮想化する。仮想化したセンサーに用紙Ｐが到達したか否かは、用紙Ｐの位置と搬送速度とに基づき判断する。用紙Ｐの位置を逐次更新していくことで、センサー位置への用紙Ｐの到達を判断する。 In the image forming apparatus 1, which is an actual machine, it would be too costly to arrange a large number of sensors on the conveyance path 15. Therefore, sensors are virtualized. Whether or not the paper P has reached the virtualized sensor is determined based on the position of the paper P and the transport speed. By sequentially updating the position of the paper P, it is determined whether the paper P has reached the sensor position.

＜Ｄ．ハードウェア構成＞
図８は、学習装置５００のハードウェア構成の典型例を表した図である。図８を参照して、学習装置５００は、主たる構成要素として、プログラムを実行するＣＰＵ５８１と、ＣＰＵ５８１によるプログラムの実行により生成されたデーター、又は入力装置を介して入力されたデーターを揮発的に格納するＲＡＭ５８２と、データーを不揮発的に格納するＲＯＭ５８３と、データーを不揮発的に格納するＨＤＤ５８４と、ディスプレイ５８５と、操作キー５８６と、通信ＩＦ（Interface）５８７と、電源回路５８８とを含む。各構成要素は、相互にデーターバスによって接続されている。 <D. Hardware configuration>
FIG. 8 is a diagram showing a typical example of the hardware configuration of the learning device 500. Referring to FIG. 8, learning device 500 includes, as main components, a CPU 581 that executes a program, and a device that volatilely stores data generated by executing the program by CPU 581 or data input via an input device. A ROM 583 that stores data in a non-volatile manner, an HDD 584 that stores data in a non-volatile manner, a display 585, operation keys 586, a communication IF (Interface) 587, and a power supply circuit 588. Each component is interconnected by a data bus.

電源回路５８８は、コンセントを介して受信した商用電源の電圧を降圧し、学習装置５００の各部に電源供給を行なう回路である。 The power supply circuit 588 is a circuit that steps down the voltage of the commercial power supply received via the outlet and supplies power to each part of the learning device 500.

通信ＩＦ５８７は、他の情報処理装置（クラウド上の機器、またはエッジ側の機器）の機器との間の通信を行なためのインターフェイスである。 The communication IF 587 is an interface for communicating with other information processing devices (devices on the cloud or devices on the edge side).

操作キー５８６は、学習装置５００のユーザーが学習装置５００へデーターを入力するための用いるキー（キーボード）である。 The operation keys 586 are keys (keyboard) used by the user of the learning device 500 to input data to the learning device 500.

学習装置５００における処理は、各ハードウェアおよびＣＰＵ５８１により実行されるソフトウェアによって実現される。 Processing in the learning device 500 is realized by each piece of hardware and software executed by the CPU 581.

同図に示される学習装置５００を構成する各構成要素は、一般的なものである。したがって、本発明の本質的な部分は、ＲＡＭ５８２、ＨＤＤ５８４、記憶媒体に格納されたソフトウェア、あるいはネットワークを介してダウンロード可能なソフトウェアであるともいえる。なお、学習装置５００の各ハードウェアの動作は周知であるので、詳細な説明は繰り返さない。 Each component constituting the learning device 500 shown in the figure is common. Therefore, it can be said that the essential part of the present invention is software stored in the RAM 582, HDD 584, a storage medium, or software that can be downloaded via a network. Note that since the operation of each hardware of the learning device 500 is well known, detailed explanation will not be repeated.

また、シミュレーター８００も学習装置５００と同様のハードウェア構成を有する。したがって、ここでは、シミュレーター８００のハードウェア構成については、繰り返し説明しない。 Further, the simulator 800 also has the same hardware configuration as the learning device 500. Therefore, the hardware configuration of the simulator 800 will not be repeatedly described here.

＜Ｅ．Ｑテーブル＞
図９は、Ｑテーブルの概要を説明するための模式図である。 <E. Q table>
FIG. 9 is a schematic diagram for explaining the outline of the Q table.

図９を参照して、Ｑテーブル５３５には、各センサー＃１～＃２０のオンとオフとの全ての組み合わせ（２^２０個の組み合わせ）に対して、状態番号＃１～＃１０４８５７６が対応付けられている。 Referring to FIG. 9, in the Q table 535, state numbers #1 to #1048576 are associated with all combinations ( ²²⁰ combinations) of ON and OFF of each sensor #1 to #20. It is being

行動ａは、８つのグループの行動ａ１～ａ８に大別される。行動ａ１～ａ８は、給紙クラッチのオンおよびオフと、タイミングクラッチのオンおよびオフと、排紙クラッチのオンおよびオフとの８（＝２^３）つの組み合わせに基づき規定されている。たとえば、行動ａ１は、給紙クラッチと、タイミングクラッチと、排紙クラッチとの各々がオフである場合を表している。 Action a is roughly divided into eight groups of actions a1 to a8. Actions a1 to a8 are defined based on eight (=2 ³ ) combinations of paper feed clutch on and off, timing clutch on and off, and paper discharge clutch on and off. For example, action a1 represents a case in which each of the paper feed clutch, timing clutch, and paper discharge clutch is off.

ただし、搬送装置３９には、上記のクラッチ以外のクラッチも存在する。それゆえ、実際には、行動ａは、さらに多くの数の行動に大別され得る。 However, the conveyance device 39 also includes clutches other than the above-mentioned clutches. Therefore, in reality, action a can be roughly divided into a larger number of actions.

行動ａ１は、典型的には、複数の行動ａ１＿１～ａ１＿ｎで構成されている。なお、ｎは、２以上の自然数である。同様に、行動ａ２も、複数の行動ａ２＿１～ａ２＿ｎで構成されている他の各行動ａ３～ａ８についても、同様に、ｎ個の行動を含む。 Action a1 typically includes a plurality of actions a1_1 to a1_n. Note that n is a natural number of 2 or more. Similarly, the action a2 includes n actions as well as each of the other actions a3 to a8, which are composed of a plurality of actions a2_1 to a2_n.

たとえば行動ａ２における行動ａ２＿１～ａ２＿ｎは、少なくとも、クラッチがオン状態の搬送用のローラー対およびクラッチが存在しない搬送用のローラー対の各々の回転速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値の組み合わせに対応している。 For example, actions a2_1 to a2_n in action a2 include at least the rotational speed, drive timing, stop timing, and shift timing of each of the conveying roller pair with the clutch in the on state and the conveying roller pair without the clutch, and drive current values.

行動ａ２について例を挙げて説明すると、以下のとおりである。仮に、行動ａ２において、クラッチがオン状態の搬送用のローラー対が２つあり、クラッチが存在しない搬送用のローラー対が１つあったとする。なお、クラッチがオン状態の搬送用のローラー対の１つは、給紙クラッチである。この場合、３つの駆動手段（モーター）が駆動される。このため、３つのモーターの駆動パラメーターに基づき、行動ａ２が分類される。 Action a2 will be explained below using an example. Assume that in action a2, there are two pairs of conveying rollers with clutches in the ON state, and one pair of conveying rollers with no clutch. Note that one of the conveyance roller pairs with the clutch in the on state is the paper feed clutch. In this case, three drive means (motors) are driven. Therefore, action a2 is classified based on the drive parameters of the three motors.

１つのモーターについて、少なくとも、回転速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値の５個のパラメーターがある。仮に各パラメーターの設定区分が１０個あるとすると、１つのモーターについて、１０００００（＝１０^５）とおりの組み合わせが存在することになる。 There are at least five parameters for one motor: rotational speed, drive timing, stop timing, speed change timing, and drive current value. Assuming that there are 10 setting categories for each parameter, there will be 100,000 (=10 ⁵ ) combinations for one motor.

よって、他の２つのモーターのパラメーターの設定区分も上記と同様とすると、３つのモーターでは、１０００００×１０００００×１００００（＝１０^１５）とおりの組み合わせが存在することになる。この場合、行動ａ２＿ｎの値は、１０^１５となる。 Therefore, assuming that the parameter settings for the other two motors are the same as above, there are 100,000×100,000×10,000 (=10 ¹⁵ ) combinations of the three motors. In this case, the value of action a2_n is 10 ¹⁵ .

各パラメーターの設定区分は、基準値に対して、たとえば、±５，±１０，…といったように設定可能である。なお、プラスマイナスで示した値は、例示であって、これに限定されるものではない。 The setting classification of each parameter can be set, for example, ±5, ±10, . . . with respect to the reference value. Note that the values indicated by plus or minus are merely examples, and the values are not limited thereto.

以上のように、Ｑテーブル５３５では、各行動の価値が、各行動に対応付けて格納されている。詳しくは、Ｑテーブル５３５では、モーターを駆動するパラメーター（設定パラメーター）のセットの価値が、セット毎に示されている。Ｑテーブル５３５では、１つのセットの価値が１つの数値として表されている。 As described above, in the Q table 535, the value of each action is stored in association with each action. Specifically, in the Q table 535, the value of a set of parameters (setting parameters) for driving the motor is shown for each set. In the Q table 535, the value of one set is represented as one numerical value.

なお、Ｑテーブルの行動ａの各数値の初期値は、学習開始前に予め設定されている。
次に、図９に示したＱテーブルを利用した行動ａの選択について説明する。つまり、Ｑテーブルを利用したパラメーターの設定について説明する。さらに、Ｑテーブル内の数値（すなわち、パラメーター）の更新についても説明する。 Note that the initial value of each numerical value of behavior a in the Q table is set in advance before learning starts.
Next, the selection of action a using the Q table shown in FIG. 9 will be explained. In other words, parameter settings using the Q table will be explained. Furthermore, updating of numerical values (ie, parameters) in the Q table will also be explained.

たとえば、用紙Ｐが搬送され、状態ｓが状態＃２となったとする。この場合、学習装置５００は、状態＃２に対応する行動ａ１＿１～ａ１＿ｎ，ａ２＿１～ａ２＿ｎ，…，ａ８＿１～ａ８＿ｎのうちから、価値の最も高い行動ａを選択する。具体的には、学習装置５００は、状態＃２に対応する行動ａ１＿１～ａ１＿ｎ，ａ２＿１～ａ２＿ｎ，…，ａ８＿１～ａ８＿ｎのうちから、最も高い数値を選択する。 For example, assume that paper P is transported and state s becomes state #2. In this case, the learning device 500 selects the action a with the highest value from among the actions a1_1 to a1_n, a2_1 to a2_n, . . . , a8_1 to a8_n corresponding to state #2. Specifically, the learning device 500 selects the highest numerical value from among the actions a1_1 to a1_n, a2_1 to a2_n, . . . , a8_1 to a8_n corresponding to state #2.

上記の例のように行動ａ２＿ｎの値が１０^１５である場合には、１０^１５個の中から数値の最も高い行動ａを選択する。ただし、学習装置５００は、最も数値が高い行動ａを常に選択するのではなく、ε－ｇｒｅｅｄｙ法を用いて他の数値の行動ａを選択する。 If the value of action a2_n is 10 ¹⁵ as in the above example, action a with the highest value is selected from among 10 ¹⁵ . However, the learning device 500 does not always select the action a with the highest numerical value, but selects the action a with other numerical values using the ε-greedy method.

学習装置５００は、行動ａが選択されると、選択された行動ａ（たとえば、行動ａ３＿５１）として規定された各種のパラメーターの数値（回転速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値）を用いて、シミュレーター内で搬送装置を駆動させる。 When action a is selected, learning device 500 determines the numerical values of various parameters (rotation speed, drive timing, stop timing, shift timing, and drive current value) to drive the transport device within the simulator.

学習装置５００は、状態ｓ_ｔにおいて選択された行動ａ（以下、行動ａ_ｔ）に基づき、報酬ｒ_ｔ＋１を付与する。具体的には、行動ａ_ｔの結果（次の状態ｓ_ｔ＋１）に基づき検出されるセンサーの出力に基づき、用紙Ｐの撓み量まはた引っ張り量を表す用紙状態量を算出する。 The learning device 500 gives a reward r _t+1 based on the action a (hereinafter referred to as action a _t ) selected in the state s _t . Specifically, a paper state quantity representing the amount of deflection or tension of the paper P is calculated based on the output of the sensor detected based on the result of the action a _t (next state s _t+1 ).

学習装置５００は、算出された用紙状態量に基づき、正または負の報酬を付与する。具体的には、たとえば行動ａ３＿５１が選択されていた場合、学習装置５００は、行動ａ３＿５１の数値（表の数値）を、用紙状態量に基づき決定された報酬を加算（報酬がマイナスのときは減算）することにより更新する。 The learning device 500 gives a positive or negative reward based on the calculated paper state amount. Specifically, for example, if action a3_51 is selected, the learning device 500 adds the reward determined based on the paper state quantity to the numerical value of action a3_51 (the numerical value in the table) (or subtracts the value when the reward is negative). ) to update.

ところで、搬送路１５の区間に応じて、撓みが許される区間もあれば、撓みが許されない区間もある。また、搬送路１５には、用紙Ｐが引っ張り状態にあることが好ましい区間もある。そこで、学習装置５００は、搬送路の区間を考慮し、報酬を付与する。報酬の付与の例いついては後述する。 By the way, depending on the section of the conveyance path 15, there are sections where deflection is allowed, and there are sections where deflection is not allowed. Furthermore, there are sections of the conveyance path 15 in which it is preferable for the paper P to be in a stretched state. Therefore, the learning device 500 gives a reward in consideration of the section of the conveyance path. An example of giving compensation will be described later.

学習装置５００は、たとえば、用紙Ｐの位置に基づき算出された長さよりも用紙Ｐの基準長さが長い場合には、当該算出された長さと当該基準長さとの差分を撓み量として取得する。また、学習装置５００は、用紙Ｐの基準長さよりも用紙Ｐの位置に基づき算出された長さが長い場合には、当該算出された長さと当該基準長さとの差分を引っ張り量として取得する。 For example, if the reference length of the paper P is longer than the length calculated based on the position of the paper P, the learning device 500 obtains the difference between the calculated length and the reference length as the amount of deflection. Further, if the length calculated based on the position of the paper P is longer than the reference length of the paper P, the learning device 500 obtains the difference between the calculated length and the reference length as the amount of tension.

また、学習装置５００は、用紙Ｐの位置に基づき算出された長さと、用紙Ｐの基準長さとの差分がなく、かつローラーの搬送速度が当該ローラーよりも１つ上流側のローラーの搬送速度以上である場合に、用紙Ｐが引っ張られた状態にあると判断する。 Further, the learning device 500 is configured such that there is no difference between the length calculated based on the position of the paper P and the reference length of the paper P, and the conveyance speed of the roller is equal to or higher than the conveyance speed of the roller one position upstream of the roller. If this is the case, it is determined that the paper P is in a stretched state.

なお、学習装置５００は、用紙Ｐの撓み量を、たとえば、エミュレーター内のカメラ部品モデル（光学式のセンサ部品モデル）による撮像結果に基づき算出してもよい。あるいは、学習装置５００は、用紙Ｐの撓み量を、たとえば、エミュレーター内の撓み機械式のセンサー部品モデル（アクチュエーター部品モデル等）によって計測してもよい。 Note that the learning device 500 may calculate the amount of deflection of the paper P based on, for example, an imaging result by a camera component model (optical sensor component model) in an emulator. Alternatively, the learning device 500 may measure the amount of deflection of the paper P using, for example, a deflection mechanical sensor component model (actuator component model, etc.) within the emulator.

また、学習装置５００は、搬送装置のエミュレーター部品にて搬送装置にかかる負荷を検出することによって、用紙Ｐの引っ張り量を算出してもよい。あるいは、学習装置５００は、エミュレーター内のカメラ部品モデルによる撮像結果に基づき、用紙Ｐの引っ張り量を算出してもよい。 Further, the learning device 500 may calculate the amount of tension on the paper P by detecting the load applied to the conveyance device using an emulator component of the conveyance device. Alternatively, the learning device 500 may calculate the amount of tension on the paper P based on the imaging results obtained by the camera component model within the emulator.

なお、上述したように、学習装置５００は、実機である画像形成装置１からセンシングデーターを取得する。この際、用紙Ｐの位置を検出するセンサー（図５参照）からの出力に基づき、学習装置５００は、学習開始前に、画像形成装置１から、上述した手法により撓み量または引っ張り量を取得することができる。このような用紙状態量は、シミュレーターの設定時に反映される。 Note that, as described above, the learning device 500 acquires sensing data from the image forming device 1, which is an actual device. At this time, based on the output from the sensor (see FIG. 5) that detects the position of the paper P, the learning device 500 acquires the amount of deflection or tension from the image forming device 1 using the method described above before starting learning. be able to. Such paper state quantities are reflected when setting the simulator.

＜Ｆ．機能的構成＞
図１０は、画像形成装置１の機能的構成を説明するための機能ブロック図である。 <F. Functional configuration>
FIG. 10 is a functional block diagram for explaining the functional configuration of the image forming apparatus 1. As shown in FIG.

図１０を参照して、画像形成装置１は、コントローラー（制御部）３１と、ネットワークＩＦ（Interface）３６と、搬送装置３９とを備える。 Referring to FIG. 10, image forming apparatus 1 includes a controller (control unit) 31, a network IF (Interface) 36, and a transport device 39.

コントローラー３１は、ファームウェアを記憶している。コントローラー３１は、ファームウェア等を用いて、画像形成装置１の全体的な動作を制御する。たとえば、コントローラー３１は、モーターの回転速度を制御する。 The controller 31 stores firmware. The controller 31 controls the overall operation of the image forming apparatus 1 using firmware or the like. For example, controller 31 controls the rotational speed of the motor.

搬送装置３９は、複数の搬送ユニット３９１＿１，３９１＿２，…を備える。各搬送ユニット３９１＿１，３９１＿２，…は、搬送部（搬送手段、ローラー対）３９８と、駆動部（モーター）３９９とを備える。搬送装置３９は、ファームウェアの設定に基づき、動作する。たとえば、駆動部３９９は、ファームウェアにおいて設定された各種パラメーター（回転速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値等）にしたがって、駆動する。 The transport device 39 includes a plurality of transport units 391_1, 391_2, . . . . Each transport unit 391_1, 391_2, . . . includes a transport section (transport means, roller pair) 398 and a drive section (motor) 399. The transport device 39 operates based on firmware settings. For example, the drive unit 399 is driven according to various parameters (rotational speed, drive timing, stop timing, shift timing, drive current value, etc.) set in firmware.

ネットワークＩＦ３６は、外部のネットワークと通信するための通信インターフェイスである。画像形成装置１は、ネットワークＩＦ３６により、クラウド９００上の各機器（学習装置５００、エミュレーター７００、シミュレーター８００）と通信を行うことができる。詳しくは、ネットワークＩＦ３６は、送信部３６１と、受信部３６２とを備える。 Network IF 36 is a communication interface for communicating with an external network. The image forming apparatus 1 can communicate with each device (learning device 500, emulator 700, simulator 800) on the cloud 900 through the network IF 36. Specifically, the network IF 36 includes a transmitter 361 and a receiver 362.

送信部３６１は、クラウド９００上のエミュレーター７００およびシミュレーター８００にデーターを送信する。具体的には、送信部３６１は、学習装置５００に対して、上述したセンシングデーターを送信する。送信部３６１は、センシングデーターとして、たとえば画像形成装置１を数分間稼働させたときのデーターをエミュレーター７００およびシミュレーター８００に送信する。当該データーには、位置検出用のセンサー（図５参照）の出力の他、回転速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値等も含まれる。また、当該データーは、用紙Ｐの撓み量または引っ張り量を示した用紙状態量も含み得る。 The transmitter 361 transmits data to the emulator 700 and the simulator 800 on the cloud 900. Specifically, the transmitter 361 transmits the above-described sensing data to the learning device 500. The transmitter 361 transmits, as sensing data, data obtained when the image forming apparatus 1 is operated for several minutes, for example, to the emulator 700 and the simulator 800. The data includes, in addition to the output of the position detection sensor (see FIG. 5), rotational speed, drive timing, stop timing, shift timing, drive current value, and the like. Further, the data may also include a paper state quantity indicating the amount of deflection or tension of the paper P.

学習装置５００は、画像形成装置１のファームウェアを更新するための更新用プログラムを画像形成装置１に対して送信する。更新用プログラムは、Ｑ学習によって得られた最適な行動ａの各種のパラメーター（回転速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値等）の数値を含んでいる。 Learning device 500 transmits an update program for updating the firmware of image forming device 1 to image forming device 1 . The update program includes numerical values of various parameters (rotational speed, drive timing, stop timing, shift timing, drive current value, etc.) of the optimal behavior a obtained by Q-learning.

画像形成装置１は、更新プログラムを学習装置５００から受信すると、ファームウェアを当該更新用プログラムにて更新する。 Upon receiving the update program from the learning device 500, the image forming apparatus 1 updates the firmware with the update program.

図１１は、学習装置５００の構成と、シミュレーターの構成とを説明するための機能ブロック図である。 FIG. 11 is a functional block diagram for explaining the configuration of the learning device 500 and the configuration of the simulator.

図１１を参照して、学習装置５００は、状態観測部５１０と、報酬付与部５２０と、学習部５３０と、意思決定部５４０と、更新用プログラム作成部５５０と、更新用プログラム送信部５６０とを備える。状態観測部５１０は、状態量取得部５１５を含む。学習部５３０は、Ｑテーブル５３５を有する。Ｑテーブル５３５には、状態ｓと行動ａ_ｔとが関連付けられている。また、行動ａ_ｔには、パラメーターのセット（複数のパラメーターからなる組）が関連付けられている。 Referring to FIG. 11, learning device 500 includes a state observation section 510, a reward giving section 520, a learning section 530, a decision making section 540, an update program creation section 550, and an update program transmission section 560. Equipped with. The state observation unit 510 includes a state quantity acquisition unit 515. The learning unit 530 has a Q table 535. In the Q table 535, a state s and an action _at are associated. Further, a set of parameters (a set of a plurality of parameters) is associated with the action _at .

シミュレーター８００は、画像形成装置１をシミュレーション用にモデル化した画像形成装置モデル８０５を有する。画像形成装置モデル８０５は、コントローラー３１をシミュレーション用にモデル化したコントローラーモデル８１０と、搬送装置３９をシミュレーション用にモデル化した搬送装置モデル８２０とを含む。 The simulator 800 has an image forming apparatus model 805 that models the image forming apparatus 1 for simulation. The image forming apparatus model 805 includes a controller model 810 that models the controller 31 for simulation, and a transport device model 820 that models the transport device 39 for simulation.

搬送装置モデル８２０は、複数の搬送ユニット３９１＿１，３９１＿２，…をシミュレーション用にモデル化した搬送ユニットモデル８２１＿１，８２１＿２，…を含む。各搬送ユニットモデルは、搬送部３９８（ローラー対）をモデル化した搬送部モデル８２１２と、駆動部３９９をモデル化した駆動部モデル８２１４とを有する。 The transport device model 820 includes transport unit models 821_1, 821_2, . . . that are modeled for simulation of a plurality of transport units 391_1, 391_2, . Each transport unit model includes a transport unit model 8212 that models the transport unit 398 (roller pair), and a drive unit model 8214 that models the drive unit 399.

搬送装置モデル８２０は、複数のセンサー（図５参照）をシミュレーション用にモデル化したセンサーモデル８２５＿１，８２５＿２，…をさらに有する。 The transport device model 820 further includes sensor models 825_1, 825_2, . . . which model a plurality of sensors (see FIG. 5) for simulation.

状態観測部５１０は、シミュレーター８００から状態ｓ（シミュレーター８００からの出力データ）を取得する。状態ｓは、センサーからのオンまたはオフの出力の他、上述した各種のデーターを取得する。 The state observation unit 510 acquires the state s (output data from the simulator 800) from the simulator 800. In the state s, in addition to the ON or OFF output from the sensor, the various data described above are acquired.

状態観測部５１０の状態量取得部５１５は、用紙Ｐの撓み量または引っ張り量を表す用紙状態量を、搬送装置（搬送装置モデル）の搬送路の複数の区間において取得する。典型的には、状態量取得部５１５は、搬送部モデル８２１２（ローラー対，搬送手段）による用紙Ｐの搬送速度、または搬送路１５中の用紙Ｐの位置に基づき、上記用紙状態量を取得する。 The state quantity acquisition unit 515 of the state observation unit 510 acquires paper state quantities representing the amount of deflection or tension of the paper P in a plurality of sections of the conveyance path of the conveyance device (conveyance device model). Typically, the state quantity acquisition unit 515 acquires the paper state quantity based on the transport speed of the paper P by the transport unit model 8212 (roller pair, transport means) or the position of the paper P in the transport path 15. .

報酬付与部５２０は、状態観測部からの状態変数（用紙状態量を含む）に基づいて報酬ｒを付与する。典型的には、報酬付与部５２０は、用紙状態量と、搬送部モデル８２１２の状態とに基づいて報酬ｒを付与する。報酬ｒの付与例については、後述する。 The reward giving unit 520 gives a reward r based on the state variables (including the paper state amount) from the state observation unit. Typically, the reward giving unit 520 gives the reward r based on the paper state quantity and the state of the transport unit model 8212. An example of granting the reward r will be described later.

学習部５３０は、付与された報酬ｒに基づき、Ｑテーブルの対応する行動ａの数値（すなわち、価値）を更新する。詳しくは、学習部５３０は、各搬送部モデル８２１２を駆動する各駆動部モデル８２１４（モーター，駆動手段）のパラメーターのセットの価値をセット毎に表したＱテーブル５３５を、得られた報酬ｒに基づき更新する機械学習を行う。パラメーターのセットは、各駆動部モデル８２１４の速度、駆動のタイミング、停止のタイミング、変速のタイミング、および駆動電流の値の少なくとも１つを含む。 The learning unit 530 updates the numerical value (that is, the value) of the corresponding action a in the Q table based on the reward r provided. Specifically, the learning unit 530 uses the obtained reward r to create a Q table 535 that represents the value of a set of parameters of each drive unit model 8214 (motor, drive means) that drives each transport unit model 8212 for each set. Perform machine learning to update based on the information. The set of parameters includes at least one of the speed, drive timing, stop timing, shift timing, and drive current value of each drive unit model 8214.

意思決定部５４０は、更新後のＱテーブル５３５に基づいて複数のセットから１つのセットを決定し、かつ、選択されたセットのパラメーターで搬送部モデル８２１２を駆動するように駆動部モデルに８２１４に対して指示する。 The decision making unit 540 determines one set from the plurality of sets based on the updated Q table 535, and instructs the drive unit model 8214 to drive the transport unit model 8212 with the parameters of the selected set. Instruct against.

具体的には、意思決定部５４０は、Ｑテーブル５３５を参照し、状態ｓに応じた複数の行動ａのうち、数値（価値）が最も高い行動ａを選択する。なお、意思決定部５４０は、ε－ｇｒｅｅｄｙ法を用いることにより、最も数値が高い行動ａ以外の行動ａも選択するようにする。意思決定部５４０は、選択した行動ａに対応付けられたセット（パラメータのセット）で搬送部モデル８２１２を駆動するように、駆動部モデルに８２１４に対して指示する。 Specifically, the decision making unit 540 refers to the Q table 535 and selects the action a with the highest numerical value (value) from among the plurality of actions a corresponding to the state s. Note that the decision making unit 540 uses the ε-greedy method to select actions a other than the action a with the highest numerical value. The decision making unit 540 instructs the drive unit model 8214 to drive the transport unit model 8212 with the set (set of parameters) associated with the selected action a.

学習装置５００は、上記のような一連の処理を最終状態となるまで繰り返す。すなわち、状態量取得部５１５は、選択されたセット（選択された行動ａに対応付けられたパラメーターのセット）のパラメーターに基づいて搬送部モデル８２１２を駆動したときの状態ｓをさらに取得する。報酬付与部５２０は、当該取得された用紙状態量に基づいて報酬ｒをさらに付与する。学習部５３０は、さらに付与された報酬に基づき、Ｑテーブルをさらに更新する。なお、「最終状態」としては、たとえば、シミュレーター８００に指示するパラメーターの値が一定となった場合が挙げられる。 The learning device 500 repeats the above-described series of processes until the final state is reached. That is, the state quantity acquisition unit 515 further acquires the state s when the transport unit model 8212 is driven based on the parameters of the selected set (the set of parameters associated with the selected action a). The reward giving unit 520 further gives a reward r based on the obtained paper state amount. The learning unit 530 further updates the Q table based on the awarded reward. Note that the "final state" includes, for example, a case where the values of parameters instructed to the simulator 800 become constant.

以上のように、学習装置５００は、搬送装置をシミュレートするシミュレーター８００と通信する。状態量取得部５１５は、シミュレーター８００からの出力に基づき、状態ｓを取得する。報酬付与部５２０は、状態ｓに基づいて報酬ｒを付与する。学習部５３０は、各搬送部モデル８２１２を駆動する各駆動部モデル８２１４のパラメータのセットの価値をセット毎に表すＱテーブル５３５を、報酬ｒに基づき更新する機械学習を行う。 As described above, the learning device 500 communicates with the simulator 800 that simulates the transport device. The state quantity acquisition unit 515 acquires the state s based on the output from the simulator 800. The reward giving unit 520 gives a reward r based on the state s. The learning unit 530 performs machine learning to update the Q table 535, which represents the value of a set of parameters of each drive unit model 8214 that drives each transport unit model 8212, for each set, based on the reward r.

意思決定部５４０は、更新後のＱテーブルに基づいて複数のセット（パラメーターのセット）から１つのセットを決定し、かつ、決定されたセットのパラメーターで搬送部モデル８２１２を駆動するように駆動部モデル８２１４に対して指示する。詳しくは、意思決定部５４０は、取得された状態ｓとＱテーブル５３５とに基づいて複数のセット（パラメーターセット）から１つのセットを決定し、かつ、決定されたセットのパラメーターで搬送部モデル８２１２を駆動するように駆動部モデル８２１４に対して指示する。 The decision making unit 540 determines one set from a plurality of sets (sets of parameters) based on the updated Q-table, and causes the driving unit to drive the transport unit model 8212 with the parameters of the determined set. Instruct model 8214. Specifically, the decision making unit 540 determines one set from a plurality of sets (parameter sets) based on the acquired state s and the Q table 535, and uses the parameters of the determined set to create the transport unit model 8212. The driver model 8214 is instructed to drive the .

再び、図９を参照して意思決定部５４０からシミュレーター８００に指示されるパラメーターについて具体例を挙げて説明する。 Again, with reference to FIG. 9, the parameters instructed by the decision making unit 540 to the simulator 800 will be explained using a specific example.

ある局面において、状態観測部５１０がシミュレーター８００から取得した、画像形成装置モデル８０５（詳しくは、搬送装置モデル８２０）の状態ｓ（詳しくは、状態ｓ_ｔ）が状態＃２であったとする。この場合、報酬付与部５２０は、状態＃２に基づき、報酬ｒを付与する。 In a certain situation, it is assumed that the state s (specifically, state s _t ) of the image forming apparatus model 805 (specifically, the transport device model 820) that the state observation unit 510 acquires from the simulator 800 is state #2. In this case, the reward granting unit 520 grants reward r based on state #2.

学習部５３０は、当該報酬ｒに基づき、シミュレーター８００に対して、直近に指示した行動ａの価値を更新する。すなわち、学習部５３０は、Ｑテーブル５３５内における、状態＃２と当該行動ａとに対応する欄の数値を更新する。 The learning unit 530 updates the value of the most recently instructed action a to the simulator 800 based on the reward r. That is, the learning unit 530 updates the numerical values in the columns corresponding to the state #2 and the action a in the Q table 535.

たとえば、直近の行動が行動ａ２＿２であった場合、学習部５３０は、状態＃２の数値群（行）における行動ａ２＿２の数値を更新する。すなわち、学習部５３０は、状態＃２の行と行動ａ２＿２の列とが交差する１つの数値を更新する。 For example, if the most recent action is action a2_2, the learning unit 530 updates the numerical value of action a2_2 in the numerical value group (row) of state #2. That is, the learning unit 530 updates one numerical value where the row of state #2 and the column of action a2_2 intersect.

意思決定部５４０は、現在の状態ｓ_ｔと現在のＱテーブル５３５（更新されている場合には更新後のＱテーブル５３５）とに基づき、次の行動ａ_ｔを決定（選択）する。さらに、意思決定部５４０は、決定された行動ａ_ｔに対応付けられたパラメーターをシミュレーター８００に通知する。 The decision making unit 540 determines (selects) the next action a _t based on the current state s _t and the current Q table 535 (if updated, the updated Q table 535). Further, the decision making unit 540 notifies the simulator 800 of the parameters associated with the determined action _at .

たとえば、意思決定部５４０は、典型的には、状態＃２において最も数値の高い行動（すなわち、価値の高い）を選択する。なお、上述したように、ε－ｇｒｅｅｄｙ法を用いて、行動ａの選択にランダム性を持たせる。すなわち、意思決定部５４０は、最も数値の高い行動ａを敢えて選択しない処理も実行する。 For example, the decision making unit 540 typically selects the action with the highest numerical value (ie, the highest value) in state #2. Note that, as described above, the ε-greedy method is used to impart randomness to the selection of action a. That is, the decision making unit 540 also executes a process in which the action a with the highest numerical value is intentionally not selected.

意思決定部５４０からの指示がなされると、シミュレーター８００の駆動部モデル８２１４は、指示されたパラメーターで搬送部モデル８２１２を駆動する。これにより、状態観測部５１０では、次の状態ｓ_ｔ＋１が観測される。さらに、行動ａ_ｔに基づく報酬ｒ_ｔ＋１が得られる。詳しくは、報酬付与部５２０によって、行動ａ_ｔに対する報酬ｒ_ｔ＋１の付与が行われる。 When an instruction is given from the decision making unit 540, the drive unit model 8214 of the simulator 800 drives the transport unit model 8212 using the instructed parameters. As a result, the state observation unit 510 observes the next state s _t+1 . Furthermore, a reward r _t+1 based on the action a _t is obtained. Specifically, the reward granting unit 520 grants a reward r _t+1 for the action a _t .

以後、上述した報酬の付与と、Ｑテーブル５３５の更新と、シミュレーター８００に対するパラメーターの通知とが繰り返される。 Thereafter, the above-described awarding of rewards, updating of Q table 535, and notification of parameters to simulator 800 are repeated.

上記の構成によれば、実機である画像形成装置１のを駆動するローラーのパラメータの値を最適化することができる。また、上述したように、学習装置５００は、画像形成装置１のファームウェアを更新するための更新用プログラムを画像形成装置１に対して送信する。したがって、好適な設定にて画像形成装置１の搬送系を動作させることができる。 According to the above configuration, it is possible to optimize the parameter values of the rollers that drive the image forming apparatus 1, which is an actual machine. Further, as described above, the learning device 500 transmits an update program for updating the firmware of the image forming apparatus 1 to the image forming apparatus 1. Therefore, the transport system of the image forming apparatus 1 can be operated with suitable settings.

＜Ｇ．用紙状態に基づく報酬付与の例＞
（ｇ１．第１の例）
図１２は、搬送路１５の幅Ｗが通常の箇所を用紙Ｐが通過している状態を表した模式図である。なお、「幅」とは、用紙Ｐの厚み方向の隙間を意味する。 <G. Example of remuneration based on paper condition>
(g1. 1st example)
FIG. 12 is a schematic diagram showing a state in which the paper P passes through a portion of the conveyance path 15 where the width W is normal. Note that "width" means a gap in the thickness direction of the paper P.

図１２を参照して、搬送装置３９は、ローラー対４０１と、ローラー対４０１の下流側の次の搬送手段であるローラー対４０２とを含む。ローラー対４０１は、駆動ローラー４０１１と、従動ローラー４０１２とを有する。ローラー対４０２は、駆動ローラー４０２１と、従動ローラー４０２２とを有する。用紙Ｐは、矢印の方向（下流の方向）に搬送される。 Referring to FIG. 12, the conveying device 39 includes a roller pair 401 and a roller pair 402 which is the next conveying means downstream of the roller pair 401. The roller pair 401 includes a driving roller 4011 and a driven roller 4012. The roller pair 402 includes a driving roller 4021 and a driven roller 4022. The paper P is conveyed in the direction of the arrow (downstream direction).

図１３は、用紙Ｐが撓んでいる状態を表した模式図である。
図１３を参照して、ローラー対４０１とローラー対４０２との間で用紙Ｐが撓んでいる。たとえば、上流側のローラー対４０１の回転速度が下流側のローラー対４０２の回転速度よりも早い場合には、このように用紙Ｐが撓んだ状態となる。 FIG. 13 is a schematic diagram showing a state in which the paper P is bent.
Referring to FIG. 13, paper P is bent between roller pair 401 and roller pair 402. For example, when the rotational speed of the upstream roller pair 401 is faster than the rotational speed of the downstream roller pair 402, the paper P is bent in this manner.

図１４は、用紙Ｐが引っ張られて状態を表した模式図である。
図１４を参照して、ローラー対４０１とローラー対４０２との間で用紙Ｐが引っ張られている。たとえば、下流側のローラー対４０２の回転速度が上流側のローラー対４０１の回転速度よりも早い場合には、このように用紙Ｐが引っ張られた状態となる。 FIG. 14 is a schematic diagram showing a state in which the paper P is pulled.
Referring to FIG. 14, paper P is being pulled between roller pair 401 and roller pair 402. For example, when the rotational speed of the downstream roller pair 402 is faster than the rotational speed of the upstream roller pair 401, the paper P is pulled in this manner.

本例では、ローラー対４０１とローラー対４０２との間の区間において、所定の撓み量を許容する設定がなされている。この場合、報酬付与部５２０は、ローラー対４０１とローラー対４０２との間の区間における用紙状態量（撓み量または引っ張り量）が当該所定の撓み量以下の撓み量を表しているときに、正の報酬を付与する。 In this example, settings are made to allow a predetermined amount of deflection in the section between the roller pair 401 and the roller pair 402. In this case, the reward giving unit 520 corrects the paper condition amount (deflection amount or tension amount) in the section between the roller pair 401 and the roller pair 402 when the amount of deflection is equal to or less than the predetermined deflection amount. will be given a reward.

所定の撓み量は、本例では、用紙Ｐの厚み方向の搬送路１５の幅未満の値である。また、所定の撓み量は、撓み量が０よりも大きい所定の値である。 In this example, the predetermined amount of deflection is a value less than the width of the conveyance path 15 in the thickness direction of the paper P. Further, the predetermined amount of deflection is a predetermined value in which the amount of deflection is greater than zero.

なお、当該箇所における所定の撓み量を許容する設定は、報酬付与部５２０において予め登録されている。以下においても、用紙状態量（撓み量または引っ張り量）についての設定は、報酬付与部５２０において予め登録されているものとする。 Note that the setting for allowing a predetermined amount of deflection at the location is registered in advance in the reward giving unit 520. In the following, it is assumed that the settings for the paper state quantity (the amount of deflection or the amount of tension) are registered in advance in the reward giving unit 520.

（ｇ２．第２の例）
図１５は、幅が狭い箇所を用紙Ｐが通過している状態を表した模式図である。 (g2. Second example)
FIG. 15 is a schematic diagram showing a state in which the paper P passes through a narrow area.

図１５を参照して、搬送装置３９は、ローラー対４０３と、ローラー対４０３の下流側の次の搬送手段であるローラー対４０４とを含む。ローラー対４０３は、駆動ローラー４０３１と、従動ローラー４０３２とを有する。ローラー対４０４は、駆動ローラー４０４１と、従動ローラー４０４２とを有する。用紙Ｐは、矢印の方向（下流の方向）に搬送される。 Referring to FIG. 15, the conveying device 39 includes a pair of rollers 403 and a pair of rollers 404 that is the next conveying means downstream of the pair of rollers 403. The roller pair 403 includes a driving roller 4031 and a driven roller 4032. The roller pair 404 includes a driving roller 4041 and a driven roller 4042. The paper P is conveyed in the direction of the arrow (downstream direction).

搬送路１５の幅Ｗが狭いため、ローラー対４０３とローラー対４０４との間の区間において、用紙Ｐの撓みを許容しない設定がなされている。この場合、報酬付与部５２０は、ローラー対４３０とローラー対４４０との間の区間における用紙状態量が引っ張り量を表しているときに、正の報酬を付与する。 Since the width W of the conveyance path 15 is narrow, a setting is made in which the sheet P is not allowed to bend in the section between the roller pair 403 and the roller pair 404. In this case, the reward giving unit 520 gives a positive reward when the paper state amount in the section between the roller pair 430 and the roller pair 440 represents the amount of tension.

（ｇ３．第３の例）
図１６は、画像形成装置１の給紙カセット１４から用紙Ｐが搬送路１５に供給されている状態を表した模式図である。 (g3. Third example)
FIG. 16 is a schematic diagram showing a state in which paper P is being supplied to the conveyance path 15 from the paper feed cassette 14 of the image forming apparatus 1.

図１６を参照して、給紙ローラー１１３およびローラー対４０６は、複数の用紙Ｐを格納した給紙カセット１４から用紙Ｐを１つずつ搬送路１５に搬送する。 Referring to FIG. 16, paper feed roller 113 and roller pair 406 transport sheets P one by one from paper feed cassette 14 storing a plurality of sheets P to transport path 15.

報酬付与部５２０は、用紙Ｐの後端が給紙ローラー１１３に到達する前の位置における用紙状態量が引っ張り量を表しており、かつ用紙Ｐの後端が給紙ローラー１１３を通過する際に給紙ローラー１１３が停止している場合、正の報酬を付与する。 The reward giving unit 520 determines that the paper state amount at a position before the trailing edge of the paper P reaches the paper feed roller 113 represents the amount of tension, and that when the trailing edge of the paper P passes the paper feed roller 113 If the paper feed roller 113 is stopped, a positive reward is given.

（ｇ４．第４の例）
図１７は、画像形成装置１において印刷済みの用紙Ｐが排出トレイ２７１に排出されている状態を表した模式図である。 (g4. Fourth example)
FIG. 17 is a schematic diagram showing a state in which printed paper P is discharged to the discharge tray 271 in the image forming apparatus 1.

図１７を参照して、搬送装置３９は、ローラー対４０７と、ローラー対４０７の下流側の次の搬送手段であるローラー対４０８とを含む。ローラー対４０７は、駆動ローラー４０７１と、従動ローラー４０７２とを有する。ローラー対４０８は、駆動ローラー４０８１と、従動ローラー４０８２とを有する。用紙Ｐは、矢印の方向（下流の方向）に搬送（排出）される。 Referring to FIG. 17, the conveying device 39 includes a pair of rollers 407 and a pair of rollers 408 that is the next conveying means downstream of the pair of rollers 407. The roller pair 407 includes a driving roller 4071 and a driven roller 4072. The roller pair 408 includes a driving roller 4081 and a driven roller 4082. The paper P is conveyed (discharged) in the direction of the arrow (downstream direction).

用紙Ｐを排出トレイ２７１に排出する場合には、ローラー対４０８が用紙Ｐを引っ張った状態で搬送することによりローラー対４０７を用紙Ｐが通過する時間を早くすることが可能である。したがって、この場合には、報酬付与部５２０、ローラー対４０８が用紙Ｐを引っ張った状態で搬送しているときに、正の報酬を付与する。 When discharging the paper P to the discharge tray 271, it is possible to speed up the time for the paper P to pass through the roller pair 407 by conveying the paper P while being pulled by the roller pair 408. Therefore, in this case, a positive reward is given when the reward giving unit 520 and the pair of rollers 408 are conveying the paper P in a stretched state.

（ｇ５．第５の例）
用紙Ｐは、レジストローラー対１１６から、駆動ローラー１０３および２次転写装置（２次転写ローラー）１１５のニップ領域に送られ、画像が転写される（図１３参照）。さらに、加熱ローラー１２１と加圧ローラー１２２とからなる定着装置１２０によって、画像を用紙Ｐに定着させる。 (g5. Fifth example)
The paper P is sent from the registration roller pair 116 to the nip area of the drive roller 103 and the secondary transfer device (secondary transfer roller) 115, and the image is transferred thereto (see FIG. 13). Further, the image is fixed onto the paper P by a fixing device 120 including a heating roller 121 and a pressure roller 122.

レジストローラー対１１６と、駆動ローラー１０３および２次転写装置１１５との間では、用紙Ｐに撓みも引っ張りもないことが精度の高い画像形成を行うために必要である。 Between the registration roller pair 116, the drive roller 103, and the secondary transfer device 115, it is necessary that the paper P be neither bent nor stretched in order to form a highly accurate image.

そこで、報酬付与部５２０は、レジストローラー対１１６と２次転写装置１１５との間の区間における用紙状態量が引っ張り量および撓み量のいずれも表していないときに、正の報酬を付与する。 Therefore, the reward giving unit 520 gives a positive reward when the paper state amount in the section between the registration roller pair 116 and the secondary transfer device 115 does not represent either the amount of tension or the amount of deflection.

このように、報酬付与部５２０は、２つのローラー対（またはローラー）の間の区間において、用紙Ｐの撓みと、搬送方向への用紙Ｐへの力の発生とが許容されていない場合、上流側のローラーと下流側のローラー対（上流側のローラー対の次のローラー対）との間の区間における用紙状態量が引っ張り量および撓み量のいずれも表していないときに、正の報酬を付与する。 In this way, if the deflection of the paper P and the generation of force on the paper P in the transport direction are not allowed in the section between the two roller pairs (or rollers), the reward giving unit 520 A positive reward is given when the paper state amount in the section between the side roller and the downstream roller pair (the roller pair next to the upstream roller pair) does not represent either the amount of tension or the amount of deflection. do.

（ｇ６．第６の例）
上流側のローラー対と下流側のローラー対（上流側のローラー対の次のローラー対）とで同時に用紙Ｐを搬送している場合、当該上流側のローラー対と下流側のローラー対との間の区間において、所定の撓み量を許容する設定がなされているとき、報酬付与部５２０は、上流側のローラー対の搬送速度が下流側のローラー対の搬送速度以上であることを条件に、正の報酬を付与する。 (g6. 6th example)
When paper P is conveyed simultaneously by an upstream roller pair and a downstream roller pair (the next roller pair after the upstream roller pair), there is a gap between the upstream roller pair and the downstream roller pair. When the setting is made to allow a predetermined amount of deflection in the section of will be given a reward.

（ｇ７．第７の例）
上流側のローラー対と下流側のローラー対（上流側のローラー対の次のローラー対）とで同時に用紙Ｐを搬送している場合、当該上流側のローラー対と下流側のローラー対との間の区間において、用紙Ｐの撓みを許容しない設定がなされている場合、報酬付与部５２０は、当該上流側のローラー対の搬送速度が当該下流側のローラー対の搬送速度以下であることを条件に、正の報酬を付与する。 (g7. Seventh example)
When paper P is conveyed simultaneously by an upstream roller pair and a downstream roller pair (the next roller pair after the upstream roller pair), there is a gap between the upstream roller pair and the downstream roller pair. If the setting is such that the paper P is not allowed to bend in the section, the reward giving unit 520 sets the condition that the conveyance speed of the upstream roller pair is equal to or lower than the conveyance speed of the downstream roller pair. , giving positive rewards.

＜Ｇ．制御構造＞
図１８は、Ｑ学習の処理の手順を表したフロー図である。 <G. Control structure>
FIG. 18 is a flow diagram showing the procedure of Q learning processing.

図１８を参照して、ステップＳ１において、学習装置５００は、取得された状態ｓ_ｔと、現在のＱテーブル５３５とを参照して、次の行動ａ_ｔを決定する。具体的には、意思決定部５４０（図１１参照）が、Ｑテーブル内の数値に基づき行動ａ_ｔを選択し、選択された行動ａ_ｔに関連付けられたパラメーターをシミュレーター８００に通知する。 Referring to FIG. 18, in step S1, the learning device 500 refers to the acquired state s _t and the current Q table 535 to determine the next action a _t . Specifically, the decision making unit 540 (see FIG. 11) selects the action a _t based on the numerical values in the Q table, and notifies the simulator 800 of the parameters associated with the selected action a _t .

ステップＳ２において、シミュレーター８００は、ステップＳ１にて決定された行動ａ_ｔに基づき行動する。具体的には、シミュレーター８００は、意思決定部５４０から通知されたパラメーターにて駆動部モデル８２１４を駆動する。 In step S2, the simulator 800 acts based on the _action at determined in step S1. Specifically, the simulator 800 drives the drive unit model 8214 using the parameters notified from the decision making unit 540.

ステップＳ３において、状態観測部５１０は、ステップＳ２で通知したパラメーターにて駆動部モデル８２１４を駆動させたときの状態ｓ_ｔ+1を、シミュレーター８００から取得する。 In step S3, the state observation unit 510 acquires the state s _t+1 from the simulator 800 when the drive unit model 8214 is driven using the parameters notified in step S2.

ステップＳ４において、報酬付与部５２０は、撓み量および引っ張り量を表した用紙状態量に基づき、選択された行動ａに対して報酬ｒ_ｔ+1を付与する。ステップＳ５において、学習部５３０は、付与された報酬ｒ_ｔ+1に基づき、Ｑテーブル５３５を更新する。詳しくは、学習部５３０は、選択された行動ａ_ｔの価値を、報酬ｒ_ｔ+1を付与することによりり更新する。 In step S4, the reward giving unit 520 gives a reward r _t+1 to the selected action a based on the paper state quantities representing the amount of deflection and the amount of tension. In step S5, the learning unit 530 updates the Q table 535 based on the granted reward r _t+1 . Specifically, the learning unit 530 updates the value of the selected action a _t by giving the reward r _t+1 .

図１９は、撓み量に関する報酬の付与例を説明するためのフロー図である。
ステップＳ１１において、学習装置５００において、区間毎における撓み量の目標値を事前に設定しておく。ステップＳ１２において、学習装置５００は、シミュレーター８００から取得した状態ｓに基づき、撓み量を計測する。なお、撓み量自体がシミュレーター８００から送信される構成であってもよい。 FIG. 19 is a flowchart for explaining an example of giving compensation regarding the amount of deflection.
In step S11, in the learning device 500, a target value for the amount of deflection for each section is set in advance. In step S12, the learning device 500 measures the amount of deflection based on the state s acquired from the simulator 800. Note that the deflection amount itself may be transmitted from the simulator 800.

ステップＳ１３において、学習装置５００は、目標値と計測値とを比較する。計測値がゼロ以下の場合、ステップＳ１４において、選択された行動に対して、報酬付与部５２０は、一例として“－１”の報酬を付与する。計測値がゼロよりも大きく、かつ目標値以下の場合、ステップＳ１５において、選択された行動に対して、報酬付与部５２０は、一例として“＋１”の報酬を付与する。計測値が目標値よりも大きい場合、ステップＳ１６において、選択された行動に対して、報酬付与部５２０は、一例として“－１”の報酬を付与する。 In step S13, the learning device 500 compares the target value and the measured value. If the measured value is less than or equal to zero, in step S14, the reward giving unit 520 gives, for example, a reward of "-1" to the selected action. If the measured value is greater than zero and less than or equal to the target value, in step S15, the reward giving unit 520 gives, for example, a reward of "+1" to the selected action. If the measured value is larger than the target value, in step S16, the reward giving unit 520 gives, for example, a reward of "-1" to the selected action.

図２０は、許容できる撓み量を区間毎に判断するための処理を説明するためのフロー図である。なお、説明を簡略化するため、区間が３つである場合を例に挙げて説明する。 FIG. 20 is a flow diagram for explaining processing for determining the allowable amount of deflection for each section. Note that to simplify the explanation, an example will be described in which there are three sections.

図２０を参照して、ステップＳ２１において、事前に区間を設定しておく。ステップＳ２２において、学習装置５００は、用紙Ｐに撓みが発生している場合、撓みが発生している区間を状態ｓに基づき判定する。 Referring to FIG. 20, in step S21, sections are set in advance. In step S22, if the paper P is warped, the learning device 500 determines the section where the warp occurs based on the state s.

撓みが発生している区間が区間Ａである場合、ステップＳ２３において、撓みの許容量を３ｍｍに設定する。学習装置５００は、区間Ａでは、撓み量と許容量（３ｍｍ）とを比較することにより、報酬の付与を行う。同様に、撓みが発生している区間が区間Ｂである場合、ステップＳ２４において、撓みの許容量を５ｍｍに設定する。学習装置５００は、区間Ｂでは、撓み量と許容量（５ｍｍ）とを比較することにより、報酬の付与を行う。また、撓みが発生している区間が区間Ｃである場合、ステップＳ２５において、撓みの許容量を２ｍｍに設定する。学習装置５００は、区間Ｃでは、撓み量と許容量（２ｍｍ）とを比較することにより、報酬の付与を行う。 When the section where the deflection occurs is section A, the allowable amount of deflection is set to 3 mm in step S23. In section A, the learning device 500 provides a reward by comparing the amount of deflection and the allowable amount (3 mm). Similarly, if the section where the deflection occurs is section B, the allowable amount of deflection is set to 5 mm in step S24. In section B, the learning device 500 provides a reward by comparing the amount of deflection and the allowable amount (5 mm). Further, when the section where the deflection occurs is section C, the allowable amount of deflection is set to 2 mm in step S25. In section C, the learning device 500 provides a reward by comparing the amount of deflection and the allowable amount (2 mm).

図２１は、引っ張り量に関する報酬の付与例を説明するためのフロー図である。
ステップＳ３１において、学習装置５００において、区間毎における引っ張り量の目標値を事前に設定しておく。ステップＳ３２において、学習装置５００は、シミュレーター８００から取得した状態ｓに基づき、引っ張り量を計測する。なお、引っ張り量自体がシミュレーター８００から送信される構成であってもよい。 FIG. 21 is a flowchart illustrating an example of awarding rewards related to the amount of pull.
In step S31, in the learning device 500, a target value of the amount of tension for each section is set in advance. In step S32, the learning device 500 measures the amount of tension based on the state s acquired from the simulator 800. Note that the tension amount itself may be transmitted from the simulator 800.

ステップＳ３３において、学習装置５００は、目標値と計測値とを比較する。計測値が目標値よりも小さい場合、ステップＳ３４において、選択された行動に対して、報酬付与部５２０は、一例として“－１”の報酬を付与する。計測値がゼロよりも小さく、かつ目標値以上である場合、ステップＳ３５において、選択された行動に対して、報酬付与部５２０は、一例として“＋１”の報酬を付与する。計測値がゼロ以上の場合、ステップＳ１６において、選択された行動に対して、報酬付与部５２０は、一例として“－１”の報酬を付与する。 In step S33, the learning device 500 compares the target value and the measured value. If the measured value is smaller than the target value, in step S34, the reward giving unit 520 gives, for example, a reward of "-1" to the selected action. If the measured value is smaller than zero and greater than or equal to the target value, in step S35, the reward giving unit 520 gives, for example, a reward of "+1" to the selected action. If the measured value is zero or more, in step S16, the reward giving unit 520 gives, for example, a reward of "-1" to the selected action.

＜Ｉ．変形例＞
（ｉ１．物性を考慮した学習）
学習部５３０が、報酬と用紙Ｐの物性とに基づき、各モーターのパラメーターの値を更新してもよい。すなわち、用紙Ｐの物性をさらに考慮して機械学習を行うように、学習装置５００を構成してもよい。この場合、Ｑテーブル５３５を物性を考慮したテーブルとして構成すればよい。物性としては、たとえば、剛度、坪量が挙げられる。物性を考慮することにより、より最適なパラメーターの設定が可能となる。 <I. Modified example>
(i1. Learning considering physical properties)
The learning unit 530 may update the parameter values of each motor based on the reward and the physical properties of the paper P. That is, the learning device 500 may be configured to perform machine learning by further considering the physical properties of the paper P. In this case, the Q table 535 may be configured as a table that takes physical properties into consideration. Examples of physical properties include stiffness and basis weight. By considering physical properties, it becomes possible to set more optimal parameters.

物性の一例として剛度を考慮する場合について説明すると、以下のとおりである。
上流側のローラー対と下流側のローラー対（上流側のローラー対の次のローラー対）とで同時に用紙Ｐを搬送している場合、報酬付与部５２０は、用紙Ｐの剛度が所定値以上であり、かつ、上流側のローラー対の搬送速度と下流側のローラー対の搬送速度とが同じであることを条件に、正の報酬を付与する。 A case in which stiffness is considered as an example of physical property will be explained as follows.
When the upstream roller pair and the downstream roller pair (the next roller pair after the upstream roller pair) are simultaneously transporting the paper P, the reward giving unit 520 determines that the stiffness of the paper P is equal to or greater than a predetermined value. A positive reward is given on the condition that the transport speed of the upstream roller pair and the transport speed of the downstream roller pair are the same.

また、上流側のローラー対と下流側のローラー対（上流側のローラー対の次のローラー対）とで同時に前記搬送対象物を搬送しており、かつ上流側のローラー対と下流側のローラー対との間の区間において、所定の撓み量を許容する設定がなされている場合、報酬付与部５２０は、用紙Ｐの剛度が所定値未満であり、上流側のローラー対と下流側のローラー対との間の区間における用紙状態量が所定の撓み量以下の撓み量を表しているときに、正の報酬を付与する。 Further, the object to be conveyed is simultaneously conveyed by an upstream roller pair and a downstream roller pair (the next roller pair after the upstream roller pair), and the upstream roller pair and the downstream roller pair If the setting is made to allow a predetermined amount of deflection in the section between A positive reward is given when the paper state quantity in the section between 2 and 3 represents a deflection amount that is less than or equal to a predetermined deflection amount.

物性の一例として坪量を考慮する場合について説明すると、以下のとおりである。
上流側のローラー対と下流側のローラー対（上流側のローラー対の次のローラー対）とで同時に用紙Ｐを搬送している場合、報酬付与部５２０は、用紙Ｐの坪量が所定値以上であり、かつ、上流側のローラー対の搬送速度と下流側のローラー対の搬送速度とが同じであることを条件に、正の報酬を付与する。 The case where basis weight is considered as an example of physical properties is as follows.
When the upstream roller pair and the downstream roller pair (the next roller pair after the upstream roller pair) are conveying the paper P at the same time, the reward giving unit 520 determines whether the basis weight of the paper P is greater than or equal to a predetermined value. , and a positive reward is given on the condition that the conveyance speed of the upstream roller pair and the conveyance speed of the downstream roller pair are the same.

また、上流側のローラー対と下流側のローラー対（上流側のローラー対の次のローラー対）とで同時に前記搬送対象物を搬送しており、かつ上流側のローラー対と下流側のローラー対との間の区間において、所定の撓み量を許容する設定がなされている場合、報酬付与部５２０は、用紙Ｐの坪量が所定値未満であり、上流側のローラー対と下流側のローラー対との間の区間における用紙状態量が上記所定の撓み量以下の撓み量を表しているときに、正の報酬を付与する。 Further, the object to be conveyed is simultaneously conveyed by an upstream roller pair and a downstream roller pair (the next roller pair after the upstream roller pair), and the upstream roller pair and the downstream roller pair If the setting is made to allow a predetermined amount of deflection in the section between A positive reward is given when the paper state quantity in the section between .

（ｉ２．再学習）
学習後において、ユーザーが実機である画像形成装置１を利用しているときに、いずれかのローラー対の用紙Ｐの搬送速度が、前回の学習時によって設定された搬送速度と異なった場合に、機械学習を再度実行するように、学習装置５００を構成することが好ましい。 (i2. Relearning)
After learning, when the user is using the actual image forming apparatus 1, if the conveyance speed of the paper P of any pair of rollers differs from the conveyance speed set during the previous learning, Preferably, the learning device 500 is configured to perform machine learning again.

たとえば、ローラー対の用紙Ｐの搬送速度が、前回の学習時によって設定された搬送速度から基準値以上、上回ったり、あるいは下回った場合に、機械学習を再度実行するように、学習装置５００を構成することが好ましい。 For example, the learning device 500 is configured to perform machine learning again when the conveyance speed of the paper P in the roller pair exceeds, exceeds, or falls below a reference value from the conveyance speed set during the previous learning. It is preferable to do so.

あるいは、ローラー対の用紙Ｐの搬送速度が、前回の学習時によって設定された搬送速度から基準割合以上、早くなったり、あるいは遅くなったりした場合に、機械学習を再度実行するように、学習装置５００を構成することが好ましい。 Alternatively, the learning device can be configured to perform machine learning again when the conveyance speed of the paper P in the roller pair increases or decreases by more than a standard percentage from the conveyance speed set during the previous learning. 500 is preferred.

（ｉ３．画像形成装置１内での機械学習）
図２２は、画像形成装置１内で強化学習を実施する構成を説明するための模式図である。 (i3. Machine learning within image forming apparatus 1)
FIG. 22 is a schematic diagram for explaining a configuration for implementing reinforcement learning within the image forming apparatus 1.

図２２を参照して、画像形成装置１は、エミュレーター７００と、シミュレーター８００と、学習装置５００とを備える。このような構成によれば、クラウド９００上の情報処理装置（具体的には、サーバー）によって、強化学習を行う必要がなくなる。 Referring to FIG. 22, image forming apparatus 1 includes an emulator 700, a simulator 800, and a learning device 500. According to such a configuration, there is no need for the information processing device (specifically, the server) on the cloud 900 to perform reinforcement learning.

なお、画像形成装置ではなく、搬送装置で上記の強化学習を行ってもよい。すなわち、画像形成の機能を有するか否かに関わらず、搬送対象物を搬送する装置内で強化学習を行ってもよい。 Note that the above-mentioned reinforcement learning may be performed in the conveyance device instead of the image forming device. That is, reinforcement learning may be performed within a device that transports an object, regardless of whether it has an image forming function or not.

また、上述した学習方法をプログラムによって提供することもできる。情報処理装置が当該プログラムを実行することにより、学習装置５００として機能する。 Moreover, the learning method described above can also be provided by a program. The information processing device functions as the learning device 500 by executing the program.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した説明ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time should be considered to be illustrative in all respects and not restrictive. The scope of the present invention is indicated by the claims rather than the above description, and it is intended that all changes within the meaning and range equivalent to the claims are included.

１画像形成装置、１０本体部、１１画像形成ユニット、１２スキャナーユニット、１３自動原稿搬送ユニット、１４給紙カセット、１５搬送路、１６メディアセンサー、１７反転搬送路、２０後処理装置、３０バス、３１コントローラー、３２固定記憶装置、３４操作パネル、３５プリンタコントローラー、３９，３９Ａ搬送装置、１０１中間転写ベルト、１０２テンションローラー、１０３，４０１１，４０２１，４０３１，４０４１，４０７１，４０８１駆動ローラー、１０４Ｃ，１０４Ｋ，１０４Ｍ，１０４Ｙ画像形成部、１０５画像濃度センサー、１１１１次転写装置、１１５２次転写装置、１１３給紙ローラー、１１６レジストローラー対、１２０定着装置、１２１加熱ローラー、１２２加圧ローラー、１４２底上げ板、１４３センサー、２２０パンチ処理装置、２５０平綴じ処理部、２６０中綴じ処理部、２７１，２７２，２７３排出トレイ、３６１送信部、３６２受信部、３９１搬送ユニット、３９８搬送部、３９９駆動部、４０１，４０２，４０３，４０４，４０６，４０７，４０８，４３０，４４０ローラー対、５００学習装置、５１０状態観測部、５１５状態量取得部、５２０報酬付与部、５３０学習部、５３５Ｑテーブル、５４０意思決定部、５５０更新用プログラム作成部、５６０更新用プログラム送信部、５８２ＲＡＭ、５８３ＲＯＭ、５８５ディスプレイ、５８６操作キー、５８８電源回路、７００エミュレーター、８００シミュレーター、８０５画像形成装置モデル、８１０コントローラーモデル、８２０搬送装置モデル、８２１搬送ユニットモデル、８２５センサーモデル、９００クラウド、１０００学習システム、４０１２，４０２２，４０３２，４０４２，４０７２，４０８２従動ローラー、８２１２搬送部モデル、８２１４駆動部モデル、Ｐ用紙、Ｗ幅。 1 image forming apparatus, 10 main unit, 11 image forming unit, 12 scanner unit, 13 automatic document transport unit, 14 paper feed cassette, 15 transport path, 16 media sensor, 17 reversing transport path, 20 post-processing device, 30 bus, 31 controller, 32 fixed storage device, 34 operation panel, 35 printer controller, 39, 39A conveyance device, 101 intermediate transfer belt, 102 tension roller, 103, 4011, 4021, 4031, 4041, 4071, 4081 drive roller, 104C, 104K , 104M, 104Y image forming section, 105 image density sensor, 111 primary transfer device, 115 secondary transfer device, 113 paper feed roller, 116 registration roller pair, 120 fixing device, 121 heating roller, 122 pressure roller, 142 bottom raiser board, 143 sensor, 220 punch processing device, 250 side stitching processing section, 260 saddle stitching processing section, 271, 272, 273 discharge tray, 361 transmitting section, 362 receiving section, 391 transport unit, 398 transport section, 399 drive section, 401, 402, 403, 404, 406, 407, 408, 430, 440 roller pair, 500 learning device, 510 state observation unit, 515 state quantity acquisition unit, 520 reward provision unit, 530 learning unit, 535 Q table, 540 intention determination unit, 550 update program creation unit, 560 update program transmission unit, 582 RAM, 583 ROM, 585 display, 586 operation keys, 588 power supply circuit, 700 emulator, 800 simulator, 805 image forming device model, 810 controller model, 820 transport device model, 821 transport unit model, 825 sensor model, 900 cloud, 1000 learning system, 4012, 4022, 4032, 4042, 4072, 4082 driven roller, 8212 transport unit model, 8214 drive unit model, P paper, W width .

Claims

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When a setting is made to allow a predetermined amount of deflection in the section between the first conveying means and the second conveying means among the plurality of sections, the reward giving means A machine learning device that provides a positive reward when the state quantity in the section between the transport means and the second transport means represents the amount of deflection that is equal to or less than the predetermined amount of deflection.

The conveying device includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyed object in a plurality of sections of the conveyance path of the conveyance device, and the conveyance device sequentially conveys the conveyed object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function representing the value of a set of parameters of each driving means that drives each of the conveying means for each set based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In a section between the first conveying means and the second conveying means among the plurality of sections, when a setting is made in which the deflection of the conveyed object is not allowed, the reward giving means A machine learning device that provides a positive reward when the state amount in the section between the first conveyance means and the second conveyance means represents the amount of tension .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The reward granting means grants the reward based on the state quantity and the state of the transport means,
A predetermined transport means among the plurality of transport means transports the transport objects one by one from a storage means storing a plurality of transport objects to the transport path,
The reward giving means is configured such that the state quantity at a position before the rear end of the conveyance object reaches the predetermined conveyance means among the plurality of sections represents the pulling amount, and A machine learning device that provides a positive reward when the predetermined transport means is stopped when the rear end of the vehicle passes through the predetermined transport means .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The reward granting means grants the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In the section between the first conveying means and the second conveying means among the plurality of sections, the object to be conveyed is deflected, and the object to be conveyed in the conveying direction in the second conveying means is If the generation of force is not permitted, the reward giving means determines whether the state quantity in the section between the first conveyance means and the second conveyance means is equal to the amount of tension or the amount of deflection. A machine learning device that gives positive rewards when the user does not represent the same .

The conveying device includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyed object in a plurality of sections of the conveyance path of the conveyance device, and the conveyance device sequentially conveys the conveyed object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function representing the value of a set of parameters of each driving means that drives each of the conveying means for each set based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The reward granting means grants the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In the case where it is possible to speed up the time for the object to be transported through the first transport means by transporting the object in a pulled state by the second transport means, the reward granting means , a machine learning device that gives a positive reward when the second conveying means conveys the conveyed object in a pulled state .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The reward giving means gives the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the object at the same time, the distance between the first conveyance means and the second conveyance means among the plurality of sections is When the setting is made to allow a predetermined amount of deflection in the section, the remuneration granting means provides, on the condition that the conveyance speed of the first conveyance means is equal to or higher than the conveyance speed of the second conveyance means, A machine learning device that gives positive rewards .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The state quantity acquisition means acquires the state quantity based on the conveyance speed of the conveyance target object by the conveyance means or the position of the conveyance target object in the conveyance path,
The reward giving means gives the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the object at the same time, the distance between the first conveyance means and the second conveyance means among the plurality of sections is If a setting is made in which the deflection of the conveyed object is not allowed in the section, the remuneration granting means sets the condition that the conveying speed of the first conveying means is equal to or less than the conveying speed of the second conveying means. A machine learning device that gives positive rewards to people .

A machine learning device,
The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When a setting is made to allow a predetermined amount of deflection in the section between the first conveying means and the second conveying means among the plurality of sections, the reward giving means A machine learning device that provides a positive reward when the state quantity in the section between the transport means and the second transport means represents the amount of deflection that is equal to or less than the predetermined amount of deflection.

A machine learning device,
The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In a section between the first conveying means and the second conveying means among the plurality of sections, when a setting is made in which the deflection of the conveyed object is not allowed, the reward giving means A machine learning device that provides a positive reward when the state amount in the section between the first conveyance means and the second conveyance means represents the amount of tension .

A machine learning device,
The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The reward granting means grants the reward based on the state quantity and the state of the transport means,
A predetermined transport means among the plurality of transport means transports the transport objects one by one from a storage means storing a plurality of transport objects to the transport path,
The reward giving means is configured such that the state quantity at a position before the rear end of the conveyance object reaches the predetermined conveyance means among the plurality of sections represents the pulling amount, and A machine learning device that provides a positive reward when the predetermined transport means is stopped when the rear end of the vehicle passes through the predetermined transport means .

A machine learning device,
The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The reward giving means gives the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In the section between the first conveying means and the second conveying means among the plurality of sections, the object to be conveyed is deflected, and the object to be conveyed in the conveying direction in the second conveying means is If the generation of force is not permitted, the reward giving means determines whether the state quantity in the section between the first conveyance means and the second conveyance means is equal to the amount of tension or the amount of deflection. A machine learning device that gives a positive reward when the user does not represent the same .

A machine learning device,
The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The reward giving means gives the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When it is possible to speed up the time for the object to be transported through the first transport means by transporting the object in a pulled state by the second transport means, the reward giving means , a machine learning device that gives a positive reward when the second conveying means conveys the conveyed object in a pulled state .

A machine learning device,
The conveying device includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyed object in a plurality of sections of the conveyance path of the conveyance device, and the conveyance device sequentially conveys the conveyed object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function representing the value of a set of parameters of each driving means that drives each of the conveying means for each set based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The reward giving means gives the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the object to be transported is simultaneously transported by the first transport means and the second transport means, the area between the first transport means and the second transport means among the plurality of sections is When the setting is made to allow a predetermined amount of deflection in the section, the remuneration granting means provides, on the condition that the conveyance speed of the first conveyance means is equal to or higher than the conveyance speed of the second conveyance means, A machine learning device that gives positive rewards .

A machine learning device,
The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The machine learning device communicates with a simulator that simulates the transport device,
The state quantity acquisition means acquires the state quantity based on the output from the simulator,
The reward giving means gives the reward based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the object at the same time, the distance between the first conveyance means and the second conveyance means among the plurality of sections is If a setting is made in which the deflection of the conveyed object is not allowed in the section, the remuneration granting means sets the condition that the conveying speed of the first conveying means is equal to or less than the conveying speed of the second conveying means. A machine learning device that gives positive rewards to people .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The learning means performs machine learning to update the value of the parameter of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is stiffness,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the conveyed object at the same time, the remuneration giving means is configured such that the rigidity of the conveyed object is equal to or higher than a predetermined value and A machine learning device that provides a positive reward on the condition that the transport speed of the first transport means and the transport speed of the second transport means are the same .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The learning means performs machine learning to update the value of the parameter of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is stiffness,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
The first conveyance means and the second conveyance means convey the object to be conveyed at the same time, and the distance between the first conveyance means and the second conveyance means among the plurality of sections is In the section, when a setting is made to allow a predetermined amount of deflection, the reward giving means is configured to allow the stiffness of the conveyed object to be less than a predetermined value, and the first conveying means and the second conveying means A machine learning device that provides a positive reward when the state quantity in the interval between represents a deflection amount that is equal to or less than the predetermined deflection amount .

The conveying device includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyed object in a plurality of sections of the conveyance path of the conveyance device, and the conveyance device sequentially conveys the conveyed object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function representing the value of a set of parameters of each driving means that drives each of the conveying means for each set based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The learning means performs machine learning to update the value of the parameter of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is basis weight,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveying means and the second conveying means are simultaneously conveying the conveyed object, the remuneration granting means may be arranged such that the basis weight of the conveyed object is equal to or greater than a predetermined value, and the A machine learning device that provides a positive reward on the condition that the transport speed of the first transport means and the transport speed of the second transport means are the same .

The conveyance apparatus includes a state quantity acquisition means for acquiring a state quantity representing the amount of deflection or tension of the conveyance object in a plurality of sections of the conveyance path of the conveyance apparatus, and the conveyance apparatus sequentially conveys the conveyance target object by the plurality of conveyance means. sandwiching and transporting the object to be transported from upstream to downstream of the transport path,
Reward granting means for granting a reward based on the state quantity;
Learning means that performs machine learning to update an action value function that represents the value of a set of parameters of each driving means that drives each of the transport means, based on the reward;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; further comprising means;
The learning means performs machine learning to update the value of the parameter of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is basis weight,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
The first conveyance means and the second conveyance means convey the object to be conveyed at the same time, and the distance between the first conveyance means and the second conveyance means among the plurality of sections is When a setting is made to allow a predetermined amount of deflection in the section, the remuneration means may be configured to allow the first conveyance means and the second conveyance means when the basis weight of the conveyed object is less than a predetermined value. A machine learning device that provides a positive reward when the state amount in the interval between .

The action value function is a Q table,
The machine learning device according to any one of claims 8 to 14 , wherein the determining means determines one set from a plurality of sets based on the acquired state quantity and the Q table.

The state quantity acquisition means further acquires the state quantity when the conveyance means is driven based on the selected set of parameters,
The reward granting means further grants the reward based on the obtained state quantity,
The machine learning device according to any one of claims 1 to 19 , wherein the learning means further updates the action value function based on the given reward.

The machine learning device according to any one of claims 1 to 20 , wherein the set includes at least one of speed, drive timing, stop timing, shift timing, and drive current value.

The machine learning device according to any one of claims 1 to 21 , wherein the conveyed object is a sheet of paper.

The machine learning device according to any one of claims 1 to 21 , wherein the conveyed object is cloth.

The machine learning device according to any one of claims 1 to 23 , wherein the learning means performs machine learning to update the value of the parameter of each of the driving means based on the reward and the physical properties of the conveyed object. .

The machine learning device according to claim 24 , wherein the physical property is stiffness.

The machine learning device according to claim 24 , wherein the physical property is basis weight.

The machine learning device according to any one of claims 1 to 26 , wherein the conveyance means is a pair of rollers.

The machine learning device according to claim 1 , wherein the predetermined amount of deflection is a value less than the width of the conveyance path in the thickness direction of the conveyed object.

The machine learning device according to claim 1 or 8 , wherein the predetermined amount of deflection is a predetermined value larger than zero.

The machine learning device according to any one of claims 1 to 29 , wherein the state quantity acquisition means acquires the deflection amount using a simulation model of a mechanical sensor provided in the transport device.

The machine learning device according to any one of claims 1 to 29 , wherein the state quantity acquisition means acquires the deflection amount using a simulation model of an optical sensor provided in the transport device.

The state quantity acquisition means is
obtaining the length of the object to be transported based on the position of the object to be transported;
According to any one of claims 1 to 29 , when the reference length of the conveyed object is longer than the obtained length, the difference between the obtained length and the reference length is set as the amount of deflection. Machine learning device described.

The state quantity acquisition means is
Obtaining the load of the conveyance means using a simulation model of a load detection means provided in the conveyance apparatus,
The machine learning device according to any one of claims 1 to 29 , wherein the amount of tension is obtained based on the load.

The machine learning device according to any one of claims 1 to 29 , wherein the state quantity acquisition means acquires the amount of tension using a simulation model of an optical sensor provided in the transport device.

The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
The state quantity acquisition means is
obtaining the length of the object to be transported based on the position of the object to be transported;
If there is no difference between the acquired length and the reference length of the object to be transported, and the transport speed of the second transport means is equal to or higher than the transport speed of the first transport means, the object to be transported is pulled. The machine learning device according to any one of claims 1 to 29 , wherein the machine learning device determines that the state is the same.

According to any one of claims 1 to 35 , the machine learning is executed again when the conveyance speed of the conveyance object of the conveyance means is different from the conveyance speed set during the previous machine learning. Machine learning device described.

The machine learning device according to any one of claims 1 to 36 , wherein a control program for updating including the parameters as a result of the machine learning is transmitted to a controller that controls operation of the transport device.

A conveyance device comprising the machine learning device according to any one of claims 1 to 37 .

An image forming apparatus comprising the machine learning device according to claim 1 .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
If a setting is made to allow a predetermined amount of deflection in the section between the first conveying means and the second conveying means among the plurality of sections, the first conveying means and the second conveying means Machine learning, wherein when the state quantity in the section between the transport means and the conveyance means represents the deflection amount that is less than or equal to the predetermined deflection amount, the step of giving a reward based on the state quantity gives a positive reward . Method.

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In a section between the first conveying means and the second conveying means among the plurality of sections, if a setting is made that does not allow deflection of the conveyed object, the first conveying means and the second conveying means A machine learning method , wherein when the state quantity in the section between the second transport means and the second transport means represents the amount of pull, a positive reward is given in the step of giving a reward based on the state quantity .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of granting a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
In the step of providing a reward based on the state quantity, the reward is provided based on the state quantity and the state of the transport means,
A predetermined transport means among the plurality of transport means transports the transport objects one by one from a storage means storing a plurality of transport objects to the transport path,
Among the plurality of sections, the state quantity at a position before the rear end of the conveyance target reaches the predetermined conveyance means represents the amount of tension, and the rear end of the conveyance target reaches the predetermined conveyance means. If the predetermined transport means is stopped when passing through the transport means, a positive reward is given in the step of giving a reward based on the state quantity .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
In the step of providing the reward based on the state quantity, the reward is provided based on the state quantity and the state of the conveying means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In the section between the first conveying means and the second conveying means among the plurality of sections, the object to be conveyed is deflected, and the object to be conveyed in the conveying direction in the second conveying means is If the generation of force is not allowed, and the state quantity in the section between the first conveying means and the second conveying means does not represent either the amount of tension or the amount of deflection, A machine learning method , wherein a positive reward is given in the step of giving a reward based on the state amount .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
In the step of providing a reward based on the state quantity, the reward is provided based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
If it is possible to speed up the time for the object to be transported through the first transport means by transporting the object in a pulled state by the second transport means, the second transport means A machine learning method , wherein when the object is being transported in a pulled state, a positive reward is provided in the step of providing a reward based on the state quantity .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of granting a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
In the step of providing a reward based on the state quantity, the reward is provided based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the object at the same time, the distance between the first conveyance means and the second conveyance means among the plurality of sections is When a setting is made to allow a predetermined amount of deflection in the section, in the step of giving a reward based on the state quantity, the conveyance speed of the first conveyance means is equal to or higher than the conveyance speed of the second conveyance means. A machine learning method that gives positive rewards on the condition that .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of acquiring the state quantity based on the conveyance speed of the conveyance target by the conveyance means or the position of the conveyance target in the conveyance path,
In the step of providing a reward based on the state quantity, the reward is provided based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the object at the same time, the distance between the first conveyance means and the second conveyance means among the plurality of sections is If the setting is such that the deflection of the conveyed object is not allowed in the section, in the step of giving a reward based on the state quantity, the conveying speed of the first conveying means is set to be lower than the conveying speed of the second conveying means. A machine learning method that gives a positive reward on the condition that the speed is lower than or equal to the speed .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of granting a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of acquiring the state quantity based on the output from the simulator,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
If a setting is made to allow a predetermined amount of deflection in the section between the first conveying means and the second conveying means among the plurality of sections, the first conveying means and the second conveying means The step of awarding a reward based on the state quantity provides a positive reward when the state quantity represents the deflection amount that is less than or equal to the predetermined deflection amount in the section between the machine and the conveying means. How to learn.

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of granting a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of acquiring the state quantity based on the output from the simulator,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In a section between the first conveying means and the second conveying means among the plurality of sections, if a setting is made that does not allow deflection of the conveyed object, the first conveying means and the second conveying means A machine learning method , wherein when the state quantity in the section between the second transport means and the second transport means represents the amount of tension, a positive reward is given in the step of giving a reward based on the state quantity.

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of obtaining the state quantity based on the output from the simulator,
In the step of providing a reward based on the state quantity, the reward is provided based on the state quantity and the state of the transport means,
A predetermined transport means among the plurality of transport means transports the transport objects one by one from a storage means storing a plurality of transport objects to the transport path,
Among the plurality of sections, the state quantity at a position before the rear end of the conveyance target reaches the predetermined conveyance means represents the amount of tension, and the rear end of the conveyance target reaches the predetermined conveyance means. If the predetermined transport means is stopped when passing through the transport means, a positive reward is given in the step of giving a reward based on the state quantity .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of obtaining the state quantity based on the output from the simulator,
In the step of providing the reward based on the state quantity, the reward is provided based on the state quantity and the state of the conveying means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
In the section between the first conveying means and the second conveying means among the plurality of sections, the object to be conveyed is deflected, and the object to be conveyed in the conveying direction in the second conveying means is If the generation of force is not allowed, and the state quantity in the section between the first conveying means and the second conveying means does not represent either the amount of tension or the amount of deflection, A machine learning method , wherein a positive reward is given in the step of giving a reward based on the state amount .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of obtaining the state quantity based on the output from the simulator,
In the step of providing the reward based on the state quantity, the reward is provided based on the state quantity and the state of the conveying means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When it is possible to speed up the time for the object to be transported through the first transport means by transporting the object in a pulled state by the second transport means, the second transport means A machine learning method , in which a positive reward is provided in the step of providing a reward based on the state quantity when the object is being transported in a pulled state .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of obtaining the state quantity based on the output from the simulator,
In the step of providing the reward based on the state quantity, the reward is provided based on the state quantity and the state of the conveying means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the object to be transported is simultaneously transported by the first transport means and the second transport means, the area between the first transport means and the second transport means among the plurality of sections is When a setting is made to allow a predetermined amount of deflection in the section, in the step of giving a reward based on the state quantity, the conveyance speed of the first conveyance means is equal to or higher than the conveyance speed of the second conveyance means. A machine learning method that gives positive rewards on the condition that .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
communicating with a simulator simulating the transport device;
further comprising the step of obtaining the state quantity based on the output from the simulator,
In the step of providing a reward based on the state quantity, the reward is provided based on the state quantity and the state of the transport means,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveyance means and the second conveyance means are conveying the object at the same time, the distance between the first conveyance means and the second conveyance means among the plurality of sections is If the setting is such that the deflection of the conveyed object is not allowed in the section, in the step of giving a reward based on the state quantity, the conveying speed of the first conveying means is set to be lower than the conveying speed of the second conveying means. A machine learning method that gives a positive reward on the condition that the speed is lower than or equal to the speed .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of granting a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of performing machine learning for updating parameter values of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is stiffness,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the first conveying means and the second conveying means are simultaneously conveying the conveyed object, in the step of providing a reward based on the state quantity, the stiffness of the conveyed object is equal to or higher than a predetermined value. A machine learning method that provides a positive reward on the condition that the transport speed of the first transport means and the transport speed of the second transport means are the same .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of performing machine learning for updating parameter values of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is stiffness,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
The first conveyance means and the second conveyance means convey the object to be conveyed at the same time, and the distance between the first conveyance means and the second conveyance means among the plurality of sections is If a setting is made to allow a predetermined amount of deflection in the section, the stiffness of the object to be transported is less than a predetermined value, and the stiffness of the object in the section between the first transport means and the second transport means is A machine learning method , wherein when the amount of state represents an amount of deflection that is less than or equal to the predetermined amount of deflection, a positive reward is provided in the step of providing a reward based on the amount of state .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of granting a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of performing machine learning for updating parameter values of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is basis weight,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
When the object to be transported is simultaneously transported by the first transport means and the second transport means, in the step of providing a reward based on the state quantity, the basis weight of the object to be transported is a predetermined value. A machine learning method that provides a positive reward on the condition that the above is the case, and that the transport speed of the first transport means and the transport speed of the second transport means are the same .

a step of acquiring state quantities representing the amount of deflection or tension of the conveyed object in a plurality of sections of a conveying path of a conveying device, the conveying device sequentially holding the conveyed object by a plurality of conveying means; , transporting the object to be transported from upstream to downstream of the transport path,
a step of providing a reward based on the state quantity;
updating, based on the reward, an action value function that represents the value of a set of parameters of each drive means that drives each of the transport means for each set;
determining one of the plurality of sets based on the updated action value function, and instructing the driving means to drive the conveying means with the parameters of the determined set; and,
further comprising the step of performing machine learning for updating parameter values of each of the driving means based on the reward and the physical properties of the conveyed object,
The physical property is basis weight,
The plurality of conveyance means includes a first conveyance means and a second conveyance means that is the next conveyance means downstream of the first conveyance means,
The first conveyance means and the second conveyance means convey the object to be conveyed at the same time, and the distance between the first conveyance means and the second conveyance means among the plurality of sections is If a setting is made to allow a predetermined amount of deflection in the section, the basis weight of the object to be transported is less than a predetermined value, and in the section between the first transport means and the second transport means. A machine learning method , wherein when the amount of state represents a deflection amount that is less than or equal to the predetermined amount of deflection, a positive reward is provided in the step of providing a reward based on the amount of state .

A program that causes a processor of a computer to execute each step according to any one of claims 40 to 57 .