CN113515092A

CN113515092A - Job assisting apparatus, system, method, and computer-readable medium

Info

Publication number: CN113515092A
Application number: CN202010274178.0A
Authority: CN
Inventors: 小泽英之; 张史龙; 付磊; 本村昭浩
Original assignee: Mitsubishi Electric Automation China Co ltd
Current assignee: Mitsubishi Electric Automation China Co ltd
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2021-10-19

Abstract

The invention provides an operation assisting device, system, method and computer readable medium for assisting production operation executed by an operator by using an operation platform, comprising: acquiring individual characteristic information at least comprising one of the operation proficiency, the operation capacity, the physical characteristics and the training condition of an operator; acquiring state information at least comprising operation content and an operation method; determining and executing the next-step operating platform instruction by using a strategy algorithm of reinforcement learning based on the value function according to the state information and the individual characteristic information; acquiring judgment information at least including work quality, work cost, work time and actions of workers from the workbench after the next workbench instruction is executed; and comparing the judgment information with the threshold value of each judgment information, and updating and providing the value function according to the reward calculated based on the comparison result, the individual characteristic information and the state information after the next operation platform instruction is executed.

Description

Job assisting apparatus, system, method, and computer-readable medium

Technical Field

The present invention relates to a work support device, a work support system, a work support method, and a computer readable medium, and more particularly, to a work support device that supports a production work performed by different workers on a work table, a work support system including the work support device, a work support method for the work support device, and a computer readable medium storing a program that causes the work support method to be performed.

Background

In the manufacturing industry, in order to improve work efficiency and reduce training time, the following technologies have been proposed: the special software is used for setting the operation instruction book for the operator on the operation platform, the operator carries out production operation according to the instruction of the operation instruction book, and the operator is guided by using various modes such as pictures, animations, sounds, LED lamps and the like, so that the operation efficiency is improved, and the training time of a new person is shortened. In addition, in order to prevent material taking errors and improve product quality, the following technologies are proposed in the past: the special LED mistake-proofing module is used for prompting the use of parts and tools, and visual and torque automatic detection is used for preventing the assembly error of parts or unqualified screw tightening torque, so that the product quality is improved.

However, when a plurality of workers perform a production operation using a non-fully-automatic work table including a plurality of processing machines according to a predetermined project plan, it is necessary to give appropriate work table instructions to the plurality of workers.

In order to solve the above problems, the following solutions are proposed: next work candidates to be executed by an operator are extracted based on current states of a plurality of processing machines and a project plan defining an execution order of processes of the plurality of processing machines and a work by the operator, and a next work for a certain operator is selected from the next work candidates based on a work capacity for each work by each operator (see patent document 1).

Documents of the prior art

Patent document

Patent document 1: japanese patent laid-open publication No. 2017-199150

Disclosure of Invention

Technical problem to be solved by the invention

However, each operator has its own unique individual characteristics, in other words, the number of individual characteristics of the operator is infinite. Therefore, it is generally necessary to take a lot of time and effort to understand the work platform instruction suitable for the individual characteristics of each operator, and even if the scheme of patent document 1 is applied, it is difficult to provide the most suitable work platform instruction for each operator.

In addition, when the work content, the work method, the work equipment, and the like are optimized according to the individual characteristics of the worker and each kind of the produced product, it takes a lot of time even for an experienced technician. Moreover, due to differences in the technical capabilities of the technicians, for example, large differences may result in work station indications that are determined by different technicians to be appropriate for certain individual characteristics of the operators in the same work step.

In addition, when determining the work table instruction, it is important to derive work content and work method that can suppress the cost for a certain individual characteristic of the worker from the viewpoint of the production cost of mass production of the production line. However, there are the following problems: even for a skilled technician, it is difficult to derive a work bench instruction that can suppress an increase in cost while maintaining high product quality for a certain individual characteristic of the operator.

The present invention has been made in view of the above problems, and it is an object of the present invention to provide a work support device, a work support system including the work support device, a work support method for the work support device, and a computer-readable medium storing a program for causing the work support method to be executed, which can optimize work contents, work methods, work equipment, and the like, shorten work time, reduce work cost, and improve work quality by appropriately adjusting a work bench instruction for each worker in a production line for manufacturing a product by a plurality of works.

Technical scheme for solving technical problem

In order to solve the above problem, a work assisting apparatus according to a first aspect of the present invention assists a production work performed by a worker on a work table, the work assisting apparatus including: an individual characteristic information acquisition unit that acquires, from the work table, individual characteristic information including at least one of the operating skill level, the work ability, the physical characteristics, and the training situation of the worker; a status information acquisition unit that acquires status information including at least job content and job method from the work table; an instruction adjustment unit that determines a next stage instruction by a reinforcement learning policy algorithm based on the status information from the status information acquisition unit and the individual feature information from the individual feature information acquisition unit based on a cost function, and executes the next stage instruction to the operator or the stage; a determination information acquisition unit that acquires determination information including at least work quality, work cost, work time, and the operation of the worker from the work table after the next work table instruction is executed; and a learning unit that compares the determination information from the determination information acquisition unit with each of the threshold values of the determination information, calculates a reward based on a result of the comparison, updates the cost function based on the calculated reward, the individual feature information, and the status information acquired by the status information acquisition unit after the instruction of the work station that has performed the next step, and provides the updated cost function to the instruction adjustment unit.

In order to solve the above problem, a work support system according to a second aspect of the present invention includes: the work assisting device according to the first aspect of the present invention; an information acquisition unit including at least an electric screwdriver and a camera provided on the work table, the electric screwdriver being configured to acquire information related to the work time, and the camera being configured to acquire information related to the work quality and bone pattern information related to the movement of the worker; and a communication unit that transmits the information acquired by the information acquisition unit to the work assistance device by a network communication technology including at least a 4G technology or a 5G technology.

In order to solve the above problem, a work support method according to a third aspect of the present invention is a work support method for supporting a production work performed by a worker on a work table, the work support method including: an individual characteristic information acquisition step of acquiring, from the work table, individual characteristic information including at least one of an operation skill level, a work ability, a physical characteristic, and a training situation of the worker; a status information acquisition step of acquiring status information including at least job content and a job method from the work table; a work table instruction adjustment step of determining a work table instruction of a next step by using a reinforcement learning policy algorithm based on the state information acquired in the state information acquisition step and the individual feature information acquired in the individual feature information acquisition step, based on a cost function, and executing the work table instruction of the next step on the worker or the work table; a determination information acquisition step of acquiring determination information including at least work quality, work cost, work time, and the operation of the worker from the work bench after the next work bench instruction is executed; and a learning step of comparing the judgment information acquired in the judgment information acquisition step with a threshold value of each of the judgment information, respectively, calculating a reward based on a result of the comparison, updating the cost function based on the calculated reward, the individual feature information, and the status information acquired in the status information acquisition step after the next stage instruction is executed, and using the updated cost function in the stage instruction adjustment step.

In order to solve the above problem, a computer-readable medium according to a fourth aspect of the present invention stores a program for executing the work support method according to the third aspect of the present invention.

Effects of the invention

According to the work support device, the work support system including the work support device, the work support method used for the work support device, and the computer-readable medium storing the program for causing the computer to execute the work support method of the present invention, in a production line for manufacturing a product by a plurality of works, a work table instruction can be appropriately adjusted for each worker, so that work content, a work method, a work equipment, and the like can be optimized, work time can be shortened, work cost can be reduced, and work quality can be improved.

Drawings

Fig. 1 is a block diagram showing a configuration of a work support system according to embodiment 1 of the present invention.

Fig. 2 is a block diagram for explaining a configuration example of the communication unit.

Fig. 3 is a schematic diagram for explaining a mode in which the individual characteristic information acquisition unit acquires the individual characteristic information.

Fig. 4 is a schematic diagram for explaining a method of setting the weight by the weight setting unit.

Fig. 5 is a block diagram showing a specific configuration of the initial instruction information providing unit.

Fig. 6 is a schematic diagram for explaining an operation mode of the initial instruction information providing unit.

Fig. 7 is a schematic diagram for explaining the initial stage and the adjusted indication information.

Fig. 8 is a diagram for explaining the operation rules/rules stored in the instruction information rule/rule providing unit.

Fig. 9 is an example of randomly selected job contents and job methods in the policy algorithm.

FIG. 10 is another example of randomly selected job content, job method in a policy algorithm.

Fig. 11 is a signal flow diagram for explaining the operation of the work assisting apparatus.

Fig. 12 is an overall flowchart for explaining the learning process of the work assisting apparatus.

Fig. 13 is a flowchart of the reward calculation process in fig. 12.

Fig. 14 is a schematic diagram for explaining the work support system according to embodiment 2 of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the drawings.

Embodiment 1.

Fig. 1 is a block diagram showing a configuration of a work support system according to embodiment 1. As shown in fig. 1, the work assisting system includes a work assisting apparatus 100, a work table 200, and a communication unit 300.

The work table 200 includes an information acquisition unit 10 and a work table instruction unit 11.

As the information acquisition section 10, for example, an information collection device such as a camera, a thermal imager, a power screwdriver, and various sensors having similar resolutions, an ID card reader, a keyboard, and a mouse, or the like can be cited. The information acquiring unit 10 can acquire various information on the operator who performs work on the work table 200, various information on the work process, and various information on the product in real time. As an example, the information on the work time may be acquired from the operation time of the electric screwdriver, for example. Further, information on the work quality may be acquired using a product image or the like captured by the camera, or bone line information or the like related to the movement of the operator may be acquired using the camera.

The work table instruction unit 11 may be, for example, a display for displaying a work instruction sheet or an instruction animation, a speaker for emitting a warning sound, an output device such as an LED for emitting an instruction light, or the like. The operator can perform work under the guidance of the work table instruction unit 11, thereby improving work efficiency and preventing errors. The work table instructing unit 11 may be an automatic actuator attached to the work table 200, and may cause the work table 200 to automatically perform work in accordance with the instruction.

The work table 200 is connected to a work support apparatus 100 described later via a communication unit 300. The communication unit 300 may be connected by a wired method such as a communication cable, or may be connected by a wireless method such as a network communication technology including a 4G technology or a 5G technology. A configuration example in which the communication unit 300 is realized by a network communication technique will be described below with reference to fig. 2.

Fig. 2 is a block diagram for explaining a configuration example of the communication unit 300. As shown in fig. 2, for example, information collected by SD/HD/2K/4K/8K and cameras with similar resolutions or operation commands from a PLC (Programmable Logic Controller) on site may be transmitted to a cloud or an external system via an edge computer (edge computer) and a router by using a known 4G or 5G communication technology, and information may be read from the cloud or the external system by a not-shown work assistance device and used. In addition, the work support device, not shown, may transmit the work table command to the cloud or the external system, the communication unit 300 may read the work table command from the cloud or the external system, and the shooting time or the shooting object of the SD/HD/2K/4K/8K or the camera having a similar resolution may be controlled or the command may be transmitted to the PLC through the router and the edge computer by using the 4G or 5G communication technology.

Returning to fig. 1, the work assisting apparatus 100 includes an individual characteristic information acquiring unit 1, a state information acquiring unit 2, an instruction adjusting unit 3, a judgment information acquiring unit 4, and a learning unit 5, and assists the production work performed by the worker using the work table 200. Next, a specific configuration of the work assisting apparatus 100 will be described.

The individual characteristic information acquisition unit 1 acquires individual characteristic information from the information acquisition unit 10 of the work table 200 via the communication unit 300. The individual characteristic information here refers to information related to the individual characteristic of the operator, and includes, for example, the operation skill level, the work ability, the physical characteristics, the training situation, and the like of the operator. As to a specific manner in which the individual characteristic information acquisition section 1 acquires the individual characteristic information, description will be made below.

The status information acquiring unit 2 acquires status information from the information acquiring unit 10 of the work table 200 via the communication unit 300. Here, the status information is information related to the production work performed by the worker using the work table 200, and includes, for example, work content, a work method, and the like. Specific examples of the job content and the job method will be described below.

The instruction adjusting unit 3 is for determining the next stage instruction, and includes an instruction output unit 31, an instruction determining unit 32, and a cost function storage unit 33. The merit function storage unit 33 is connected to a merit function update unit 52 of the learning unit 5 described later, and acquires the updated merit function from the merit function update unit 52. The instruction determining unit 32 is connected to the individual characteristic information acquiring unit 1 and the status information acquiring unit 2, respectively, and acquires individual characteristic information from the individual characteristic information acquiring unit 1 and status information from the status information acquiring unit 2. The instruction determination unit 32 acquires the cost function stored in the cost function storage unit 33, and determines the next stage instruction using a strategy algorithm of reinforcement learning. The instruction output unit 31 outputs the next stage instruction acquired from the instruction determination unit 32 to the stage instruction unit 11 of the stage 200 via the communication unit 300, and the operator performs the next stage instruction based on the guidance of the stage instruction unit 11 or the stage 200 performs the operation directly based on the instruction of the stage instruction unit 11.

The determination information acquiring unit 4 is connected to the information acquiring unit 10 via the communication unit 300, and acquires the determination information from the information acquiring unit 10 of the work table 200 after the work table 200 or the worker instructs the work table for the next step. Here, the judgment information is information for judging the work result, and includes, for example, work quality, work cost, work time, and operation of the worker.

The learning unit 5 is for learning a value function, and includes a reward calculation unit 51 and a value function update unit 52. The reward calculation unit 51 acquires the determination information from the determination information acquisition unit 4, compares each piece of the acquired determination information with a threshold value of each piece of the determination information, and calculates a reward based on the comparison result. As for a specific method of the reward calculation, it will be explained below. The merit function update unit 52 acquires the calculated reward from the reward calculation unit 51, acquires the individual characteristic information and the status information instructed by the next stage from the individual characteristic information acquisition unit 1 and the status information acquisition unit 2, respectively, and updates the merit function based on the acquired information. The cost function update unit 52 is also connected to the cost function storage unit 33 of the instruction adjustment unit 3, and supplies and stores the updated cost function to the cost function storage unit 33.

In addition, any learning algorithm may be used as a method for updating the cost function by the cost function updating unit 52. As an example, for example, a case where Reinforcement Learning (Reinforcement Learning) is applied is cited. Reinforcement learning is the determination of an action to be taken by an agent (agent) in an environment by observing the current state. The agent obtains a reward from the environment by selecting an action, and learns a countermeasure for obtaining the most reward by a series of actions. As representative methods of reinforcement learning, Q-learning (Q-learning) and TD-learning (time difference learning) are known. For example, in the case of Q learning, a general update (action value table) of the action value function Q (s, a) is represented by equation 1.

[ mathematical formula 1]

In mathematical formula 1, s_tIndicates the state at time t, a_tIndicating the action at time t. Due to action a_tThe state is changed to s_t+1。r_t+1Denotes a reward obtained by the change of the state, γ denotes a discount rate, and α denotes a learning coefficient. Here, when Q learning is applied, the instruction determination unit 32 determines the next operation to be performedStation indicates to perform action a_tIs indicated.

In the update shown in equation 1, if the behavior value of the optimal action a at time t +1 is greater than the action value Q of the action a executed at time t, the action value Q at time t is increased, and conversely, the action value Q at time t is decreased. In other words, the action a at the time t_tThe action value function Q (s, a) is updated so that the action value Q(s) approaches the optimum action value at the time t + 1. Thus, the best action value in a certain environment is propagated to the action values in its previous environment in turn.

The manner in which the individual characteristic information acquiring unit 1 acquires the individual characteristic information will be described in detail below with reference to fig. 3.

Fig. 3 is a schematic diagram for explaining a mode in which the individual characteristic information acquiring unit 1 acquires individual characteristic information. As shown in fig. 3, the information acquisition unit 10 may recognize the identity of the operator who operates the work table 200 by an information input device such as a camera, and for example, update the feature values such as the height and arm length of the operator by the camera, or perform face recognition. Alternatively, the identity of the operator may be recognized by reading information such as a two-dimensional code in an identification card of the operator using a card reader. As shown in fig. 3, the following two methods are examples of the method for acquiring the individual characteristic information of the operator: directly storing the individual states of the operators such as operation proficiency, operation capacity, physical characteristics, training conditions and the like in the identity recognition cards of the operators, and directly reading the individual states from the identity recognition cards by an information acquisition part 10 on the side of an operation platform 200; the information acquiring unit 10 recognizes the identity of the operator to obtain an operator ID (identification number of the operator) and then retrieves and acquires individual characteristic information from, for example, the individual characteristic information storage unit 12 provided in the work table 200 using the obtained operator ID as a keyword.

With the above configuration, the individual characteristic information of the operator who performs the production work on the work table 200 can be acquired in real time, and the subsequent instruction adjustment unit 3 can determine the work table instruction suitable for the operator based on the individual characteristic of the operator, thereby improving the production efficiency.

As shown in fig. 1, the work assisting apparatus 100 may further include a weight setting unit 6. The weight setting unit 6 is connected to the reward calculation unit 51 of the learning unit 5, and the reward calculation unit 51 changes the amount of increase or decrease of the reward corresponding to each determination information based on the weight of each determination information set by the weight setting unit 6. The way in which the weight setting unit 6 sets the weight will be described in detail below with reference to fig. 4.

Fig. 4 is a schematic diagram for explaining a method of setting the weight in the weight setting unit 6. As shown in fig. 4, the weight setting unit 6 may acquire the determination information from the determination information acquiring unit 4 as weight determination data. The weight setting unit 6 may receive a user command as weight determination data via an input device, not shown. Upon receiving the weight determination data, the weight setting unit 6 sets a weight for each determination information based on the weight determination data.

As example 1 in which a weight is set to each piece of judgment information based on the piece of judgment information, for example, the following method can be adopted: the degree of improvement or deterioration of at least one of the work quality, the work cost, the work time, and the operator's action is determined based on the determination information, and the corresponding weight is determined based on the degree of improvement or deterioration. Specifically, for example, in the aspect of the work quality, when the failure rate, the number of pauses due to temporary failures, the number of device failures, and the like are decreased or the straight-line rate is increased, the weight relating to the work quality is increased according to the degree of decrease or increase of the failure rate, the number of pauses due to temporary failures, the number of device failures, and the degree of increase of the straight-line rate is increased as the failure rate, the number of pauses due to temporary failures, and the number of device failures are decreased or the straight-line rate is increased. In addition, in terms of the work cost, if the manufacturing cost of the product is reduced, the weight relating to the work cost is increased according to the degree of reduction, and the amount of increase in the reward, that is, the weight is increased as the degree of reduction of the manufacturing cost is increased. In addition, in the aspect of the work time, when the work time is reduced, the weight relating to the work time is increased according to the degree of reduction, and the amount of increase in reward, that is, the weight is increased as the degree of reduction of the work time is increased. In addition, in terms of the operation of the operator, if the excessive operation of the operator is reduced, the weight related to the operation of the operator is increased according to the degree of reduction, and the larger the degree of reduction of the excessive operation is, the larger the weight is, the amount of increase and decrease of the reward. The above description has been given of the case where the weight increases as the degree of improvement in any one of the pieces of determination information increases. On the other hand, when any one of the pieces of judgment information is deteriorated, the weight may be increased as the degree of deterioration is increased.

In addition, as example 2 in which a weight is set to each piece of determination information based on the determination information, for example, the following method can be adopted: the upper and lower thresholds are preset for each piece of judgment information, and when the judgment information is within the upper and lower thresholds, a certain fixed weight is set, and when one of the judgment information is out of the upper and lower thresholds, the corresponding weight is determined according to the deviation amount.

In example 3, in which a weight is set to each piece of judgment information based on the piece of judgment information, for example, the following method can be adopted: a threshold value is set in advance for each piece of judgment information, and the weight setting unit 6 determines a corresponding weight based on the amount of deviation between each piece of judgment information and each corresponding threshold value. Specifically, for example, threshold values are set in advance for each piece of judgment information based on the state information, and the weight setting unit 6 sets the weight to be larger as the amount of deviation between the judgment information and the threshold values is larger. In addition, when the rate of change in the amount of deviation between the determination information and the threshold value is increased, the weight may be set based on the rate of change in the amount of deviation, and the weight setting unit 6 may set the weight to be larger as the rate of change in the amount of deviation is larger. Thus, when the amount of deviation increases in an acceleration manner, a larger weight can be set.

In addition, the weight setting manners of the above-described examples 2 and 3 may be combined, and the threshold value may be set in the vicinity of the upper limit within the range of the upper limit and the lower limit. Specifically, for example, since the working time varies greatly depending on the process, the working time may exceed the upper limit in the vicinity of the upper limit within the range between the upper limit and the lower limit. To avoid such a risk, it is preferable to set the threshold value at a position close to the upper limit above the lower limit. This stabilizes the working time in the vicinity of the lower limit within the range between the upper limit and the lower limit, and reduces the probability that the working time exceeds the upper limit. Thus, a plurality of examples 1 to 3 described above can be used in combination.

As an example of setting the weight to each judgment information based on a user instruction, as shown in fig. 3, the user may determine the weight to be given to each judgment information according to, for example, the supply and demand status of the product to be produced, and input the weight to the weight setting unit 6 through an input device not shown. For example, when the user determines that it is only necessary to compare one or several items of the determination information with the threshold value according to the supply and demand situation, the weight of the determination information that does not need to be compared may be set to 0. Accordingly, the user can concentrate attention on optimization in one or more aspects during the work process according to the needs of the user, thereby further reducing the workload of the work assisting device and improving the work assisting efficiency.

Then, the weight setting unit 6 sets a weight for each piece of judgment information, and stores each weight in the weight storage unit 61 to be used by the learning unit 5 in calculating the reward.

Further, as shown in fig. 1, the work assisting apparatus 100 may further include an initial instruction information providing portion 7. The initial instruction information providing unit 7 is connected to the instruction determining unit 32 of the instruction adjusting unit 3, and provides the instruction adjusting unit 3 with a work table instruction, which is an initial work table instruction to instruct the operator to execute in the initial stage of learning. Next, a specific configuration and operation of the initial instruction information providing unit 7 will be described in detail with reference to fig. 5 to 7.

Fig. 5 is a block diagram showing a specific configuration of the initial instruction information providing unit 7. As shown in fig. 5, the initial instruction information providing part 7 includes an initial work information providing part 71, an initial equipment/tool information providing part 72, and an initial education information providing part 73. The initial job information providing section 71 is used to provide information relating to a job instruction. The initial equipment/tool information providing unit 72 is used to provide information on the working equipment/tool. The initial education information providing part 73 is used to provide information relating to education of the worker.

Fig. 6 is a schematic diagram for explaining an operation mode of the initial instruction information providing unit 7, and fig. 7 is a schematic diagram for explaining the instruction information at an initial stage and after adjustment. As shown in fig. 6, three workers, i.e., a worker a, a worker B, and a worker C, perform work on the work table 200. Wherein, the physical characteristics of the operator A are ' right-handed, high height ', proficiency ' and ' high ' operational capability. In addition, the physical characteristics of worker B are "left-handed, short in height," beginner in proficiency, "and" low in work ability. In addition, the physical characteristics of worker C are "right-handed, medium height," middle skill, "and high work capacity.

First, as shown in fig. 6 ((r)), the initial work information providing unit 71, the initial equipment/tool information providing unit 72, and the initial education information providing unit 73 of the initial instruction information providing unit 7 provide the initial work instruction contents, the initial education instruction contents, and the initial equipment/tool creation/setting instruction contents to the instruction adjusting unit 3, respectively, after the learning starts. As an example of a method of determining the initial instruction information, for example, a production technician as a user may input the initial instruction information to the initial instruction information providing unit 7 by an input device not shown, or may determine the initial instruction information by referring to a work history up to now stored in the initial instruction information providing unit 7 in advance.

Next, as shown in fig. 6, the workers a to C perform the corresponding operations based on the initial instruction information which is transmitted from the instruction adjustment unit 3 to the work table 200 via the communication unit, not shown, and presented by the work table instruction unit 11. Fig. 7 shows an example of specific content of the indication information. The initial work information, which is an example of the initial work instruction content, is provided by the initial work information providing section 71, and includes ten work steps of the works 1 to 10, wherein the step marked with "o" is, for example, a button for instructing the operator to press the "completion confirmation button" to confirm the work, the step not marked with "o" is a step for not requiring the operator to press the "completion confirmation button", and the step marked with "standard" is a step of the work method a, which is a standard work method in the example described with reference to fig. 9 and 10 described later. Further, the inceptive device/tool information, which is an example of the contents of the instructions for creation/setting of the inceptive device/tool, is provided by the inceptive device/tool information providing section 72, and includes settings of the height, depth, and width of different devices/tools, wherein the item labeled "standard" is the standard setting value of the item. The initial education information, which is an example of the initial education instruction content, is provided by the initial education information providing unit 73, and includes education settings corresponding to ten work steps of the works 1 to 10, wherein the step marked "unfinished" means that the worker is not educated as to the step.

Then, as shown in fig. 6 c, the work station 200 transmits instruction information after the execution of the work, that is, the work actual result and the status information, the education actual result and the status information, and the device/tool creation/setting actual result and the status information as determination information, individual characteristic information, and status information to the learning unit 5 via the information acquisition unit 10 and a communication unit not shown, the learning unit 5 compares the determination information with each threshold value of the determination information, calculates a reward based on the comparison result, updates the price function based on the calculated reward, the individual characteristic information, and the status information, and provides the updated value function and reward to the instruction adjustment unit 3.

Thereafter, as shown by the fourth in fig. 6, the instruction adjustment unit 3 determines the adjusted instruction information, that is, the adjusted operation instruction content, the adjusted education instruction content, and the adjusted device/tool creation/setting instruction content in the figure, based on the updated cost function which is the learning result, based on the state information and the individual feature information, by using a strategy algorithm of reinforcement learning, that is, referring to the operation, education, and device/tool rules from the instruction information rule/rule providing unit 8 described later, and replaces the initial instruction information with the adjusted instruction information.

Thereafter, as indicated by the fifth step in fig. 6, the worker continues the production activity based on the adjusted instruction information. As shown in fig. 7, different adjusted work information is provided for each of the workers a to C, for example, because of the difference in individual characteristics of the workers, and the steps marked with "option B" and "option C" are the steps of the work method B and the work method C, which are the work methods of the option B and the option C, respectively, in the examples shown in fig. 9 and 10 described below. The operators a to C are provided with different pieces of post-adjustment equipment/tool information, wherein the items marked "high" and "low" indicate that the height and depth of the equipment/tool are set higher or lower than the standard set values, respectively. The worker a to C are provided with different adjusted education information, wherein the step marked with "unfinished/not required" is a step in which the worker is skilled and does not need to perform education, the step marked with "unfinished/required" is a step in which the worker is required for a beginner but does not perform education, and the step marked with "finished" is a step in which the worker is middle-class worker and has completed education.

As shown in fig. 6, the above-mentioned (c) - (c) are repeatedly performed, thereby forming a cycle for learning. After a plurality of cycles, the operator having different individual characteristics can be given different table instructions.

Further, as shown in fig. 1, the work assisting apparatus 100 may further include an instruction information rule/rule providing section 8. The instruction information rule/rule providing unit 8 is connected to the instruction determining unit 32 of the instruction adjusting unit 3, and stores the operation rules/rules when the operator performs the production operation using the work table 200. Next, an example of the operation rules/rules stored in the instruction information rule/rule providing unit 8 and the policy algorithm of the instruction adjusting unit will be described in detail with reference to fig. 8 to 10.

Fig. 8 is a diagram for explaining the job rules/rules stored in the instruction information rule/rule providing unit 8, fig. 9 is an example of the job contents and job methods randomly selected in the policy algorithm, and fig. 10 is another example of the job contents and job methods randomly selected in the policy algorithm. As shown in fig. 8, the instruction information principle/rule providing section 8 includes a work information principle/rule providing section 81, an apparatus/tool information principle/rule providing section 82, and an education information principle/rule providing section 83. The work information rule/rule providing unit 81, the equipment/tool information rule/rule providing unit 82, and the education information rule/rule providing unit 83 store a work rule/rule table, an equipment/tool rule/rule table, and an education rule/rule table, respectively. In the above, different rules/regulations such as "a basic number of operations to be performed simultaneously", "a distance to be performed to be shortened", "an operation to be performed easily", and "optimization according to physical characteristics" are defined for items such as "a rule/regulation regarding work" and "a rule/regulation regarding arrangement of equipment/tools". In addition, different principles/rules such as "differentiation according to work procedure", "differentiation according to work proficiency", and "physical characteristics" are defined for the "principles/rules regarding education". The instruction determining unit 32 may acquire the principle/rule tables shown in fig. 8 from the work information principle/rule providing unit 81, the equipment/tool information principle/rule providing unit 82, and the education information principle/rule providing unit 83 of the instruction information principle/rule providing unit 8, and may determine the next work bench instruction by associating instruction contents selected at random with specific principles/rules in the principle/rule tables based on the state information and optimizing the work contents, work sites, work equipment and tools, and worker education based on the individual characteristic information as a policy algorithm.

As an example of the policy algorithm, as shown in fig. 9, in the case of the box-packing work, specific work methods selected at random, for example, a standard work method a "all columns are packed from left to right", a work method B "as an option B" changes the work direction of box-packing every 1 column "and a work method C" as an option C "all columns are packed from right to left are associated with the rules/rules in the work principle/rule table, and the work method is changed based on the individual characteristic information of the worker, such as whether the worker is left-handed or right-handed, so that the work time, the work quality, and the like are optimized, and the work bench designation for the next step is determined, based on the work content and the work method in the status information. This enables the order of the boxing operation to be automatically specified according to the individual characteristics of the operator.

In addition, as another example of the policy algorithm, as shown in fig. 10, taking a screwing operation as an example, according to the operation contents and operation methods in the state information, specific operation methods selected at random, such as screwing screws in the order of (i) → of the standard operation method a, screwing screws in the order of (i) → of the option B, and screwing screws in the order of (i) → of the option C, are associated with the rules/rules in the operation principle/rule table, and the operation method is changed based on the individual characteristic information of the operator, such as whether the operator is a left-handed person or a right-handed person, so as to optimize the operation time, the operation quality, and the like, and the total 6 |, which is a multiplication of 6! The table indication of the most suitable next step is selected for the 720 schemes. Thus, after the screw fixing position is determined, the sequence of the screwing operation can be automatically specified according to the individual characteristics of the operator.

The above description has been made on the change of the steps, the change of the operation method, and the like in the same step, with respect to the type of the work table instruction, but the present invention is not limited to this. For example, the work bench instruction may be changed by modifying the content of the work instruction manual, re-educating the worker as shown in fig. 7, creating a dedicated device or tool, or the like. Alternatively, the various table instructions described above may be combined.

As shown in fig. 1, the work assisting apparatus 100 may further include a circulation control unit 9. The loop control unit 9 controls the loop operation of the individual characteristic information acquisition unit 1, the state information acquisition unit 2, the instruction adjustment unit 3, the judgment information acquisition unit 4, and the learning unit 5, and includes a step management unit 91 and a round management unit 92. The specific modes of the step management and the round management will be described together in the following description of the learning process.

Next, the operation principle of the work assisting apparatus 100 will be described with reference to fig. 11 to 13. Fig. 11 is a signal flow diagram for explaining the operation of the work assisting apparatus 100, fig. 12 is an overall flow diagram for explaining the learning process of the work assisting apparatus 100, and fig. 13 is a flow diagram of the reward calculation process in fig. 12.

As shown in fig. 11, the work bench 200 corresponds to an "environment" in reinforcement learning; the individual characteristic information acquisition unit 1, the state information acquisition unit 2, the instruction adjustment unit 3, the judgment information acquisition unit 4, and the learning unit 5 constitute an "agent" in reinforcement learning. The parameters used for learning are set by the weight setting unit 6, the initial instruction information providing unit 7, and the instruction information rule/rule providing unit 8.

As shown in fig. 11 and 12, after the learning of the work assistance is started, the instruction adjustment unit 3 first acquires an initial work bench instruction from the initial instruction information providing unit 7. The method for determining the initial stage indication is as described above. Next, a round start instruction is issued by the round management unit 92 and a step start instruction is issued by the step management unit 91 in the cycle control unit 9, not shown.

Next, the instruction adjustment unit 3 executes the initial work table instruction to the operator on the work table 200 or the work table 200. After the work table 200 executes the initial work table instruction, the individual characteristic information acquisition unit 1 acquires the individual characteristic information of the worker from the work table 200, the state information acquisition unit 2 acquires the state information of the next state from the work table 200, and the reward calculation unit 51 of the learning unit 5 calculates the reward based on the determination information acquired from the work table 200 by the determination information acquisition unit 4 after the initial work table instruction is executed. The specific flow of the reward calculation will be described below. Next, the merit function update unit 52 stores the reward calculated by the reward calculation unit 51 in a memory, not shown, as experience, updates the merit function based on the calculated reward, the individual characteristic information from the individual characteristic information acquisition unit 1, and the state information of the next state from the state information acquisition unit 2, and transmits the updated merit function to the merit function storage unit 33 in the instruction adjustment unit 3.

Then, the step management unit 91 determines whether or not the round end condition is satisfied. Here, the round end condition refers to a case where any one of the pieces of judgment information deviates from a preset condition value as a result of comparison by the reward calculation unit 51 of the learning unit 5, or a case where the number of steps of the cyclic operation during learning exceeds a predetermined upper limit of the number of steps. The case where the determination information deviates from the preset condition value means, for example, a case where the work quality is inferior to the preset work quality condition value, the work cost is higher than the preset work cost condition value, the work time is longer than the preset work time condition value, or the excessive operation in the operation of the operator is larger than the preset operator operation condition value.

When the step management unit 91 determines that the above-described turn end condition is not satisfied ("no" is the result of determination of the "round end condition satisfied"), the flow returns to the step start stage, and the instruction adjustment unit 3 determines the next stage instruction by the strategy algorithm of reinforcement learning based on the latest action cost function stored in the cost function storage unit 33, based on the individual feature information from the individual feature information acquisition unit 1 and the current state information from the state information acquisition unit 2, and executes the stage instruction to the operator or the work station 200, thereby starting the repetition of the above-described learning step.

On the other hand, when the step management unit 91 determines that the above-described round end condition is satisfied ("yes" determination result is satisfied. Here, the learning end condition is a case where the number of rounds of the cyclic operation during learning exceeds a predetermined upper limit of the number of rounds.

If the round management unit 92 determines that the learning end condition is not satisfied ("does learning end condition satisfied", determination result is no), the flow returns to the round start stage, and learning of the next round is resumed. In this case, the learning process of the above steps is repeatedly performed.

On the other hand, in a case where the round management part 92 determines that the learning end condition described above is satisfied ("yes" determination result is satisfied.

Next, a specific method for the instruction adjustment unit to determine the next stage instruction by using the reinforcement learning policy algorithm will be described. Here, the epsilon-greedy policy will be described as an example. In the learning initial stage, due to inexperience, it may not be possible to immediately judge the stage indication of the next step having the largest bid value. In this case, an action may be selected at random at a search rate of ∈ 0.1 by using an ∈ -greedy policy, and an action having the greatest value may be selected among the other actions selected by greedy. Then, ε is gradually decreased, eventually converging the algorithm. For example, in the learning initial stage, after various action conditions are preliminarily given initial values by the initial instruction information providing unit 7, the cycle may be started by changing at least one of the work content and the work method within a range according to the individual characteristics of the operator, and a combination of the work content, the work method, and the action corresponding to a certain individual characteristic and making the determination information optimal within a predetermined range may be learned. In addition to the epsilon-greedy strategy, other learning-intensive action strategy algorithms such as genetic algorithm may be used to develop learning by randomly selecting an action with a certain probability.

The following describes a method of reward calculation in the learning process of fig. 12. As shown in fig. 13, when the reward calculation is started, first, in step S101, the reward calculation unit 51 of the learning unit 5 acquires the determination information from the determination information acquisition unit 4, and proceeds to step S102. In step S102, the reward calculation unit 51 compares the determination information on the work quality in the determination information with a work quality threshold set in advance. In the present embodiment, the following description will be given taking the yield as an example of the judgment information on the job quality. When the percentage of pass is lower than the work quality threshold (step S102: low), the process proceeds to step S103, and the reward calculation unit 51 reduces the reward by the weight set by the weight setting unit 6 in the weight setting manner described above, for example, -10, and proceeds to step S106. When the yield is equal to the job quality threshold (step S102: same), the process proceeds to step S104, and the reward calculation section 51 keeps the reward unchanged, and proceeds to step S106. When the pass rate is higher than the job quality threshold (step S102: high), the process proceeds to step S105, and the reward calculation unit 51 increases the reward by the weight set by the weight setting unit 6 in the weight setting method described above, for example, by +10, and proceeds to step S106.

In step S106, the reward calculation unit 51 compares the determination information on the operation cost in the determination information with a preset operation cost threshold. When the operating cost is higher than the operating cost threshold (step S106: high), the process proceeds to step S107, and the reward calculation unit 51 reduces the reward by the weight set by the weight setting unit 6 by the weight setting method described above, for example, -3, and proceeds to step S110. When the job cost is equal to the job cost threshold (step S106: same), the process proceeds to step S108, and the reward calculation section 51 keeps the reward unchanged, and proceeds to step S110. When the work cost is lower than the work cost threshold (step S106: low), the process proceeds to step S109, and the reward calculation unit 51 increases the reward by the weight set by the weight setting unit 6 in the weight setting method described above, for example, by +3, and proceeds to step S110.

In step S110, the reward calculation unit 51 compares the determination information on the working time among the determination information with a preset working time threshold. When the operating time is longer than the operating time threshold (step S110: long), the process proceeds to step S111, and the reward calculation unit 51 reduces the reward by the weight set by the weight setting unit 6 using the weight setting method described above, for example, -5, and proceeds to step S114. When the working time is equal to the working time threshold (step S110: same), the process proceeds to step S112, and the reward calculation section 51 keeps the reward unchanged, and proceeds to step S114. When the operating time is shorter than the operating time threshold (step S110: short), the process proceeds to step S113, and the reward calculation unit 51 increases the reward by the weight set by the weight setting unit 6 using the weight setting method described above, for example, by +5, and proceeds to step S114.

In step S114, the reward calculation unit 51 compares the judgment information on the operator operation in the judgment information with a preset operator operation threshold. When the worker 'S action is increased by the extra action more than the worker' S action threshold (step S114: increase extra action), the process proceeds to step S115, and the reward calculation unit 51 reduces the reward by the weight set by the weight setting unit 6 in the above-described weight setting method, for example, -1, and then ends the reward calculation process, and sends the calculation result to the merit function update unit 52. When the number of operations of the worker does not change from the worker operation threshold value (step S114: the number of operations does not change), the process proceeds to step S116, and the reward calculation unit 51 keeps the reward unchanged, and then ends the reward calculation process, and sends the calculation result to the merit function update unit 52. When the worker movement is reduced by the extra movement from the worker movement threshold (step S114: reduction of extra movement), the process proceeds to step S117, and the reward calculation unit 51 increases the reward by the weight set by the weight setting unit 6 in the above-described weight setting method, for example, by +1, and then ends the reward calculation process, and sends the calculation result to the merit function update unit 52.

Although not shown in fig. 12 and 13, the weight setting step may be performed before the reward calculation is started. In this weight setting step, the weight setting unit 6 sets a weight for each piece of determination information in the various manners described above, and transmits the set weight to the reward calculation unit 51. Thus, in the reward calculation process, the reward calculation unit 51 can change the amount of increase or decrease of the reward corresponding to each determination information, based on the weight of each determination information set.

Embodiment 1 of the present invention has been described above. According to the work support device, the work support system provided with the work support device, and the work support method used for the work support device of the present embodiment, individual characteristic information including at least one of the skill level of an operator, the work ability, the physical characteristics, and the training condition is acquired, a next stage instruction is determined by a reinforcement learning policy algorithm based on the state information and the individual characteristic information, after the next stage instruction is executed, determination information including at least the work quality, the work cost, the work time, and the operator's movement is acquired, the determination information is compared with each threshold value of the determination information, a reward is calculated based on the comparison result, and based on the calculated reward, the individual characteristic information, and the state information after the next stage instruction is executed, by updating the cost function, the contents of the work table instructions can be optimized according to the individual characteristics of the operator, and the work table instructions suitable for a certain individual characteristic of the operator can be determined without depending on the experience of the technician who has a difference in technical level, experience, and the like.

Further, since the weight is set for each piece of judgment information and the amount of increase or decrease of the reward corresponding to each piece of judgment information is changed based on the weight of each piece of judgment information, the work quality, the work cost, the work time, and the degree of influence of each of the operator actions on the work process in the actual work process, or the different requirements made by the user on the work quality, the work cost, the work time, and the operator actions, can be appropriately adjusted, and therefore, the work content, the work method, the work equipment, and the like can be further optimized, the work time can be shortened, the work cost can be reduced, and the work quality can be improved.

Embodiment 2.

Fig. 14 is a schematic diagram for explaining the work support system according to embodiment 2. The work support system of embodiment 2 is different from the work support system of embodiment 1 in that it includes a plurality of work tables, and the instruction adjustment unit 3 'of the work support device 100' transmits different work table instructions to the respective work tables. Although not shown in the drawings, other configurations of the work assisting apparatus 100', the configuration of the communication unit 300, and the configurations of the work tables are the same as those of embodiment 1, and a description thereof is omitted here.

As shown in fig. 14, the work support system according to embodiment 2 includes, for example, 3 work tables a to C, and the instruction adjustment unit 3' transmits, for example, a work procedure #1 of a certain process to the work table a, transmits another work procedure #2 of the same process to the work table B, and transmits another work procedure #3 of the same process to the work table C, so that the work tables a to C learn different work methods of the same process. This can further shorten the time required for learning and improve the learning efficiency.

In embodiment 2, an example of 3 work tables is given, but the present invention is not limited to this, and a plurality of work tables of 2 or 4 or more may be used for learning.

In addition, although the case where the work assisting apparatus according to each embodiment of the present invention is realized by hardware has been described above, the present invention is not limited to this. The job assisting method of the present invention may also be implemented by software, or may be implemented by a combination of software and hardware. Further, the program for executing the work support method of the present invention may be stored in various computer-readable media, loaded into a CPU or the like, and executed when necessary. The computer-readable medium is not particularly limited, and examples thereof include optical disks such as HDD, CD-ROM, CD-R, MO, MD, and DVD, IC cards, flexible disks, and semiconductor memories such as mask ROM, EPROM, EEPROM, and flash ROM.

It should be noted that all aspects of the embodiments disclosed herein are merely exemplary and not restrictive. The scope of the present invention is indicated by the appended claims, rather than the foregoing embodiments, and all changes and modifications that come within the meaning and range of equivalency of the claims are intended to be embraced therein.

Industrial applicability of the invention

The work support device, system, method, and computer-readable medium according to the present invention are useful for supporting manual work on a non-fully automatic work table in various industries.

Description of the reference symbols

1 individual characteristic information acquisition unit

2 status information acquisition unit

3 instruction adjustment unit

3' indication adjusting part

4 judging information acquiring part

5 learning part

6 weight setting unit

7 initial instruction information providing part

8 instruction information rule/rule providing section

9 cycle control part

10 information acquisition unit

11 work table indication part

12 individual characteristic information storage unit

31 instruction output unit

32 instruction determination unit

33 cost function storage unit

51 reward calculating part

52 value function update unit

61 weight storage unit

91 step management part

92 round management part

100 operation assisting device

100' operation auxiliary device

200 working table

300 a communication section.

Claims

1. A work support device that supports a production work performed by an operator using a work table, comprising:

an individual characteristic information acquisition unit that acquires, from the work table, individual characteristic information including at least one of the operating skill level, the work ability, the physical characteristics, and the training situation of the worker;

a status information acquisition unit that acquires status information including at least job content and job method from the work table;

an instruction adjustment unit that determines a next stage instruction by a reinforcement learning policy algorithm based on the status information from the status information acquisition unit and the individual feature information from the individual feature information acquisition unit based on a cost function, and executes the next stage instruction to the operator or the stage;

a determination information acquisition unit that acquires determination information including at least work quality, work cost, work time, and the operation of the worker from the work table after the next work table instruction is executed; and

and a learning unit that compares the determination information from the determination information acquisition unit with each of the threshold values of the determination information, calculates a reward based on a result of the comparison, updates the cost function based on the calculated reward, the individual feature information, and the status information acquired by the status information acquisition unit after the instruction of the work station that has performed the next step is performed, and provides the updated cost function to the instruction adjustment unit.

2. The work assisting device according to claim 1,

the individual characteristic information is stored in an identification card of the operator using the work table, or retrieved from an individual characteristic information storage unit using an identification number of the operator using the work table as a keyword.

3. The work assisting device according to claim 1,

further comprising a weight setting unit for setting a weight for each of the determination information,

the learning unit changes an amount of increase or decrease of the reward corresponding to each of the determination information based on the weight of each of the determination information set by the weight setting unit.

4. The work assist device according to claim 3,

the weight setting unit sets the weight to the determination information based on the determination information from the determination information acquiring unit.

5. The work assist device according to claim 3,

the weight setting unit sets the weight to the determination information based on a user instruction.

6. The work assisting device according to any one of claims 1 to 5,

further comprising an initial instruction information providing unit for determining an initial stage instruction based on an input by a user or a job history stored in the initial instruction information providing unit and providing the initial stage instruction to the instruction adjusting unit,

the instruction adjustment unit executes the initial work table instruction to the worker or the work table,

the determination information acquiring unit acquires the determination information from the work table after the initial work table instruction is executed,

the learning unit compares the determination information from the determination information acquisition unit with each of the threshold values of the determination information, calculates a reward based on a result of the comparison, updates the cost function based on the calculated reward, the individual feature information, and the status information acquired by the status information acquisition unit after the initial stage instruction is performed, and provides the updated cost function to the instruction adjustment unit.

7. The work assisting device according to any one of claims 1 to 5,

further comprises an instruction information rule/rule providing part for storing an operation rule/rule when the operator performs the production operation by using the operation table,

the instruction adjustment unit acquires the work rule/rule from the instruction information rule/rule providing unit, associates the work rule/rule with the work content and the work method selected at random according to the status information as the policy algorithm, and optimizes the work content, the work method, the work place, the work equipment and tools, and the work staff education according to the individual characteristic information to determine the next work bench instruction.

8. The work assisting device according to any one of claims 1 to 5,

further comprises a cycle control section for controlling the cycle operation of the state information acquisition section, the individual characteristic information acquisition section, the instruction adjustment section, the judgment information acquisition section, and the learning section, and comprises a step management section and a round management section,

the step management unit ends the cycle of one round and advances to the next round when the comparison result of the learning unit indicates that any one of the determination information is deviated from a predetermined condition value or when the number of steps of the cyclic operation exceeds a predetermined upper limit of the number of steps,

the round management unit ends learning when the number of rounds of the cyclic operation exceeds a predetermined upper limit.

9. The work assisting device according to any one of claims 1 to 5,

the type of the work table instruction includes any one or a combination of plural steps of changing steps in the same process, modifying a work instruction manual, implementing re-education of an operator, and creating and using a dedicated device/tool.

10. A work assistance system, comprising:

the work assisting device according to any one of claims 1 to 9;

an information acquisition unit including at least an electric screwdriver and a camera provided on the work table, the electric screwdriver being configured to acquire information related to the work time, and the camera being configured to acquire information related to the work quality and bone pattern information related to the movement of the worker; and

a communication unit that transmits the information acquired by the information acquisition unit to the work assistance apparatus by a network communication technology including at least a 4G technology or a 5G technology.

11. The work assist system of claim 10,

comprises a plurality of the working tables,

the plurality of work tables learn different work methods for the same process.

12. An operation support method for supporting a production operation performed by an operator using a work table, comprising:

an individual characteristic information acquisition step of acquiring, from the work table, individual characteristic information including at least one of an operation skill level, a work ability, a physical characteristic, and a training situation of the worker;

a status information acquisition step of acquiring status information including at least job content and a job method from the work table;

a work table instruction adjustment step of determining a work table instruction of a next step by using a reinforcement learning policy algorithm based on the state information acquired in the state information acquisition step and the individual feature information acquired in the individual feature information acquisition step, based on a cost function, and executing the work table instruction of the next step on the worker or the work table;

a determination information acquisition step of acquiring determination information including at least work quality, work cost, work time, and the operation of the worker from the work bench after the next work bench instruction is executed; and

a learning step of comparing the judgment information acquired in the judgment information acquisition step with a threshold value of each of the judgment information, respectively, calculating a reward based on a comparison result, updating the cost function based on the calculated reward, the individual feature information, and the status information acquired in the status information acquisition step after the next stage instruction is executed, and using the updated cost function in the stage instruction adjustment step.

13. The work assisting method according to claim 12,

14. The work assisting method according to claim 12,

further comprising a weight setting step of setting a weight for each of the judgment information,

in the learning step, an amount of increase or decrease of the reward corresponding to each of the determination information is changed in accordance with the weight of each of the determination information set in the weight setting step.

15. The work assisting method according to claim 14,

in the weight setting step, the weight is set to the determination information based on the determination information acquired in the determination information acquisition step.

16. The work assisting method according to claim 14,

in the weight setting step, the weight is set to the determination information based on a user instruction.

17. The work assisting method according to any one of claims 12 to 16,

further comprising an initial work information providing step of determining an initial work table instruction based on a user input or a stored work history,

in the work table instruction adjustment step, the initial work table instruction is executed to the worker or the work table,

in the judgment information acquisition step, the judgment information is acquired from the work table after the initial work table instruction is executed,

in the learning step, the judgment information acquired in the judgment information acquisition step is compared with each of the threshold values of the judgment information, a reward is calculated based on a result of the comparison, the cost function is updated based on the calculated reward, the individual characteristic information, and the status information acquired in the status information acquisition step after the initial workbench instruction is executed, and the updated cost function is used in the workbench instruction adjustment step.

18. The work assisting method according to any one of claims 12 to 16,

in the work bench designation adjustment step, the work rule/rule is acquired from a designation information rule/rule providing unit that stores a work rule/rule when the worker performs the production work on the work bench, the work rule/rule is associated with a work content and a work method selected at random according to the state information as the policy algorithm, and work content, a work method, a work place, work equipment and tools, and worker education are optimized according to the individual characteristic information, thereby determining the work bench designation of the next step.

19. The work assisting method according to any one of claims 12 to 16,

further comprising a loop control step of controlling loop operations of the state information acquisition step, the individual characteristic information acquisition step, the work table instruction adjustment step, the judgment information acquisition step, and the learning step, including a step management step and a round management step,

in the step management step, when the comparison result in the learning step is that any one of the determination information is deviated from a predetermined condition value or when the number of steps of the cyclic operation exceeds a predetermined upper limit of the number of steps, the cycle of one round is ended and the next round is entered,

in the round managing step, in a case where the number of rounds of the cyclic action exceeds a predetermined upper limit of the number of rounds, the learning is caused to end.

20. The work assisting method according to any one of claims 12 to 16,

21. A computer-readable medium storing a program for executing the work assisting method according to any one of claims 12 to 20.