WO2021192280A1

WO2021192280A1 - Learning device and inference device for air-conditioning control

Info

Publication number: WO2021192280A1
Application number: PCT/JP2020/014248
Authority: WO
Inventors: 貴則京屋
Original assignee: 三菱電機株式会社
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-09-30
Also published as: CN115280077B; JPWO2021192280A1; CN115280077A; JP7414964B2

Abstract

This learning device (100) learns to control the air conditioning system of a factory that includes at least one piece of equipment. The learning device (100) comprises a first data acquisition unit (110) and a model generation unit (120). The first data acquisition unit (110) acquires learning data including a first parameter (Prm1) representing the state of the at least one piece of equipment and the air conditioning system and a second parameter (Prm2) relating to the intensity of air conditioning of the air conditioning system. The model generation unit (120) generates a trained model to infer the second parameter (Prm2) from the first parameter (Prm1) using the learning data. The first parameter (Prm1) includes the identification information of each operator performing work with the at least one piece of equipment, the items of products produced by the at least one piece of equipment, the identification information of the at least one piece of equipment, product takt time, information relating to product quality, and information relating to the time at which the first parameter was obtained.

Description

Air conditioning control learning device and inference device

This disclosure relates to an air conditioning control learning device and an inference device.

Conventionally, air conditioning control that improves the comfort of the space to be air-conditioned is known. For example, Japanese Patent Application Laid-Open No. 5-256493 (Patent Document 1) has a configuration in which an air-conditioning zone is stably and comfortably maintained by using a thermal environment index PMV (Predicted Mean Vote) value by air-conditioning control by a wind speed sensor or the like. It is disclosed.

Japanese Unexamined Patent Publication No. 5-256493

In general, the air conditioning control in the factory is often constant or manually controlled according to the sensibilities of the workers. Further, the thermal environment index PMV (Predicted Mean Vote) value of Patent Document 1 is comfort as a general solution, and the correlation with the productivity of workers is unknown. Patent Document 1 does not consider improving the productivity of factory workers.

This disclosure was made to solve the above-mentioned problems, and the purpose is to improve the productivity of factory workers.

The learning device according to one aspect of the present disclosure learns the control of the air conditioning system of a factory including at least one facility. The learning device includes a first data acquisition unit and a model generation unit. The first data acquisition unit acquires learning data including a first parameter representing the state of at least one facility and an air conditioning system and a second parameter relating to the intensity of air conditioning of the air conditioning system. The model generation unit generates a trained model that infers the second parameter from the first parameter using the training data. The first parameter relates to the identification information of the worker performing the work in each of at least one equipment, the item of the product produced by at least one equipment, the identification information of at least one equipment, the takt time of the product, and the quality of the product. Includes information and information about the time when the first parameter was acquired.

The inference device according to the other aspect of the present disclosure outputs the control of the air conditioning system of the factory including at least one facility. The inference device includes a data acquisition unit and an inference unit. The data acquisition unit acquires a first parameter representing the state of at least one facility and the air conditioning system. The inference unit outputs the second parameter from the first parameter acquired by the data acquisition unit using a learned model that infers the second parameter related to the air conditioning intensity of the air conditioning system from the first parameter. The first parameter relates to the identification information of the worker performing the work in each of at least one equipment, the item of the product produced by at least one equipment, the identification information of at least one equipment, the takt time of the product, and the quality of the product. Includes information and information about the time when the first parameter was acquired.

According to the learning device and the inference device according to the present disclosure, the first parameter is the identification information of the worker who works in each of the at least one equipment, the item of the product produced by the at least one equipment, and the at least one equipment. By including the identification information, the takt time of the product, the information on the quality of the product, and the information on the time when the first parameter is acquired, the productivity of the workers in the factory can be improved.

It is a block diagram which shows an example of the structure of the management server provided with the learning device and the inference device which concerns on embodiment, the air-conditioning system controlled by the management server, and the factory. It is a figure which shows an example of working time, equipment identification information, item, worker identification information, expected optimum temperature, and air-conditioning intensity. It is a block diagram which shows the structure of the learning apparatus of FIG. It is a flowchart which shows the learning process of the learning apparatus of FIG. It is a block diagram which shows the structure of the inference apparatus of FIG. It is a flowchart which shows the inference processing of the inference apparatus of FIG. It is a block diagram which shows the hardware structure of the information processing system of FIG.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In principle, the same or corresponding parts in the drawings are designated by the same reference numerals and the description is not repeated.

FIG. 1 is a block diagram showing an example of the configuration of a management server 10 including a learning device 100 and an inference device 200 according to an embodiment, and an air conditioning system 20 and a factory 30 controlled by the management server 10. FIG. 2 is a diagram showing an example of working hours, equipment identification information, items, worker identification information, expected optimum temperature, and air conditioning intensity. With reference to FIGS. 1 and 2, the factory 30 includes equipment Eq1, Eq2, Eq3. The work Wrk is shipped as a product Prd via a work process in the order of equipment Eq1 to Eq3. In the equipments Eq1 to Eq3, the workers Op1, Op2, and Op3 are performing the work.

The management server 10 includes an information processing system 11 and a data collection / processing system 12. The information processing system 11 includes a learning device 100 and an inference device 200. The management server 10 acquires the temperature and humidity of the equipment Eq1, the temperature and humidity of the equipment Eq2, and the temperature and humidity of the equipment Eq3 from the temperature and humidity sensors Sn1, Sn2, and Sn3 by wireless communication, respectively. The management server 10 acquires the temperature and humidity of the outdoor unit 21 from the temperature / humidity sensor Sn10 via the air conditioning controller 23 by wired communication. The management server 10 acquires the temperature and humidity of the indoor unit 22 from the temperature / humidity sensor Sn11 via the data collection / processing system 12 by wired communication. The management server 10 acquires the air conditioning control estimation parameter Prm1 (first parameter) at the production site. The air conditioning control estimation parameter Prm1 is the identification information of the workers Op1 to Op3 who perform the work in each of the equipments Eq1 to Eq3, the item of the product Prd produced by the equipments Eq1 to Eq3, the identification information of the equipments Eq1 to Eq3, and the product Prd. Includes information about takt time, product Prd quality, and time when the air conditioning control estimation parameter Prm1 was acquired. The information on the quality of the product Prd includes, for example, the result of the quality inspection performed in the inspection process or the information on the yield. The air-conditioning control estimation parameter Prm1 may include images of the workers Op1 to Op3 during their respective operations.

The air conditioner system 20 includes an outdoor unit 21, an indoor unit 22, and an air conditioner controller 23. The outdoor unit 21 is arranged outside the factory 30. The indoor unit 22 and the air conditioning controller 23 are arranged in the factory 30. The outdoor unit 21 includes a fan, a compressor, and a heat exchanger. The indoor unit 22 includes a fan, a heat exchanger and an expansion valve. The air conditioning controller 23 includes a thermostat. The air conditioning controller 23 receives the air conditioning intensity control parameter Prm2 (second parameter) from the management server 10 and controls the outdoor unit 21 and the indoor unit 22. The air conditioning intensity control parameter Prm2 includes ON / OFF of the thermostat, the rotation frequency of the compressor, the wind power of the fan, the evaporation temperature of the refrigerant, and the condensation temperature of the refrigerant.

FIG. 3 is a block diagram showing the configuration of the learning device 100 of FIG. As shown in FIG. 3, the learning device 100 includes a data acquisition unit 110 (first data acquisition unit) and a model generation unit 120. The data acquisition unit 110 acquires, and the air conditioning control estimation parameter Prm1 and the air conditioning intensity control parameter Prm2 as learning data.

The model generation unit 120 learns the air conditioning intensity control by using the learning data including the air conditioning control estimation parameter Prm1 and the air conditioning intensity control parameter Prm2. That is, the model generation unit 120 generates a learned model that infers the air conditioning intensity control parameter Prm2 from the air conditioning control estimation parameter Prm1. As the learning algorithm used by the model generation unit 120, known algorithms such as supervised learning, unsupervised learning, and reinforcement learning can be used. In the following, as an example, the case where reinforcement learning is applied will be described. In reinforcement learning, an agent (behavior) in a certain environment observes the current state (environmental parameters) and decides the action to be taken. The environment changes dynamically depending on the behavior of the agent, and the agent is rewarded according to the change in the environment. The agent repeats this process and learns the action policy that gives the most reward through a series of actions. Q-learning or TD-learning is known as a typical method of reinforcement learning. For example, if the Q-learning, action value function Q (s _{t, a} _t) general update equations for is expressed as the following equation (1).

In the formula (1), _{s t} represents the state of the environment at time t, _{a t} represents the behavior in time t. By the action _{a t,} the state is changed to _{s t + 1} from the _{s t.} rt _{+ 1} represents the reward obtained by changing the state, γ represents the discount rate, and α represents the learning coefficient. Note that γ is in the range of 0 <γ ≦ 1, and α is in the range of 0 <α ≦ 1. Air conditioning intensity control parameters Prm2 the action _{a t,} and the air conditioning control estimation parameters Prm1 of the production site becomes the state _{s t.} Agent, while repeating the update of the action value function shown in equation (1) Q (s, a ), to learn the best action a _t in state s _t at time t.

The update formula represented by the equation (1) has an action value when the Q value of the action a having the highest action value Q (evaluation value) at time t + 1 is larger than the action value Q of the action a executed at time t. Increase Q. In the opposite case, the update formula reduces the action value Q. In other words, the action value function Q (s, a) is updated so that the action value Q of the action a at time t approaches the best action value at time t + 1. As a result, the best behavioral value in a certain environment is sequentially propagated to the behavioral value in the previous environment.

As described above, when a trained model is generated by reinforcement learning, the model generation unit 120 includes a reward calculation unit 121 and a function update unit 122. The reward calculation unit 121 calculates the reward using the air conditioning control estimation parameter Prm1 and the air conditioning intensity control parameter Prm2. The reward calculation unit 121 calculates the reward r according to the increase or decrease in productivity, which represents the number of product Prds actually produced in the factory 30 (for example, pieces / hour) per unit time. Specifically, the reward calculation unit 121 corresponds to the degree of deviation between the productivity of the factory 30 and the total of the individual standard productivity of each of the workers Op1 to Op3, or the productivity of the factory 30 and the standard tact time. The reward r is calculated according to the degree of deviation from the standard productivity. For example, if the productivity of the factory 30 is higher than the previous time, the reward r is increased (for example, the reward of "1" is given), while if the productivity of the factory 30 is lower than the previous time, the reward is increased. Reduce r (for example, give a reward of "-1").

The function update unit 122 updates the function for determining the air conditioning intensity control parameter Prm2 according to the reward calculated by the reward calculation unit 121, and outputs the function to the trained model storage unit 140. For example, in the case of Q-learning, action value function _{Q (s} t, _{a t)} represented by the formula (1) is used as a function for calculating the air-conditioning power control parameter Prm2.

The learning device 100 repeatedly executes the above learning. Learned model storage unit 140, action value is updated by the function updating unit 122 function _{Q (s} _{t, a} t) for storing the learned model is.

FIG. 4 is a flowchart showing the learning process of the learning device 100 of FIG. In the following, the step is simply referred to as S. As shown in FIG. 4, in S101, the data acquisition unit 110 acquires the air conditioning control estimation parameter Prm1 and the air conditioning intensity control parameter Prm2 as learning data. Specifically, the data acquisition unit 110 sets the identification information of the equipment in which the worker is working, the reference tact time according to the worker, and the working time in the identification information of each of the workers Op1 to Op3. It is given, and the position information and the time information at which the temperature and humidity were measured are given to the temperature and humidity.

In S102, the model generation unit 120 calculates the reward using the air conditioning control estimation parameter Prm1 and the air conditioning intensity control parameter Prm2. Specifically, the reward calculation unit 121 acquires the air-conditioning control estimation parameter Prm1 and the air-conditioning intensity control parameter Prm2, and determines the degree of deviation between the standard productivity, which is a predetermined reward standard, and the actual productivity of the factory 30. Based on this, it is determined whether to increase the reward corresponding to the air conditioning intensity control parameter Prm2 (S103) or decrease the reward (S104). The reward calculation unit 121 increases the reward in S103 when the productivity of the actual factory 30 is larger than the standard productivity. On the other hand, when the productivity of the actual factory 30 is smaller than the standard productivity, the reward calculation unit 121 reduces the reward in S104.

As a reward standard, a standard may be used in which the reward is increased when the yield of the product Prd is larger than the standard yield, and the reward is decreased when the yield is smaller than the standard yield. As a result, the quality of the product Prd can be improved.

In S105, the function updater 122 uses the calculated fees and wherein the compensation calculation unit 121 (1), the behavior learned model storage unit 140 stores value function _{Q (s} t, _{a t)} Update ..

Learning apparatus 100 repeatedly executes the steps of the above S101 to S105, and stores the generated action-value function _{Q (s} _{t, a} t) as a learned model. In the learning device 100, the learned model is stored in the learned model storage unit 140 provided outside the learning device 100, but the learned model storage unit 140 is formed inside the learning device 100. You may.

FIG. 5 is a block diagram showing the configuration of the inference device 200 of FIG. The inference device 200 includes a data acquisition unit 210 and an inference unit 220. The data acquisition unit 210 acquires the air conditioning control estimation parameter Prm1. The inference unit 220 infers the air conditioning intensity control parameter Prm2 by using the learned model stored in the learned model storage unit 140. That is, by inputting the air conditioning control estimation parameter Prm1 of the production site acquired by the data acquisition unit 210 into the trained model, it is possible to infer the air conditioning intensity control parameter Prm2 suitable for the air conditioning control estimation parameter Prm1 of the production site. In the embodiment, the configuration for inferring the air conditioning intensity control parameter Prm2 using the trained model learned by the model generation unit 120 in FIG. 3 has been described, but the trained model trained in another environment is used. The air conditioning intensity control parameter may be output.

FIG. 6 is a flowchart showing the inference process of the inference device 200 of FIG. As shown in FIG. 6, in S201, the data acquisition unit 210 acquires the air conditioning control estimation parameter Prm1 at the production site. In S202, the inference unit 220 inputs the air conditioning control estimation parameter Prm1 at the production site into the learned model stored in the learned model storage unit 140, obtains the air conditioning intensity control parameter Prm2, and sets the air conditioning intensity control parameter Prm2 in S203. Output to the air conditioning system 20. In S204, the air conditioning system 20 uses the air conditioning intensity control parameter Prm2 output from the inference device 200 to perform air conditioning control having an intensity that increases the amount of productivity change predicted in the near future. This will improve the estimated productivity in the near future to solve the problem of productivity fluctuation that occurs depending on people, equipment, cost items, and time, which was unavoidable in the conventional air conditioning control using uniform temperature setting. It is possible to maintain stable and high productivity by implementing air conditioning control.

In the present embodiment, the case where reinforcement learning is applied to the learning algorithm used by the inference unit has been described, but the learning algorithm is not limited to reinforcement learning. As for the learning algorithm, in addition to reinforcement learning, supervised learning, unsupervised learning, semi-supervised learning, and the like can also be applied.

Further, as a learning algorithm used in the model generation unit 120, deep learning, which learns the extraction of the feature amount itself, can also be used, and other known methods such as neural networks, genetic programming, and functions can be used. Machine learning may be performed according to logical programming or a support vector machine.

The learning device 100 and the inference device 200 may be devices separate from the air conditioning system 20 that are connected to the air conditioning system 20 via a network, for example. Further, the learning device 100 and the inference device 200 may be built in the air conditioning system 20. Further, the learning device 100 and the inference device 200 may exist on the cloud server.

In addition, instead of directly acquiring data for each worker, the persona of the worker is set from multiple viewpoints such as age, skill level, and gender (for example, a new male in his 20s), and the work for each persona is performed. Worker data may be simplified by setting a person model. Similarly, the data configuration of the air conditioning control estimation parameter Prm1 may be simplified by preparing a plurality of models of factories, equipment, and lines in advance.

Further, the model generation unit 120 may learn the air conditioning intensity control by using the learning data acquired from the plurality of air conditioning systems 20. The model generation unit 120 may acquire learning data from a plurality of air conditioning systems 20 used in the same area, or may collect learning data from a plurality of air conditioning systems 20 operating independently in different areas. The air conditioning intensity control may be learned by using the data. Further, the air conditioning system 20 that collects learning data can be added to the learning target or removed from the learning target on the way. Further, the learning device 100 that has learned the air conditioning intensity control for a certain air conditioning system 20 is applied to another air conditioning system 20, and the air conditioning intensity control is relearned and updated for the other advanced air conditioning system. You may.

FIG. 7 is a block diagram showing a hardware configuration of the information processing system 11 of FIG. As shown in FIG. 7, the information processing system 11 includes a processing circuit 51, a memory 52 (storage unit), and an input / output unit 53. The processing circuit 51 includes a CPU (Central Processing Unit) that executes a program stored in the memory 52. The processing circuit 51 may include a GPU (Graphics Processing Unit). The function of the information processing system 11 is realized by software, firmware, or a combination of software and firmware. The software or firmware is described as a program and stored in the memory 52. The processing circuit 51 reads and executes the program stored in the memory 52. The CPU is also called a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a processor, or a DSP (Digital Signal Processor).

The memory 52 includes a non-volatile or volatile semiconductor memory (for example, RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), or EPROM (Electrically Erasable Programmable Read Only Memory). )), And includes magnetic discs, flexible discs, optical discs, compact discs, mini discs, or DVDs (Digital Versatile Discs). The memory 52 stores, for example, a trained model, an air conditioning program, and a machine learning program.

The input / output unit 53 receives an operation from the user and outputs the processing result to the user. The input / output unit 53 includes, for example, a mouse, a keyboard, a touch panel, a display, and a speaker.

As described above, according to the learning device and the inference device according to the embodiment, the productivity of factory workers can be improved.

The embodiments disclosed this time should be considered to be exemplary in all respects and not restrictive. The scope of the present disclosure is indicated by the scope of claims rather than the above description, and is intended to include all modifications within the meaning and scope of the claims.

10 management server, 11 information processing system, 12 data collection / processing system, 20 air conditioning system, 21 outdoor unit, 22 indoor unit, 23 air conditioning controller, 30 factory, 51 processing circuit, 52 memory, 53 input / output unit, 100 learning device , 110, 210 data acquisition unit, 120 model generation unit, 121 reward calculation unit, 122 function update unit, 140 learned model storage unit, 200 inference device, 220 inference unit, Eq1 to Eq3 equipment, Op1 to Op3 workers, Prd Product, Sn1 to Sn3, Sn10, Sn11 temperature and humidity sensor, Wrk work.

Claims

A learning device that learns to control a factory air conditioning system that includes at least one piece of equipment.
A first data acquisition unit that acquires learning data including a first parameter representing the state of the at least one facility and the air conditioning system, and a second parameter relating to the intensity of air conditioning of the air conditioning system.
It is provided with a model generation unit that generates a trained model that infers the second parameter from the first parameter using the training data.
The first parameter is identification information of a worker who works in each of the at least one equipment, an item of a product produced by the at least one equipment, identification information of the at least one equipment, and a takt time of the product. , A learning device that includes information about the quality of the product, and information about the time when the first parameter was acquired.
The learning device according to claim 1, wherein the first parameter includes an image of the worker during work.
The learning device according to claim 1 or 2, wherein the trained model includes a function in which the first parameter and the evaluation value of the second parameter are associated with each other.
The model generation unit updates the evaluation value of the second parameter according to the degree of deviation between the productivity in the factory and the reference productivity under the air conditioning by the air conditioning system controlled according to the second parameter. , The learning device according to claim 3.
The third aspect of the present invention, wherein the model generation unit updates the evaluation value of the second parameter according to a change in the yield of the product produced under the air conditioning by the air conditioning system controlled according to the second parameter. Learning device.
A second data acquisition unit that acquires the first parameter, and
Inference that outputs the second parameter from the first parameter acquired by the second data acquisition unit using the trained model generated by the learning device according to any one of claims 1 to 5. An inference device equipped with a unit.
An inference device that outputs control of a factory air conditioning system that includes at least one piece of equipment.
A data acquisition unit that acquires a first parameter representing the state of at least one of the facilities and the air conditioning system.
Using a learned model that infers a second parameter related to the intensity of air conditioning of the air conditioning system from the first parameter, an inference unit that outputs the second parameter from the first parameter acquired by the data acquisition unit is used. Prepare,
The first parameter is identification information of a worker who works in each of the at least one equipment, an item of a product produced by the at least one equipment, identification information of the at least one equipment, and a takt time of the product. An inference device that includes information about the quality of the product, and information about the time when the first parameter was acquired.
The inference device according to claim 7, wherein the first parameter includes an image of the worker in progress.
The inference device according to claim 7 or 8, wherein the trained model includes a function in which the first parameter and the evaluation value of the second parameter are associated with each other.