WO2019077686A1 - Elevator maintenance work assistance device - Google Patents

Elevator maintenance work assistance device Download PDF

Info

Publication number
WO2019077686A1
WO2019077686A1 PCT/JP2017/037592 JP2017037592W WO2019077686A1 WO 2019077686 A1 WO2019077686 A1 WO 2019077686A1 JP 2017037592 W JP2017037592 W JP 2017037592W WO 2019077686 A1 WO2019077686 A1 WO 2019077686A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
reward
action
elevator
measurement
Prior art date
Application number
PCT/JP2017/037592
Other languages
French (fr)
Japanese (ja)
Inventor
真人 高井
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2017/037592 priority Critical patent/WO2019077686A1/en
Publication of WO2019077686A1 publication Critical patent/WO2019077686A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B66HOISTING; LIFTING; HAULING
    • B66BELEVATORS; ESCALATORS OR MOVING WALKWAYS
    • B66B3/00Applications of devices for indicating or signalling operating conditions of elevators
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B66HOISTING; LIFTING; HAULING
    • B66BELEVATORS; ESCALATORS OR MOVING WALKWAYS
    • B66B5/00Applications of checking, fault-correcting, or safety devices in elevators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present invention relates to an elevator maintenance work support device.
  • Patent Document 1 discloses an example of an elevator maintenance work support device.
  • the elevator maintenance work support device acquires elevator state data from measurement data of a measurement device provided in the elevator.
  • a range of state data that indicates an indication of a failure is preset.
  • the elevator maintenance operation support device determines the presence or absence of a sign of elevator failure based on whether the state data is within a range of state data representing a sign of failure.
  • An object of the present invention is to provide an elevator maintenance work support device capable of specifically presenting maintenance activities for maintenance personnel to suppress the probability of failure of an elevator based on measurement data of a measurement device provided in the elevator. It is.
  • the elevator maintenance work support device is provided communicably with a sensor of the elevator and a terminal device used by a maintenance worker, receives the measurement data of the elevator measured during a predetermined measurement period by the sensor, and transmits from the terminal device
  • a communication unit for receiving maintenance data of maintenance personnel, and a state of each of the one or more elevators based on measurement data measured during a measurement period by a measurement device provided in each of the one or more elevators State data to be represented is set, reinforcement learning is performed using action data and state data, and action data after a measurement period in which the measurement data is measured is selected and selected based on the result of reinforcement learning and measurement data
  • an action selection unit that causes the communication unit to communicate the action data.
  • the communication unit is provided in a communicable manner with the elevator sensor and the terminal device used by the maintenance staff, receives the measurement data of the elevator measured during a predetermined measurement period by the sensor, and transmits from the terminal device Receive maintenance data of maintenance personnel.
  • the action selection unit sets state data representing the state of each of the one or more elevators based on measurement data measured during a measurement period by a measurement device provided in each of the one or more elevators, Reinforcement learning is performed using data and state data, action data after a measurement period in which the measurement data is measured is selected based on the result of reinforcement learning and measurement data, and the selected action data is communicated to the communication unit.
  • BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the structure of a system provided with the elevator maintenance operation assistance apparatus which concerns on Embodiment 1 of this invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the structure of the elevator maintenance operation assistance apparatus which concerns on Embodiment 1 of this invention. It is a figure which shows the example of the data which the remuneration memory
  • FIG. 1 is a block diagram showing a configuration of a system including an elevator maintenance work support apparatus according to Embodiment 1 of the present invention.
  • the system shown in FIG. 1 includes one or more elevators 1, an elevator maintenance work support device 10, and a terminal device 20.
  • One or more elevators 1 are provided in the building 2.
  • the hoistway 3 of the elevator 1 penetrates each floor of the building 2.
  • the landing 4 of the elevator 1 is provided on each floor of the building 2.
  • the hall 4 has a hall entrance at each floor.
  • the landing entrance corresponds to the hoistway 3.
  • the landing entrance faces the corresponding hoistway 3.
  • the elevator 1 includes a car 5, a control device 6, a sensor 7 and a communication device 8.
  • the car 5 is provided inside the hoistway 3.
  • the car 5 is a device that moves up and down inside the hoistway 3 by the power of a hoisting machine (not shown).
  • a hoisting machine not shown.
  • the cars 5 of the plurality of elevators 1 are arranged so as not to overlap each other on the horizontal projection plane.
  • the car 5 is provided with a door not shown.
  • the door of the car 5 is a device that opens and closes so that the user can move the car 5 in and out when the car 5 is stopped.
  • the door of the car 5 is provided so as to be able to be opened and closed by a drive device (not shown) provided in the car 5.
  • the landing 4 has a door not shown.
  • the door of the hall 4 is provided at the hall entrance.
  • the door of the landing 4 is a device that opens and closes so that the user can get in and out of the car 5 when the car 5 is stopped at the floor where the landing 4 is provided.
  • the door of the landing 4 is provided so as to be able to open and close in conjunction with the opening and closing of the door of the car 5.
  • the controller 6 is a device that controls the operation of the corresponding car 5.
  • the operation of the car 5 includes, for example, an elevation operation and an opening / closing operation of the door.
  • the control device 6 is connected to the group management device 9.
  • the group management device 9 controls call assignment to the car 5 controlled by the connected control device 6.
  • the sensor 7 is a device that measures the state of the elevator 1.
  • the sensor 7 is an example of a measuring device.
  • the sensors 7 are, for example, internal sensors provided in each car 5 or hoist of the elevator 1.
  • the sensors 7 are, for example, external sensors provided in the hoistway 3 or the landing 4 to which each car 5 of the elevator 1 corresponds.
  • the internal sensor is, for example, a device that measures the position, velocity, acceleration or vibration of the car 5.
  • the internal sensor is, for example, a device that measures the opening / closing speed or vibration of the door of the car 5.
  • the internal sensor is, for example, a device that measures the torque or vibration of the hoist.
  • the external sensor is, for example, a device that measures the temperature or humidity of the hoistway 3.
  • the external sensor is, for example, a device that measures the opening / closing speed or vibration of the door of the landing 4.
  • the sensor 7 is connected directly or indirectly to the communication device 8 of the elevator 1 so that the measurement result can be transmitted as measurement data.
  • the communication device 8 of the elevator 1 is a device that communicates measurement data measured by the sensor 7.
  • the communication device 8 of the elevator 1 is connected directly or indirectly to the elevator maintenance work support device 10.
  • the elevator maintenance work support device 10 is a device that presents maintenance activities to maintenance personnel of the elevator 1 based on measurement data of the sensor 7 received from the communication device 8 of the elevator 1.
  • the elevator maintenance work support device 10 is, for example, a server computer.
  • the maintenance action is a specific action which can be performed by a maintenance worker in the maintenance work of the elevator 1.
  • Maintenance activities include, for example, replacement of parts of the hoist, inspection of parts of the car 5 and adjustment of the doors of the landing 4.
  • the elevator maintenance work support device 10 is connected to the terminal device 20 so as to be able to communicate by wire or wirelessly.
  • the terminal device 20 is a device that displays the contents of the maintenance action that the elevator maintenance work support device 10 presents as action data, and transmits the action data selected by the maintenance worker to the elevator maintenance work support device 10.
  • the action data is data representing each of the maintenance actions.
  • the action data is, for example, an integer value maintenance action code associated with each of M types of maintenance actions.
  • the terminal device 20 includes a communication unit 21, a display unit 22, and an input unit 23.
  • the terminal device 20 is, for example, a personal computer.
  • the communication unit 21 of the terminal device 20 is directly or indirectly connected to the elevator maintenance work support device 10.
  • the communication unit 21 of the terminal device 20 is a portion that communicates the behavior data with the elevator maintenance work support device 10.
  • the function of the communication unit 21 is realized by, for example, a network card.
  • the display part 22 is a part which displays the content of the maintenance action which the action data received from the elevator maintenance work assistance apparatus 10 via the communication part 21 of the terminal device 20 represent.
  • the function of the display unit 22 is realized by, for example, a display.
  • the input unit 23 is a part that receives an input of action data selected by the maintenance staff.
  • the function of the input unit 23 is realized by, for example, a keyboard and a mouse.
  • the action data of which the input unit 23 receives an input is transmitted to the elevator maintenance work support apparatus 10 via the communication unit 21 of the terminal device 20.
  • the elevator maintenance work support apparatus 10 selects action data based on the measurement data received from the sensor 7.
  • the elevator maintenance work support device 10 presents the maintenance action to the maintenance worker via the display unit 22 by transmitting the selected action data to the terminal device 20.
  • the elevator maintenance work support device 10 performs reinforcement learning using the action data input by the maintenance worker via the input unit 23 of the terminal device 20.
  • the elevator maintenance work support apparatus 10 reflects the result of reinforcement learning in the selection of the next action data.
  • FIG. 2 is a block diagram showing functions of the elevator maintenance work support device according to the present embodiment.
  • the elevator maintenance work support device 10 includes a communication unit 11, an action selection unit 12, a reward determination unit 13, and a reward storage unit 14.
  • the communication unit 11 of the elevator maintenance operation support device 10 is connected to each of the sensors 7 directly or indirectly.
  • the communication unit 11 of the elevator maintenance work support device 10 is connected to the terminal device 20 by wire or wireless.
  • the communication unit 11 of the elevator maintenance work support device 10 is a portion that performs communication between the sensor 7 and the terminal device 20.
  • the action selection unit 12 is a part that selects action data based on the measurement data measured by the sensor 7.
  • the reward determination unit 13 is a part that determines a reward for the selection of the action data of the action selection unit 12.
  • the reward is numerical data representing the evaluation of the result obtained for the selection of the action data of the action selection unit 12.
  • the reward storage unit 14 stores the reward determined by the reward determination unit 13.
  • the communication unit 11 receives measurement data from each of the sensors 7.
  • Measurement data is measured by the sensor 7 during the measurement period.
  • the measurement period is, for example, a period between maintenance actions on each of the elevators 1 by the maintenance staff.
  • the measurement period is a period between the maintenance work.
  • the action selection unit 12 sets each state data of the elevator 1 based on the measurement data received by the communication unit 11 of the elevator maintenance work support device 10.
  • State data set based on measurement data measured during the measurement period is state data of the measurement period.
  • the action selection unit 12 sets state data of the measurement period in which the measurement data is measured, for example, as follows.
  • sensor 7 provided in each of elevators 1 is explained as one sensor of the same kind. That is, the measurement data of the sensor 7 provided in each of the elevators 1 will be described as numerical data.
  • the action selection unit 12 calculates an average value of measurement data over the measurement period.
  • the action selection unit 12 stores in advance N sections into which the possible range of the average value of the measurement data is divided by N. Each of the N intervals corresponds to each of the N status codes.
  • the action selection unit 12 sets a state code corresponding to a section including the calculated average value in the state data of the measurement period in which the measurement data is measured.
  • the action selection unit 12 learns the change in the state of the elevator 1 due to the maintenance work by performing reinforcement learning. Specifically, the action selection unit 12 performs reinforcement learning using an algorithm of Q learning in the case where, for example, a change in the state of the elevator 1 due to maintenance work is modeled as a Markov decision process.
  • the action selection unit 12 performs reinforcement learning by labeling each of the selection of action data with time t.
  • the time t is 0 at the start of reinforcement learning, and is an integer which is incremented by one each time the action selection unit 12 selects action data.
  • the subscript t indicates that it is a value or data corresponding to the selection of behavioral data at time t.
  • the action selection unit 12 stores the Q value for each of the elevators 1.
  • the Q value is numerical data Q (s, a) labeled with state data s and action data a.
  • Q (s, a) is a gain that can be obtained when reinforcement learning progresses and behavior data a is selected for state data s and behavior data is subsequently selected so as to obtain the highest reward. Converge to the expected value.
  • Gain V t at time t is reward r t at time t, as the discount rate gamma, is expressed by the following equation (1).
  • the discount rate ⁇ is a predetermined numerical value of 0 ⁇ ⁇ ⁇ 1.
  • Action selection unit 12 sets the respective status data s t of the elevator 1 based on each of the measurement data of the elevator 1 that is measured before the selected behavioral data at time t.
  • Action selection unit 12 based on the status data s t set the Q value for each of the elevator 1, selects the behavior data a t for the elevator 1.
  • the action selection unit 12 selects action data by, for example, the ⁇ -greedy selection method. That is, the action selecting section 12, with probability ⁇ predetermined, randomly selects behavior data a t from the M activity data. Action selection unit 12, with probability 1-epsilon, the behavior data a t to maximize the labeled Q value in the state data s t, selected by the following equation (2).
  • Action selector 12 the selected behavioral data a t, via the communication unit 11 of the elevator maintenance work support device 10 to the terminal device 20.
  • the communication unit 11 of the elevator maintenance work support device 10 the behavior data a t m representing the input was done in maintenance work and maintenance personnel maintenance actions, it receives from the terminal device 20.
  • the superscript m represents that it is data representing the maintenance action performed by the maintenance staff.
  • the action selection unit 12 sets the next state data st + 1 based on the measurement data of the measurement period subsequent to the selection of the action data at time t.
  • Compensation determination unit 13 determines whether the behavior data a t m received is the same from the action action selection unit 12 selects data a t and the terminal device 20.
  • the behavior data a t and behavior data a t m is the case the same, determines the reward r based on the measurement data of the measurement period subsequent to the selection of behavior data at time t.
  • the reward determination unit 13 determines the reward r, for example, as follows, based on the measurement data.
  • the reward determination unit 13 determines the reward r to be +1 when the maximum value over the measurement period of the absolute value of the difference between the measured numerical value of the measured data and the numerical value representing the normal state is smaller than a predetermined value.
  • the reward determination unit 13 determines the reward r to be -1 when the minimum value over the measurement period of the absolute value of the difference between the measured numerical value of the measured data and the numerical value representing the abnormal state is smaller than a predetermined value.
  • the reward determination unit 13 sets the set position as the numerical value of the normal state, and the absolute value of the difference between the stop position and the set position is predetermined. If the value 1 mm is not exceeded during the measurement period, the reward r is determined to be +1.
  • Compensation storage unit 14 in association with reward r state data s t and behavioral data a t and reward determination unit 13 at time t is determined, the elevator 1 the action selection unit 12 makes a selection of the behavior data a t Remember.
  • Compensation determination unit 13 when the behavior data a t and behavior data a t m are different, compensation storage section 14 for the other elevator, to store the compensation r1 in association with behavioral data a t and state data s t + 1 Determine if there is.
  • Compensation storage unit 14 for the other elevator when storing the compensation r1 in association with behavioral data a t and state data s t + 1, reward determination unit 13, for the selection of behavior data a t the action selection unit 12 Determine the reward r to be r1.
  • Compensation storage unit 14 in association with reward r state data s t and behavioral data a t and reward determination unit 13 at time t is determined, the elevator 1 the action selection unit 12 makes a selection of the behavior data a t Remember.
  • Compensation storage unit 14 for the other elevator if not stored a reward r 1 in association with the action data a t and state data s t + 1, reward determination unit 13 selects the behavior data a t the action selection unit 12
  • the reward r for is determined to a predetermined value. As a specific example, it rewards storage unit 14 sets 0 to reward r on behavior data a t the action selection unit 12.
  • the action selection unit 12 updates the Q value based on the reward r determined by the reward determination unit 13. Specifically, the action selection unit 12 updates Q (s t , a t ) according to the following equation (3), where ⁇ is a learning rate.
  • the action selection unit 12 selects the next action data at + 1 for the elevator 1 based on the updated Q value for each of the elevators 1 and the set state data st + 1 at time t + 1 .
  • FIG. 3 is a diagram showing an example of data stored in the reward storage unit 14 according to the present embodiment.
  • the reward storage unit 14 stores, for each of the elevators 1, a set of N state data and M type action data in association with a reward, which is numerical data.
  • the reward storage unit 14 stores, for example, the state data S2 after the action data A1 and the action data A1 have been selected, and the reward R12 in association with each other.
  • the reward storage unit 14 may store together state data before the action data is selected.
  • FIG. 4 is a flow chart showing an example of the operation of the elevator maintenance work support system according to the present embodiment.
  • the elevator maintenance work support device 10 performs the operation shown in FIG. 4 for each of the elevators 1.
  • the action selection unit 12 sets the current time t to zero. Thereafter, the action selection unit 12 sets state data s 0 based on the measurement data (S101).
  • Action selection unit 12 the behavior data a t the current time t, is selected based on the Q value and s t (S102).
  • the terminal device 20 displays the contents of the maintenance action received behavioral data a t represented on the display unit 22.
  • the maintenance worker judges the maintenance action actually performed in the maintenance work with reference to the content displayed on the display unit 22.
  • Maintenance personnel after the maintenance work has been completed, enter the action data a t m which represents the actual maintenance actions were carried out from the input unit 23.
  • the communication unit 21 of the terminal device 20 transmits the input action data a t m in the elevator maintenance work support device 10.
  • the communication unit 11 of the elevator maintenance work support device 10 receives the action data a t m from the terminal device 20 (S104).
  • the action selection unit 12 sets the state data st + 1 based on the measurement data of the measurement period after the maintenance work (S105).
  • Compensation determination unit 13 determines the reward r for the selection of activity data a t by the action selecting section 12 (S106).
  • the action selection unit 12 updates the Q value using the reward r determined by the reward determination unit 13 and the state data st + 1 (S107).
  • the action selection unit sets the current time t as the next time t + 1 (S108). Thereafter, the operation of the elevator maintenance work support device 10 proceeds to S102.
  • FIG. 5 is a flow chart showing an example of operation in determination of a reward of the elevator maintenance work support device according to the present embodiment.
  • Compensation determination unit 13 determines a behavior data a t the action selection unit 12 selects a behavior data a t m received from the terminal device 20, whether the same (S201). When the determination result is Yes, the operation of the elevator maintenance work support device 10 proceeds to S202. If the determination result is No, the operation of the elevator maintenance work support device 10 proceeds to S204.
  • the reward determination unit 13 determines the reward r based on the measurement data (S202).
  • the elevator action selecting section 12 selects the behavior data a t, the behavior data a t and state data s t + 1, and stores in association with reward r (S203).
  • Compensation determination section 13 determines whether compensation storage section 14 for the other elevator, in association with behavior data a t and state data s t + 1 and reward r1 stored (S204). If the determination result is Yes, the operation of the elevator maintenance work support device 10 proceeds to S205. If the determination result is No, the operation of the elevator maintenance work support device 10 proceeds to S206.
  • Compensation determination unit 13 acquires the reward r1 remuneration storage unit 14, determines the reward r to r1 for the selection of activity data a t by the action selecting section 12 (S205).
  • the reward r for the selection of activity data a t by action selecting section 12 determines to 0 (S206).
  • the action selection unit 12 is set based on the action data and the measurement data measured by the sensor 7 during the measurement period. Reinforcement learning is performed using the data.
  • the action selection unit 12 selects action data based on the Q value obtained as a result of reinforcement learning and the state data.
  • the communication unit 11 of the elevator maintenance work support device 10 transmits the behavior data selected by the behavior selection unit 12 to the terminal device 20.
  • elevator maintenance operation support device 10 can specifically present maintenance personnel to maintenance personnel for suppressing the probability of failure of the elevator based on the measurement data of sensor 7 provided in elevator 1. This can improve the elevator operation efficiency.
  • the action selection unit 12 selects action data based on the result of reinforcement learning. That is, the action selection unit 12 learns which action data should be selected to avoid a failure state based on the measurement data after the selection. Therefore, after reinforcement learning progresses, the action selection unit 12 can select action data for avoiding the failure even for a failure whose mechanism is not clear.
  • the reward determination unit 13 stores the reward storage unit 14 when the action data selected by the action selection unit 12 does not match the action data indicating the maintenance action performed by the maintenance worker. Refer to the data.
  • the action selection unit 12 performs reinforcement learning using the reward determined by the reward determination unit 13. Thereby, the action selection unit 12 can perform reinforcement learning even when the maintenance action represented by the selected action data does not match the maintenance action performed by the maintenance worker.
  • the action selection unit 12 performs reinforcement learning for each of the elevators 1.
  • the reward determination unit 13 determines, for each of the elevators 1, a reward for the selection of the action selection unit 12.
  • the reward storage unit 14 stores data for each of the elevators 1.
  • the behavior selection unit 12 can select behavior data incorporating effects of different driving situations and installation situations, etc., for each elevator 1.
  • the reward determination unit 13 stores the other information stored in the reward storage unit 14. Refer to the data for the elevator 1. As a result, the amount of data that can be referred to by the reward determination unit 13 is greater than when only data for the same elevator is referred to. Therefore, the reward determination unit 13 can determine the reward in many cases as compared to the case where the data stored in the reward storage unit 14 is not referred to. Therefore, the action selection unit 12 can obtain many opportunities for reinforcement learning as compared to the case where the reward storage unit 14 does not refer to the data stored therein.
  • FIG. 6 is a flow chart showing another example of the operation in determination of remuneration of the elevator maintenance work support system according to the present embodiment.
  • the elevator maintenance work support apparatus 10 determines the reward in S214 and S215 instead of S204 and S205 in the example of the operation shown in FIG.
  • the elevator maintenance work support apparatus 10 adopts, for example, the same features as the example of the operation illustrated in FIG. 5.
  • Compensation determination unit 13 when the behavior data a t selected for the elevator 1 and the behavior data a t m are different, compensation storage section 14 for the elevator, compensation in association with the action data a t and state data s t + 1 r2 Is determined (S214).
  • the determination result is Yes, the operation of the elevator maintenance work support device proceeds to S215. If the determination result is No, the operation of the elevator maintenance work support device proceeds to S206.
  • the compensation memory unit 14 acquires the reward r2 is a historical data of the elevator 1, determines the reward r to r2 for the selected behavioral data a t the action selection unit 12 (S215).
  • the reward determination unit 13 determines that the reward storage unit 14 does not match the action data selected by the action selection unit 12 for one elevator 1 and the action data representing the maintenance action performed by the maintenance worker. Refers to the data about the elevator 1 stored by the. Thus, even when the maintenance action represented by the selected action data does not match the maintenance action performed by the maintenance worker, the action selection unit 12 can perform reinforcement learning based on the data for the elevator 1. Therefore, the action selection unit 12 can enhance the compatibility of data used for reinforcement learning.
  • the action selection unit 12 converts the state data into a state data based on a value calculated by regarding the distribution of values measured by the sensor 7 during the measurement period as the probability distribution instead of the average value over the measurement period of the measurement data. It may be set.
  • the action selection unit 12 may set state data based on the mode, maximum value, minimum value, median value, interquartile difference, moment, or cumulant of the probability distribution.
  • the action selection unit 12 may set state data based on the result of frequency analysis using measurement data as time series data.
  • the sensors 7 provided in each of the elevators 1 may be a plurality of types of sensors.
  • the measurement data of the sensor 7 may be vector data having d numerical values as components when the type of the sensor 7 is d.
  • the action selection unit 12 sets state data of the measurement period in which the measurement data is measured, for example, as follows.
  • the action selection unit 12 calculates an average value over the measurement period for each of the d components of the measurement data.
  • the action selection unit 12 stores, in advance, L sections in which the possible range of the average value is divided by L for each of the d components of the measurement data.
  • the action selection unit 12 sets the state code corresponding to the label including the calculated average value in the state data of the measurement period in which the measurement data is measured.
  • the action selection unit 12 may select action data by the Boltzmann selection method. That is, the action selecting section 12, state data conditional probability P when a s t
  • the parameter T is a predetermined positive value.
  • the action selection unit 12 may change the parameter T as a predetermined monotonically decreasing function of time t.
  • the reward determination unit 13 may determine the reward r as an average value or a minimum value over the measurement period of the distance between the vector of the measurement data and the vector representing the abnormal state.
  • the reward determination unit 13 may determine the reward r to be the average value or the minimum value over the measurement period of the value obtained by multiplying the distance between the vector of the measurement data and the vector representing the normal state by a negative constant.
  • a vector representing a normal state is, for example, a vector in which each component of the vector has a corresponding amount of design value as a component.
  • the vector representing the abnormal state is a vector whose component is the value indicated by each of the sensors 7 in the abnormal state.
  • the distance between the vector and the vector is, for example, a norm such as Euclidean norm or maximum norm.
  • the reward determination unit 13 may consider the distribution of values measured by the sensor 7 during the measurement period as a probability distribution, and determine the reward r as the Kullback-Leibler distance with the probability distribution representing the normal state.
  • the probability distribution representing the normal state is, for example, a normal distribution with the mean value and the standard deviation as design values and measurement errors.
  • the action selection unit 12 may perform reinforcement learning using a reward that is not directly based on measurement data. For example, the action selection unit 12 may perform reinforcement learning with the cost according to the maintenance action and the loss caused by the failure of the elevator 1 as a negative reward. The action selection unit 12 can perform learning so as not to unnecessarily select high-cost action data while avoiding a failure of the elevator.
  • the reward determination unit 13 may add 1 to the reward every time the normal driving state continues, assuming that the measured data value is in the range of the value indicating the normal state as the normal driving state. By using the number of days in which the normal driving state continues as the reward, the action selecting unit 12 can learn to select the action data for continuing the normal driving state for a longer time.
  • the reward determination unit 13 may subtract 1 from the reward every time the abnormality occurrence state continues for one day, assuming that the value of the measurement data is in the range of the value representing the abnormality, as the abnormality occurrence state.
  • the action selection unit 12 can learn to select action data for shortening the period in which the abnormality occurrence state continues. Thus, the action selection unit 12 can select action data for further improving the operation efficiency.
  • the action selection unit 12 may set a predetermined time interval as a measurement period. That is, the action selection unit 12 may set state data based on measurement data measured at predetermined time intervals. The action selection unit 12 can take many learning opportunities regardless of the maintenance work interval.
  • the behavior data may be behavior data representing a conservative behavior that does nothing.
  • the elevator maintenance operation support device 10 can present maintenance personnel that there is no particular maintenance action to be performed. In particular, when the action selection unit 12 sets a predetermined time interval as the measurement period, the elevator maintenance operation support device 10 can present the maintenance worker with the timing of the maintenance action in addition to the contents of the maintenance action.
  • the reward determination unit 13 may determine the reward on the assumption that the maintenance action to do nothing is selected.
  • Action selection unit 12 includes a behavior data a t the action selection unit 12 has selected, if different from the behavior data a t m received from the terminal apparatus 20, the reward r the compensation determining section 13 has determined based on the measurement data based on, it may update the Q value are labeled with behavioral data a t m. That is, the Q value may be updated by the following equation (5).
  • the action selection unit 12 may use Sarsa (State-Action-Reward-State-Action) as an algorithm of reinforcement learning. That is, the Q value may be updated by the following equation (6).
  • the action selecting section 12 for example, without selection of behavior data in S102, the selection of time t + 1 of the behavioral data a t + 1 based on the Q value and s t + 1 immediately after the S106 I do.
  • the action selection unit 12 may perform reinforcement learning using an algorithm in the case where a change in the state of the elevator 1 due to maintenance work is modeled as a partially observed Markov decision process in consideration of uncertainty.
  • the action selection unit 12 may perform reinforcement learning by using state data as vector data having continuous numerical values as components.
  • the behavior selection unit 12 When each of the elevators 1 is provided with a plurality of types of sensors 7, the behavior selection unit 12 performs reinforcement learning as vector data whose state data is an average value of measurement values of a plurality of sensors over a measurement period. You may do it.
  • the action selection unit 12 may consider the distribution of values measured by the sensor during the measurement period as a probability distribution, and may perform reinforcement learning as vector data having expansion coefficients obtained by performing the basis function expansion on the probability distribution. .
  • the action selection unit 12 may perform function approximation of the Q value using a parameter.
  • the action selection unit 12 uses, for example, a weight vector for each of the action data as a parameter, and an inner product of the weight vector and a vector representing the state data as a Q value labeled with each of the action data and the state data. calculate.
  • the action selection unit 12 may calculate the Q value by a neural network using each of the components of the state data as an input layer.
  • the action selection unit 12 may use DQN (Deep Q-Network) which calculates Q value by deep learning with each component of the state data as an input.
  • DQN Deep Q-Network
  • the reward storage unit 14 may store data on the plurality of elevators 1 collectively.
  • the action selection unit 12 may perform reinforcement learning by sharing Q values for a plurality of elevators 1. As a result, more data can be used for reinforcement learning of a set of Q values than when Q values are used for each of the elevators 1.
  • the sensor 7 may transmit measurement data to the elevator maintenance work support apparatus 10 at predetermined time intervals during the measurement period.
  • the sensor 7 may transmit the measurement data to the elevator maintenance operation support device 10 when the measurement data exceeds a predetermined threshold value during the measurement period.
  • the sensor 7 may transmit the measurement data to the elevator maintenance operation support device 10 during the measurement period if the measurement data is within a range of values representing a predetermined elevator failure.
  • the behavior selection unit 12 may share the Q value for the plurality of elevators 1 that the bank has.
  • the terminal device 20 may be a portable tablet computer owned by a maintenance worker.
  • the action selection unit 12 may use the result of learning by simulation as the initial value of the Q value.
  • the elevator maintenance work support apparatus 10 may perform presentation and learning of maintenance activities for the elevator 1 provided in one building 2.
  • the elevator maintenance work support apparatus 10 may be disposed in a building 2 in which the elevator 1 is provided.
  • the elevator maintenance work support apparatus 10 may perform presentation and learning of maintenance activities for the elevators 1 provided in the multiple buildings 2.
  • the elevator maintenance work support apparatus 10 may be connected to each of the elevators 1 from the outside of the building 2 provided with the elevators 1 via a network.
  • the communication unit 11 of the elevator maintenance work support device 10 may communicate the terminal device 20 with the action data when a request from a maintenance worker from the terminal device 20 is made.
  • the elevator maintenance operation support device 10 may not store the correspondence between the elevator 1 and the terminal device 20 to which the maintenance worker who performs the maintenance operation of the elevator 1 refers.
  • FIG. 7 is a diagram showing a hardware configuration of main parts of the elevator maintenance work support device according to the present embodiment.
  • Each function of elevator maintenance work support device 10 can be realized by a processing circuit.
  • the processing circuit comprises at least one processor 10b and at least one memory 10c.
  • the processing circuit may comprise at least one dedicated hardware 10a together with or as an alternative to the processor 10b and the memory 10c.
  • each function of the elevator maintenance work support device 10 is realized by software, firmware, or a combination of software and firmware. At least one of software and firmware is described as a program.
  • the program is stored in the memory 10c.
  • the processor 10 b implements each function of the elevator maintenance work support device 10 by reading and executing the program stored in the memory 10 c.
  • the processor 10 b is also referred to as a central processing unit (CPU), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, or a DSP.
  • the memory 10 c is configured of, for example, nonvolatile or volatile semiconductor memory such as RAM, ROM, flash memory, EPROM, EEPROM, magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD, or the like.
  • the processing circuit may be realized by, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an ASIC, an FPGA, or a combination thereof.
  • Each function of the elevator maintenance work support device 10 can be realized by a processing circuit. Alternatively, each function of the elevator maintenance work support device 10 can be collectively realized by a processing circuit. About each function of elevator maintenance operation support device 10, a part may be realized by dedicated hardware 10a, and the other part may be realized by software or firmware. For example, the functions of the action selection unit 12 and the reward determination unit 13 may be realized by software or firmware described as a program, and other units may be realized by the dedicated hardware 10a. Thus, the processing circuit implements each function of the elevator maintenance work support device 10 with the hardware 10a, software, firmware, or a combination thereof.
  • the elevator maintenance work support device can be applied to maintenance work of an elevator system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Indicating And Signalling Devices For Elevators (AREA)
  • Maintenance And Inspection Apparatuses For Elevators (AREA)

Abstract

The purpose of this invention is to provide an elevator maintenance work assistance device that can provide a maintenance worker with concrete information on a maintenance action for reducing the probability of the elevator failing on the basis of measurement data from a measurement device provided in the elevator. The elevator maintenance work assistance device comprises a communication unit and an action selection unit. The communication unit transmits action data that indicates a maintenance action for a maintenance worker to a terminal device that displays the details of the maintenance action indicated in the received action data. The action selection unit sets status data that indicates the status of each of one or more elevators on the basis of measurement data measured during a measurement period with a measurement device provided in each of the one or more elevators, carries out reinforcement learning using the action data and the status data, selects action data on the basis of the reinforcement learning results and the measurement data for after the measurement period in which the measurement data is measured, and causes the communication unit to transmit the selected action data.

Description

エレベータ保守作業支援装置Elevator maintenance work support device
 本発明は、エレベータ保守作業支援装置に関する。 The present invention relates to an elevator maintenance work support device.
 特許文献1は、エレベータ保守作業支援装置の例を開示する。エレベータ保守作業支援装置は、エレベータに設けられた測定装置の測定データから、エレベータの状態データを取得する。エレベータ保守作業支援装置は、故障の兆候を現す状態データの範囲が予め設定される。エレベータ保守作業支援装置は、状態データが故障の兆候を表す状態データの範囲にあるか否かに基づいて、エレベータの故障の兆候の有無を判定する。 Patent Document 1 discloses an example of an elevator maintenance work support device. The elevator maintenance work support device acquires elevator state data from measurement data of a measurement device provided in the elevator. In the elevator maintenance work support device, a range of state data that indicates an indication of a failure is preset. The elevator maintenance operation support device determines the presence or absence of a sign of elevator failure based on whether the state data is within a range of state data representing a sign of failure.
日本特開平8-104473号公報Japanese Patent Application Laid-Open No. 8-104473
 しかしながら、特許文献1で開示されるエレベータ保守作業支援装置においては、判定したエレベータの故障の兆候を通知するまでに留まる。このため、エレベータ保守作業支援装置は、エレベータの故障の兆候があると判定した場合に、エレベータが故障状態に遷移する確率を抑制するための保守行動を、保守員に具体的に提示できない。 However, in the elevator maintenance work support device disclosed in Patent Document 1, it remains until notifying the determined indication of elevator failure. For this reason, when it is determined that the elevator maintenance work support device has a sign of a failure of the elevator, the elevator maintenance work support device can not specifically present maintenance personnel for the maintenance action for suppressing the probability of the elevator transitioning to the failure state.
 本発明は、このような課題を解決するためになされた。本発明の目的は、エレベータに設けられた測定装置の測定データに基づいて、エレベータが故障する確率を抑制するための保守行動を、保守員に具体的に提示できるエレベータ保守作業支援装置を提供することである。 The present invention has been made to solve such problems. An object of the present invention is to provide an elevator maintenance work support device capable of specifically presenting maintenance activities for maintenance personnel to suppress the probability of failure of an elevator based on measurement data of a measurement device provided in the elevator. It is.
 本発明に係るエレベータ保守作業支援装置は、エレベータのセンサ及び保守員が使用する端末装置と通信可能に設けられ、センサが予め定めた測定期間に測定したエレベータの測定データを受信し、端末装置から保守員の行動データを受信する、通信部と、1つ以上のエレベータの各々に設けられた測定装置によって測定期間の間に測定される測定データに基づいて1つ以上のエレベータの各々の状態を表す状態データを設定し、行動データと状態データとを用いて強化学習を行い、強化学習の結果と測定データとに基づいて当該測定データを測定した測定期間の後の行動データを選択し、選択した行動データを通信部に通信させる行動選択部と、を備える。 The elevator maintenance work support device according to the present invention is provided communicably with a sensor of the elevator and a terminal device used by a maintenance worker, receives the measurement data of the elevator measured during a predetermined measurement period by the sensor, and transmits from the terminal device A communication unit for receiving maintenance data of maintenance personnel, and a state of each of the one or more elevators based on measurement data measured during a measurement period by a measurement device provided in each of the one or more elevators State data to be represented is set, reinforcement learning is performed using action data and state data, and action data after a measurement period in which the measurement data is measured is selected and selected based on the result of reinforcement learning and measurement data And an action selection unit that causes the communication unit to communicate the action data.
 本発明によれば、通信部は、エレベータのセンサ及び保守員が使用する端末装置と通信可能に設けられ、前記センサが予め定めた測定期間に測定したエレベータの測定データを受信し、端末装置から保守員の行動データを受信する。行動選択部は、1つ以上のエレベータの各々に設けられた測定装置によって測定期間の間に測定される測定データに基づいて1つ以上のエレベータの各々の状態を表す状態データを設定し、行動データと状態データとを用いて強化学習を行い、強化学習の結果と測定データとに基づいて当該測定データを測定した測定期間の後の行動データを選択し、選択した行動データを通信部に通信させる。これによって、エレベータに設けられた測定装置の測定データに基づいて、エレベータが故障する確率を抑制するための保守行動を、保守員に具体的に提示できる。 According to the present invention, the communication unit is provided in a communicable manner with the elevator sensor and the terminal device used by the maintenance staff, receives the measurement data of the elevator measured during a predetermined measurement period by the sensor, and transmits from the terminal device Receive maintenance data of maintenance personnel. The action selection unit sets state data representing the state of each of the one or more elevators based on measurement data measured during a measurement period by a measurement device provided in each of the one or more elevators, Reinforcement learning is performed using data and state data, action data after a measurement period in which the measurement data is measured is selected based on the result of reinforcement learning and measurement data, and the selected action data is communicated to the communication unit Let As a result, based on the measurement data of the measurement device provided in the elevator, maintenance actions can be specifically presented to the maintenance staff for suppressing the probability that the elevator will fail.
本発明の実施の形態1に係るエレベータ保守作業支援装置を備えるシステムの構成を示すブロック図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the structure of a system provided with the elevator maintenance operation assistance apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態1に係るエレベータ保守作業支援装置の構成を示すブロック図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the structure of the elevator maintenance operation assistance apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態1に係る報酬記憶部が記憶するデータの例を示す図である。It is a figure which shows the example of the data which the remuneration memory | storage part which concerns on Embodiment 1 of this invention memorize | stores. 本発明の実施の形態1に係るエレベータ保守作業支援装置の動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement of the elevator maintenance work assistance apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態1に係るエレベータ保守作業支援装置の報酬の決定における動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement in determination of the remuneration of the elevator maintenance work assistance apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態1の変形例に係る報酬決定部の動作の他を示すフローチャートである。It is a flowchart which shows the other than operation | movement of the remuneration determination part which concerns on the modification of Embodiment 1 of this invention. 本発明の実施の形態1に係るエレベータ保守作業支援装置の主要部のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the principal part of the elevator maintenance operation assistance apparatus which concerns on Embodiment 1 of this invention.
 本発明を実施するための形態について添付の図面を参照しながら説明する。各図において、同一または相当する部分には同一の符号を付して、重複する説明は適宜に簡略化または省略する。 DETAILED DESCRIPTION OF THE INVENTION Embodiments of the present invention will be described with reference to the attached drawings. In the drawings, the same or corresponding parts are denoted by the same reference numerals, and overlapping descriptions will be appropriately simplified or omitted.
 実施の形態1.
 図1は、本発明の実施の形態1に係るエレベータ保守作業支援装置を備えるシステムの構成を示すブロック図である。
Embodiment 1
FIG. 1 is a block diagram showing a configuration of a system including an elevator maintenance work support apparatus according to Embodiment 1 of the present invention.
 図1に示すシステムは、1つ以上のエレベータ1と、エレベータ保守作業支援装置10と、端末装置20と、を備える。 The system shown in FIG. 1 includes one or more elevators 1, an elevator maintenance work support device 10, and a terminal device 20.
 1つ以上のエレベータ1は、建築物2に設けられる。 One or more elevators 1 are provided in the building 2.
 エレベータ1の昇降路3は、建築物2の各階を貫く。エレベータ1の乗場4は、建築物2の各階に設けられる。乗場4は、各階に乗場出入口を有する。乗場出入口は、昇降路3に対応する。乗場出入口は、対応する昇降路3と対向する。 The hoistway 3 of the elevator 1 penetrates each floor of the building 2. The landing 4 of the elevator 1 is provided on each floor of the building 2. The hall 4 has a hall entrance at each floor. The landing entrance corresponds to the hoistway 3. The landing entrance faces the corresponding hoistway 3.
 エレベータ1は、かご5と、制御装置6と、センサ7と、通信装置8と、を備える。 The elevator 1 includes a car 5, a control device 6, a sensor 7 and a communication device 8.
 かご5は、昇降路3の内部に設けられる。かご5は、図示しない巻上機の動力により、昇降路3の内部において昇降する装置である。建築物2が複数のエレベータ1を備える場合に、複数のエレベータ1のかご5の各々は、水平投影面上で互いに重ならないように配置される。 The car 5 is provided inside the hoistway 3. The car 5 is a device that moves up and down inside the hoistway 3 by the power of a hoisting machine (not shown). When the building 2 is provided with a plurality of elevators 1, the cars 5 of the plurality of elevators 1 are arranged so as not to overlap each other on the horizontal projection plane.
 かご5は、図示しない戸を備える。かご5の戸は、かご5が停止しているときに、利用者がかご5を出入りできるように開閉する装置である。かご5の戸は、かご5に備えられた図示しない駆動装置によって開閉しうるように設けられる。 The car 5 is provided with a door not shown. The door of the car 5 is a device that opens and closes so that the user can move the car 5 in and out when the car 5 is stopped. The door of the car 5 is provided so as to be able to be opened and closed by a drive device (not shown) provided in the car 5.
 乗場4は、図示しない戸を備える。乗場4の戸は、乗場出入口に設けられる。乗場4の戸は、乗場4が設けられた階にかご5が停止しているときに、利用者が当該かご5を出入りできるように開閉する装置である。乗場4の戸は、当該かご5の戸の開閉と連動して開閉しうるように設けられる。 The landing 4 has a door not shown. The door of the hall 4 is provided at the hall entrance. The door of the landing 4 is a device that opens and closes so that the user can get in and out of the car 5 when the car 5 is stopped at the floor where the landing 4 is provided. The door of the landing 4 is provided so as to be able to open and close in conjunction with the opening and closing of the door of the car 5.
 制御装置6は、対応するかご5の動作を制御する装置である。かご5の動作は、例えば昇降動作および戸の開閉動作を含む。建築物2が複数のエレベータ1を備える場合に、制御装置6は、群管理装置9に接続される。 The controller 6 is a device that controls the operation of the corresponding car 5. The operation of the car 5 includes, for example, an elevation operation and an opening / closing operation of the door. When the building 2 includes a plurality of elevators 1, the control device 6 is connected to the group management device 9.
 群管理装置9は、接続している制御装置6が制御するかご5に対して、呼びの割当てを制御する装置である。 The group management device 9 controls call assignment to the car 5 controlled by the connected control device 6.
 センサ7は、エレベータ1の状態を測定する装置である。センサ7は、測定装置の例である。センサ7は、例えばエレベータ1の各々のかご5または巻上機に設けられる内部センサである。センサ7は、例えばエレベータ1の各々のかご5が対応する昇降路3または乗場4に設けられる外部センサである。 The sensor 7 is a device that measures the state of the elevator 1. The sensor 7 is an example of a measuring device. The sensors 7 are, for example, internal sensors provided in each car 5 or hoist of the elevator 1. The sensors 7 are, for example, external sensors provided in the hoistway 3 or the landing 4 to which each car 5 of the elevator 1 corresponds.
 内部センサは、例えばかご5の位置、速度、加速度または振動を測定する装置である。内部センサは、例えばかご5の戸の開閉速度または振動を測定する装置である。内部センサは、例えば巻上機のトルクまたは振動を測定する装置である。外部センサは、例えば昇降路3の温度または湿度を測定する装置である。外部センサは、例えば乗場4の戸の開閉速度または振動を測定する装置である。 The internal sensor is, for example, a device that measures the position, velocity, acceleration or vibration of the car 5. The internal sensor is, for example, a device that measures the opening / closing speed or vibration of the door of the car 5. The internal sensor is, for example, a device that measures the torque or vibration of the hoist. The external sensor is, for example, a device that measures the temperature or humidity of the hoistway 3. The external sensor is, for example, a device that measures the opening / closing speed or vibration of the door of the landing 4.
 センサ7は、測定した結果を測定データとして送信できるように、エレベータ1の通信装置8に直接または間接的に接続される。 The sensor 7 is connected directly or indirectly to the communication device 8 of the elevator 1 so that the measurement result can be transmitted as measurement data.
 エレベータ1の通信装置8は、センサ7が測定した測定データを通信する装置である。エレベータ1の通信装置8は、エレベータ保守作業支援装置10に直接または間接的に接続される。 The communication device 8 of the elevator 1 is a device that communicates measurement data measured by the sensor 7. The communication device 8 of the elevator 1 is connected directly or indirectly to the elevator maintenance work support device 10.
 エレベータ保守作業支援装置10は、エレベータ1の通信装置8から受信するセンサ7の測定データに基づいて、エレベータ1の保守員に保守行動を提示する装置である。エレベータ保守作業支援装置10は、例えばサーバコンピュータである。 The elevator maintenance work support device 10 is a device that presents maintenance activities to maintenance personnel of the elevator 1 based on measurement data of the sensor 7 received from the communication device 8 of the elevator 1. The elevator maintenance work support device 10 is, for example, a server computer.
 保守行動は、エレベータ1の保守作業において保守員が行いうる具体的な行動である。保守行動は、例えば巻上機の部品の交換、かご5の部品の点検および乗場4の戸の調整を含む。 The maintenance action is a specific action which can be performed by a maintenance worker in the maintenance work of the elevator 1. Maintenance activities include, for example, replacement of parts of the hoist, inspection of parts of the car 5 and adjustment of the doors of the landing 4.
 エレベータ保守作業支援装置10は、有線または無線により通信できるように、端末装置20と接続される。 The elevator maintenance work support device 10 is connected to the terminal device 20 so as to be able to communicate by wire or wirelessly.
 端末装置20は、エレベータ保守作業支援装置10が行動データとして提示する保守行動の内容を表示し、保守員に選択された行動データをエレベータ保守作業支援装置10に送信する装置である。 The terminal device 20 is a device that displays the contents of the maintenance action that the elevator maintenance work support device 10 presents as action data, and transmits the action data selected by the maintenance worker to the elevator maintenance work support device 10.
 行動データは、保守行動の各々を表すデータである。行動データは、例えばM種類の保守行動の各々に対応付けられた整数値の保守行動コードである。 The action data is data representing each of the maintenance actions. The action data is, for example, an integer value maintenance action code associated with each of M types of maintenance actions.
 端末装置20は、通信部21と表示部22と入力部23とを備える。端末装置20は、例えばパーソナルコンピュータである。 The terminal device 20 includes a communication unit 21, a display unit 22, and an input unit 23. The terminal device 20 is, for example, a personal computer.
 端末装置20の通信部21は、エレベータ保守作業支援装置10と直接または間接的に接続される。端末装置20の通信部21は、エレベータ保守作業支援装置10と行動データを通信する部分である。通信部21の機能は、例えばネットワークカードにより実現される。 The communication unit 21 of the terminal device 20 is directly or indirectly connected to the elevator maintenance work support device 10. The communication unit 21 of the terminal device 20 is a portion that communicates the behavior data with the elevator maintenance work support device 10. The function of the communication unit 21 is realized by, for example, a network card.
 表示部22は、端末装置20の通信部21を介してエレベータ保守作業支援装置10から受信した行動データが表す保守行動の内容を表示する部分である。表示部22の機能は、例えばディスプレイにより実現される。 The display part 22 is a part which displays the content of the maintenance action which the action data received from the elevator maintenance work assistance apparatus 10 via the communication part 21 of the terminal device 20 represent. The function of the display unit 22 is realized by, for example, a display.
 入力部23は、保守員によって選択される行動データの入力を受け付ける部分である。入力部23の機能は、例えばキーボードおよびマウスにより実現される。入力部23が入力を受け付けた行動データは、端末装置20の通信部21を介してエレベータ保守作業支援装置10に送信される。 The input unit 23 is a part that receives an input of action data selected by the maintenance staff. The function of the input unit 23 is realized by, for example, a keyboard and a mouse. The action data of which the input unit 23 receives an input is transmitted to the elevator maintenance work support apparatus 10 via the communication unit 21 of the terminal device 20.
 エレベータ保守作業支援装置10は、センサ7から受信した測定データに基づいて行動データを選択する。エレベータ保守作業支援装置10は、選択した行動データを端末装置20に送信することによって、表示部22を介して保守員に保守行動を提示する。エレベータ保守作業支援装置10は、端末装置20の入力部23を介して保守員によって入力された行動データを用いて強化学習を行う。エレベータ保守作業支援装置10は、強化学習の結果を、次の行動データの選択に反映させる。 The elevator maintenance work support apparatus 10 selects action data based on the measurement data received from the sensor 7. The elevator maintenance work support device 10 presents the maintenance action to the maintenance worker via the display unit 22 by transmitting the selected action data to the terminal device 20. The elevator maintenance work support device 10 performs reinforcement learning using the action data input by the maintenance worker via the input unit 23 of the terminal device 20. The elevator maintenance work support apparatus 10 reflects the result of reinforcement learning in the selection of the next action data.
 続いて、本実施の形態に係るエレベータ保守作業支援装置の機能を説明する。図2は、本実施の形態に係るエレベータ保守作業支援装置の機能を示すブロック図である。 Subsequently, functions of the elevator maintenance work support device according to the present embodiment will be described. FIG. 2 is a block diagram showing functions of the elevator maintenance work support device according to the present embodiment.
 エレベータ保守作業支援装置10は、通信部11と、行動選択部12と、報酬決定部13と、報酬記憶部14と、を備える。 The elevator maintenance work support device 10 includes a communication unit 11, an action selection unit 12, a reward determination unit 13, and a reward storage unit 14.
 エレベータ保守作業支援装置10の通信部11は、センサ7の各々に直接または間接的に接続される。エレベータ保守作業支援装置10の通信部11は、端末装置20に有線または無線により接続される。エレベータ保守作業支援装置10の通信部11は、センサ7および端末装置20との間の通信を行う部分である。 The communication unit 11 of the elevator maintenance operation support device 10 is connected to each of the sensors 7 directly or indirectly. The communication unit 11 of the elevator maintenance work support device 10 is connected to the terminal device 20 by wire or wireless. The communication unit 11 of the elevator maintenance work support device 10 is a portion that performs communication between the sensor 7 and the terminal device 20.
 行動選択部12は、センサ7が測定した測定データに基づいて行動データを選択する部分である。 The action selection unit 12 is a part that selects action data based on the measurement data measured by the sensor 7.
 報酬決定部13は、行動選択部12の行動データの選択に対する報酬を決定する部分である。 The reward determination unit 13 is a part that determines a reward for the selection of the action data of the action selection unit 12.
 報酬は、行動選択部12の行動データの選択に対して得られた結果の評価を表す数値データである。 The reward is numerical data representing the evaluation of the result obtained for the selection of the action data of the action selection unit 12.
 報酬記憶部14は、報酬決定部13が決定した報酬を記憶する部分である。 The reward storage unit 14 stores the reward determined by the reward determination unit 13.
 通信部11は、センサ7の各々から測定データを受信する。 The communication unit 11 receives measurement data from each of the sensors 7.
 測定データは、センサ7によって、測定期間の間に測定される。 Measurement data is measured by the sensor 7 during the measurement period.
 測定期間は、例えば保守員によるエレベータ1の各々に対する保守行動の間の期間である。この場合、保守行動は保守作業における具体的な行動なので、測定期間は、保守作業の間の期間である。 The measurement period is, for example, a period between maintenance actions on each of the elevators 1 by the maintenance staff. In this case, since the maintenance action is a concrete action in the maintenance work, the measurement period is a period between the maintenance work.
 行動選択部12は、エレベータ保守作業支援装置10の通信部11が受信した測定データに基づいて、エレベータ1の各々の状態データを設定する。 The action selection unit 12 sets each state data of the elevator 1 based on the measurement data received by the communication unit 11 of the elevator maintenance work support device 10.
 測定期間の間に測定される測定データに基づいて設定される状態データは、当該測定期間の状態データである。 State data set based on measurement data measured during the measurement period is state data of the measurement period.
 行動選択部12は、例えば次のように測定データが測定された測定期間の状態データを設定する。以下では、エレベータ1の各々に設けられたセンサ7は、同じ種類の1つのセンサであるとして説明する。すなわち、エレベータ1の各々に設けられたセンサ7の測定データを数値データであるとして説明する。 The action selection unit 12 sets state data of the measurement period in which the measurement data is measured, for example, as follows. Below, sensor 7 provided in each of elevators 1 is explained as one sensor of the same kind. That is, the measurement data of the sensor 7 provided in each of the elevators 1 will be described as numerical data.
 行動選択部12は、測定データの測定期間にわたる平均値を算出する。行動選択部12は、測定データの平均値の取り得る範囲をN分割するN個の区間を予め記憶している。N個の区間の各々は、N個の状態コードの各々に対応する。行動選択部12は、算出した平均値が含まれる区間に対応する状態コードを、測定データが測定された測定期間の状態データに設定する。 The action selection unit 12 calculates an average value of measurement data over the measurement period. The action selection unit 12 stores in advance N sections into which the possible range of the average value of the measurement data is divided by N. Each of the N intervals corresponds to each of the N status codes. The action selection unit 12 sets a state code corresponding to a section including the calculated average value in the state data of the measurement period in which the measurement data is measured.
 行動選択部12は、保守作業によるエレベータ1の状態の変化を、強化学習を行うことで学習する。具体的には、行動選択部12は、例えば保守作業によるエレベータ1の状態の変化をマルコフ決定過程としてモデル化した場合のQ学習のアルゴリズムにより強化学習を行う。 The action selection unit 12 learns the change in the state of the elevator 1 due to the maintenance work by performing reinforcement learning. Specifically, the action selection unit 12 performs reinforcement learning using an algorithm of Q learning in the case where, for example, a change in the state of the elevator 1 due to maintenance work is modeled as a Markov decision process.
 行動選択部12は、行動データの選択の各々を時刻tでラベル付けして強化学習を行う。時刻tは、強化学習の開始時には0であり、行動選択部12による行動データの選択ごとに1ずつ増える整数である。以下では、下付き添字のtは、時刻tにおける行動データの選択に対応する値またはデータであることを表す。 The action selection unit 12 performs reinforcement learning by labeling each of the selection of action data with time t. The time t is 0 at the start of reinforcement learning, and is an integer which is incremented by one each time the action selection unit 12 selects action data. In the following, the subscript t indicates that it is a value or data corresponding to the selection of behavioral data at time t.
 行動選択部12は、エレベータ1の各々についてQ値を記憶する。Q値は、状態データsと行動データaとでラベル付けされる数値データQ(s,a)である。 The action selection unit 12 stores the Q value for each of the elevators 1. The Q value is numerical data Q (s, a) labeled with state data s and action data a.
 Q(s,a)は、強化学習が進むと、状態データsに対して行動データaを選択しその後は最も高い報酬が得られるように行動データを選択し続けた場合にその後得られる利得の期待値に収束する。 Q (s, a) is a gain that can be obtained when reinforcement learning progresses and behavior data a is selected for state data s and behavior data is subsequently selected so as to obtain the highest reward. Converge to the expected value.
 時刻tの利得Vは、時刻tの報酬をr、割引率をγとして、次の式(1)で表される。割引率γは、予め定めた0≦γ<1の数値である。 Gain V t at time t is reward r t at time t, as the discount rate gamma, is expressed by the following equation (1). The discount rate γ is a predetermined numerical value of 0 ≦ γ <1.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 行動選択部12は、時刻tの行動データの選択の前に測定されるエレベータ1の各々の測定データに基づいてエレベータ1の状態データsの各々を設定する。行動選択部12は、エレベータ1の各々についてのQ値と設定した状態データsとに基づいて、当該エレベータ1について行動データaを選択する。 Action selection unit 12 sets the respective status data s t of the elevator 1 based on each of the measurement data of the elevator 1 that is measured before the selected behavioral data at time t. Action selection unit 12, based on the status data s t set the Q value for each of the elevator 1, selects the behavior data a t for the elevator 1.
 行動選択部12は、例えばε-greedy選択法によって行動データを選択する。すなわち、行動選択部12は、予め定めたεの確率で、M個の行動データのうちからランダムに行動データaを選択する。行動選択部12は、1-εの確率で、状態データsでラベル付けされたQ値を最大にする行動データaを、次の式(2)によって選択する。 The action selection unit 12 selects action data by, for example, the ε-greedy selection method. That is, the action selecting section 12, with probability ε predetermined, randomly selects behavior data a t from the M activity data. Action selection unit 12, with probability 1-epsilon, the behavior data a t to maximize the labeled Q value in the state data s t, selected by the following equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 行動選択部12は、選択した行動データaを、エレベータ保守作業支援装置10の通信部11を介して端末装置20に送信する。 Action selector 12, the selected behavioral data a t, via the communication unit 11 of the elevator maintenance work support device 10 to the terminal device 20.
 エレベータ保守作業支援装置10の通信部11は、保守作業において行ったと保守員に入力された保守行動を表す行動データa を、端末装置20から受信する。以下では、上付き添字のmは、保守員によって行われた保守行動を表すデータであることを表す。 The communication unit 11 of the elevator maintenance work support device 10, the behavior data a t m representing the input was done in maintenance work and maintenance personnel maintenance actions, it receives from the terminal device 20. In the following, the superscript m represents that it is data representing the maintenance action performed by the maintenance staff.
 行動選択部12は、時刻tの行動データの選択に続く測定期間の測定データに基づいて、次の状態データst+1を設定する。 The action selection unit 12 sets the next state data st + 1 based on the measurement data of the measurement period subsequent to the selection of the action data at time t.
 報酬決定部13は、行動選択部12が選択した行動データaと端末装置20から受信した行動データa とが同じであるか否かを判定する。 Compensation determination unit 13 determines whether the behavior data a t m received is the same from the action action selection unit 12 selects data a t and the terminal device 20.
 報酬決定部13は、行動データaと行動データa とが同じ場合に、時刻tの行動データの選択に続く測定期間の測定データに基づいて報酬rを決定する。 Compensation determination unit 13, the behavior data a t and behavior data a t m is the case the same, determines the reward r based on the measurement data of the measurement period subsequent to the selection of behavior data at time t.
 報酬決定部13は、測定データに基づいて例えば次のように報酬rを決定する。報酬決定部13は、測定された測定データの数値と正常状態を表す数値との差の絶対値の測定期間にわたる最大値が予め定めた値よりも小さい場合に、報酬rを+1に決定する。報酬決定部13は、測定された測定データの数値と異常状態を表す数値との差の絶対値の測定期間にわたる最小値が予め定めた値よりも小さい場合に、報酬rを-1に決定する。 The reward determination unit 13 determines the reward r, for example, as follows, based on the measurement data. The reward determination unit 13 determines the reward r to be +1 when the maximum value over the measurement period of the absolute value of the difference between the measured numerical value of the measured data and the numerical value representing the normal state is smaller than a predetermined value. The reward determination unit 13 determines the reward r to be -1 when the minimum value over the measurement period of the absolute value of the difference between the measured numerical value of the measured data and the numerical value representing the abnormal state is smaller than a predetermined value. .
 より具体的な例として、報酬決定部13は、測定データがかご5の停止位置である場合に、設定位置を正常状態の数値として、停止位置と設定位置との差の絶対値が予め定めた値である1mmを測定期間の間に超えない場合に、報酬rを+1に決定する。 As a more specific example, when the measurement data is the stop position of the car 5, the reward determination unit 13 sets the set position as the numerical value of the normal state, and the absolute value of the difference between the stop position and the set position is predetermined. If the value 1 mm is not exceeded during the measurement period, the reward r is determined to be +1.
 報酬記憶部14は、時刻tにおける状態データsおよび行動データaと報酬決定部13が決定した報酬rとを関連付けて、行動選択部12が行動データaの選択を行ったエレベータ1について記憶する。 Compensation storage unit 14 in association with reward r state data s t and behavioral data a t and reward determination unit 13 at time t is determined, the elevator 1 the action selection unit 12 makes a selection of the behavior data a t Remember.
 報酬決定部13は、行動データaと行動データa とが異なる場合に、報酬記憶部14が他のエレベータについて、行動データaおよび状態データst+1に関連付けて報酬r1を記憶しているか否かを判定する。 Compensation determination unit 13, when the behavior data a t and behavior data a t m are different, compensation storage section 14 for the other elevator, to store the compensation r1 in association with behavioral data a t and state data s t + 1 Determine if there is.
 報酬記憶部14が他のエレベータについて、行動データaおよび状態データst+1に関連付けて報酬r1を記憶している場合に、報酬決定部13は、行動選択部12の行動データaの選択に対する報酬rをr1に決定する。 Compensation storage unit 14 for the other elevator, when storing the compensation r1 in association with behavioral data a t and state data s t + 1, reward determination unit 13, for the selection of behavior data a t the action selection unit 12 Determine the reward r to be r1.
 報酬記憶部14は、時刻tにおける状態データsおよび行動データaと報酬決定部13が決定した報酬rとを関連付けて、行動選択部12が行動データaの選択を行ったエレベータ1について記憶する。 Compensation storage unit 14 in association with reward r state data s t and behavioral data a t and reward determination unit 13 at time t is determined, the elevator 1 the action selection unit 12 makes a selection of the behavior data a t Remember.
 報酬記憶部14が他のエレベータについて、行動データaおよび状態データst+1に関連付けて報酬rを記憶していない場合に、報酬決定部13は、行動選択部12の行動データaの選択に対する報酬rを予め定めた値に決定する。具体的な例として、報酬記憶部14は、行動選択部12の行動データaに対する報酬rに0を設定する。 Compensation storage unit 14 for the other elevator, if not stored a reward r 1 in association with the action data a t and state data s t + 1, reward determination unit 13 selects the behavior data a t the action selection unit 12 The reward r for is determined to a predetermined value. As a specific example, it rewards storage unit 14 sets 0 to reward r on behavior data a t the action selection unit 12.
 行動選択部12は、報酬決定部13が決定した報酬rに基づいて、Q値を更新する。具体的には、行動選択部12は、αを学習率として、次の式(3)によりQ(s,a)を更新する。 The action selection unit 12 updates the Q value based on the reward r determined by the reward determination unit 13. Specifically, the action selection unit 12 updates Q (s t , a t ) according to the following equation (3), where α is a learning rate.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 行動選択部12は、時刻t+1において、エレベータ1の各々についての更新したQ値と設定した状態データst+1とに基づいて、当該エレベータ1について次の行動データat+1を選択する。 The action selection unit 12 selects the next action data at + 1 for the elevator 1 based on the updated Q value for each of the elevators 1 and the set state data st + 1 at time t + 1 .
 続いて、報酬記憶部14が記憶するデータを説明する。図3は、本実施の形態に係る報酬記憶部14が記憶するデータの例を示す図である。 Subsequently, data stored in the reward storage unit 14 will be described. FIG. 3 is a diagram showing an example of data stored in the reward storage unit 14 according to the present embodiment.
 報酬記憶部14は、エレベータ1の各々について、N種類の状態データおよびM種類の行動データの組と、数値データである報酬を関連付けて記憶する。報酬記憶部14は、例えば、行動データA1および行動データA1が選択された後の状態データS2と、報酬R12と、を関連付けて記憶している。報酬記憶部14は、行動データが選択される前の状態データをあわせて記憶してもよい。 The reward storage unit 14 stores, for each of the elevators 1, a set of N state data and M type action data in association with a reward, which is numerical data. The reward storage unit 14 stores, for example, the state data S2 after the action data A1 and the action data A1 have been selected, and the reward R12 in association with each other. The reward storage unit 14 may store together state data before the action data is selected.
 続いて、エレベータ保守作業支援装置10の動作を説明する。図4は、本実施の形態に係るエレベータ保守作業支援装置の動作の例を示すフローチャートである。 Subsequently, the operation of the elevator maintenance work support device 10 will be described. FIG. 4 is a flow chart showing an example of the operation of the elevator maintenance work support system according to the present embodiment.
 エレベータ保守作業支援装置10は、エレベータ1の各々について図4に示す動作を行う。 The elevator maintenance work support device 10 performs the operation shown in FIG. 4 for each of the elevators 1.
 行動選択部12は、現在の時刻tを0に設定する。その後、行動選択部12は、測定データに基づいて状態データsを設定する(S101)。 The action selection unit 12 sets the current time t to zero. Thereafter, the action selection unit 12 sets state data s 0 based on the measurement data (S101).
 行動選択部12は、現在の時刻tの行動データaを、Q値とsとに基づいて選択する(S102)。 Action selection unit 12, the behavior data a t the current time t, is selected based on the Q value and s t (S102).
 エレベータ保守作業支援装置10の通信部11は、行動選択部12が選択した行動データaを、端末装置20に送信する(S103)。 The communication unit 11 of the elevator maintenance work support device 10, the behavior data a t the action selecting section 12 selects, and transmits to the terminal device 20 (S103).
 端末装置20は、受信した行動データaが表す保守行動の内容を表示部22に表示する。保守員は、表示部22に表示された内容を参考にして、保守作業において実際に行う保守行動を判断する。保守員は、保守作業が終了した後に、実際に行った保守行動を表す行動データa を入力部23から入力する。端末装置20の通信部21は、入力された行動データa をエレベータ保守作業支援装置10に送信する。 The terminal device 20 displays the contents of the maintenance action received behavioral data a t represented on the display unit 22. The maintenance worker judges the maintenance action actually performed in the maintenance work with reference to the content displayed on the display unit 22. Maintenance personnel, after the maintenance work has been completed, enter the action data a t m which represents the actual maintenance actions were carried out from the input unit 23. The communication unit 21 of the terminal device 20 transmits the input action data a t m in the elevator maintenance work support device 10.
 エレベータ保守作業支援装置10の通信部11は、端末装置20から行動データa を受信する(S104)。 The communication unit 11 of the elevator maintenance work support device 10 receives the action data a t m from the terminal device 20 (S104).
 行動選択部12は、保守作業の後の測定期間の測定データに基づいて、状態データst+1を設定する(S105)。 The action selection unit 12 sets the state data st + 1 based on the measurement data of the measurement period after the maintenance work (S105).
 報酬決定部13は、行動選択部12による行動データaの選択に対する報酬rを決定する(S106)。 Compensation determination unit 13 determines the reward r for the selection of activity data a t by the action selecting section 12 (S106).
 行動選択部12は、報酬決定部13が決定した報酬rと状態データst+1とを用いてQ値を更新する(S107)。 The action selection unit 12 updates the Q value using the reward r determined by the reward determination unit 13 and the state data st + 1 (S107).
 行動選択部は、現在の時刻tを次の時刻t+1とする(S108)。その後、エレベータ保守作業支援装置10の動作は、S102に進む。 The action selection unit sets the current time t as the next time t + 1 (S108). Thereafter, the operation of the elevator maintenance work support device 10 proceeds to S102.
 続いて、S106において行うエレベータ保守作業支援装置10の報酬の決定における動作を説明する。図5は、本実施の形態に係るエレベータ保守作業支援装置の報酬の決定における動作の例を示すフローチャートである。 Subsequently, an operation in determination of a reward of the elevator maintenance work support device 10 performed in S106 will be described. FIG. 5 is a flow chart showing an example of operation in determination of a reward of the elevator maintenance work support device according to the present embodiment.
 報酬決定部13は、行動選択部12が選択した行動データaと、端末装置20から受信した行動データa とが、同じであるかを判定する(S201)。判定結果がYesの場合、エレベータ保守作業支援装置10の動作はS202に進む。判定結果がNoの場合、エレベータ保守作業支援装置10の動作はS204に進む。 Compensation determination unit 13 determines a behavior data a t the action selection unit 12 selects a behavior data a t m received from the terminal device 20, whether the same (S201). When the determination result is Yes, the operation of the elevator maintenance work support device 10 proceeds to S202. If the determination result is No, the operation of the elevator maintenance work support device 10 proceeds to S204.
 報酬決定部13は、測定データに基づいて報酬rを決定する(S202)。 The reward determination unit 13 determines the reward r based on the measurement data (S202).
 報酬記憶部14は、行動選択部12が行動データaを選択したエレベータについて、行動データaおよび状態データst+1と、報酬rとを関連付けて記憶する(S203)。 Compensation storage unit 14, the elevator action selecting section 12 selects the behavior data a t, the behavior data a t and state data s t + 1, and stores in association with reward r (S203).
 報酬決定部13は、報酬記憶部14が他のエレベータについて、行動データaおよび状態データst+1と報酬r1とを関連付けて記憶しているかを判定する(S204)。判定結果がYesの場合、エレベータ保守作業支援装置10の動作はS205に進む。判定結果がNoの場合、エレベータ保守作業支援装置10の動作はS206に進む。 Compensation determination section 13 determines whether compensation storage section 14 for the other elevator, in association with behavior data a t and state data s t + 1 and reward r1 stored (S204). If the determination result is Yes, the operation of the elevator maintenance work support device 10 proceeds to S205. If the determination result is No, the operation of the elevator maintenance work support device 10 proceeds to S206.
 報酬決定部13は、報酬記憶部14から報酬r1を取得して、行動選択部12による行動データaの選択に対する報酬rをr1に決定する(S205)。 Compensation determination unit 13 acquires the reward r1 remuneration storage unit 14, determines the reward r to r1 for the selection of activity data a t by the action selecting section 12 (S205).
 報酬決定部13は、行動選択部12による行動データaの選択に対する報酬rを0に決定する(S206)。 Compensation determination unit 13, the reward r for the selection of activity data a t by action selecting section 12 determines to 0 (S206).
 以上に説明したように、本実施の形態に係るエレベータ保守作業支援装置においては、行動選択部12は、行動データとセンサ7によって測定期間の間に測定される測定データに基づいて設定される状態データとを用いて強化学習を行う。行動選択部12は、強化学習の結果として得られるQ値と状態データとに基づいて、行動データを選択する。エレベータ保守作業支援装置10の通信部11は、行動選択部12が選択した行動データを端末装置20に送信する。 As described above, in the elevator maintenance work support device according to the present embodiment, the action selection unit 12 is set based on the action data and the measurement data measured by the sensor 7 during the measurement period. Reinforcement learning is performed using the data. The action selection unit 12 selects action data based on the Q value obtained as a result of reinforcement learning and the state data. The communication unit 11 of the elevator maintenance work support device 10 transmits the behavior data selected by the behavior selection unit 12 to the terminal device 20.
 これによって、保守員は、端末装置20を介してエレベータ保守作業支援装置10が送信した行動データが表す保守行動の内容を知ることができる。また、行動選択部12は、行動データの選択を強化学習に基づいて行う。したがって、エレベータ保守作業支援装置10は、エレベータ1に設けられたセンサ7の測定データに基づいて、エレベータが故障する確率を抑制するための保守行動を保守員に具体的に提示できる。これによって、エレベータの運用効率を高めることができる。 Thereby, the maintenance worker can know the contents of the maintenance action represented by the action data transmitted by the elevator maintenance work support apparatus 10 through the terminal device 20. Further, the action selection unit 12 performs selection of action data based on reinforcement learning. Therefore, elevator maintenance operation support device 10 can specifically present maintenance personnel to maintenance personnel for suppressing the probability of failure of the elevator based on the measurement data of sensor 7 provided in elevator 1. This can improve the elevator operation efficiency.
 行動選択部12は強化学習の結果に基づいて行動データを選択する。すなわち、行動選択部12は、故障状態に陥らないためにどの行動データを選択すべきかを、選択の後の測定データに基づいて学習する。よって、強化学習が進んだ後は、行動選択部12は、メカニズムが明確になっていない故障についても、当該故障を避けるための行動データを選択できる。 The action selection unit 12 selects action data based on the result of reinforcement learning. That is, the action selection unit 12 learns which action data should be selected to avoid a failure state based on the measurement data after the selection. Therefore, after reinforcement learning progresses, the action selection unit 12 can select action data for avoiding the failure even for a failure whose mechanism is not clear.
 報酬決定部13は、報酬を決定するときに、行動選択部12が選択した行動データと保守員が行った保守行動を表す行動データとが一致しない場合に、報酬記憶部14が記憶しているデータを参照する。行動選択部12は、報酬決定部13が決定した報酬を用いて強化学習を行う。これにより、行動選択部12は、選択した行動データが表す保守行動と保守員が行った保守行動とが一致しない場合においても、強化学習を行うことができる。 When determining the reward, the reward determination unit 13 stores the reward storage unit 14 when the action data selected by the action selection unit 12 does not match the action data indicating the maintenance action performed by the maintenance worker. Refer to the data. The action selection unit 12 performs reinforcement learning using the reward determined by the reward determination unit 13. Thereby, the action selection unit 12 can perform reinforcement learning even when the maintenance action represented by the selected action data does not match the maintenance action performed by the maintenance worker.
 行動選択部12は、エレベータ1の各々について強化学習を行う。報酬決定部13は、エレベータ1の各々について、行動選択部12の選択に対する報酬を決定する。報酬記憶部14は、エレベータ1の各々についてデータを記憶する。これにより、行動選択部12は、エレベータ1の各々によって異なる運転の状況および設置の状況等による影響を取り入れた行動データを選択できる。 The action selection unit 12 performs reinforcement learning for each of the elevators 1. The reward determination unit 13 determines, for each of the elevators 1, a reward for the selection of the action selection unit 12. The reward storage unit 14 stores data for each of the elevators 1. As a result, the behavior selection unit 12 can select behavior data incorporating effects of different driving situations and installation situations, etc., for each elevator 1.
 報酬決定部13は、1つのエレベータ1について行動選択部12が選択した行動データと保守員が行った保守行動を表す行動データとが一致しない場合に、報酬記憶部14が記憶している他のエレベータ1についてのデータを参照する。これにより、報酬決定部13が参照できるデータの量は、同じエレベータについてのデータのみを参照する場合より多くなる。よって、報酬決定部13は、報酬記憶部14が記憶しているデータを参照しない場合に比べて、多くの場合に報酬を決定できるようになる。したがって、行動選択部12は、報酬記憶部14が記憶しているデータを参照しない場合に比べて、強化学習の機会を多く得られる。 When the action data selected by the action selection unit 12 for one elevator 1 does not match the action data representing the maintenance action performed by the maintenance worker, the reward determination unit 13 stores the other information stored in the reward storage unit 14. Refer to the data for the elevator 1. As a result, the amount of data that can be referred to by the reward determination unit 13 is greater than when only data for the same elevator is referred to. Therefore, the reward determination unit 13 can determine the reward in many cases as compared to the case where the data stored in the reward storage unit 14 is not referred to. Therefore, the action selection unit 12 can obtain many opportunities for reinforcement learning as compared to the case where the reward storage unit 14 does not refer to the data stored therein.
 続いて、S106において行うエレベータ保守作業支援装置10の報酬の決定における動作の他の例を説明する。図6は、本実施の形態に係るエレベータ保守作業支援装置の報酬の決定における動作の他の例を示すフローチャートである。 Then, the other example of the operation | movement in determination of the remuneration of the elevator maintenance work assistance apparatus 10 performed in S106 is demonstrated. FIG. 6 is a flow chart showing another example of the operation in determination of remuneration of the elevator maintenance work support system according to the present embodiment.
 以下では、図5に示した動作の例と相違する特徴について詳しく説明する。具体的には、エレベータ保守作業支援装置10は、図5に示した動作の例におけるS204およびS205に代えて、S214およびS215により報酬を決定する。以下で説明しない特徴については、エレベータ保守作業支援装置10は、例えば図5に示した動作の例と同様の特徴が採用される。 Hereinafter, features different from the example of the operation illustrated in FIG. 5 will be described in detail. Specifically, the elevator maintenance work support apparatus 10 determines the reward in S214 and S215 instead of S204 and S205 in the example of the operation shown in FIG. For the features not described below, the elevator maintenance work support apparatus 10 adopts, for example, the same features as the example of the operation illustrated in FIG. 5.
 報酬決定部13は、エレベータ1について選択した行動データaと行動データa とが異なる場合に、報酬記憶部14が当該エレベータについて、行動データaおよび状態データst+1に関連付けて報酬r2を記憶しているか否かを判定する(S214)。判定結果がYesの場合、エレベータ保守作業支援装置の動作はS215に進む。判定結果がNoの場合、エレベータ保守作業支援装置の動作はS206に進む。 Compensation determination unit 13, when the behavior data a t selected for the elevator 1 and the behavior data a t m are different, compensation storage section 14 for the elevator, compensation in association with the action data a t and state data s t + 1 r2 Is determined (S214). When the determination result is Yes, the operation of the elevator maintenance work support device proceeds to S215. If the determination result is No, the operation of the elevator maintenance work support device proceeds to S206.
 報酬決定部13は、報酬記憶部14から当該エレベータ1の過去のデータである報酬r2を取得して、行動選択部12の行動データaの選択に対する報酬rをr2に決定する(S215)。 Compensation determination unit 13, the compensation memory unit 14 acquires the reward r2 is a historical data of the elevator 1, determines the reward r to r2 for the selected behavioral data a t the action selection unit 12 (S215).
 以上に説明したように、報酬決定部13は、1つのエレベータ1について行動選択部12が選択した行動データと保守員が行った保守行動を表す行動データとが一致しない場合に、報酬記憶部14が記憶している当該エレベータ1についてのデータを参照する。これにより、行動選択部12は、選択した行動データが表す保守行動と保守員が行った保守行動とが一致しない場合においても、当該エレベータ1についてのデータに基づいて強化学習を行うことができる。よって、行動選択部12は、強化学習に用いるデータの適合性を高めることができる。 As described above, the reward determination unit 13 determines that the reward storage unit 14 does not match the action data selected by the action selection unit 12 for one elevator 1 and the action data representing the maintenance action performed by the maintenance worker. Refers to the data about the elevator 1 stored by the. Thus, even when the maintenance action represented by the selected action data does not match the maintenance action performed by the maintenance worker, the action selection unit 12 can perform reinforcement learning based on the data for the elevator 1. Therefore, the action selection unit 12 can enhance the compatibility of data used for reinforcement learning.
 なお、行動選択部12は、測定データの測定期間にわたる平均値に替えて、測定期間の間にセンサ7によって測定される値の分布を確率分布とみなして算出される値に基づいて状態データを設定してもよい。行動選択部12は、当該確率分布の最頻値、最大値、最小値、中央値、四分位差、モーメントまたはキュムラントに基づいて状態データを設定してもよい。 In addition, the action selection unit 12 converts the state data into a state data based on a value calculated by regarding the distribution of values measured by the sensor 7 during the measurement period as the probability distribution instead of the average value over the measurement period of the measurement data. It may be set. The action selection unit 12 may set state data based on the mode, maximum value, minimum value, median value, interquartile difference, moment, or cumulant of the probability distribution.
 行動選択部12は、測定データを時系列データとして周波数解析を行った結果に基づいて状態データを設定してもよい。 The action selection unit 12 may set state data based on the result of frequency analysis using measurement data as time series data.
 エレベータ1の各々に設けられたセンサ7は、複数種類の複数のセンサであってもよい。センサ7の測定データは、センサ7の種類がd種類である場合に、d個の数値を成分とするベクトルデータであってもよい。 The sensors 7 provided in each of the elevators 1 may be a plurality of types of sensors. The measurement data of the sensor 7 may be vector data having d numerical values as components when the type of the sensor 7 is d.
 測定データがベクトルデータである場合に、行動選択部12は、例えば次のように測定データが測定された測定期間の状態データを設定する。行動選択部12は、測定データのd個の成分の各々について測定期間にわたる平均値を算出する。行動選択部12は、測定データのd個の成分の各々について、平均値の取り得る範囲をL分割するL個の区間を予め記憶している。測定データのd個の成分の各々が含まれる区間の各々の組合せは、L種類のラベルで特定される。L=Nとして、N個のラベルの各々は、N個の状態コードの各々に対応する。行動選択部12は、算出した平均値が含まれるラベルに対応する状態コードを、測定データが測定された測定期間の状態データに設定する。 When the measurement data is vector data, the action selection unit 12 sets state data of the measurement period in which the measurement data is measured, for example, as follows. The action selection unit 12 calculates an average value over the measurement period for each of the d components of the measurement data. The action selection unit 12 stores, in advance, L sections in which the possible range of the average value is divided by L for each of the d components of the measurement data. The combination of each of the sections in which each of the d components of the measurement data is included is identified by L d kinds of labels. As L d = N, each of the N labels corresponds to each of the N status codes. The action selection unit 12 sets the state code corresponding to the label including the calculated average value in the state data of the measurement period in which the measurement data is measured.
 行動選択部12は、ボルツマン選択法によって行動データを選択してもよい。すなわち、行動選択部12は、状態データがsであるときの条件付確率P(a|s)に従って行動データaを選択する。条件付確率P(a|s)は、次の式(4)として与えられる。ここで、パラメータTは、予め定めた正の値である。行動選択部12は、パラメータTを、時刻tの予め定めた単調減少関数として変化させてもよい。 The action selection unit 12 may select action data by the Boltzmann selection method. That is, the action selecting section 12, state data conditional probability P when a s t | selecting behavior data a t according (a s t). Conditional probability P (a | s t) is given as the following equation (4). Here, the parameter T is a predetermined positive value. The action selection unit 12 may change the parameter T as a predetermined monotonically decreasing function of time t.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 測定データがベクトルデータである場合に、報酬決定部13は、測定データのベクトルと異常状態を表すベクトルとの距離の測定期間にわたる平均値または最小値などに報酬rを決定してもよい。報酬決定部13は、測定データのベクトルと正常状態を表すベクトルとの距離に負の定数を乗じた値の測定期間にわたる平均値または最小値などに報酬rを決定してもよい。 When the measurement data is vector data, the reward determination unit 13 may determine the reward r as an average value or a minimum value over the measurement period of the distance between the vector of the measurement data and the vector representing the abnormal state. The reward determination unit 13 may determine the reward r to be the average value or the minimum value over the measurement period of the value obtained by multiplying the distance between the vector of the measurement data and the vector representing the normal state by a negative constant.
 正常状態を表すベクトルは、例えばベクトルの成分の各々が対応する量の設計値を成分とするベクトルである。異常状態を表すベクトルは、例えば異常状態が既知の場合に、異常状態においてセンサ7の各々が示す値を成分としたベクトルである。 A vector representing a normal state is, for example, a vector in which each component of the vector has a corresponding amount of design value as a component. For example, when the abnormal state is known, the vector representing the abnormal state is a vector whose component is the value indicated by each of the sensors 7 in the abnormal state.
 ベクトルとベクトルとの距離は、例えばユークリッドノルムまたは最大値ノルムなどのノルムである。 The distance between the vector and the vector is, for example, a norm such as Euclidean norm or maximum norm.
 報酬決定部13は、測定期間の間にセンサ7によって測定される値の分布を確率分布とみなして、正常状態を表す確率分布とのカルバック・ライブラー距離に報酬rを決定してもよい。正常状態を表す確率分布は、例えば、平均値および標準偏差を設計値および測定誤差とする正規分布である。 The reward determination unit 13 may consider the distribution of values measured by the sensor 7 during the measurement period as a probability distribution, and determine the reward r as the Kullback-Leibler distance with the probability distribution representing the normal state. The probability distribution representing the normal state is, for example, a normal distribution with the mean value and the standard deviation as design values and measurement errors.
 行動選択部12は、測定データに直接基づかない報酬を用いて強化学習を行ってもよい。例えば、行動選択部12は、保守行動に応じたコストおよびエレベータ1の故障によって生じた損失を負の報酬として強化学習を行ってもよい。行動選択部12は、エレベータの故障を避けつつコストの高い行動データを無用に選択しないような学習を行うことができる。 The action selection unit 12 may perform reinforcement learning using a reward that is not directly based on measurement data. For example, the action selection unit 12 may perform reinforcement learning with the cost according to the maintenance action and the loss caused by the failure of the elevator 1 as a negative reward. The action selection unit 12 can perform learning so as not to unnecessarily select high-cost action data while avoiding a failure of the elevator.
 報酬決定部13は、測定データの値が正常状態を表す値の範囲にある場合を正常運転状態であるとして、正常運転状態が1日継続するごとに報酬に1を加算してもよい。正常運転状態が継続している日数を報酬とすることで、行動選択部12は、正常運転状態をより長い時間継続させるための行動データを選択するように学習できる。報酬決定部13は、測定データの値が異常状態を表す値の範囲にある場合を異常発生状態であるとして、異常発生状態が1日継続するごとに報酬から1を減算してもよい。異常発生状態が継続している日数を負の報酬とすることで、行動選択部12は、異常発生状態が継続している期間をより短くするための行動データを選択するように学習できる。これにより、行動選択部12は、より運用効率を高めるための行動データを選択できる。 The reward determination unit 13 may add 1 to the reward every time the normal driving state continues, assuming that the measured data value is in the range of the value indicating the normal state as the normal driving state. By using the number of days in which the normal driving state continues as the reward, the action selecting unit 12 can learn to select the action data for continuing the normal driving state for a longer time. The reward determination unit 13 may subtract 1 from the reward every time the abnormality occurrence state continues for one day, assuming that the value of the measurement data is in the range of the value representing the abnormality, as the abnormality occurrence state. By setting the number of days in which the abnormality occurrence state continues as the negative reward, the action selection unit 12 can learn to select action data for shortening the period in which the abnormality occurrence state continues. Thus, the action selection unit 12 can select action data for further improving the operation efficiency.
 行動選択部12は、予め定められた時間間隔を測定期間としてもよい。すなわち、行動選択部12は、予め定められた時間間隔毎に測定される測定データに基づいて状態データを設定してもよい。保守作業の間隔によらず、行動選択部12は、学習の機会を多くとることができる。 The action selection unit 12 may set a predetermined time interval as a measurement period. That is, the action selection unit 12 may set state data based on measurement data measured at predetermined time intervals. The action selection unit 12 can take many learning opportunities regardless of the maintenance work interval.
 行動データは、何もしない保守行動を表す行動データであってもよい。エレベータ保守作業支援装置10は、具体的に行うべき保守行動が特にないということを保守員に提示できる。特に行動選択部12が予め定められた時間間隔を測定期間とする場合に、エレベータ保守作業支援装置10は、保守行動の内容に加えて保守行動を行う時期もあわせて保守員に提示できる。 The behavior data may be behavior data representing a conservative behavior that does nothing. The elevator maintenance operation support device 10 can present maintenance personnel that there is no particular maintenance action to be performed. In particular, when the action selection unit 12 sets a predetermined time interval as the measurement period, the elevator maintenance operation support device 10 can present the maintenance worker with the timing of the maintenance action in addition to the contents of the maintenance action.
 端末装置20から行動データが送信されなかった場合に、報酬決定部13は、何もしない保守行動が選択されたとして報酬を決定してもよい。 When the action data is not transmitted from the terminal device 20, the reward determination unit 13 may determine the reward on the assumption that the maintenance action to do nothing is selected.
 行動選択部12は、行動選択部12が選択した行動データaと、端末装置20から受信した行動データa と異なる場合に、報酬決定部13が測定データに基づいて決定した報酬rに基づいて、行動データa でラベル付けされるQ値を更新してもよい。すなわち、次の式(5)によってQ値を更新してもよい。 Action selection unit 12 includes a behavior data a t the action selection unit 12 has selected, if different from the behavior data a t m received from the terminal apparatus 20, the reward r the compensation determining section 13 has determined based on the measurement data based on, it may update the Q value are labeled with behavioral data a t m. That is, the Q value may be updated by the following equation (5).
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 行動選択部12は、強化学習のアルゴリズムにSarsa(State-Action-Reward-State-Action)を用いてもよい。すなわち、次の式(6)によってQ値を更新してもよい。強化学習のアルゴリズムにSarsaを用いる場合、行動選択部12は、例えば、S102において行動データの選択を行わず、S106の直後においてQ値とst+1とに基づいて時刻t+1の行動データat+1の選択を行う。 The action selection unit 12 may use Sarsa (State-Action-Reward-State-Action) as an algorithm of reinforcement learning. That is, the Q value may be updated by the following equation (6). When using a Sarsa the reinforcement learning algorithm, the action selecting section 12, for example, without selection of behavior data in S102, the selection of time t + 1 of the behavioral data a t + 1 based on the Q value and s t + 1 immediately after the S106 I do.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 行動選択部12は、保守作業によるエレベータ1の状態の変化を、不確実性を考慮した部分観測マルコフ決定過程としてモデル化した場合のアルゴリズムにより強化学習を行ってもよい。 The action selection unit 12 may perform reinforcement learning using an algorithm in the case where a change in the state of the elevator 1 due to maintenance work is modeled as a partially observed Markov decision process in consideration of uncertainty.
 行動選択部12は、状態データを連続な数値を成分とするベクトルデータとして強化学習を行ってもよい。 The action selection unit 12 may perform reinforcement learning by using state data as vector data having continuous numerical values as components.
 行動選択部12は、エレベータ1の各々に複数の種類のセンサ7が設けられている場合に、状態データを複数のセンサの測定値の測定期間にわたる平均値を成分とするベクトルデータとして強化学習をおこなってもよい。行動選択部12は、測定期間の間にセンサによって測定される値の分布を確率分布とみなして、当該確率分布を基底関数展開した展開係数を成分とするベクトルデータとして強化学習をおこなってもよい。 When each of the elevators 1 is provided with a plurality of types of sensors 7, the behavior selection unit 12 performs reinforcement learning as vector data whose state data is an average value of measurement values of a plurality of sensors over a measurement period. You may do it. The action selection unit 12 may consider the distribution of values measured by the sensor during the measurement period as a probability distribution, and may perform reinforcement learning as vector data having expansion coefficients obtained by performing the basis function expansion on the probability distribution. .
 行動選択部12は、パラメータを用いてQ値を関数近似してもよい。行動選択部12は、例えば、行動データの各々についての重みベクトルをパラメータとして、当該重みベクトルと状態データを表すベクトルとの内積を、行動データの各々と当該状態データでラベル付けされたQ値として計算する。あるいは、行動選択部12は、状態データの成分の各々を入力層としたニューラルネットワークによってQ値を計算してもよい。行動選択部12は、状態データの成分の各々を入力としてディープラーニングによってQ値を計算するDQN(Deep Q-Network)を用いてもよい。 The action selection unit 12 may perform function approximation of the Q value using a parameter. The action selection unit 12 uses, for example, a weight vector for each of the action data as a parameter, and an inner product of the weight vector and a vector representing the state data as a Q value labeled with each of the action data and the state data. calculate. Alternatively, the action selection unit 12 may calculate the Q value by a neural network using each of the components of the state data as an input layer. The action selection unit 12 may use DQN (Deep Q-Network) which calculates Q value by deep learning with each component of the state data as an input.
 報酬記憶部14は、複数のエレベータ1についてまとめてデータを記憶してもよい。行動選択部12は、複数のエレベータ1についてQ値を共有して強化学習を行ってもよい。これにより、エレベータ1の各々についてQ値を用いる場合よりも、一組のQ値の強化学習に利用するデータを多くできる。 The reward storage unit 14 may store data on the plurality of elevators 1 collectively. The action selection unit 12 may perform reinforcement learning by sharing Q values for a plurality of elevators 1. As a result, more data can be used for reinforcement learning of a set of Q values than when Q values are used for each of the elevators 1.
 センサ7は、測定期間の間に、エレベータ保守作業支援装置10に予め定めた時間間隔で測定データを送信してもよい。センサ7は、測定期間の間に、測定データが予め定めた閾値を超えた場合にエレベータ保守作業支援装置10に測定データを送信してもよい。センサ7は、測定期間の間に、測定データが予め定めたエレベータの故障を表す値の範囲内にある場合にエレベータ保守作業支援装置10に測定データを送信してもよい。 The sensor 7 may transmit measurement data to the elevator maintenance work support apparatus 10 at predetermined time intervals during the measurement period. The sensor 7 may transmit the measurement data to the elevator maintenance operation support device 10 when the measurement data exceeds a predetermined threshold value during the measurement period. The sensor 7 may transmit the measurement data to the elevator maintenance operation support device 10 during the measurement period if the measurement data is within a range of values representing a predetermined elevator failure.
 建築物2が複数のエレベータ1を有するバンクを備える場合に、行動選択部12は、バンクが有する複数のエレベータ1についてQ値を共有してもよい。 When the building 2 includes a bank having a plurality of elevators 1, the behavior selection unit 12 may share the Q value for the plurality of elevators 1 that the bank has.
 端末装置20は、保守員が所持する携帯型のタブレットコンピュータであってもよい。 The terminal device 20 may be a portable tablet computer owned by a maintenance worker.
 行動選択部12は、シミュレーションにより学習した結果をQ値の初期値として用いてもよい。 The action selection unit 12 may use the result of learning by simulation as the initial value of the Q value.
 エレベータ保守作業支援装置10は、1つの建築物2が備えるエレベータ1に対して保守行動の提示および学習を行ってもよい。エレベータ保守作業支援装置10は、エレベータ1が設けられる建築物2の中に配置されてもよい。 The elevator maintenance work support apparatus 10 may perform presentation and learning of maintenance activities for the elevator 1 provided in one building 2. The elevator maintenance work support apparatus 10 may be disposed in a building 2 in which the elevator 1 is provided.
 エレベータ保守作業支援装置10は、複数の建築物2が備えるエレベータ1に対して保守行動の提示および学習を行ってもよい。エレベータ保守作業支援装置10は、エレベータ1が設けられる建築物2の外部からネットワークを介してエレベータ1の各々に接続してもよい。 The elevator maintenance work support apparatus 10 may perform presentation and learning of maintenance activities for the elevators 1 provided in the multiple buildings 2. The elevator maintenance work support apparatus 10 may be connected to each of the elevators 1 from the outside of the building 2 provided with the elevators 1 via a network.
 エレベータ保守作業支援装置10の通信部11は、端末装置20と行動データの通信を、当該端末装置20からの保守員による要求があったときにしてもよい。これにより、エレベータ保守作業支援装置10は、エレベータ1と当該エレベータ1の保守作業を行う保守員が参照する端末装置20との対応を記憶しなくてもよい。 The communication unit 11 of the elevator maintenance work support device 10 may communicate the terminal device 20 with the action data when a request from a maintenance worker from the terminal device 20 is made. Thus, the elevator maintenance operation support device 10 may not store the correspondence between the elevator 1 and the terminal device 20 to which the maintenance worker who performs the maintenance operation of the elevator 1 refers.
 続いて、本実施の形態に係るエレベータ保守作業支援装置の例について説明する。図7は、本実施の形態に係るエレベータ保守作業支援装置の主要部のハードウェア構成を示す図である。 Subsequently, an example of the elevator maintenance work support device according to the present embodiment will be described. FIG. 7 is a diagram showing a hardware configuration of main parts of the elevator maintenance work support device according to the present embodiment.
 エレベータ保守作業支援装置10の各機能は、処理回路により実現し得る。処理回路は、少なくとも1つのプロセッサ10bと少なくとも1つのメモリ10cとを備える。処理回路は、プロセッサ10bおよびメモリ10cと共に、或いはそれらの代用として、少なくとも1つの専用のハードウェア10aを備えてもよい。 Each function of elevator maintenance work support device 10 can be realized by a processing circuit. The processing circuit comprises at least one processor 10b and at least one memory 10c. The processing circuit may comprise at least one dedicated hardware 10a together with or as an alternative to the processor 10b and the memory 10c.
 処理回路がプロセッサ10bとメモリ10cとを備える場合、エレベータ保守作業支援装置10の各機能は、ソフトウェア、ファームウェア、またはソフトウェアとファームウェアとの組み合わせで実現される。ソフトウェアおよびファームウェアの少なくとも一方は、プログラムとして記述される。そのプログラムはメモリ10cに格納される。プロセッサ10bは、メモリ10cに記憶されたプログラムを読み出して実行することにより、エレベータ保守作業支援装置10の各機能を実現する。 When the processing circuit includes the processor 10 b and the memory 10 c, each function of the elevator maintenance work support device 10 is realized by software, firmware, or a combination of software and firmware. At least one of software and firmware is described as a program. The program is stored in the memory 10c. The processor 10 b implements each function of the elevator maintenance work support device 10 by reading and executing the program stored in the memory 10 c.
 プロセッサ10bは、CPU(Central Processing Unit)、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSPともいう。メモリ10cは、例えば、RAM、ROM、フラッシュメモリ、EPROM、EEPROM等の、不揮発性または揮発性の半導体メモリ、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD等により構成される。 The processor 10 b is also referred to as a central processing unit (CPU), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, or a DSP. The memory 10 c is configured of, for example, nonvolatile or volatile semiconductor memory such as RAM, ROM, flash memory, EPROM, EEPROM, magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD, or the like.
 処理回路が専用のハードウェア10aを備える場合、処理回路は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ASIC、FPGA、またはこれらの組み合わせで実現される。 When the processing circuit includes dedicated hardware 10a, the processing circuit may be realized by, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an ASIC, an FPGA, or a combination thereof.
 エレベータ保守作業支援装置10の各機能は、それぞれ処理回路で実現することができる。或いは、エレベータ保守作業支援装置10の各機能は、まとめて処理回路で実現することもできる。エレベータ保守作業支援装置10の各機能について、一部を専用のハードウェア10aで実現し、他部をソフトウェアまたはファームウェアで実現してもよい。例えば、行動選択部12および報酬決定部13の機能をプログラムとして記述されるソフトウェアまたはファームウェアで実現し、他部を専用のハードウェア10aで実現してもよい。このように、処理回路は、ハードウェア10a、ソフトウェア、ファームウェア、またはこれらの組み合わせでエレベータ保守作業支援装置10の各機能を実現する。 Each function of the elevator maintenance work support device 10 can be realized by a processing circuit. Alternatively, each function of the elevator maintenance work support device 10 can be collectively realized by a processing circuit. About each function of elevator maintenance operation support device 10, a part may be realized by dedicated hardware 10a, and the other part may be realized by software or firmware. For example, the functions of the action selection unit 12 and the reward determination unit 13 may be realized by software or firmware described as a program, and other units may be realized by the dedicated hardware 10a. Thus, the processing circuit implements each function of the elevator maintenance work support device 10 with the hardware 10a, software, firmware, or a combination thereof.
 本発明に係るエレベータ保守作業支援装置は、エレベータシステムの保守作業に適用できる。 The elevator maintenance work support device according to the present invention can be applied to maintenance work of an elevator system.
 1 エレベータ、 2 建築物、 3 昇降路、 4 乗場、 5 かご、 6 制御装置、 7 センサ、 8 通信装置、 9 群管理装置、 10 エレベータ保守作業支援装置、 11 通信部、 12 行動選択部、 13 報酬決定部、 14 報酬記憶部、 20 端末装置、 21 通信部、 22 表示部、 23 入力部、 10a ハードウェア、 10b プロセッサ、 10c メモリ DESCRIPTION OF SYMBOLS 1 elevator, 2 buildings, 3 hoistways, 4 landings, 5 cars, 6 controllers, 7 sensors, 8 communication devices, 9 group management devices, 10 elevator maintenance work support devices, 11 communication units, 12 action selection units, 13 Reward determination unit, 14 reward storage unit, 20 terminal devices, 21 communication units, 22 display units, 23 input units, 10a hardware, 10b processor, 10c memory

Claims (8)

  1.  エレベータのセンサ及び保守員が使用する端末装置と通信可能に設けられ、前記センサが予め定めた測定期間に測定したエレベータの測定データを受信し、前記端末装置から保守員の行動データを受信する、通信部と、
     1つ以上のエレベータの各々に設けられた測定装置によって測定期間の間に測定される測定データに基づいて前記1つ以上のエレベータの各々の状態を表す状態データを設定し、行動データと状態データとを用いて強化学習を行い、強化学習の結果と測定データとに基づいて当該測定データを測定した測定期間の後の行動データを選択し、選択した行動データを前記通信部に通信させる行動選択部と、
    を備えたエレベータ保守作業支援装置。
    A sensor of the elevator and a terminal device used by a maintenance worker are provided in a communicable manner, the sensor receives measurement data of the elevator measured during a predetermined measurement period, and receives action data of the maintenance worker from the terminal device. Communication department,
    State data representing the state of each of the one or more elevators is set based on the measurement data measured during the measurement period by the measurement device provided in each of the one or more elevators, and the action data and the state data To select the action data after the measurement period in which the measurement data was measured based on the result of the reinforcement learning and the measurement data, and to cause the communication unit to communicate the selected action data. Department,
    Elevator maintenance work support device equipped with
  2.  前記行動選択部は、保守員による保守行動の間を測定期間として状態データを設定する請求項1に記載のエレベータ保守作業支援装置。 The elevator maintenance work support device according to claim 1, wherein the action selection unit sets state data as a measurement period during a maintenance action by a maintenance worker.
  3.  前記行動選択部は、予め定められた時間間隔を測定期間として状態データを設定する請求項1に記載のエレベータ保守作業支援装置。 The elevator maintenance work support device according to claim 1, wherein the action selection unit sets state data using a predetermined time interval as a measurement period.
  4.  測定期間の間に測定される測定データに基づいて、前記行動選択部による当該測定期間の前の行動データの選択に対する報酬を決定する報酬決定部、
    を備え、
     前記行動選択部は、前記報酬決定部が決定した報酬を用いて強化学習を行う請求項1から請求項3のいずれか一項に記載のエレベータ保守作業支援装置。
    A reward determination unit that determines a reward for selection of action data before the measurement period by the action selection unit based on measurement data measured during the measurement period;
    Equipped with
    The elevator maintenance work support device according to any one of claims 1 to 3, wherein the action selection unit performs reinforcement learning using the reward determined by the reward determination unit.
  5.  行動データおよび状態データと、当該状態データに基づいてされた当該行動データの選択に対して前記報酬決定部が決定した報酬と、を関連付けて記憶する報酬記憶部、
    を備え、
     前記報酬決定部は、前記端末装置から受信した前記保守員によって入力された行動データと前記行動選択部が選択した行動データとが異なる場合に、当該選択に対する報酬を、選択された当該行動データおよび当該選択の前後の測定期間の状態データに関連付けて前記報酬記憶部が記憶している報酬に決定する請求項4に記載のエレベータ保守作業支援装置。
    A reward storage unit that associates and stores behavior data and state data, and a reward determined by the reward determination unit with respect to selection of the behavior data based on the state data;
    Equipped with
    The reward determination unit is configured to select a reward for the selection when the behavior data input by the maintenance worker received from the terminal device is different from the behavior data selected by the behavior selection unit, and The elevator maintenance work support device according to claim 4, wherein the compensation stored in the compensation storage unit is determined in association with the state data of the measurement period before and after the selection.
  6.  前記報酬記憶部は、前記1つ以上のエレベータの各々について行動データおよび状態データと報酬とを関連付けて記憶し、
     前記報酬決定部は、前記1つ以上のエレベータの各々について報酬を決定し、
     前記行動選択部は、前記1つ以上のエレベータの各々について強化学習を行う請求項5に記載のエレベータ保守作業支援装置。
    The reward storage unit associates and stores behavior data and status data and a reward for each of the one or more elevators,
    The reward determination unit determines a reward for each of the one or more elevators,
    The elevator maintenance work support device according to claim 5, wherein the action selection unit performs reinforcement learning for each of the one or more elevators.
  7.  前記報酬決定部は、1つのエレベータについて報酬を決定するときに、前記端末装置から受信した行動データと前記行動選択部が選択した行動データとが異なる場合で、かつ、前記報酬記憶部が他のエレベータについて、選択された当該行動データおよび当該行動データの選択の前後の測定期間の状態データに関連付けて報酬を記憶している場合に、当該選択に対する報酬を、前記報酬記憶部が記憶している当該報酬に決定する請求項6に記載のエレベータ保守作業支援装置。 When the reward determination unit determines the reward for one elevator, the behavior data received from the terminal device is different from the behavior data selected by the behavior selection unit, and the reward storage unit is another one. When a reward is stored in association with the selected action data and measurement state data of the measurement period before and after the selection of the elevator, the reward storage unit stores the reward for the selection. The elevator maintenance work support device according to claim 6, which is determined to be the reward.
  8.  前記報酬決定部は、1つのエレベータについて報酬を決定するときに、前記端末装置から受信した行動データと前記行動選択部が選択した行動データとが異なる場合で、かつ、前記報酬記憶部が当該エレベータについて、選択された当該行動データおよび当該行動データの選択の前後の測定期間の状態データに関連付けて報酬を記憶している場合に、当該選択に対する報酬を、前記報酬記憶部が記憶している当該報酬に決定する請求項6に記載のエレベータ保守作業支援装置。 When the reward determination unit determines the reward for one elevator, the reward storage unit is the elevator when the behavior data received from the terminal device is different from the behavior data selected by the behavior selection unit. When the reward is stored in association with the selected action data and the state data of the measurement period before and after the selection of the action data, the reward storage unit stores the reward for the selection. The elevator maintenance work support device according to claim 6, which is determined to be a reward.
PCT/JP2017/037592 2017-10-17 2017-10-17 Elevator maintenance work assistance device WO2019077686A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/037592 WO2019077686A1 (en) 2017-10-17 2017-10-17 Elevator maintenance work assistance device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/037592 WO2019077686A1 (en) 2017-10-17 2017-10-17 Elevator maintenance work assistance device

Publications (1)

Publication Number Publication Date
WO2019077686A1 true WO2019077686A1 (en) 2019-04-25

Family

ID=66174347

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/037592 WO2019077686A1 (en) 2017-10-17 2017-10-17 Elevator maintenance work assistance device

Country Status (1)

Country Link
WO (1) WO2019077686A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115676539A (en) * 2023-01-03 2023-02-03 常熟理工学院 High-rise elevator cooperative dispatching method based on Internet of things

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000251126A (en) * 1999-02-26 2000-09-14 Toshiba Corp Analyzing device for proper maintenance operation
JP2006151529A (en) * 2004-11-25 2006-06-15 Mitsubishi Electric Building Techno Service Co Ltd Elevator maintenance work check system and elevator maintenance operator terminal device
JP2010287131A (en) * 2009-06-12 2010-12-24 Honda Motor Co Ltd System and method for controlling learning
WO2011036699A1 (en) * 2009-09-24 2011-03-31 株式会社 東芝 Maintenance policy determination device and maintenance policy determination program
JP2017130094A (en) * 2016-01-21 2017-07-27 ファナック株式会社 Cell control device, and production system for managing operation situation of multiple manufacturing machines in manufacturing cell

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000251126A (en) * 1999-02-26 2000-09-14 Toshiba Corp Analyzing device for proper maintenance operation
JP2006151529A (en) * 2004-11-25 2006-06-15 Mitsubishi Electric Building Techno Service Co Ltd Elevator maintenance work check system and elevator maintenance operator terminal device
JP2010287131A (en) * 2009-06-12 2010-12-24 Honda Motor Co Ltd System and method for controlling learning
WO2011036699A1 (en) * 2009-09-24 2011-03-31 株式会社 東芝 Maintenance policy determination device and maintenance policy determination program
JP2017130094A (en) * 2016-01-21 2017-07-27 ファナック株式会社 Cell control device, and production system for managing operation situation of multiple manufacturing machines in manufacturing cell

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115676539A (en) * 2023-01-03 2023-02-03 常熟理工学院 High-rise elevator cooperative dispatching method based on Internet of things

Similar Documents

Publication Publication Date Title
US20200180910A1 (en) Method and an elevator system for defining an elongation of an elevator car suspension means
EP3795525A1 (en) Estimation and presentation of area of interest for condition based monitoring of an elevator system
JP2015009909A (en) Elevator group-management system
CN109803909B (en) Elevator system and method for observing misoperation
EP3363758A1 (en) Mechanism for monitoring operation of passenger transport device
JP6104409B2 (en) Control parameter detection method, elevator group management simulation method, control parameter detection device, and elevator group management simulation device
JP2019038650A (en) Remote diagnosis operation method of elevator, elevator control device, and remote diagnosis operation program of elevator
WO2019077686A1 (en) Elevator maintenance work assistance device
JP5289574B2 (en) Elevator control device
US20210087018A1 (en) Air pressure floor table detection: statistical analysis of location
CN113993806A (en) Position detection system for elevator
CN116348406B (en) Fault diagnosis device for elevator
JP5005401B2 (en) Elevator control device
JPWO2018134891A1 (en) Elevator automatic recovery system
CN110072792B (en) Elevator control system
CN112840141B (en) Elevator brake deterioration prediction system
JP6536756B2 (en) Elevator control system
US20220297976A1 (en) Method and device for determining multiple absolute car positions of an elevator car within a shaft of an elevator arrangement
JP2012226460A (en) Recommendation device of parameter and device in building facility
WO2018061081A1 (en) Elevator device and elevator system
KR102266228B1 (en) Temperature trend specific device, maintenance planning system and elevator system
KR102562729B1 (en) elevator device
JP5473375B2 (en) Earthquake monitoring control system
JP6731188B2 (en) Lifting equipment remote monitoring system
CN110035968B (en) Remote recovery system for elevator fault

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17929076

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17929076

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP