WO2024009667A1

WO2024009667A1 - Information processing device, inference model generation method, training data generation method, inference model generation program, and training data generation program

Info

Publication number: WO2024009667A1
Application number: PCT/JP2023/020963
Authority: WO
Inventors: 翔太林; 裕司白石
Original assignee: 日立造船株式会社
Priority date: 2022-07-06
Filing date: 2023-06-06
Publication date: 2024-01-11
Also published as: TW202403485A; JP2024007873A

Abstract

The present invention enables generation of a highly versatile inference model. An information processing device (2) comprises: a data combination unit (204) that combines a plurality of pieces of time series data, which are based on pieces of data collected from a plurality of facilities respectively, to generate pseudo time series data; a training data generation unit (205) that applies standardization processing to the pseudo time series data to obtain training data; and a training unit (206) that generates an inference model by machine learning using the training data.

Description

Information processing device, inference model generation method, teacher data generation method, inference model generation program, and teacher data generation program

The present invention relates to an information processing device, etc. that generates an inference model.

Technology for generating inference models using machine learning has been known for a long time. For example, Patent Document 1 listed below discloses a neural network model that infers the type of event occurring in a plant from a plurality of plant data acquired in the plant. Note that the above-mentioned "multiple plant data" is time-series data collected at one plant.

Japanese Patent No. 3002524

There is room for improvement in the conventional techniques described above in terms of the versatility of the inference model. That is, in the technique of Patent Document 1, machine learning is performed using plant data collected in a target plant. With the inference model generated through such learning, it is possible to highly accurately infer the type of event that occurred in the plant from the plant data collected in the plant. However, with this inference model, it is not possible to infer the type of event that occurred at another plant from the plant data collected at that other plant. Or even if inference is possible, the accuracy of the inference model will be low.

An object of one aspect of the present invention is to provide an information processing device and the like that can generate a highly versatile inference model.

In order to solve the above problems, an information processing device according to one aspect of the present invention connects a plurality of time series data based on data collected for each of a plurality of targets to create one pseudo time series data. a data linking unit that generates data, a teacher data generation unit that performs standardization processing or normalization processing on the pseudo time series data and uses it as training data, and learning that generates an inference model by machine learning using the training data. It is equipped with a section and a section.

In addition, in order to solve the above problem, another information processing device according to one aspect of the present invention connects a plurality of time-series data based on data collected for each of a plurality of objects to create one pseudo The data linking section generates time series data, and a teacher data generating section performs standardization processing or normalization processing on the pseudo time series data to produce teacher data.

Further, in order to solve the above problem, an inference model generation method according to one aspect of the present invention is an inference model generation method that is executed by one or more information processing devices, and is performed for each of a plurality of targets. A data concatenation step of concatenating a plurality of time series data based on the collected data to generate one pseudo time series data, and performing standardization processing or normalization processing on the pseudo time series data to create training data. and a learning step of generating an inference model by machine learning using the teacher data.

Further, in order to solve the above problem, a method for generating teacher data according to one aspect of the present invention is a method for generating teacher data executed by one or more information processing apparatuses, and for each of a plurality of targets. A data concatenation step of concatenating a plurality of time series data based on the collected data to generate one pseudo time series data, and performing standardization processing or normalization processing on the pseudo time series data to create training data. and a step of generating training data.

According to one aspect of the present invention, it is possible to generate a highly versatile inference model.

FIG. 1 is a block diagram showing an example of the configuration of main parts of an information processing device according to an embodiment of the present invention. FIG. 1 is a diagram showing the configuration of an information processing system including the information processing device. FIG. 7 is a diagram illustrating an example of determining connection targets. FIG. 3 is a diagram showing an example of generation of time series data. FIG. 3 is a diagram showing an example of generation of pseudo time series data and teacher data. FIG. 2 is a flowchart illustrating an example of a process when the information processing device generates an inference model. FIG.

(System configuration)
FIG. 2 is a diagram showing the configuration of the information processing system 5 according to this embodiment. As shown in the figure, the information processing system 5 includes information processing devices 1A to 1D and an information processing device 2. Although details will be described below, the information processing system 5 can generate an inference model using a plurality of time series data from different sources. The inference model generated by the information processing system 5 is more versatile than the inference model generated using time series data from a single source as training data.

The information processing devices 1A to 1D are devices that collect time-series data that is the source of training data for generating the above-mentioned inference model, and are located in facilities A to D, respectively. In the following, when there is no need to distinguish between the information processing apparatuses 1A to 1D, they will simply be referred to as "information processing apparatus 1."

Facilities A to D may be equipped with at least one piece of equipment. For example, facilities A to D may be plants equipped with multiple pieces of equipment. Here, a "plant" is an industrially used facility, and is equipped with a plurality of devices. The "plant" uses these devices to perform predetermined processing such as production of products or processing of objects.

In the following, an example will be described in which facilities A to D are waste treatment facilities that incinerate waste (for example, combustible garbage) and generate electricity using the waste heat. In this case, the information processing device 1A collects time-series data regarding waste incineration and power generation at the facility A. Similarly, the information processing devices 1B to 1D collect time-series data regarding waste incineration and power generation at the facilities B to D.

The time-series data may be the source of the training data for generating the inference model, and may be in accordance with the content of the inference. For example, when generating an inference model that predicts the combustion state of an incinerator, it is sufficient to collect various time-series data related to the combustion state of the incinerator. To give a specific example, time-series sensing data related to automatic combustion control (ACC), such as time-series measurements of furnace temperature and time-series measurements of the amount of steam generated in a boiler, are It may also be collected as data.

The information processing device 2 connects the time series data collected by the information processing devices 1A to 1D to generate one time series data. Then, the information processing device 2 performs standardization processing or normalization processing on the connected time series data and uses it as training data. The information processing device 2 generates an inference model by machine learning using the teacher data.

The inference model generated in this way is more versatile than when time series data from a single source is used as training data. For example, an inference model generated using training data generated from each time-series data sourced from facilities A to D can be used for inference at any of facilities A to D, and if the same type of facility It can also be used for reasoning in facilities other than facilities A to D. In this way, the generation of the inference model by the information processing device 2 does not necessarily require data collected at the facility that is the target of inference. Therefore, for example, when a new facility is constructed, the inference model can be used immediately after the facility starts operating. Further, it is not necessarily necessary that all of the information processing devices 1A to 1D collect time-series data that is the source of teacher data. For example, it is possible to collect time-series data from facilities A to C, which is the source of the teacher data, while generating an inference model without collecting time-series data, which is the source of the teacher data, from facility D. This inference model can also be used for inference at facility D.

Additionally, in general, in order to generate an estimation model with sufficient estimation accuracy, a sufficient amount of teacher data is required, and the data collection period at each facility tends to be long. In this regard, since the information processing system 5 generates teacher data from time-series data collected at a plurality of facilities, it is possible to collect the required number of teacher data in a short period of time and quickly generate an inference model.

Note that the time-series data to be linked is not limited to data collected at multiple different facilities. For example, the information processing device 2 can also connect time-series data collected about a plurality of different pieces of equipment or equipment in one facility. For example, if the above-mentioned facility A is equipped with two incinerators, the information processing device 2 can connect the time-series data collected at each incinerator and use it for inference in any of the two incinerators. It is possible to generate an inference model. Therefore, "facility" in the following description can be replaced with any "object".

Additionally, the method of collecting time-series data is also arbitrary. For example, measured values measured by a sensor installed on an arbitrary object itself may be collected as time-series data. Furthermore, measured values measured by sensors installed around an arbitrary object, or data measured over a wider range of objects, such as temperature and humidity, may be collected as time-series data. In addition to this, for example, setting values for setting the operation of an arbitrary object, command values for causing an arbitrary object to execute a predetermined operation, etc. may be collected as time-series data.

(Device configuration)
The configurations of

information processing devices

1 and 2 will be explained based on FIG. 1. FIG. 1 is a block diagram showing an example of the configuration of main parts of

information processing apparatuses

1 and 2. As shown in FIG. As illustrated, the information processing device 1 includes a control section 10 that centrally controls each section of the information processing device 1, and a storage section 11 that stores various data used by the information processing device 1. The information processing device 1 also includes a communication unit 12 for the information processing device 1 to communicate with other devices, an input unit 13 that receives input of various data to the information processing device 1, and an input unit 13 for the information processing device 1 to output various data. It is equipped with an output section 14 for outputting data. The control unit 10 also includes a data acquisition unit 101, a preprocessing unit 102, an inference unit 103, and a control amount determining unit 104.

The data acquisition unit 101 acquires time-series data that is the source of training data for generating an inference model. The time series data may be acquired from a sensor or the like placed in the facility via the communication unit 12 or the input unit 13, or may be input by the user of the information processing device 1 via the input unit 13. The data acquisition unit 101 then transmits the acquired data to the information processing device 2 via the communication unit 12.

Additionally, the data acquisition unit 101 acquires an inference model generated using the above data transmitted to the information processing device 2. The inference model may be acquired from the information processing device 2 through communication via the communication unit 12, or may be input by the user of the information processing device 1 via the input unit 13.

Further, the data acquisition unit 101 acquires data for inference when performing inference using the acquired inference model. In the following, this data will be referred to as inference data. The inference data may be input via the communication unit 12 or the input unit 13 from a sensor or the like placed in a facility to be inferred, or may be input via the input unit 13 by a user of the information processing device 1. You can.

The preprocessing unit 102 performs preprocessing on the inference data acquired by the data acquisition unit 101, and generates input data to be input to the above inference model. Although the details will be described later, the above pre-processing is a process of standardizing or normalizing the inference data by applying the same conditions as when the information processing device 2 generates the training data of the inference model.

The inference unit 103 performs inference using the inference model generated by the information processing device 2. More specifically, the inference unit 103 directly uses the output value obtained by inputting the input data generated by the preprocessing unit 102 into the inference model as the inference result, or generates the inference result based on the output value. obtain.

The inference target of the inference model is not particularly limited, and may predict the combustion state in the incinerators provided in the facilities A to D shown in FIG. 2, for example. In order to efficiently generate electricity at Facilities A to D, it is necessary to stabilize the amount of steam used to turn the power generation turbines, but due to fluctuations in the quality and quantity of waste to be incinerated, The combustion state changes, and the amount of steam changes accordingly. The information processing devices 1A to 1D predict the combustion state in the incinerators provided in the facilities A to D by using an inference model that predicts the combustion state. The information processing devices 1A to 1D can appropriately control the equipment in the facilities A to D according to the prediction results, and can stably perform waste incineration and power generation.

The control amount determination unit 104 determines the control amount for the equipment installed in the target facility based on the inference result of the inference unit 103. The method for determining the control amount differs depending on what kind of inference model was used to perform the inference. For example, suppose that an inference result using an inference model that predicts the amount of steam generated from the boiler of an incinerator indicates that the amount of steam generated will decrease. In this case, the control amount determination unit 104 may determine the control amount of equipment that affects the amount of steam generated from the boiler (for example, the amount of air supplied to the incinerator and the operating speed of the grate). Of course, in this case, the control amount is determined so that the amount of generated steam increases. For example, the control amount is determined to increase the amount of air supplied into the incinerator or to increase the operating speed of the grate.

On the other hand, the information processing device 2 includes a control section 20 that centrally controls each section of the information processing device 2, and a storage section 21 that stores various data used by the information processing device 2. The information processing device 2 also includes a communication unit 22 for the information processing device 2 to communicate with other devices, an input unit 23 that receives input of various data to the information processing device 2, and an input unit 23 for the information processing device 2 to output various data. It is equipped with an output section 24 for outputting data. The control unit 20 also includes a data acquisition unit 201, a connection target determination unit 202, a time series data generation unit 203, a data connection unit 204, a teacher data generation unit 205, and a learning unit 206.

The data acquisition unit 201 acquires data that is the source of teacher data. The data acquired by the data acquisition unit 201 only needs to include data that becomes an explanatory variable of the estimated model to be generated or data that becomes the source of the explanatory variable. For example, when generating an estimation model using time-series sensing data collected at each of facilities A to D as an explanatory variable, the data acquisition unit 201 transmits data to the information processing device 1A shown in FIG. 2 via the communication unit 22. ~1D may be communicated with to obtain the sensing data.

The concatenation target determining unit 202 determines the data to be concatenated by the data concatenation unit 204 from among the data acquired by the data acquisition unit 201. The connection target determination unit 202 is not an essential configuration. However, by including the connection target determination unit 202, even if the data acquired by the data acquisition unit 201 includes data that is not suitable for connection, or if there are combinations that cannot be connected, appropriate training data can be obtained. It has the advantage that it can be generated.

The time series data generation unit 203 generates time series data used for concatenation by the data concatenation unit 204. More specifically, the time series data generation unit 203 generates a time series based on the measured value measured at the facility and the set value corresponding to the measured value, the element being the difference or ratio between the measured value and the set value. Generate data. The difference or ratio between the measured value and the set value is an explanatory variable in the generated inference model.

The time series data generation unit 203 is also not an essential configuration. However, the provision of the time-series data generation unit 203 has the advantage that variations in data caused by different facilities can be reduced. Furthermore, linking data collected at each of multiple facilities has the advantage of reducing bias in numerical fluctuations due to time series.

The data linking unit 204 connects a plurality of time series data based on data collected at each of a plurality of facilities to generate one pseudo time series data. The method for generating pseudo time series data will be explained in the section ``Example of generating pseudo time series data and training data'' below.

The teacher data generating unit 205 performs standardization processing or normalization processing on the pseudo time series data generated by the data linking unit 204, and uses it as teacher data. The method for generating training data will also be explained in the section ``Example of generation of pseudo time series data and training data'' below.

The learning unit 206 generates an inference model by machine learning using the teacher data generated by the teacher data generating unit 205. The machine learning algorithm is not particularly limited, and for example, the learning unit 206 may generate the inference model using a support vector machine, linear regression, random forest, or neural network.

As described above, the information processing device 2 includes the data linking section 204 and the teacher data generating section 205. The data linking unit 204 connects a plurality of time series data based on data collected for each of a plurality of objects to generate one pseudo time series data. The teacher data generating unit 205 performs standardization processing or normalization processing on the pseudo time series data generated by the data linking unit 204, and uses the data as teacher data.

According to the above configuration, it is possible to generate training data that can generate a highly versatile inference model. Therefore, according to the above configuration, it is possible to provide a highly versatile inference model.

Furthermore, as described above, the information processing device 2 includes the data linking section 204, the teacher data generating section 205, and the learning section 206. The data linking unit 204 connects a plurality of time series data based on data collected for each of a plurality of objects to generate one piece of pseudo time series data. The teacher data generating unit 205 performs standardization processing or normalization processing on the pseudo time series data generated by the data linking unit 204, and uses the data as teacher data. The learning unit 206 generates an inference model by machine learning using the teacher data generated by the teacher data generating unit 205.

According to the above configuration, an inference model can be generated using a plurality of time series data based on data collected for each of a plurality of objects. Further, according to the above configuration, it is possible to generate a highly versatile inference model that can be used not only for the facility where data is collected but also for inference at other facilities.

(Example of determining consolidation targets)
FIG. 3 is a diagram showing an example of determining connection targets. In the example of FIG. 3, in facility A, steam amount sensor data 111A, temperature sensor data 112A, etc. are collected. Collection of these data is performed by, for example, the information processing device 1A shown in FIG. 2 (more specifically, the data acquisition unit 101 of the information processing device 1A). Similarly, in facility B, steam amount sensor data 111B, temperature sensor data 112B, etc. are collected, and in facility C, steam amount sensor data 111C, temperature sensor data 112C, etc. are collected. Collection of these data is performed by, for example, the

information processing devices

1B and 1C shown in FIG. Then, the data acquisition unit 201 of the information processing device 2 acquires each of the above data collected by the information processing devices 1A to 1C.

The concatenation target determining unit 202 determines the data to be concatenated by the data concatenation unit 204 from among the data thus obtained. For example, the connection target determining unit 202 may determine data to be connected according to preset rules, and may exclude unrelated data from being connected.

For example, a rule may be set in which data measured by the same type of sensor is to be linked. In this case, as shown in FIG. 3, the connection target determination unit 202 determines the steam amount sensor data 111A to 111C including the measured value measured by the steam amount sensor as the connection object. Furthermore, the connection target determination unit 202 determines the temperature sensor data 112A to 112C including the measured values measured by the temperature sensors as the connection targets. Note that the concatenation target determining unit 202 may assign a common code or identification information to data determined to be concatenation targets.

For example, in addition to or instead of the above rules, the connection target determining unit 202 may connect data according to a rule that data measured by sensors installed at similar positions in a facility are connected. The target data may be determined. For example,

temperature sensor data

112A and 112C in FIG. 3 are both measured by a temperature sensor installed near the superheater of the incinerator, while temperature sensor data 112B is measured by a temperature sensor installed at another location. Suppose that it has been measured. In this case, the connection target determination unit 202 determines the

temperature sensor data

112A and 112C measured by temperature sensors installed at similar positions to be the connection targets, and does not consider the temperature sensor data 112B to be the connection target.

(Example of time series data generation)
FIG. 4 is a diagram showing an example of generation of time series data. More specifically, FIG. 4 shows an example in which time-series data used to generate teacher data is generated from each of the steam amount sensor data 111A to 111C shown in FIG. 3.

As shown in FIG. 4, the steam amount sensor data 111A to 111C include measured values (PV: Process Variable) at each time. Further, the steam amount sensor data 111A to 111C shown in FIG. 4 also include setting values corresponding to each measured value. The set value (SV: Set Variable) indicates a target value (in this example, the amount of steam) at the time associated with the set value. That is, in the facilities A to C from which the steam amount sensor data 111A to 111C were obtained, control is performed so that the steam amount approaches the set value.

In this manner, the data acquisition unit 201 may acquire the steam amount sensor data 111A to 111C including measured values and set values. In this case, the time series data generation unit 203 generates the measured value from the measured value measured at the facilities A to C and the setting value corresponding to the measured value, which is included in the steam amount sensor data 111A to 111C. Time series data may be generated in which the difference or ratio with respect to the set value is used as an element.

For example, in the example of FIG. 4, the time-series data generation unit 203 generates a time-series data series from each of the steam amount sensor data 111A to 111C, in which the PV/SV value at each time, that is, the ratio between the measured value and the set value, is used as an element. Data 113A to 113C are generated. If the value of PV/SV is close to 1, it can be said that the state of the facility is normal. Note that the time series data generation unit 203 may generate time series data using the difference between PV and SV, that is, the difference between the measured value and the set value, as an element instead of PV/SV.

Additionally, the time series data 113A to 113C include an element called an abnormality flag. The value of the abnormality flag at each time indicates whether or not the status of facilities A to C is normal. Specifically, in the example of FIG. 4, the value of the abnormality flag is set to 0 when the state is normal, and the value of the abnormality flag is set to 1 when the state is not normal. Criteria for determining whether or not it is normal may be determined as appropriate. For example, the time series data generation unit 203 determines that the PV/SV value is not normal in at least one of the following cases: when the value of PV/SV is less than a predetermined lower limit value, and when the PV/SV value exceeds a predetermined upper limit value. It may be determined that the condition is normal in other cases.

The value of the abnormality flag is correct data in machine learning, in other words, the objective variable of the inference model to be generated. In other words, the inference model generated using the time series data 113A to 113C is a model for inferring whether the state of the facility is normal or not. Note that the teacher data generation unit 205 may perform the association of objective variables.

Of course, the objective variable of the inference model is arbitrary and is not limited to whether the state of the facility is normal or not. In other words, the time-series data generation unit 203 only needs to generate time-series data that includes any target variable that is desired to be inferred. For example, when generating an inference model for estimating an appropriate control amount for a predetermined device in a facility, the time-series data generation unit 203 may generate time-series data including the appropriate control amount for the device. Note that the objective variable may be automatically determined by the time series data generation unit 203, or may be input by the user.

It is also assumed that the steam amount sensor data 111A to 111C include a sensor name, a sensor ID, etc., or that the data types of the data included in the steam amount sensor data 111A to 111C are different. In such a case, the time series data generation unit 203 may format the steam amount sensor data 111A to 111C and convert them into data that can be connected.

As described above, the time series data generation unit 203 generates a time series based on the measured value measured at the facility and the set value corresponding to the measured value, the time series having the difference or ratio between the measured value and the set value as an element. Data may be generated for each of a plurality of facilities (objects).

Differences and ratios between measured values and set values are indicators that are more versatile than measured values measured at facilities. Therefore, according to the above-described configuration for generating time-series data having such versatile indicators as elements, it is possible to make the pseudo-time-series data natural. That is, according to the above configuration, it is possible to reduce variations in data caused by different facilities. Furthermore, by linking the data collected for each of multiple facilities, it is possible to reduce the bias given to fluctuations in numerical values due to time series. Therefore, according to the above configuration, it is possible to generate an inference model with high inference accuracy.

(Example of generation of pseudo time series data and training data)
FIG. 5 is a diagram showing an example of generation of pseudo time series data and teacher data. More specifically, FIG. 5 shows pseudo time series data 114 generated from the time series data 113A to 113C shown in FIG. 4, and training data 115 generated from the pseudo time series data 114. There is.

First, generation of the pseudo time series data 114 by the data linking unit 204 will be explained. The data concatenation unit 204 generates pseudo time series data 114 by concatenating the time series data 113A to 113C in this order. However, if the time series data 113A to 113C are simply concatenated as they are, duplication or inconsistency will occur in the "time" values.

Therefore, when performing the connection, the data connection unit 204 replaces the time values in the time series data 113A to 113C with continuous values of 1 to 15. These values indicate the order of each data element included in the time series data 113A to 113C, and can be called order information. By providing new order information in place of the original "time" value, it is possible to maintain data continuity and prevent inconsistencies in the "time" values.

In this manner, the data linking unit 204 may link multiple pieces of time-series data by providing order information indicating a series of orders to the plurality of pieces of time-series data. According to this configuration, a plurality of time series data can be made into one pseudo time series data by a simple process of adding a series of order information to a plurality of time series data.

Note that in general, when creating an inference model using multiple variables with different data distributions, a highly accurate inference model can be created by standardizing or normalizing the variables. Here, in order to perform standardization processing or normalization processing, statistics of the target data set are required, but the statistics are unique to the data set. Therefore, with conventional technology, it is impossible to combine time series data after standardization processing or normalization processing into a single data, or even if it is possible to combine it, learning using such data is difficult. will significantly degrade the performance of the inference model.

On the other hand, according to the above configuration, since multiple time series data are concatenated in advance to form a single pseudo time series data, standardization processing or normalization processing and concatenation processing of multiple time series data are compatible. can be done.

Furthermore, the data concatenation unit 204 may perform concatenation after performing processing to reduce discontinuity in numerical values of concatenated portions of a plurality of time series data. As a result, it is possible to reduce the discontinuity of numerical values in the connected part and give continuity to the connected pseudo time series data, and to generate pseudo time series data in which numerical changes in the connected part are natural. becomes possible.

In addition, according to the above configuration, discontinuity in numerical values of connected parts in multiple time series data is reduced, so it can be applied to time series data such as calculating a moving average for pseudo time series data. It becomes possible to apply common pretreatments. This can be expected to further improve the inference accuracy of the inference model.

Note that the process for reducing discontinuity is not particularly limited as long as it reduces the discontinuity in numerical values of connected parts in a plurality of time series data. For example, as a process for reducing discontinuity, a process for smoothing numerical values of connected parts in a plurality of time series data may be applied. Further, for example, as a process for reducing discontinuity, a process may be applied in which a part of the numerical value of each joint part of the time series data to be concatenated is deleted so as to reduce the discontinuity. In this case, for example, numerical values may be deleted so that the difference between the combined parts becomes less than or equal to a threshold value.

For example, the data linking unit 204 may replace the value of PV/SV at time = 6 in the pseudo time series data 114 with the average value of the value and the value of PV/SV at time = 5. good. In this case, the value of PV/SV at time = 6 is (0.989+1.035)/2=1.012, smoothing the change in the value of PV/SV in the connected part of

time series data

113A and 113B. be able to.

Furthermore, for example, the data linking unit 204 may delete the value of PV/SV at time=10 in the pseudo time series data 114. This allows smooth changes in the PV/SV values in the connected portion of the

time series data

113B and 113C.

Next, generation of the teacher data 115 by the teacher data generation unit 205 will be explained. In the example of FIG. 5, the teacher data generation unit 205 generates the teacher data 115 by performing standardization processing on the pseudo time series data 114. Specifically, the teacher data generation unit 205 generates the teacher data 115 by standardizing each of the three data elements included in the pseudo time series data 114, namely, the measured value, the setting value, and PV/SV. It is said that

Specifically, the training data generation unit 205 generates training data by performing a process for each data element of dividing the difference between the value of the data element included in the pseudo time series data 114 and the average value by the standard deviation. 115 is generated. For example, in the pseudo time series data 114, the average value of the measured values is 9.387, and the standard deviation is 0.519. Therefore, the teacher data generation unit 205 generates a value of (9.8-9.387)/0.519=0.795 for 9.8, which is the measured value at time = 1 in the pseudo time series data 114. It has been standardized.

Furthermore, the teacher data generation unit 205 may generate the teacher data by performing normalization processing instead of standardization processing. In this case, the teacher data generation unit 205 calculates the maximum value and minimum value of the data elements included in the pseudo time series data 114, respectively. Then, the teacher data generation unit 205 performs a process for each data element of dividing the difference between the value of the data element and the calculated minimum value by the data range, that is, the difference between the maximum value and the minimum value. Generate training data by

As described above, the teacher data generation unit 205 generates the teacher data 115 by performing standardization processing or normalization processing on the pseudo time series data 114. This makes it possible to generate an inference model that absorbs the characteristics of time-series data for each facility, in other words, an inference model that reflects the characteristics of time-series data for each facility.

Note that the conditions for standardization processing or normalization processing (statistics such as average, standard deviation, maximum value, and minimum value) are also used during inference using the generated inference model. For this reason, the teacher data generation unit 205 may notify the conditions for standardization processing or normalization processing to the information processing apparatuses 1A to 1D shown in FIG. 2 through communication via the communication unit 22.

The learning unit 206 performs machine learning using the teacher data 115 generated as described above to generate an inference model. The objective variable of the inference model may be the value of the abnormality flag. Further, the explanatory variable of the inference model may be at least one of a measured value, a set value, and a PV/SV value. Further, the learning unit 206 may include values other than these in the explanatory variables. For example, the learning unit 206 may include, as explanatory variables, at least one of a time-series temperature measurement value, a setting value, and a PV/SV value included in the temperature sensor data 112A. Furthermore, at least one of the values obtained by standardizing or normalizing these values may be included in the explanatory variables.

Although the data collected at facility D is not reflected in the teacher data 115 shown in FIG. 5, the inference model generated using such teacher data 115 is based on the data collected at facility D. It can also be applied to inference using data.

(Flow of processing executed by information processing device 2)
The flow of processing executed by the information processing device 2 will be explained based on FIG. 6. FIG. 6 is a flowchart illustrating an example of processing when the information processing device 2 generates an inference model. Although details will be described later, the series of processes shown in the flowchart of FIG. 6 includes a method of generating teacher data and a method of generating an inference model.

In S11, the data acquisition unit 201 acquires data obtained at each facility. For example, the data acquisition unit 201 may acquire time-series sensing data collected at the facilities A to C shown in FIG. 2, respectively, via the information processing devices 1A to 1C. Note that the data collected at each facility may be stored in advance in the storage unit 21 or the like, and in this case, the data acquisition unit 201 may acquire the data from the storage unit 21 or the like.

In S12, the connection target determination unit 202 determines the data to be connected by the data connection unit 204 from among the data acquired in S11. As described based on FIG. 3, the connection target determination unit 202 may determine data to be connected according to preset rules, and may exclude unrelated data from being connected.

In S13, the time-series data generation unit 203 generates time-series data to be used for the connection by the data connection unit 204 from the data determined to be the object of connection in S12. Specifically, the time series data generation unit 203 calculates the difference between the measured value and the set value from the measured value included in the data determined to be linked in S12 and the set value corresponding to the measured value. Or generate time series data with ratio as an element. This process can also be said to be a process of generating explanatory variables for an inference model using measured values. Further, the time series data generation unit 203 may also perform a process of associating the generated time series data with a value that is an objective variable of the inference model in S13.

In S14 (data concatenation step), the data concatenation unit 204 concatenates a plurality of time series data based on data collected at each of a plurality of facilities to generate one pseudo time series data.

In S15 (teacher data generation step), the teacher data generation unit 205 performs standardization processing or normalization processing on the pseudo time series data generated in S14, and uses it as teacher data. If the time-series data is not associated with the value to be the target variable in S13, the teacher data generation unit 205 associates the value with the pseudo-time-series data in S15.

In S16 (learning step), the learning unit 206 generates an inference model by machine learning using the teacher data generated in S15, and thus the process in FIG. 6 ends. Note that the learning unit 206 may transmit the generated inference model to the information processing device 1. Furthermore, at this time, the statistical amount used in the standardization process or normalization process in S15 may also be transmitted.

The process of FIG. 6 described above includes a method of generating teacher data. That is, the teacher data generation method executed by the information processing device 2 includes a data linking step (S14) and a teacher data generation step (S15). The data linking step (S14) connects a plurality of time series data based on data collected for each of a plurality of facilities (objects) to generate one pseudo time series data. In the teacher data generation step (S15), the pseudo time series data generated in S14 is subjected to standardization processing or normalization processing to become teacher data. Thereby, it is possible to generate training data that can generate a highly versatile inference model.

Additionally, the process in FIG. 6 also includes a method for generating an inference model. That is, the inference model generation method executed by the information processing device 2 includes a data linking step (S14), a teacher data generation step (S15), and a learning step (S16). The data linking step (S14) connects a plurality of time series data based on data collected for each of a plurality of facilities (objects) to generate one pseudo time series data. In the teacher data generation step (S15), the pseudo time series data generated in S14 is subjected to standardization processing or normalization processing to become teacher data. In the learning step (S16), an inference model is generated by machine learning using the teacher data generated in S15. Therefore, it is possible to generate a highly versatile inference model.

(Flow of processing executed by information processing device 1)
The data acquisition unit 101 of the information processing device 1 acquires the inference model generated in S16 and the statistics used in the standardization process or normalization process in S15. Furthermore, the data acquisition unit 101 acquires inference data collected about a facility that is a target of inference. For example, in the case of the information processing apparatus 1D in FIG. 2, inference data collected at the facility D is acquired.

Next, the preprocessing unit 102 performs standardization processing or normalization processing on the inference data using the above-mentioned statistics to generate input data for the inference model. Subsequently, the inference unit 103 inputs the input data generated by the preprocessing unit 102 to the inference model, and obtains an inference result based on the output value output by the inference model. Then, the control amount determination unit 104 determines the control amount for the equipment installed in the target facility based on the above inference result.

Conventionally, when there are multiple facilities as shown in Figure 2, the time-series data collected for each facility is standardized or normalized for each facility to generate teacher data for each facility. An inference model was generated specifically for each facility. Therefore, when making inferences using inference data at each facility, it was necessary to perform standardization processing or normalization processing using statistics applied at that facility.

On the other hand, in the information processing system 5 of FIG. 2, the information processing device 2 connects the time series data collected for each facility to form one pseudo time series data, and then performs standardization processing or normalization processing. . Therefore, standardization processing or normalization processing can be performed on inference data collected at any facility using the same statistical amount.

[Modified example]
The execution entity of each process described in each of the above-mentioned embodiments is arbitrary and is not limited to the above-mentioned examples. In other words, the devices constituting the information processing system 5 can be changed as appropriate, as long as they can execute the processes described in each of the above-described embodiments.

For example, in the information processing system 5 in FIG. 2, an information processing device 2 different from the information processing devices 1A to 1D generates teacher data and an inference model, but any of the information processing devices 1A to 1D Alternatively, the teacher data and the inference model may be generated. Furthermore, in the information processing system 5 in FIG. 2, the information processing devices 1A to 1D collect data in the facilities A to D and perform inference using an inference model, but these processes are performed using separate information. It may be executed by a processing device.

Furthermore, in the information processing system 5 of FIG. 2, one information processing device 2 generates the teacher data and the inference model, but the generation of the teacher data and the generation of the inference model are performed using different information. It can also be executed by a processing device.

Moreover, the execution entity of each process described in FIG. 6 does not necessarily have to be one device, and the processing can be shared and executed by a plurality of arbitrary information processing devices (computers). For example, the information processing device 2 may execute the processing of S11 to S15 in FIG. 6, and the processing of S16 may be executed by another information processing device.

[Example of implementation using software]
The functions of the information processing devices 1 and 2 (hereinafter referred to as "devices") are programs for making a computer function as the devices, and each control block of the devices (particularly each unit included in the

control units

10 and 20). ) can be realized by a program for making a computer function. For example, the inference model generation function in the information processing device 2 can be realized by an inference model generation program, and the teacher data generation function in the information processing device 2 can be realized by a teacher data generation program.

In this case, the device includes a computer having at least one control device (for example, a processor) and at least one storage device (for example, a memory) as hardware for executing the program. By executing the above program using this control device and storage device, each function described in each of the above embodiments is realized.

The above program may be recorded on one or more computer-readable recording media instead of temporary. This recording medium may or may not be included in the above device. In the latter case, the program may be supplied to the device via any transmission medium, wired or wireless.

Furthermore, part or all of the functions of each of the control blocks described above can also be realized by a logic circuit. For example, an integrated circuit in which a logic circuit functioning as each of the control blocks described above is formed is also included in the scope of the present invention. In addition to this, it is also possible to realize the functions of each of the control blocks described above using, for example, a quantum computer.

〔summary〕
The information processing device according to aspect 1 of the present invention includes: a data linking unit that connects a plurality of time series data based on data collected for each of a plurality of objects to generate one pseudo time series data; The present invention includes a teacher data generation section that performs standardization processing or normalization processing on the standard time series data to produce teacher data, and a learning section that generates an inference model by machine learning using the teacher data.

In the information processing device according to aspect 2 of the present invention, in aspect 1, the data linking unit connects the plurality of time series data by providing order information indicating a series of orders to the plurality of time series data. The configuration may be such that

In the information processing device according to aspect 3 of the present invention, in

aspect

1 or 2, the data concatenation unit performs the process of reducing numerical discontinuity of the concatenated portions in the plurality of time series data, and then concatenates the data. It may also be configured to perform the following.

In the information processing apparatus according to aspect 4 of the present invention, in any one of aspects 1 to 3, the measurement value and the setting value are determined from the measurement value measured for the object and the setting value corresponding to the measurement value. The apparatus further includes a time-series data generation unit that generates the time-series data for each of the plurality of objects, the time-series data having a difference or ratio as an element.

An information processing device according to aspect 5 of the present invention includes: a data linking unit that connects a plurality of time series data based on data collected for each of a plurality of objects to generate one pseudo time series data; and a teacher data generation unit that performs standardization processing or normalization processing on the historical time series data and generates teacher data.

A generation method according to aspect 6 of the present invention is an inference model generation method executed by one or more information processing devices, which connects a plurality of time series data based on data collected for each of a plurality of objects. a data concatenation step of generating one pseudo time series data using the above data, a teacher data generation step of performing standardization processing or normalization processing on the pseudo time series data to obtain training data, and a step of generating training data using the training data. A learning step of generating an inference model by machine learning.

A generation method according to aspect 7 of the present invention is a training data generation method executed by one or more information processing devices, which connects a plurality of time series data based on data collected for each of a plurality of objects. The method includes a data concatenation step of generating one piece of pseudo time series data, and a teacher data generation step of subjecting the pseudo time series data to standardization processing or normalization processing to obtain teacher data.

The inference model generation program according to aspect 8 of the present invention is an inference model generation program for causing a computer to function as the information processing apparatus according to aspect 1, and includes the data linking section, the teacher data generation section, and the inference model generation program. Make the computer function as a learning department.

A teacher data generation program according to aspect 9 of the present invention is a teacher data generation program for causing a computer to function as the information processing apparatus according to aspect 5, wherein the computer is used as the data linking section and the teacher data generation section. Make it work.

The present invention is not limited to the embodiments described above, and various modifications can be made within the scope of the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. are also included within the technical scope of the present invention.

2 Information processing device 203 Time series data generation section 204 Data connection section 205 Teacher data generation section 206

Learning section

111A, 111B, 111C Steam amount sensor data (measured value, set value)
112A, 112B, 112C Temperature sensor data (measured value, set value)
113A, 113B Time series data 114 Pseudo time series data 115 Training data

Claims

a data concatenation unit that generates one pseudo time series data by concatenating a plurality of time series data based on data collected for each of the plurality of targets;
a teacher data generation unit that performs standardization processing or normalization processing on the pseudo time series data to obtain teacher data;
An information processing device comprising: a learning unit that generates an inference model by machine learning using the teacher data.
The information processing device according to claim 1, wherein the data linking unit connects the plurality of time-series data by providing order information indicating a series of orders to the plurality of time-series data.
The information processing device according to claim 1 or 2, wherein the data linking unit performs the linking after performing processing to reduce discontinuity in numerical values of linked portions in the plurality of time series data.
Generate, for each of the plurality of objects, the time-series data whose element is the difference or ratio between the measured value and the setting value, from the measurement value measured for the object and the setting value corresponding to the measurement value. The information processing device according to claim 1 or 2, further comprising a time series data generation unit.
a data concatenation unit that generates one pseudo time series data by concatenating a plurality of time series data based on data collected for each of the plurality of targets;
An information processing device comprising: a teacher data generation unit that performs standardization processing or normalization processing on the pseudo time series data to generate teacher data.
A method for generating an inference model executed by one or more information processing devices, the method comprising:
a data linking step of linking multiple time series data based on data collected for each of the multiple targets to generate one pseudo time series data;
a training data generation step of performing standardization processing or normalization processing on the pseudo time series data to obtain training data;
A method for generating an inference model, comprising: a learning step of generating an inference model by machine learning using the teacher data.
A teaching data generation method executed by one or more information processing devices, the method comprising:
a data linking step of linking multiple time series data based on data collected for each of the multiple targets to generate one pseudo time series data;
A method for generating teacher data, comprising the step of generating teacher data by subjecting the pseudo time series data to standardization processing or normalization processing to obtain teacher data.
An inference model generation program for causing a computer to function as the information processing device according to claim 1, the inference model generation program for causing the computer to function as the data linking section, the teacher data generation section, and the learning section. .
A teacher data generation program for causing a computer to function as the information processing device according to claim 5, the teacher data generation program for causing the computer to function as the data linking unit and the teacher data generation unit.