US20240028872A1 - Estimation apparatus, learning apparatus, methods and programs for the same - Google Patents

Estimation apparatus, learning apparatus, methods and programs for the same Download PDF

Info

Publication number
US20240028872A1
US20240028872A1 US17/425,684 US201917425684A US2024028872A1 US 20240028872 A1 US20240028872 A1 US 20240028872A1 US 201917425684 A US201917425684 A US 201917425684A US 2024028872 A1 US2024028872 A1 US 2024028872A1
Authority
US
United States
Prior art keywords
parameter
state
observed amount
estimation
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/425,684
Inventor
Shin Murata
Yuma KOIZUMI
Noboru Harada
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION reassignment NIPPON TELEGRAPH AND TELEPHONE CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HARADA, NOBORU, MURATA, SHIN, KOIZUMI, Yuma
Publication of US20240028872A1 publication Critical patent/US20240028872A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

An estimation apparatus includes a state estimation unit that estimates a state from an observed amount using an encoder, an observed amount estimation unit that estimates an observed amount from a state using a decoder, and a future observed amount estimation unit that estimates a future observed amount, which is a value to which the observed amount changes with time, using a parameter K representing time evolution, where a parameter of the encoder, a parameter of the decoder, and the parameter K are optimized simultaneously.

Description

    TECHNICAL FIELD
  • The present invention relates to an estimation apparatus that estimates a state from an observed amount using a state-space model and a learning apparatus, and relates to methods therefor and programs therefor.
  • BACKGROUND ART
  • A framework called a state-space model is widely used to analyze the properties of objects from series data. A state-space model includes a hidden “state model” that is unobservable and an “observation model” that is a result of observation, and can be considered as a model in which an amount called a “state” evolves with time and series data of observed amounts (for example, “amounts” regarding current, sound pressure, an image, or the like) is generated from these states through an observation process.
  • A state changes non-linearly with time (evolves non-linearly with time) and an observed amount can be obtained by performing an observation process of changing non-linearly with time as the state changes with time (i.e., can be obtained through the observation process). It is difficult to learn the entirety of a state-space model from observed amounts alone without prior assumptions due to the non-linearity of the observation process and time evolution. On the other hand, a method called Koopman mode decomposition which has been studied in recent years can avoid the above non-linearity by considering the state-space model in another domain (a function space) (see Non Patent Literatures 1 and 2).
  • CITATION LIST Non Patent Literature
    • Non Patent Literature 1: Matthew O. Williams. Clarence W. Rowley, and Ioannis G. Kevrekidis., “A Kernel-Based Approach to Data-Driven Koopman Spectral Analysis,” Journal of Computational Dynamic, 2:247-265, 2015. arXiv: 1411.2260.
    • Non Patent Literature 2: Matthew O. Williams, Ioannis G. Kevrekidis, and Clarence W. Rowley., “A Data-Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition,” Journal of Nonlinear Science, 25(6):1307-1346, 2015.
    SUMMARY OF THE INVENTION Technical Problem
  • However, when the state is unknown, all of the observation process, the time evolution, and the state still cannot be learned from observed amounts alone because the Koopman mode decomposition cannot be applied.
  • It is an object of the present invention to provide a learning apparatus that learns an observation process, a time evolution, and a state from an observed amount alone, an estimation apparatus that estimates a state from an observed amount using the observation process, the time evolution, and the state learned from the observed amount alone, methods for the learning and estimation apparatuses, and programs for the learning and estimation apparatuses.
  • Means for Solving the Problem
  • To achieve the above object, one aspect of the present invention is to provide an estimation apparatus including a state estimation unit configured to estimate a state from an observed amount using an encoder, an observed amount estimation unit configured to estimate an observed amount from a state using a decoder, and a future observed amount estimation unit configured to estimate a future observed amount using a parameter K representing time evolution, the future observed amount being a value to which the observed amount changes with time, wherein a parameter of the encoder, a parameter of the decoder, and the parameter K are optimized simultaneously.
  • Effects of the Invention
  • According to the present invention, the observation process, the time evolution, and the state can be learned from the observed amount alone. Another advantage is that a state can be estimated from an observed amount by using the observation process, the time evolution, and the state learned from the observed amount alone.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 is a diagram illustrating an outline of a first embodiment.
  • FIG. 2A is a diagram for explaining a spatial state model of the related art and FIG. 2B is a diagram for explaining a framework of an autoencoder of the first embodiment.
  • FIG. 3 is a diagram illustrating an exemplary configuration of an estimation system according to the first embodiment.
  • FIG. 4 is a functional block diagram of a learning apparatus according to the first embodiment.
  • FIG. 5 is a flowchart of an example of processing of the learning apparatus according to the first embodiment.
  • FIG. 6 is a diagram for explaining an algorithm at a learning stage.
  • FIG. 7 is a functional block diagram of an estimation apparatus according to the first embodiment.
  • FIG. 8 is a flowchart of an example of processing of the estimation apparatus according to the first embodiment.
  • FIG. 9 is a diagram for explaining an algorithm at an estimation stage.
  • FIG. 10 is a diagram illustrating an example in which series data was generated by actually learning parameters with series data of data based on image data taken as an input.
  • FIG. 11 is a diagram for explaining an algorithm for predicting an observed amount corresponding to a state.
  • FIG. 12 is a functional block diagram of an abnormality detection apparatus according to a second embodiment.
  • FIG. 13 is a flowchart of an example of preprocessing of the abnormality detection apparatus according to the second embodiment.
  • FIG. 14 is a flowchart of an example of an abnormality detection process of the abnormality detection apparatus according to the second embodiment.
  • FIG. 15 is a diagram illustrating an outline of a third embodiment.
  • FIG. 16 is a functional block diagram of a learning apparatus according to the third embodiment.
  • FIG. 17 is a functional block diagram of an estimation unit of an estimation apparatus according to the third embodiment.
  • FIG. 18 is a flowchart of an example of processing of the estimation unit of the estimation apparatus according to the third embodiment.
  • DESCRIPTION OF EMBODIMENTS
  • Hereinafter, embodiments of the present invention will be described. In the drawings used in the following description, the same reference signs are given to components having the same function or the steps of performing the same processing, and duplicate description is omitted. In the following description, a symbol “{circumflex over ( )}” or the like used in the text is originally written directly above the character immediately after it, but is written immediately before the character due to a limitation of text notation. In the expressions, such symbols are written in their original positions. It is assumed that processing performed for each element of a vector or a matrix is applied to all elements of the vector or the matrix unless otherwise specified.
  • Points of First Embodiment
  • In the present embodiment, not only a time evolution and an observation process but also the “inverse function” of the observation process are learned simultaneously, thereby making it possible to learn all of the observation process, a time evolution, and a state from an observed amount. Specifically, an autoencoder network is used (see Reference 1).
    • (Reference 1) G. E. Hinton. “Reducing the Dimensionality of Data with Neural Networks,” Science, 313(5786): 504-507, July 2006.
  • An autoencoder converts an input into a smaller number of dimensions using a network called an encoder and restores it using a network called a decoder. Here, the basic technique of the present embodiment involves regarding the encoder as the “inverse function” and the decoder as the observation process, thereby enabling simultaneous learning of the inverse function and the observation process. A state-space model includes an observation model that estimates an observed amount from a state that changes with elapse of time and is unobservable, and a state model that estimates a state that changes with elapse of time, as described above. That is, a state-space model is based on the premise that an observed amount is estimated from a state. However, such a model cannot be constructed if the state is unknown as described above. Thus, the framework of the autoencoder is used to learn the encoder as a model that estimates a state from an observed amount and the decoder as a model that estimates an observed amount from a state, thereby enabling construction of a model that estimates an observed amount from a state. Namely, this construction is to learn a model with the input and output reversed.
  • The basic technique of the present embodiment will be described below.
  • A state xt will be considered. Here, it does not matter whether the state is abstract or concrete. The state evolves with time as follows.

  • x t+1 =f(x t)  (1)
  • From this state xt, an observed amount yt is obtained through an observation process given by the following expression.

  • y t =g(x t)  (2)
  • Here, the observed amount is that quantified using some method (such as, for example, a current/voltage, a temperature, a sound pressure, or an image) and may be multidimensional data such as that obtained by a microphone array.
  • The goal of the framework of the state-space model is to determine the following when series data {y1, . . . , yT} of observed amounts has been obtained.
      • Series data of states {x1, . . . , xT}
      • Time evolution function f(xt)
      • Observation process function g(xt)
  • However, it is generally difficult to determine the entirety of the state-space model from observed amounts alone. This is partly because the time evolution f(xt) and the observation process g(xt) are generally non-linear and it is difficult to determine them without prior knowledge.
  • Koopman mode decomposition is a technique that can avoid the above non-linearity.
  • Koopman Mode Decomposition
  • Basis Function

  • Ψ(x)≡[ψ1(x), . . . ,ψτ(x), . . . ]  [Math. 1]
  • The observation process function is expanded as follows using a basis function given above.
  • [ Math . 2 ] g ( x t ) = r = 1 b r ψ r ( x t ) = B Ψ ( x t ) ( 3 )
  • Here, using the idea of the Koopman operator, the time evolution of the basis function can be rewritten as follows.

  • Ψ(x T+1)=Ψ(f(x T))=KΨ(x T)  (4)
  • In summary, by describing in an (infinite dimensional) function space with an appropriate transformation zt=Ψ(xt), an original state-space model can be rewritten as a linear state-space model.
  • The above original state-space model is given by the following expression.
  • [ Math . 3 ] { x t + 1 = f ( x t ) y t = g ( x t ) ( 5 )
  • The above linear state-space model is given by the following expression.
  • [ Math . 4 ] { z t + 1 = Kz t y t = Bz t ( 6 )
  • As mentioned above, it is difficult to learn a generative model without prior information because of the non-linearity of the time evolution and the observation process in the state-space model. On the other hand, Koopman mode decomposition can avoid the non-linearity by describing the state-space model in the function space. However, Koopman mode decomposition is applicable when the state xt is known. Thus, even if Koopman mode decomposition is used, it is difficult to learn the time evolution, the observation process, and the state from the series data of observed amounts alone.
  • Therefore, the present embodiment devises a new state estimation method based on the framework of Koopman mode decomposition. Use of the present embodiment enables learning of all of the time evolution, the observation process, and the state simultaneously from the series data of observed amounts alone.
  • Basic Idea
  • FIG. 1 illustrates an outline of the present embodiment. A state xt, an observation process ({circumflex over ( )}Ψ, B), and a time evolution K are learned taking series data {yt} of observed amounts as an input. After learning, the state xt and series data {circumflex over ( )}y1 (t), {circumflex over ( )}y2 (t), . . . of observed amounts predicted from a certain observed amount yt are output.
  • An observed amount {circumflex over ( )}yt is generated from the state xt as follows.
  • [ Math . 5 ] { z t ^ = Ψ ( x t ) y ^ t = B z t ^ ( 7 )
  • Therefore, to estimate the state xt from the current observed amount yt, it is necessary to solve the inverse problem of the above. The inverse problem is given by the following expression.
  • [ Math . 6 ] { z t = B - 1 ( y t ) x t = Ψ - 1 ( z t ) ( 8 )
  • B−1 (•) can be obtained analytically by a pseudo inverse matrix of B or ridge regression. On the other hand, it is generally difficult to obtain the inverse function Ψ−1(•) of the basis function Ψ(•). Here, an implementation example of determining the inverse function, the basis function, and the state using the framework of an autoencoder network will be described. An autoencoder network is a neural network used to reduce the number of dimensions of data, where an input is transferred to a middle layer through an encoder and restored through a decoder. This is expressed as follows.
  • [ Math . 7 ] { x = Φ - 1 ( z ; w ent ) z = Φ ( x ; w dec ) ( 9 )
  • Here, the encoder Φ−1 is regarded as the inverse function Ψ−1(•) of the basis function and the decoder Φ is regarded as the basis function Ψ(•).
  • In other words, in the related art, a basis which enables linear transformation with elapse of time is obtained to obtain a spatial state model in which a state xt is converted into {circumflex over ( )}zt using the basis and a predicted observed amount {circumflex over ( )}yt is derived from {circumflex over ( )}zt and a coefficient B (see FIGS. 1 and 2A). On the other hand, in the present embodiment, learning is performed using the framework of an autoencoder including an encoder which takes an observed amount yt as an input and outputs a state xt and a decoder which takes the state xt as an input and outputs an observed amount {circumflex over ( )}yt (See FIGS. 1 and 2B).
  • First Embodiment
  • The present embodiment is divided into two stages. One is a stage of learning a time evolution, an observation process, and an inverse function thereof from time series data of observed amounts (hereinafter also referred to as a learning stage). The other is a stage of acquiring a state from an observed amount (hereinafter also referred to as a state acquisition stage).
  • A state acquisition system according to the present embodiment includes a learning apparatus 100 and a state acquisition apparatus 200 (see FIG. 3 ). The state acquisition apparatus 200 is also referred to as an estimation apparatus 200 because it estimates a state from an observed amount or an observed amount from a state. Similarly, the state acquisition stage is also referred to as an estimation stage and the state acquisition system is also referred to as an estimation system.
  • The learning apparatus 100 executes the learning stage and the estimation apparatus 200 executes the estimation stage. First, the learning stage will be described.
  • Learning Stage
  • FIG. 4 illustrates a functional block diagram of the learning apparatus 100 and FIG. 5 illustrates a flowchart of an example of its processing. The learning apparatus 100 includes an initialization unit 110, an estimation unit 120, an objective function calculation unit 130, and a parameter update unit 140.
  • The learning apparatus 100 takes series data {yt (L)} of observed amounts for learning as an input and outputs parameters (wenc, wdec, K, B) which are learning results. wenc represents a parameter used in the encoder (a parameter of the inverse function Ψ−1), wdec represents a parameter used in the decoder (a parameter of the basis function Ψ), K represents a parameter representing the time evolution, and B represents the expansion coefficient.
  • The learning stage is performed as in algorithm 1 of FIG. 6 . This illustrates an example in which an observed amount for learning is data based on image data. Here, series data {yt}(e.g. moving image data) including pieces of data based on image data will be considered as input data. The data yt based on the image data may be, for example, data including pixel values of pixels and having a number of dimensions corresponding to the number of the pixels or may be a feature obtained from the image data, a feature obtained from the moving image data, or the like. When the data based on image data is taken as input data, the state represents an abstract state corresponding to the observed amount yt and the abstract state may have a value with a physical meaning (for example, a value with a smaller number of dimensions than the observed amount). For example, the state represents an amount similar to the size or position of a periodic pattern or object appearing in image data or an amount similar to the phase of a moving body in moving image data including pieces of image data (wfhen the movement is periodic).
  • Initialization Unit 110
  • Prior to learning, the initialization unit 110 initializes the parameters wenc (k), wdec (k), K(k), and B(k) used for inference (S110) and outputs the initialized parameters wenc (0), wdec (0), K(0), B(0) to the estimation unit 120. Further, an index k indicating the number of updates is set such that k=0.
  • Estimation Unit 120
  • The estimation unit 120 takes the series data {yt (L)} of observed amounts for learning and the initialized or updated parameters wenc (k), wdec (k), K(k), and B(k) as inputs to perform (1) estimation of a basis function value (to obtain an estimated value zt), (2) estimation of a state (to obtain an estimated value xt), (3) estimation of a reconstructed basis function value (to obtain an estimated value {circumflex over ( )}zt), (4) prediction of a basis function (to obtain a predicted value {circumflex over ( )}zτ (t)), and (5) prediction of an observed amount (to obtain a predicted value {circumflex over ( )}yτ (t))(S120) and outputs the estimated values zt, xt, and {circumflex over ( )}zt and the predicted values {circumflex over ( )}zτ (t) and {circumflex over ( )}yτ (t)). Hereinafter, estimation and prediction are collectively referred to as estimation. Details of the estimation will be described below.
  • (1) First, the estimation unit 120 estimates the basis function value zt using the current expansion coefficient B(k). That is, zt=B(k)+(yt (L)). Here, B(k)+(•) represents solving the regression problem. In the present embodiment, B(k)+(•)=(B(k)TB(k)+σ1)−1B(k)T is used for the problem of ridge regression, but general linear regression using a pseudo inverse matrix or a sparse estimation algorithm such as LASSO may also be used. Here, σ is a predetermined weight parameter in the ridge regression.
  • (2) Next, the estimation unit 120 estimates the state such that xt−1(zt; wenc (k)) using the neural network of Expression (9).
  • (3) Further, the estimation unit 120 estimates a reconstructed basis function value such that {circumflex over ( )}zt=Φ(xt; wdec (k) using the neural network of Expression (9).
  • (4) Next, the estimation unit 120 performs linear prediction of a basis function value for τ=0, . . . , T to obtain a predicted value of the basis function value such that {circumflex over ( )}zτ (t)=K(k)τ{circumflex over ( )}zt.
  • (5) Based on the predicted value {circumflex over ( )}zτ (t), the estimation unit 120 obtains a predicted value of the observed amount such that {circumflex over ( )}yτ (t)=B(k){circumflex over ( )}zτ (t).
  • Objective Function Calculation Unit 130
  • The objective function calculation unit 130 takes the series data {yt+τ (L)} of observed amounts for learning, the series data of estimated values {zt+τ}, {xt+τ}, and {{circumflex over ( )}zt+τ}, the series data of predicted values {zτ (t)}, {{circumflex over ( )}yτ(t)}, and the parameters wenc (k) and wdec (k) as inputs to calculate a value of the objective function J(Θ) (S130) and outputs the calculated value J(Θ). Here. Θ is a set of parameters wenc (k), wdec (k), K(k), and B(k).
  • (i) First, the objective function calculation unit 130 obtains a prediction error of the observed amount using the following expression.

  • l 1 =E[Σ τ=0 Tδτ ∥ŷ τ (t) −y t+τ (t)∥]2  [Math. 8]
  • Here, δτ satisfies 0<δτ<1, which is a weight parameter for setting the error such that it is estimated as being higher as the corresponding time t+τ becomes closer to the current time t, and E[•] represents expected value calculation.
  • (ii) The objective function calculation unit 130 also obtains a prediction error of the basis function using the following expression.

  • l 2 =E[Σ τ=0 Tδτ ∥{circumflex over (z)} τ (t) −z t+τ∥]2  [Math. 9]
  • (iii) A regularization term Ω1 for the weights of the neural network is further introduced as a regularization term. Here, Ω1=∥wenc (k)∥22+∥wdec (k)∥22.
  • (iv) Structures of the state are also introduced. In the present embodiment, smoothness Ω2=E[∥xt+1−xt∥22] and non-Gaussianity Ω3=E[log cosh(xt)] are introduced. Here, cosh(•) represents a hyperbolic function.
  • (v) The objective function calculation unit 130 obtains the value of the objective function J(Θ)=aJ1+bJ2+p1Ω1+p2Ω2+p3Ω3 with the above terms weighted. Here, a, b, p1, p2, and p3 are parameters for determining how much importance is placed on J1, J2, Ω1, Ω2, and Ω3, respectively, and are appropriately set using experimental results and simulation results.
  • Parameter Update Unit 140
  • The parameter update unit 140 receives the objective function J(Θ) and updates each parameter wenc (k), wdec (k), K(k), and B(k) (S140). For example, using back propagation, the parameter update unit 140 calculates the gradient ΔΘJ with respect to each parameter and updates each parameter wenc (k), wdec (k), K(k), and B(k) k+1)(k)ΘJ).
  • If a predetermined condition (for example, a predetermined number of loops have ended or the objective function does not change) is not satisfied (no in S141), the parameter update unit 140 outputs the updated parameters wenc (k+1), wdec (k+1), K(k+1), and B(k+1) to the estimation unit 120, outputs the updated parameters wenc (k+1) and wdec (k+1) for calculating the regularization term Ω1 to the objective function calculation unit 130, sets k such that k←k+1, and repeats S120 to S140.
  • If the predetermined condition is satisfied (yes in S141), the parameter update unit 140 stops the parameter update and completes learning the model. The parameter update unit 140 outputs the latest parameters wenc (k), wdec (k), K(k), and B(k) to the estimation apparatus 200 as parameters (wenc, wdec, K, B) that are learning results.
  • Estimation Stage
  • FIG. 7 illustrates a functional block diagram of the estimation apparatus 200 and FIG. 8 is a flowchart of an example of its processing. The estimation apparatus 200 includes an estimation unit 220.
  • The estimation apparatus 200 sets the parameters (wenc, wdec, K, B) received from the learning apparatus 100 in the estimation unit 220 prior to the estimation and prediction.
  • Algorithm 2
  • When algorithm 2 of FIG. 9 is executed, the estimation apparatus 200 takes an observed amount yt as an input, estimates a state corresponding to the observed amount yt, predicts series data of observed amounts subsequent to yt, and outputs the estimated value xt and the predicted series data {{circumflex over ( )}yτ (t)}. With the algorithm 2 of FIG. 9 , data based on appropriate image data is given to the estimation unit 220 as an observed amount yt and image data of up to T steps ahead can be predicted. For example, the estimation apparatus 200 may take series data yt, yt+1, . . . , and yt+N of observed amounts as an input and output series data xt, xt+1, . . . , xt+N of estimated values and N pieces of predicted series data {{circumflex over ( )}yτ (t)}, {{circumflex over ( )}yτ (t+1)}, . . . , {{circumflex over ( )}yτ (t+N)}.
  • Estimation Unit 220
  • The estimation unit 220 takes an observed amount yt as an input and performs predetermined processes on the observed amount yt to estimate a state (S220). In the present embodiment, the predetermined processes are (1) estimation of a basis function value (to obtain an estimated value zt) and (2) estimation of a state (to obtain an estimated value xt). Further, the estimation unit 220 performs (3) estimation of a reconstructed basis function value (to obtain an estimated value {circumflex over ( )}zt), (4) prediction of a basis function (to obtain a predicted value {circumflex over ( )}zτ (t)), and (5) prediction of observed amounts (to obtain predicted values {circumflex over ( )}yτ (t)) (S220), and outputs the estimated value xt and series data of predicted values {{circumflex over ( )}yτ (t)}.
  • (1) First, the estimation unit 220 estimates a basis function value zt using the expansion coefficient B. That is, zt=B+(yt).
  • (2) Next, the estimation unit 220 estimates a state such that xt−1(zt; wenc) using the neural network of Expression (9).
  • (3) Further, the estimation unit 220 estimates a reconstructed basis function value such that {circumflex over ( )}zt=Φ(xt; wdec) using the neural network of Expression (9).
  • (4) Next, the estimation unit 220 performs linear prediction of a basis function value for τ=0, . . . , T to obtain a predicted value of the basis function value such that {circumflex over ( )}zτ (t)=Kτ{circumflex over ( )}zt.
  • (5) Based on the predicted value {circumflex over ( )}zτ (t), the estimation unit 220 obtains a predicted value of the observed amount such that {circumflex over ( )}yτ(t)=B{circumflex over ( )}zτ (t).
  • The above processes (3) to (5) are processes for obtaining the observed amount from the state and can be said to be the reverse of the above processes (1) and (2).
  • FIG. 10 illustrates an example in which series data was generated by actually learning parameters with series data of data based on image data taken as an input.
  • The upper row of FIG. 10 is series data of data based on actual image data (series data of observed amounts for learning). On the other hand, the lower row is series data (predicted series data) generated for τ=1, . . . , 10 with image data given as “Input” at the left end of the upper row input as yt of the algorithm 2.
  • Algorithm 3
  • When algorithm 3 of FIG. 11 is executed, the estimation apparatus 200 takes a state xt as an input to predict an observed amount corresponding to the state xt and outputs predicted series data {{circumflex over ( )}yτ (t)}. With the algorithm 3 of FIG. 11 , an appropriate state xt is given to the estimation unit 220 and images of up to T steps ahead can be predicted.
  • The estimation unit 220 takes a certain state xt as an input and performs (1) estimation of a reconstructed basis function value (to obtain an estimated value {circumflex over ( )}zt), (2) prediction of a basis function (to obtain a predicted value {circumflex over ( )}zτ (t), and (3) prediction of observed amounts (to obtain predicted values {circumflex over ( )}yτ (t)) (S220), and outputs series data of predicted values {{circumflex over ( )}yτ (t)}.
  • (1) The estimation unit 220 estimates a reconstructed basis function value such that {circumflex over ( )}zt=Φ(xt; wdec) using the neural network of Expression (9).
  • (2) Next, the estimation unit 220 performs linear prediction of a basis function value for τ=0, . . . , T to obtain a predicted value of the basis function value such that {circumflex over ( )}zτ (t)=Kt{circumflex over ( )}zt.
  • (3) Based on the predicted value {circumflex over ( )}zτ (t), the estimation unit 220 obtains a predicted value of the observed amount such that {circumflex over ( )}yτ (t)=B{circumflex over ( )}zτ (t).
  • Advantages
  • At the learning stage, the observation process, the time evolution, and the state can be learned from the observed amount alone.
  • At the estimation stage, the state can be estimated from the observed amount by using the observation process, the time evolution, and the state (the learned model) that have been learned from the observed amount alone. In addition, an observed amount can be predicted from an estimated state or a given state. That is, series data can be predicted by estimating a state from a current observed amount, simulating the time evolution, and observing the state. An observed amount or a state is also given, thereby data (a state or an observed amount) can be artificially generated.
  • A state can be estimated from series data observed by a sensor or the like and can be used for analysis of an observed amount. In addition, a state (for example, with a small number of dimensions) can be estimated from series data that is difficult to visually identify (for example, with a large number of dimensions) and the estimated state is presented, so that the observed amount can be converted (visualized) into one that is easy to visually identify.
  • Further, a distance between a predicted value and an actually observed amount can be appropriately defined and can be applied to abnormality detection of series data.
  • Modifications
  • Although the present embodiment has been described with respect to the case where the observed amount is data based on image data, other data may also be used. For example, data based on acoustic data, data based on vibration data, or a combination of data based on acoustic data and data based on vibration data can be considered. This will be described in more detail below.
  • Acoustic Data
  • When the observed amount is data based on acoustic data, sound pressure waveform data acquired from a microphone or its feature (such as STFT or log-Mel power) is taken as an input yt. When sound is collected with a microphone array, a vector obtained by combining a number of pieces of sound pressure waveform data or its feature corresponding to the number of elements of the microphone array is taken as the input yt.
  • In this case, the state represents an abstract state corresponding to the observed amount yt and the abstract state may have a value with a physical meaning (for example, a value with a smaller number of dimensions than the observed amount). For example, the state represents an amount of the waveform of a sound source, or the like, an amount of the position of the sound source, or the like (when the sound source moves), or an amount of the phase (when the sound is periodic), or the like.
  • Vibration Data
  • When the observed amount is data based on vibration data, acceleration waveform data acquired from a vibration pickup or its feature (such as STFT or log-Mel power) is taken as an input yt. When vibration data is collected with a plurality of vibration pickups, a vector obtained by combining a number of pieces of waveform data or its feature corresponding to the number of the vibration pickups is taken as the input.
  • In this case, the state represents an abstract state corresponding to the observed amount yt and the abstract state may have a value with a physical meaning (for example, a value with a smaller number of dimensions than the observed amount). For example, the state represents an amount of the mode of vibration, or the like, or an amount of the phase, or the like (when the vibration is that of a quasi-periodically moving object).
  • Acoustic/Vibration Data
  • When the observed amount is a combination of the data based on acoustic data and the data based on vibration data described above, a vector obtained by combining a number of pieces of acceleration waveform data acquired from vibration pickups or its features corresponding to the number of the vibration pickups and a number of pieces of sound pressure waveform data acquired from microphones or its features corresponding to the number of the microphones is taken as the input yt.
  • In this case, the state xt represents an abstract state corresponding to the observed amount yt and the abstract state may have a value with a physical meaning (for example, a value with a smaller number of dimensions than the observed amount). For example, the state represents an amount of the waveform of a sound source (a vibration source), or the like, or an amount of the mode of vibration, or the like.
  • Second Embodiment
  • Parts different from the first embodiment will be mainly described.
  • In the present embodiment, the present invention is applied to abnormality detection.
  • FIG. 12 illustrates a functional block diagram of an abnormality detection apparatus according to the present embodiment.
  • First, according to the method described in the first embodiment, the learning apparatus 100 takes series data {yt (L)} of observed amounts for learning as an input, learns parameters (wenc, wdec, K, B), and outputs the learned parameters.
  • According to the method described in the first embodiment, the estimation apparatus 200 sets the parameters (wenc, wdec, K, B) received from the learning apparatus 100 in the estimation unit 220 prior to the estimation and prediction.
  • Abnormality Detection apparatus 300
  • The abnormality detection apparatus 300 includes an error vector calculation unit 310, a mean-variance-covariance matrix calculation unit 320, and a detection unit 330 (see FIG. 12 ).
  • The abnormality detection apparatus 300 performs an abnormality detection process and preprocessing for obtaining parameters in advance before the abnormality detection process. First, the preprocessing will be described.
  • Preprocessing
  • First, a dataset Dnormal={y1, v2, . . . , yT_1} for normal observed amounts is prepared. Here, the subscript A_B means AB. From this dataset Dnornml, sub-series of length L, Dt={yt+1, . . . , yt+L} for t=1, 2, . . . , T1-L are extracted.
  • Next, according to the method described in the first embodiment, the estimation apparatus 200 takes each observed amount yt as an input, predicts series data of observed amounts subsequent to yt, and outputs the predicted series data of length L, Pt={{circumflex over ( )}y(t) 1, . . . , {circumflex over ( )}y(t) L}.
  • FIG. 13 is a flow chart of an example of the preprocessing.
  • Prior to the abnormality detection process, the abnormality detection apparatus 300 takes the T1-L sub-series Dt obtained from the dataset Dnormal of the normal observed amounts and the T1-L pieces of predicted series data Pt as inputs and calculates a mean y and a variance-covariance matrix S which will be described later.
  • The error vector calculation unit 310 takes the T1-L sub-series Dt={yt+1, . . . , yt+L} and the T1-L pieces of series data Pt={{circumflex over ( )}y(t) 1, . . . , {circumflex over ( )}y(t) L} as inputs, calculates errors between elements of the sub-series Dt={yt+1, . . . , yt+L} and elements of the predicted series data Pt={{circumflex over ( )}y(t), . . . , {circumflex over ( )}y(t) L} (S310-A), and outputs an error vector et=[e(t) 1 . . . e(t) L]T. Here, e(t) i=yt+1−{circumflex over ( )}y(t) i where i=1, . . . , L. An error vector when the observed amounts yt have been input (when the series data Pt={{circumflex over ( )}y(t) 1, . . . , {circumflex over ( )}y(t) L} predicted from the observed amounts yt and the sub-series Dt={yt+1, . . . , yt+L} corresponding to the series data Pt have been taken as inputs in this case) is defined such that et=[e(t) 1 . . . e(t) L]T. This error vector et is a vector having a length of (D×L) when the number of dimensions of each observed amount yt is D. The error vector calculation unit 310 performs this error vector calculation for all the T1-L sub-series Dt and T1-L pieces of series data Pt.
  • The mean-variance-covariance matrix calculation unit 320 takes the T1-L error vectors et=[e(t) 1 . . . e(t) L]T as inputs, calculates a mean μ and a variance-covariance matrix S using the following expression (S320), and outputs the calculated mean p and variance-covariance matrix S to the detection unit 330.
  • [ Math . 10 ] μ = 1 T 1 - 1 t = 1 T 1 - 1 e t [ Math . 11 ] S = 1 T 1 - 1 t = 1 T 1 - 1 ( e t - μ ) ( e t - μ ) T
  • Abnormality Detection Process
  • When a dataset Dnew={y′1, y′2, . . . , yT_2} of observed amounts to be detected for abnormality has been obtained, sub-series of length L, D′t={yt′+1, . . . , yt′+L} for t′=1, 2, . . . , T2-L, are extracted similar to the preprocessing.
  • Next, according to the method described in the first embodiment, the estimation apparatus 200 takes each observed amount y′t′ as an input, predicts series data of observed amounts subsequent to y′t′, and outputs the predicted series data of length L P′t′={{circumflex over ( )}y′(t′), . . . , {circumflex over ( )}y′(t′) L}.
  • FIG. 14 is a flowchart of an example of the abnormality detection process.
  • During the abnormality detection process, the abnormality detection apparatus 300 takes the T2-L sub-series D′t obtained from the dataset Dnew of observed amounts to be detected for abnormality and the T2-L pieces of predicted series data P′t={{circumflex over ( )}y′(t′) 1, . . . , {circumflex over ( )}y′(t′) L} as inputs and outputs a detection result for the sub-series D′t′ and the series data P′t′. Because the series data P′t′ has been predicted from the observed amount y′t′ and the sub-series D′t′ corresponds to the series data P′t′, the detection result for the sub-series D′t′ and the series data P′t′ can be said to be a detection result for the observed amount y′t′.
  • The error vector calculation unit 310 takes the T2-L sub-series D′t′={y′t′+1, . . . , yt′+L} and the T2-L pieces of series data Pt′={{circumflex over ( )}y′(t′) 1, . . . , {circumflex over ( )}y′(t′) L} as inputs, calculates errors of between elements of the sub-series D′t′={y′t′+1, . . . , yt′+L} and elements of the predicted series data Pt′={{circumflex over ( )}y′(t′) 1, . . . , {circumflex over ( )}y′(t′) L} (S310-B), and outputs an error vector e′t=[e′(t′) 1 . . . e′(t′) L]T. Here, e′(t′) i=y′t′+i−{circumflex over ( )}y′(t′) i where i=1, . . . , L.
  • The detection unit 330 receives the mean p and the variance-covariance matrix S prior to the abnormality detection.
  • The detection unit 330 takes the T2-L error vectors et=[e′(t′) 1 . . . e′(t′) L]T as inputs and calculates the following degree of abnormality Lt′ for each time t′=1, . . . . T2-L using the mean μ, the variance-covariance matrix S, and the error vectors e′t′ (S330-1).

  • L t′ =logdet(S)+(e′ t′−μ)T S −1(e′ t′−μ)T
  • This degree of abnormality Lt′ is a quantity proportional to the negative log-likelihood when the error vectors have been fitted into a normal distribution.
  • Next, the detection unit 330 determines whether or not there is an abnormality based on which of a value corresponding to the degree of abnormality Lt′ and a threshold value p is large or small to detect the abnormality (S330-2) and outputs the detection result. For example, the detection unit 330 determines that there is abnormality when the degree of abnormality satisfies Lt′>p and determines that there is no abnormality when the degree of abnormality satisfies Lt′≤p. The threshold value p is appropriately determined in advance by experiments, simulations, or the like.
  • The above configuration allows the present invention to be applied to abnormality detection.
  • Third Embodiment
  • Parts different from the first embodiment will be mainly described.
  • In the first embodiment, the case has been discussed where the inverse function exists for a function that outputs a prediction of an observed amount (a predicted value {circumflex over ( )}yτ (t) from a state xt. However, for example, when the number of dimensions of the state xt is smaller than that of the prediction of the observed amount (the predicted value {circumflex over ( )}yτ (t)), it is not clear whether there is an inverse function that estimates the state xt from the observed amount yt.
  • In addition, when the number of dimensions of the observed amount yt is smaller than that of the basis function value zt, the problem of determining the basis function value zt from the observed amount yt is an underdetermined problem and it is necessary to introduce a regularization term appropriately.
  • Therefore, in the present embodiment, the state is estimated from the observed amount by considering a generative model without considering the inverse function. FIG. 15 illustrates the proposed model. The right side (a dashed line part) of FIG. 15 illustrates an extended dynamic mode decomposition (EDMD) part which is a numerical calculation method for Koopman mode decomposition.
  • First, mean and variance parameters are estimated from observed amounts yt and yt+1 using a neural network ((αt, μt)←˜Ψ(yt, yt+1)). This process corresponds to encoding of a variational autoencoder and a part that performs this process is called an encoder.
  • Next, a latent variable (state xt) is sampled from a multivariate normal distribution according to the obtained values of the estimated mean and variance parameters μt and σt. Here, et in FIG. 15 is a random number obtained from a normal distribution with mean 0 and variance 1.
  • Subsequent processing is similar to that of the first embodiment. An outline will be described below.
  • (3) Estimation of a reconstructed basis function value (to obtain an estimated value {circumflex over ( )}zt←Ψ(xt)), (4) prediction of a basis function (to obtain a predicted value {circumflex over ( )}zτ (t)=Kτ{circumflex over ( )}zt), and (5) prediction of an observed amount (to obtain a predicted value {circumflex over ( )}yτ (t)=B{circumflex over ( )}zτ (t))) are performed to output the estimated value xt and series data of predicted values {yτ (t)}. The process of obtaining a predicted value {circumflex over ( )}yτ (t) of the observed amount from the state xt corresponds to decoding of the variational autoencoder and a part that performs this process is called a decoder.
  • A general variational autoencoder learns the weight parameter θ of the neural network to minimize the following objective function.
  • [ Math . 12 ] L ( θ ) = T τ = 0 ( y t + τ - Ψ θ ( x t + τ ) 2 + KL [ N ( μ θ ( y t + τ ) , θ ( y t + τ ) "\[LeftBracketingBar]" N ( 0 , l ) ) ] )
  • Here, KL[A|B] represents the Kullback-Leibler divergence of distributions A and B, N(μ, σ) represents a distribution of mean μ and variance σ, and μθ(yt) and Σθ(yt) are mean and variance parameters that are estimated by giving an observed amount yt+τ to the neural network of weight parameter θ, respectively.
  • On the other hand, in the variational autoencoder of the present embodiment, the expansion coefficient B and the parameter K are also learned and optimized simultaneously with the weight parameter θ. That is, the following is minimized.
  • [ Math . 13 ] L ( B , K , θ ) = E [ T τ = 0 ( y t + τ - y ^ τ ( t ) 2 + KL [ N ( μ θ ( y t , y t + 1 ) , θ ( y t , y t + 1 ) ) "\[LeftBracketingBar]" N ( 0 , l ) ] ]
  • Here, {circumflex over ( )}yτ (t)=BKτ{circumflex over ( )}zt, and μθ(yt+τ, yt+τ+1) and Σ74(yt+τ, yt+τ+1) represent mean and variance parameters that are estimated by giving observed amounts yt+τ and yt+τ+1 to the neural network of the weight parameter θ, respectively. The difference between the first embodiment and the present embodiment includes the following two points.
  • The first embodiment assumes an inverse function, whereas the present embodiment assumes a probabilistic generative model.
  • The first embodiment minimizes the reconstruction error as an objective function, whereas the present embodiment adds a Kullback-Leibler divergence term for measuring the closeness of distributions to the reconstruction error.
  • Hereinafter, an estimation system that achieves the present embodiment will be described.
  • The estimation system according to the present embodiment includes a learning apparatus 100 and an estimation apparatus 200 (see FIG. 3 ).
  • The learning apparatus 100 executes the learning stage and the estimation apparatus 200 executes the estimation stage. First, the learning stage will be described.
  • Learning Stage
  • FIG. 16 illustrates a functional block diagram of the learning apparatus 100 and FIG. 5 is a flowchart of an example of its processing. The learning apparatus 100 includes an initialization unit 110, an estimation unit 120, an objective function calculation unit 130, and a parameter update unit 140.
  • The learning apparatus 100 takes series data {yt} of observed amounts for learning as an input and outputs parameters (wenc, wdec, K, B) which are learning results. wenc represents a parameter used in the encoder, wdec represents a parameter used in the decoder (a parameter of the basis function Ψ), K represents a parameter representing the time evolution, and B represents the expansion coefficient.
  • The learning stage is performed as follows.
  • Processing performed by the initialization unit 110 and the parameter update unit 140 is similar to that of the first embodiment and thus the description thereof will be omitted. However, the parameter update unit 140 receives the objective function L(B(k), K(k), θ(k)) instead of the objective function J(Θ(k)) to perform processing.
  • Estimation Unit 120
  • The estimation unit 120 takes the series data {yt} of observed amounts for learning and the initialized or updated parameters wenc (k), wdec (k), K(k), and B(k) as inputs to perform (1) estimation of mean and variance parameters of a state (to obtain estimated values at and μt), (2) estimation of the state (to obtain an estimated value xt), (3) estimation of a reconstructed basis function value (to obtain an estimated value {circumflex over ( )}zt), (4) prediction of a basis function (to obtain a predicted value {circumflex over ( )}zτ (t), and (5) prediction of an observed amount (to obtain a predicted value {circumflex over ( )}yτ (t)) (S120) and outputs the predicted value {circumflex over ( )}yτ (t). (1) and (2) will be described because (3) to (5) are similar to those of the first embodiment.
  • (1) First, the estimation unit 120 estimates mean and variance parameters of the state from series data {yt} of observed amounts for learning and the current parameter wenc (k) by using the neural network. That is, (σt, μt)=˜Ψ(yt: wenc (k)). In the present embodiment, the input may be two or more observed amounts yt for learning because the neural network is used. For example, the mean and vanance parameters of the state may be estimated from the two observed amounts yt and yt+1 for learning such that (σt, μt)=˜Ψ(yt, yt+1; wenc (k)). It is conceivable that estimating the mean and variance parameters of the state using two or more observed amounts for learning in this way makes it easier to identify the features of the state.
  • (2) Next, the state xt is sampled from a multivariate normal distribution according to (αt, μt).
  • Objective Function Calculation Unit 130
  • The objective function calculation unit 130 takes the series data {yt+τ} of observed amounts for learning, the series data of predicted values {{circumflex over ( )}yτ (t)}, and the parameter wenc (k) as inputs to calculate a value of the objective function L(B(k), K(k), θ(k)) (S130) and outputs the calculated value L(B(k), K(k), θ(k)). Here, θ(k) is a set of parameters wenc (k) and wdec (k).
  • [ Math . 14 ] L ( B , K , θ ) = E [ c T τ = 0 ( y t + τ - y ^ τ ( t ) 2 + dKL [ N ( μ θ ( y t , y t + 1 ) , θ ( y t , y t + 1 ) ) "\[LeftBracketingBar]" N ( 0 , l ) ] ]
  • Here, {circumflex over ( )}yτ (t)=B(k)K(k)τ{circumflex over ( )}zt. At least two or more time-series observed amounts yt and yτ+1 are required in order to update K(k) at the same time, and T is any integer of 1 or more. c and d are parameters for determining how much importance is placed on the terms and are appropriately set using experimental results and simulation results.
  • Estimation Stage
  • FIG. 7 illustrates a functional block diagram of the estimation apparatus 200 and FIG. 8 is a flowchart of an example of its processing. The estimation apparatus 200 includes an estimation unit 220.
  • The estimation apparatus 200 sets the parameters (wenc, wdec, K, B) received from the learning apparatus 100 in the estimation unit 220 prior to the estimation and prediction.
  • Estimate State from Observed Amount and Predict Observed Amount
  • When estimating a state from an observed amount and predicting observed amounts (in the case of the algorithm 2 of the first embodiment), the estimation apparatus 200 takes an observed amount yt as an input, estimates a state corresponding to the observed amount yt, predicts series data of observed amounts subsequent to yt, and outputs the estimated value xt and the predicted series data {{circumflex over ( )}yτ (t)}. For example, the estimation apparatus 200 may take series data yt, yt+1, . . . , and yt+N of observed amounts as an input and output series data xt, xt+1, . . . , xt+N of estimated values and N pieces of predicted series data {{circumflex over ( )}yτ (t)}, {{circumflex over ( )}yτ t+1)}, . . . , {{circumflex over ( )}yτ (t+N)}.
  • Estimation Unit 220
  • The estimation unit 220 takes an observed amount yt as an input and performs predetermined processes on the observed amount yt to estimate a state (S220). In the present embodiment, the predetermined processes are estimation of mean and variance parameters of a state (to obtain estimated values at and μt) and (2) estimation of the state (to obtain an estimated value xt). Further, the estimation unit 220 performs (3) estimation of a reconstructed basis function value (to obtain an estimated value {circumflex over ( )}zt), (4) prediction of a basis function (to obtain a predicted value {circumflex over ( )}zτ (t)), and (5) prediction of observed amounts (to obtain predicted values {circumflex over ( )}yτ (t)) (S220), and outputs the estimated value xt and series data of predicted values {{circumflex over ( )}yτ (t)}. (1) and (2) will be described because (3) to (5) are similar to those of the first embodiment.
  • (1) First, the estimation unit 220 estimates mean and variance parameters of the state from the observed amount yt and the parameter wenc by using the neural network. That is, (σt, μt)=˜Ψ(yt; wenc). In the present embodiment, the input may be two or more observed amounts yt and a number of observed amounts yt corresponding to the neural network learned by the learning apparatus 100 are taken as inputs.
  • (2) Next, the state xt is sampled from a multivariate normal distribution according to (σt, μt).
  • Predict Observed Amount from Observed Amount
  • The prediction of an observed amount from an observed amount (in the case of the algorithm 3 of the first embodiment) is similar to that of the first embodiment.
  • Advantages
  • The same advantages as those of the first embodiment can be achieved. By considering the generative model, the state is estimated without considering the inverse function. Here, the present embodiment may be combined with a modification of the first embodiment or the second embodiment.
  • Others
  • The estimation unit 220 of each of the first and third embodiments can also be represented by a functional block diagram of FIG. 17 . FIG. 18 is an flowchart of an example of the processing of the estimation unit 220.
  • The estimation unit 220 includes a state estimation unit 221, an observed amount estimation unit 222, and a future observed amount estimation unit 223. Further, the observed amount estimation unit 222 includes an intermediate value estimation unit 222A and an intermediate observed value estimation unit 222B. Processing performed by the state estimation unit 221 differs between the first and third embodiments. Processing performed by the observed amount estimation unit 222 and the future observed amount estimation unit 223 is the same between the first and third embodiments.
  • State Estimation Unit 221
  • The state estimation unit 221 estimates a state from an observed amount using the encoder of the autoencoder (S221) and outputs the estimated state.
  • In the first embodiment, the state estimation unit 221 receives the parameter wenc of the neural network and the expansion coefficient B prior to the estimation process. The state estimation unit 221 takes an observed amount yt as an input and estimates a basis function value zt using the expansion coefficient B. That is, zt=B+(yt). Furthermore, the state estimation unit 221 estimates the state such that xt−1 (zt: wenc) using the neural network of Expression (9).
  • In the third embodiment, the state estimation unit 221 receives the parameter wen of the neural network prior to the estimation process. The state estimation unit 221 takes one or more observed amounts yt as inputs and estimates mean and vanance parameters of a state from the one or more observed amounts yt and the parameter wenc using the neural network. For example, the state estimation unit 221 estimates mean and variance parameters of a state from two observed amounts yt and yt+1 such that (σt, μt)=˜Ψ(yt, yt+1; wenc). Next, the state estimation unit 221 estimates the state xt by sampling from a multivariate normal distribution according to (σt, μt).
  • Observed Amount Estimation Unit 222
  • The observed amount estimation unit 222 estimates an observed amount from the state using the decoder of the autoencoder (S222) and outputs the estimated observed amount.
  • In the case of the first embodiment, processing performed by the state estimation unit 221 is defined by a first function, processing performed by the observed amount estimation unit 222 is defined by a second function, and the first function is the inverse function of the second function.
  • Intermediate Value Estimation Unit 222A
  • The intermediate value estimation unit 222A receives the parameter wdec of the neural network prior to the estimation process.
  • The intermediate value estimation unit 222A takes the state xt as an input, estimates a reconstructed basis function value using the neural network of Expression (9) such that {circumflex over ( )}zt=Φ(xt; wdec) (S222A), and outputs the estimated value {circumflex over ( )}zt. Here, the estimated value of the reconstructed basis function value will also be referred to as an intermediate value.
  • Intermediate Observed Value Estimation Unit 222B
  • The intermediate observed value estimation unit 222B receives the expansion coefficient B prior to the estimation process.
  • The intermediate observed value estimation unit 222B takes the estimated value {circumflex over ( )}zt as an input, estimates an observed value from the estimated value {circumflex over ( )}zt (S222B), and outputs the estimated value {circumflex over ( )}yt. This corresponds to τ=0 in the following equations.

  • {circumflex over ( )}z τ (t) =K τ {circumflex over ( )}z t

  • {circumflex over ( )} (t)=B{circumflex over ( )}z τ (t)
  • That is, {circumflex over ( )}yt=B{circumflex over ( )}zt.
  • Future Observed Amount Estimation Unit 223
  • The future observed amount estimation unit 223 receives K and B prior to the estimation process.
  • The future observed amount estimation unit 223 estimates a future observed amount, which is a value to which the observed amount changes with time, using the parameter K representing the time evolution (S223) and outputs the estimated future observed amount.
  • First, the future observed amount estimation unit 223 obtains a future intermediate value {circumflex over ( )}zτ (t), which is a value to which the estimated value {circumflex over ( )}zt changes with time, using the parameter K.
  • That is, {circumflex over ( )}zτ (t)=Kτ{circumflex over ( )}zt. Here, τ=1, . . . , T.
  • Further, the future observed amount estimation unit 223 estimates a future observed amount {circumflex over ( )}yτ (t) from the future intermediate value {circumflex over ( )}zτ (t) using the expansion coefficient B. That is, {circumflex over ( )}yτ (t)=B{circumflex over ( )}zτ (t).
  • The estimation unit 120 of the learning apparatus of each of the first and third embodiments can be expressed in the same manner as described above. However, processing is performed using observed amounts for learning and parameters to be learned.
  • Hardware Configuration
  • Each of the learning apparatus and the estimation apparatus is, for example, a special apparatus formed by loading a special program into a known or dedicated computer having a central processing unit (CPU), a main storage device (a random access memory (RAM)), and the like. Each of the learning apparatus and the estimation apparatus executes, for example, each process under the control of the CPU. Data input to each of the learning apparatus and the estimation apparatus and data obtained through each process are stored, for example, in the main storage device, and the data stored in the main storage device is read out to the central processing unit as needed and used for other processing. Each processing unit of the learning apparatus and the estimation apparatus may be at least partially configured by hardware such as an integrated circuit. Each storage unit included in the learning apparatus and the estimation apparatus can be configured, for example, by a main storage device such as a random access memory (RAM) or by middleware such as a relational database or a key-value store. However, each storage unit does not necessarily have to be provided inside the learning apparatus and the estimation apparatus and may be configured by a hard disk, an optical disc, or an auxiliary storage device formed from a semiconductor memory device such as a flash memory and may be provided outside the learning apparatus and the estimation apparatus.
  • Other Modifications
  • The present invention is not limited to the above embodiments and modifications. For example, the various processes described above may be executed not only in chronological order as described but also in parallel or on an individual basis as necessary or depending on the processing capabilities of the apparatuses that execute the processing. In addition, appropriate changes can be made without departing from the spirit of the present invention.
  • Program and Recording Medium
  • The various processing functions of each device (or apparatus) described in the above embodiments and modifications may be implemented by a computer. In this case, the processing details of the functions that each device may have are described in a program. When the program is executed by a computer, the various processing functions of the device are implemented on the computer.
  • The program in which the processing details are described can be recorded on a computer-readable recording medium. The computer-readable recording medium can be any type of medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, or a semiconductor memory.
  • The program is distributed, for example, by selling, giving, or lending a portable recording medium such as a DVD or a CD-ROM with the program recorded on it. The program may also be distributed by storing the program in a storage device of a server computer and transmitting the program from the server computer to another computer through a network.
  • For example, a computer configured to execute such a program first stores, in its storage unit, the program recorded on the portable recording medium or the program transmitted from the server computer. Then, the computer reads the program stored in its storage unit and executes processing in accordance with the read program. In a different embodiment of the program, the computer may read the program directly from the portable recording medium and execute processing in accordance with the read program. The computer may also sequentially execute processing in accordance with the program transmitted from the server computer each time the program is received from the server computer. In another configuration, the processing may be executed through a so-called application service provider (ASP) service in which functions of the processing are implemented just by issuing an instruction to execute the program and obtaining results without transmission of the program from the server computer to the computer. The program includes information that is provided for use in processing by a computer and is equivalent to the program (such as data having properties defining the processing executed by the computer rather than direct commands to the computer).
  • In this mode, the device is described as being configured by executing the predetermined program on the computer, but at least a part of the processing may be implemented in hardware.

Claims (11)

1. An estimation apparatus comprising:
a state estimation unit configured to estimate a state from an observed amount using an encoder,
an observed amount estimation unit configured to estimate an observed amount from a state using the decoder, and
a future observed amount estimation unit configured to estimate a future observed amount using a parameter K representing time evolution, the future observed amount being a value to which the observed amount changes with time,
wherein a parameter of the encoder, a parameter of the decoder, and the parameter K are optimized simultaneously.
2. The estimation apparatus according to claim 1,
wherein processing performed by the state estimation unit is defined by a first function, processing performed by the observed amount estimation unit is defined by a second function, and the first function is an inverse function of the second function.
3. The estimation apparatus according to claim 1,
wherein the parameter of the encoder, the parameter of the decoder, and the parameter K are optimized by a variational autoencoder that uses the state estimation unit as an encoder and the observed amount estimation unit as a decoder, and
the observed amount is a time series observed amount.
4. The estimation apparatus according to claim 1,
wherein the observed amount estimation unit includes:
an intermediate value estimation unit configured to estimate an intermediate value from the state; and
an intermediate observed value estimation unit configured to estimate the observed value from the estimated intermediate value, and
the future observed amount estimation unit is configured to estimate the future observed amount from a future intermediate value, the future intermediate value being a value to which the intermediate value changes with time and being obtained using the parameter K.
5. A learning apparatus configured to learn parameters used in the estimation apparatus according to claim 2,
wherein the second function uses a parameter of a basis function, the parameter of the basis function being the parameter of the decoder of the autoencoder,
the first function uses a parameter of an inverse function of the basis function, the parameter of the inverse function of the basis function being the parameter of the encoder of the autoencoder,
the learning apparatus comprises:
an estimation unit configured to perform, using series data of an observed amount for learning, the parameter of the basis function, the parameter of the inverse function, the parameter K, and an expansion coefficient, (1) estimation of a value of the basis function, (2) estimation of a state, (3) estimation of a value of a reconstructed basis function, (4) prediction of the basis function, and (5) prediction of the observed amount;
an objective function calculation unit configured to obtain, using the series data of the observed amount for learning, series data of an estimated value of the basis function, series data of an estimated value of the state, series data of an estimated value of the reconstructed basis function, series data of a predicted value of the basis function, and series data of a predicted value of the observed amount, (i) a prediction error of the observed amount, (ii) a prediction error of the basis function, (iii) a regularization term for weights of a neural network based on the parameter of the basis function and the parameter of the inverse function, and (iv) a structure of the state, and obtain a value of an objective function from the obtained values; and
an update unit configured to update the parameter of the basis function, the parameter of the inverse function, the parameter K, and the expansion coefficient based on the objective function,
the state changes non-linearly with time, and
the observed amount is obtainable by performing an observation process of changing non-linearly with time as the state changes with time.
6. An estimation method comprising:
estimating a state from an observed amount using an encoder;
estimating an observed amount from a state using the decoder; and
estimating a future observed amount using a parameter K representing time evolution, the future observed amount being a value to which the observed amount changes with time,
wherein a parameter of the encoder, a parameter of the decoder, and the parameter K are optimized simultaneously.
7. A program for causing a computer to operate as the estimation apparatus according to claim 1.
8. A program for causing a computer to operate as the estimation apparatus according to claim 2.
9. A program for causing a computer to operate as the estimation apparatus according to claim 3.
10. A program for causing a computer to operate as the estimation apparatus according to claim 4.
11. A program for causing a computer to operate as the learning apparatus according to claim 5.
US17/425,684 2019-01-28 2019-08-30 Estimation apparatus, learning apparatus, methods and programs for the same Pending US20240028872A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2019-011981 2019-01-28
JP2019011981 2019-01-28
PCT/JP2019/034216 WO2020158032A1 (en) 2019-01-28 2019-08-30 Estimation device, learning device, method for these, and program

Publications (1)

Publication Number Publication Date
US20240028872A1 true US20240028872A1 (en) 2024-01-25

Family

ID=71840041

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/425,684 Pending US20240028872A1 (en) 2019-01-28 2019-08-30 Estimation apparatus, learning apparatus, methods and programs for the same

Country Status (3)

Country Link
US (1) US20240028872A1 (en)
JP (1) JP7163977B2 (en)
WO (1) WO2020158032A1 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10410113B2 (en) * 2016-01-14 2019-09-10 Preferred Networks, Inc. Time series data adaptation and sensor fusion systems, methods, and apparatus
EP3709271B1 (en) * 2016-09-15 2022-11-02 Google LLC Image depth prediction neural networks

Also Published As

Publication number Publication date
JPWO2020158032A1 (en) 2021-09-30
WO2020158032A1 (en) 2020-08-06
JP7163977B2 (en) 2022-11-01

Similar Documents

Publication Publication Date Title
JP6740247B2 (en) Anomaly detection system, anomaly detection method, anomaly detection program and learned model generation method
US20210019647A1 (en) Systems and methods for machine learning
CN108140146B (en) Discrete variational automatic encoder system and method using adiabatic quantum computer
Dong et al. Bearing degradation process prediction based on the PCA and optimized LS-SVM model
US11574164B2 (en) Neural network cooperation
US20210326728A1 (en) Anomaly detection apparatus, anomaly detection method, and program
JP5186322B2 (en) Time series data analysis system, method and program
CN114357594B (en) Bridge abnormity monitoring method, system, equipment and storage medium based on SCA-GRU
US20210089953A1 (en) Quantum bit prediction
JP6881207B2 (en) Learning device, program
KR102308751B1 (en) Method for prediction of precipitation based on deep learning
KR20210107491A (en) Method for generating anoamlous data
US20190385055A1 (en) Method and apparatus for artificial neural network learning for data prediction
WO2022009010A1 (en) Model fidelity monitoring and regeneration for manufacturing process decision support
Torzoni et al. A deep neural network, multi-fidelity surrogate model approach for Bayesian model updating in SHM
WO2020105468A1 (en) Information processing device, information processing system, information processing method, and non-transitory computer-readable medium having program stored therein
KR102362678B1 (en) Method for analyzing bio-signal
KR20190141581A (en) Method and apparatus for learning artificial neural network for data prediction
Luo et al. An improved recursive ARIMA method with recurrent process for remaining useful life estimation of bearings
CN113537614A (en) Construction method, system, equipment and medium of power grid engineering cost prediction model
JP2022070386A (en) Learning method, sequence analysis method, learning device, sequence analysis device, and program
US20240028872A1 (en) Estimation apparatus, learning apparatus, methods and programs for the same
CN117250914A (en) Method and system for training machine learning model based on measurement data captured by manufacturing process
EP4266209A1 (en) Anomaly detection method and apparatus for dynamic control system, and computer-readable medium
Miao et al. Predicting research of mechanical gyroscope life based on wavelet support vector

Legal Events

Date Code Title Description
AS Assignment

Owner name: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MURATA, SHIN;KOIZUMI, YUMA;HARADA, NOBORU;SIGNING DATES FROM 20201209 TO 20210112;REEL/FRAME:056967/0571

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION