WO2020250451A1 - Transfer learning apparatus, transfer learning system, method of transfer learning, and storage medium - Google Patents
Transfer learning apparatus, transfer learning system, method of transfer learning, and storage medium Download PDFInfo
- Publication number
- WO2020250451A1 WO2020250451A1 PCT/JP2019/024618 JP2019024618W WO2020250451A1 WO 2020250451 A1 WO2020250451 A1 WO 2020250451A1 JP 2019024618 W JP2019024618 W JP 2019024618W WO 2020250451 A1 WO2020250451 A1 WO 2020250451A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- inference
- time series
- model parameter
- model
- parameter data
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
Definitions
- the present invention relates to a transfer learning apparatus, a transfer learning system, a computer readable storage medium, and method for efficiently (machine) learning a joint model including an inference model and a time series model continuously from time series data.
- an inference model can be an object detection model used for detecting objects within an individual frame, and a time series model can be used for tracking object identities between frames.
- object detection models such as the one described in NPL1 are complex and incur substantial computational costs and latencies.
- Less complex detection models can achieve similarly high accuracy under limited conditions (e.g. fixed background, fixed time-of-day, etc.) at lower computational cost and latency when they are trained specifically for these limited conditions such as described for example in PTL1.
- the present disclosure aims to solve the problem of the incurred computational overhead of having to frequently train new specialized inference models according to changed conditions, and/or having to maintain and switch dynamically between a multitude of specialized inference models.
- One of the objectives of this invention is to provide a method for efficiently learning an inference model continuously from time series data, to the effect that the inference model dynamically adapts to changes of external conditions, such as background objects, lighting, etc.
- a time series model is used to provide means for estimating the magnitude of which parameters of the inference model would change according to input time slice data, i.e. the magnitude of the potential learning effect. Furthermore, the
- computationally intensive parameter update i.e. learning operation
- a threshold magnitude value i.e. only when the anticipated learning effect is considered high enough.
- a first example aspect of the present disclosure provides a transfer learning apparatus, including: an inference model parameter memory storing model parameter data associated with an inference model; a time series model memory storing model parameter data associated with a time series model and a state probability distribution; an inference unit configured to receive time slice data and configured to calculate an inference result vector from the time slice data and the parameter data stored in the inference model parameter memory; a time series model update unit configured to receive the inference result vector from the inference unit and configured to update the parameter data and the state probability distribution stored in the time series model memory; a gradient calculation unit configured to receive the inference result vector from the inference unit and parameter data from the time series model memory and calculate a gradient vector based on the inference result vector and the parameter data; a magnitude metric calculation unit configured to receive the gradient vector and calculate a magnitude metric value; and an inference model parameter update unit configured to update the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, if the magnitude metric value is higher than a
- a second example aspect of the present disclosure provides a transfer learning system, including a communication network; an inference model parameter memory storing model parameter data associated with an inference model; a time series model memory storing model parameter data associated with a time series model and a state probability distribution; an inference unit configured to receive time slice data and configured to calculate an inference result vector from the time slice data and the parameter data stored in the inference model parameter memory; a time series model update unit configured to receive the inference result vector from the inference unit and configured to update the parameter data and the state probability distribution stored in the time series model memory; a gradient calculation unit configured to receive the inference result vector from the inference unit and parameter data from the time series model memory and calculate a gradient vector based on the inference result vector and the parameter data; a magnitude metric calculation unit configured to receive the gradient vector and calculate a magnitude metric value; and an inference model parameter update unit configured to update the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, if the magnitude
- a third example aspect of the present disclosure provides a method of transfer learning including: calculating an inference result vector from time slice data and inference model parameter data; updating time series model parameter data from the inference result vector; updating a state probability distribution from the inference result vector; calculating a gradient vector from the time series model parameter data and the inference result vector; calculating a magnitude metric from the gradient vector; and updating the inference model parameter data from the gradient vector and the time slice data when the magnitude metric value is higher than a magnitude metric threshold.
- a fourth example aspect of the present disclosure provides a computer readable storage medium storing instructions to cause a computer to execute: calculating an inference result vector from time slice data and inference model parameter data; updating time series model parameter data from the inference result vector; updating a state probability distribution from the inference result vector; calculating a gradient vector from the time series model parameter data and the inference result vector; calculating a magnitude metric from the gradient vector; and updating the inference model parameter data from the gradient vector and the time slice data when the magnitude metric value is higher than a magnitude metric threshold.
- a single less complex inference model dynamically adapting to limited conditions by use of the present invention can achieve similar accuracy at computational costs which are substantially lower due to the learning operation only being performed selectively when the anticipated learning effect is considered high enough, i.e., greater than a predetermined threshold.
- Figure 1 is a block diagram showing the structure of the first and second embodiments of the present disclosure.
- Figure 2 is a block diagram showing the structure of the third and fourth embodiments of the present disclosure.
- Figure 3 is a flow diagram showing the operations of the first embodiment of the present disclosure.
- Figure 4 is a flow diagram showing the operation of the second embodiment of the present disclosure.
- Figure 5 is a block diagram showing the structure of the fifth embodiment of the present disclosure.
- Figure 6 is an overview of the structure in which transfer learning is provided at multiple locations over a communication network.
- Figure 7 is a block diagram showing the structure of an edge device.
- Figure 8 is a block diagram showing the transfer learning apparatus according the third embodiment of the present disclosure.
- time is broken down into slices indexed by t (time slices).
- Time slice data d t is data corresponding to a time slice t.
- the time slice data d t may be image frames from a surveillance video camera, installed, for example, in a retailer shop at a fixed angle recording customers.
- the time slice data d t may go through background changes, such as changes in the lighting, or positions of fixed objects like shelf products and boxes.
- the present embodiment uses an inference model , where
- d corresponds to input data, to model parameters, and , to the corresponding
- the inference model may have a structure of any linear or non-linear classification or regression model, including a convolution neural network (CNN) such as MobileNets and its variants.
- CNN convolution neural network
- the inference model may be, for example, MobileNet 224 (https://arxiv.org/abs/1704.04861 ’ ) having predetermined parameters.
- the initial model parameters of the inference model may be pre-trained using a conventional method such as supervised or unsupervised training using a training dataset designed for the inference task such as object detection, image captioning, natural language processing, or scene recognition, for example.
- the model structure and the initial model parameters may also be adopted from available public repositories of
- the inference model structure is a lightweight network such as
- the network in order for the network output inferences to be sufficiently accurate, the network should be re-trained, either online or offline, using time series data collected in the context of the specific deployment of the model at the time of the initial installation. This is to adapt the parameter values from those found e.g. in the public repositories, to values suitable to the deployment (background) during initial installation. For example, such context could correspond to a specific surveillance camera for an object detection task in a surveillance application. Furthermore, in order for the network output inferences to be sufficiently accurate even after background changes, the network should be re-trained either online or offline during normal operation. This is to adapt to background changes after initial installation, i.e. during normal operation.
- the time series model may be any state-based probabilistic model.
- the time series model may have a structure such as a hidden Markov model, a linear dynamic system state-space model, or a random finite set state-space model.
- recurrent neural networks can be used if their prediction output is interpreted as a probability distribution.
- the time series model may be pfe-trained using a publicly available dataset so that the trained model can predict the locations of detection targets such as humans within the image in the next time frame, when the model is provided with the locations of the detection targets in the current time frame.
- the time series model can be defined by a function g,
- time series model is a hidden Markov model
- a linear dynamic system state-space model or a random finite set state-space model, g(y, z’ ⁇ z, q ) can be written as the product of state transitions probabilities P(Z t
- a state z represents the locations and velocities of the tracked objects
- an inference observation y represents detected object
- the function g represents the modeled chance of objects moving .
- g may model motion noise, appearance
- the filtered state probability distribution p’(z’) is calculated by Bayesian inference as
- p’(z’) represents the posterior probability distribution of the object locations and velocities in the image, given the prior probability distributions p(z) and the observation y at time frame t.
- a loss function L is defined below.
- this loss function L represents the difference between (a) the object locations inferred by the image inference model from the video image at the current time frame and (b) the object locations inferred by the time series model based on the estimated probability distribution of the object locations and velocities in the previous time frame.
- the loss represents the unlikelihood of detecting locations y given the estimated distribution p of locations and velocities in the previous time frame.
- the structure of the first embodiment of this invention is displayed in the block diagram in Fig. 1, and constitutes the basic structure of this invention. In the following, the responsibilities of each unit contained in this embodiment are described.
- time slice data d t triggers a new operation of the transfer learning apparatus.
- time slice data can be image data.
- the camera be stationary, so that the video background change occurs only gradually.
- An inference unit 101 calculates an inference result vector
- the inference result may represent the locations of the detected objects and object classes
- y j and q respectively denote the detected location and class (such as a person, a vehicle, etc.) of the i-th detected object.
- the inference model parameter memory 102 stores parameter data
- the transfer learning apparatus may also be updated between individual operations of the transfer learning apparatus according to the rules governing whether or not to update the model (such rules to be discussed later).
- the number of parameters is of the orders
- the transfer apparatus transfer learning apparatus has an inference result vector 103 representing the inference result as, for example, numbers and locations of detected objects.
- a time series model update unit 104 retrieves the state probability distribution p(z) and parameters q stored in the time series model memory 105 (sometimes referred to as“time series model parameter/state memory”), updates parameters q stored in the time series model memory 105 as
- a time series model memory 105 stores parameter data q for the time series model and a state distribution p(z) associated with the time series model. The parameter persists between arrivals of time series data slices 100.
- a gradient calculation unit 106 retrieves the state probability distribution p(z) and parameters q stored in time series model memory 105, calculates a gradient vector where y is the inference result vector 103. This gradient vector
- this gradient vector tends to yield larger elements.
- the gradient vector 107 consists of the partial derivative of the loss L with respect to all components of the inference vector y.
- the device of the present embodiment determines whether or not to update the model parameters depending on the significance or magnitude of the update that is about to be made using the current time series data slice 100. This determination is made for example based on the gradient vector 107. In the present embodiment, this
- determination may be performed, as explained below, by calculating a magnitude metric from the gradient vector 107 and comparing the gradient magnitude with a threshold, and perform an update of the model when the magnitude is larger than the threshold.
- a magnitude metric calculation unit 108 calculates the magnitude metric value
- m h(w), where w is the gradient vector 107 and h(w) is a magnitude metric function to calculate the magnitude of the gradient vector 107.
- the magnitude metric function h(w) may be chosen from any vector magnitude metric function, for example, but not necessarily, an L1, L2, or Max function. If the metric function h(w) is L2, then
- a magnitude metric value 109 represents the magnitude of the gradient, which represents the unexpectedness, i.e., significance, of the update being made based on the current time frame data.
- the pre-trained model produces a high-magnitude gradient of the loss, it is likely that this is caused by some misdetection or detection noises due to background change, such as a recent lighting change.
- the gradient from the current time frame should be efficiently used for the model updating, in order to reduce similar misdetections or noises in the future.
- a magnitude metric threshold value 110 may be determined emphatically.
- the inference model parameter update unit 111 updates the parameters f stored in the inference model parameter memory 102 as where T k are fixed parameters
- d is the time slice data 100 and w is the gradient vector 107.
- step S200 the time series data slice 100 d— d t for some time slice t is received.
- step S202a the time series model update unit 104 retrieves the state probability distribution p(z) and parameters q stored in the time series model memory
- step S202b the time series model update unit 104 retrieves the state probability distribution p(z) and parameters q stored in time series model memory 105, and then updates the state probability distribution by Bayesian inference as
- step S203 the gradient calculation unit 106 retrieves the state probability distribution p(z) and parameters q stored in the time series model memory 105, calculates a gradient vector W j 107 as
- step S205 if magnitude metric value 109 is above magnitude metric threshold value 110, execution proceeds-to step S206, else it proceeds to step S207.
- step S206 the inference model parameter update unit 111 updates the parameters f stored in inference model parameter memory 102
- d is the time slice data 100 and w is the gradient vector 107.
- the magnitude metric calculation unit 108 calculates the magnitude metric value
- Fig. 6 shows an example system diagram in which the transfer learning apparatus of the present disclosure may be applied for time series data analysis at, for example, a plurality of locations (e.g., supermarkets, convenience stores, stadiums, warehouse, etc.) with a plurality of sensors 305, such as cameras, audio recording devices, etc.
- the transfer learning apparatus is part of a cloud computing environment 310 and is able to perform processing of time slice data 100 for each of the locations which are equipped with an edge device 300 and one or more sensors 305 as shown, for example, in Fig. 7.
- a tracking data generation unit 112 is provided, as shown in Fig. 8, in order to output object tracking data, for example, back to the respective edge devices of the respective locations.
- the exemplary embodiments may include a central processing unit (CPU), and as the memory, a random access memory (RAM) may be used.
- a hard disk drive (HDD), a solid state drive (SSD), etc. may be used.
- the edge device 300 may include, for example, a communication I/F 301
- the controller includes a controller 302, storage 303, and a sensor I/F 304.
- the controller includes
- the storage 303 may be a storage medium such as an HDD and an SSD.
- the communication I/F 301 has general functions for communicating with cloud computing environment 310 via the communication network.
- the sensor I/F has general functions for instructing operations to the sensor 305 and retrieve detected
- the edge device 300 has at least a computing function, a communication gateway function, and a storage function.
- these functions of the edge device are relatively less performance intensive as compared with those of a high-end personal computer and also those of the cloud computing environment due to, for example, commercial reasons (i.e. cost) with regard to the edge device 300.
- edge device 300 may be merely part of a POS (point of sale) system.
- the embodiments are intended to be used with training being performed online.
- batch training is also possible depending on design specifications.
- One example of an object to be tracked could be human beings, and the objective may be to track the number of individuals in a store at any given time.
- the disclosed invention can be applied to the computer vision task of tracking objects from video data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a transfer learning apparatus, system, method, and storage medium able to reduce the incurred computational overhead of having to frequently train new specialized inference models and a time series models continuously from time series data.
Description
[DESCRIPTION]
[Title of the Invention]
TRANSFER LEARNING APPARATUS, TRANSFER LEARNING SYSTEM,
METHOD OF TRANSFER LEARNING, AND STORAGE MEDIUM
[Technical Field]
[0001]
The present invention relates to a transfer learning apparatus, a transfer learning system, a computer readable storage medium, and method for efficiently (machine) learning a joint model including an inference model and a time series model continuously from time series data.
[Background Art]
[0002]
Applications frequently combine an inference model with a time series model to analyze time series data. For example, when time series data consists of frames from a video stream, an inference model can be an object detection model used for detecting objects within an individual frame, and a time series model can be used for tracking object identities between frames. However, high-accuracy object detection models such as the one described in NPL1 are complex and incur substantial computational costs and latencies. Less complex detection models can achieve similarly high accuracy under limited conditions (e.g. fixed background, fixed time-of-day, etc.) at lower computational cost and latency when they are trained specifically for these limited conditions such as described for example in PTL1. However, when analyzing time series data such as frames from a video stream, and conditions such as background are expected to be transient, then the usage of such specialized models creates the additional problems of having to frequently train new specialized models according to the changed conditions,
and/or having to maintain and switch dynamically between a multitude of specialized . models by detecting the current conditions and determining the specialized model best suited to the current conditions.
[Citation List]
[Patent Literature]
[0003]
[PTL 1]
US Patent Application Publication No. US20180005069A 1
[Non Patent Literature]
[0004]
[NPL 1]
“Focal Loss for Dense Object Detection”, Tsung-Yi Lin et. al., 2017 IEEE International
Conference on Computer Vision (ICCV).
[Summary of Invention]
[Technical Problem]
[0005]
The present disclosure aims to solve the problem of the incurred computational overhead of having to frequently train new specialized inference models according to changed conditions, and/or having to maintain and switch dynamically between a multitude of specialized inference models. One of the objectives of this invention is to provide a method for efficiently learning an inference model continuously from time series data, to the effect that the inference model dynamically adapts to changes of external conditions, such as background objects, lighting, etc.
[Solution to Problem]
[0006]
A time series model is used to provide means for estimating the magnitude of which parameters of the inference model would change according to input time slice data, i.e. the magnitude of the potential learning effect. Furthermore, the
computationally intensive parameter update, i.e. learning operation, is performed selectively according to the estimated magnitude of change and a threshold magnitude value, i.e. only when the anticipated learning effect is considered high enough.
[0007]
A first example aspect of the present disclosure provides a transfer learning apparatus, including: an inference model parameter memory storing model parameter data associated with an inference model; a time series model memory storing model parameter data associated with a time series model and a state probability distribution; an inference unit configured to receive time slice data and configured to calculate an inference result vector from the time slice data and the parameter data stored in the inference model parameter memory; a time series model update unit configured to receive the inference result vector from the inference unit and configured to update the parameter data and the state probability distribution stored in the time series model memory; a gradient calculation unit configured to receive the inference result vector from the inference unit and parameter data from the time series model memory and calculate a gradient vector based on the inference result vector and the parameter data; a magnitude metric calculation unit configured to receive the gradient vector and calculate a magnitude metric value; and an inference model parameter update unit configured to update the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, if the magnitude metric value is higher than a magnitude metric threshold.
[0008]
A second example aspect of the present disclosure provides a transfer learning system, including a communication network; an inference model parameter memory storing model parameter data associated with an inference model; a time series model memory storing model parameter data associated with a time series model and a state probability distribution; an inference unit configured to receive time slice data and configured to calculate an inference result vector from the time slice data and the parameter data stored in the inference model parameter memory; a time series model update unit configured to receive the inference result vector from the inference unit and configured to update the parameter data and the state probability distribution stored in the time series model memory; a gradient calculation unit configured to receive the inference result vector from the inference unit and parameter data from the time series model memory and calculate a gradient vector based on the inference result vector and the parameter data; a magnitude metric calculation unit configured to receive the gradient vector and calculate a magnitude metric value; and an inference model parameter update unit configured to update the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, if the magnitude metric value is higher than a magnitude metric threshold, wherein an edge device configured to provide time slice data through the communication network, the edge device decoding information from a sensor as the time slice data.
[0009]
A third example aspect of the present disclosure provides a method of transfer learning including: calculating an inference result vector from time slice data and inference model parameter data; updating time series model parameter data from the inference result vector; updating a state probability distribution from the inference result vector; calculating a gradient vector from the time series model parameter data and the
inference result vector; calculating a magnitude metric from the gradient vector; and updating the inference model parameter data from the gradient vector and the time slice data when the magnitude metric value is higher than a magnitude metric threshold.
[0010]
A fourth example aspect of the present disclosure provides a computer readable storage medium storing instructions to cause a computer to execute: calculating an inference result vector from time slice data and inference model parameter data; updating time series model parameter data from the inference result vector; updating a state probability distribution from the inference result vector; calculating a gradient vector from the time series model parameter data and the inference result vector; calculating a magnitude metric from the gradient vector; and updating the inference model parameter data from the gradient vector and the time slice data when the magnitude metric value is higher than a magnitude metric threshold.
-[Advantageous Effects of the Invention]
[0011]
When compared to using a single static but complex general inference model of high accuracy, a single less complex inference model dynamically adapting to limited conditions by use of the present invention can achieve similar accuracy at computational costs which are substantially lower due to the learning operation only being performed selectively when the anticipated learning effect is considered high enough, i.e., greater than a predetermined threshold.
[Brief Description of Drawings]
[0012]
[Fig. 1]
Figure 1 is a block diagram showing the structure of the first and second
embodiments of the present disclosure.
[Fig. 2]
Figure 2 is a block diagram showing the structure of the third and fourth embodiments of the present disclosure.
[Fig. 3]
Figure 3 is a flow diagram showing the operations of the first embodiment of the present disclosure.
[Fig. 4]
Figure 4 is a flow diagram showing the operation of the second embodiment of the present disclosure.
[Fig. 5]
Figure 5 is a block diagram showing the structure of the fifth embodiment of the present disclosure.
[Fig. 6]
Figure 6 is an overview of the structure in which transfer learning is provided at multiple locations over a communication network.
[Fig- 7]
Figure 7 is a block diagram showing the structure of an edge device.
[Fig. 8]
Figure 8 is a block diagram showing the transfer learning apparatus according the third embodiment of the present disclosure.
[EXAMPLE EMBODIMENTS]
[0013]
Example embodiments of the present invention are described in detail below with reference to the accompanying drawings. In the drawings, the same elements are
denoted by the same reference numerals, and thus redundant descriptions are omitted as needed.
[0014]
Reference throughout this specification to“one embodiment”,“an
embodiment”,“one example” or“an example” means that a particular feature, structure or characteristic described in connection with the embodiment or example is included in at least one embodiment of the present embodiments. Thus, appearances of the phrases
“in one embodiment”,“in an embodiment”,“one example” or“an example” in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures or characteristics may be combined in any suitable combinations and/or sub-combinations in one or more embodiments or examples.
(First Example Embodiment)
Before explaining the structure and operation of the first example embodiment, some terms will be defined and some assumptions will be provided.
[0015]
In the following descriptions time is broken down into slices indexed by t (time slices).
[0016]
Time slice data dt is data corresponding to a time slice t. The time slice data dt may be image frames from a surveillance video camera, installed, for example, in a retailer shop at a fixed angle recording customers. The time slice data dt may go through background changes, such as changes in the lighting, or positions of fixed objects like shelf products and boxes.
[0017]
The present embodiment uses an inference model , where
inference result vector.
[0018]
The inference model may have a structure of any linear or non-linear classification or regression model, including a convolution neural network (CNN) such as MobileNets and its variants. For the surveillance camera embodiment, the inference model may be, for example, MobileNet 224 (https://arxiv.org/abs/1704.04861’) having predetermined parameters.
[0019]
The initial model parameters
of the inference model may be pre-trained using a conventional method such as supervised or unsupervised training using a training dataset designed for the inference task such as object detection, image captioning, natural language processing, or scene recognition, for example. The model structure and the initial model parameters may also be adopted from available public repositories of
trained networks. When the inference model structure is a lightweight network such as
MobileNets, in order for the network output inferences to be sufficiently accurate, the network should be re-trained, either online or offline, using time series data collected in the context of the specific deployment of the model at the time of the initial installation. This is to adapt the parameter values from those found e.g. in the public repositories, to values suitable to the deployment (background) during initial installation. For example, such context could correspond to a specific surveillance camera for an object detection task in a surveillance application. Furthermore, in order for the network output inferences to be sufficiently accurate even after background changes, the network should be re-trained either online or offline during normal operation. This is to adapt to
background changes after initial installation, i.e. during normal operation.
[0020]
A probabilistic time series model ) modeling inference
observations Y1:t and states Z1:t is given model parameters q. The time series model may be any state-based probabilistic model. The time series model may have a structure such as a hidden Markov model, a linear dynamic system state-space model, or a random finite set state-space model. Alternatively, recurrent neural networks can be used if their prediction output is interpreted as a probability distribution.
[0021]
For this example embodiment, description will be given as applied to an object tracking surveillance camera although the present invention is not limited thereto. The time series model may be pfe-trained using a publicly available dataset so that the trained model can predict the locations of detection targets such as humans within the image in the next time frame, when the model is provided with the locations of the detection targets in the current time frame.
[0022]
representing the joint probability of observing inference y and a state transition to state z’ at time t, given that the time series state at time t-1 is z, under the time series model parameters q.
[0023]
For example, when the time series model is a hidden Markov model, a linear dynamic system state-space model or a random finite set state-space model, g(y, z’\z, q ) can be written as the product of state transitions probabilities P(Zt|Zt_1 q) and
observation probabilities P(Yt\Zt, q),
For the surveillance-camera embodiment using a random finite set state-space model as time series model, a state z represents the locations and velocities of the tracked objects, , and an inference observation y represents detected object
to different locations and different velocities
and of detecting locations Especially, g may model motion noise, appearance
or disappearance of objects, as well as detection noise and probabilities of false
positive/false negative detections.
[0024]
Given a state probability distribution p(z) and observation data y, the filtered state probability distribution p’(z’) is calculated by Bayesian inference as
In the present embodiment, p’(z’) represents the posterior probability distribution of the object locations and velocities in the image, given the prior probability distributions p(z) and the observation y at time frame t.
[0025]
In the present embodiment, this loss function L represents the difference between (a) the object locations inferred by the image inference model from the video image at the current time frame and (b) the object locations inferred by the time series model based on
the estimated probability distribution of the object locations and velocities in the previous time frame. In other words, the loss represents the unlikelihood of detecting locations y given the estimated distribution p of locations and velocities in the previous time frame.
[0026]
The structure of the first embodiment of this invention, a transfer learning apparatus, is displayed in the block diagram in Fig. 1, and constitutes the basic structure of this invention. In the following, the responsibilities of each unit contained in this embodiment are described.
[0027]
The data d=dt that corresponds to time slice t is received from the outside as input or from a memory, and is read or received in succession, Each
time slice data dt triggers a new operation of the transfer learning apparatus. For the present embodiment, for example in the surveillance application, time slice data can be image data. In order for the online/offline training to reasonably converge, it is preferable that the camera be stationary, so that the video background change occurs only gradually.
[0028]
data ø stored in inference model parameter memory 102. In the surveillance camera embodiment, the inference result may represent the locations of the detected objects and object classes,
where yj and q respectively denote the detected location and class (such as a person, a vehicle, etc.) of the i-th detected object.
[0029]
The inference model parameter memory 102 stores parameter data
updates and may also be updated between individual operations of the transfer learning apparatus according to the rules governing whether or not to update the model (such rules to be discussed later). For typical models for object detection in image data, the number of parameters is of the orders
[0030]
The transfer apparatus transfer learning apparatus has an inference result vector 103 representing the inference result as, for example, numbers and locations of detected objects.
[0031]
A time series model update unit 104 (sometimes referred to as“time series model parameter/state update unit”) retrieves the state probability distribution p(z) and parameters q stored in the time series model memory 105 (sometimes referred to as“time series model parameter/state memory”), updates parameters q stored in the time series model memory 105 as
where are some fixed parameters controlling the learning speed and y is the inference result vector 103. Given y, the detected object locations inferred from the new video image, the parameters q and the estimated distribution of locations and velocities p(z)
are updated using these equation's.
[0032]
A time series model memory 105 stores parameter data q for the time series model and a state distribution p(z) associated with the time series model. The parameter persists between arrivals of time series data slices 100.
[0033]
A gradient calculation unit 106 retrieves the state probability distribution p(z) and parameters q stored in time series model memory 105, calculates a gradient vector where y is the inference result vector 103. This gradient vector
corresponds to the gradient (i.e., the partial derivative) of the loss L with respect to each component of the inference vector y. In the surveillance camera embodiment, when the observed change of the inferred object locations in the current video frame are totally unexpected based on the prediction of the time series model, then this gradient vector tends to yield larger elements.
[0034]
The gradient vector 107 consists of the partial derivative of the loss L with respect to all components of the inference vector y.
[0035]
The device of the present embodiment determines whether or not to update the model parameters depending on the significance or magnitude of the update that is about to be made using the current time series data slice 100. This determination is made for example based on the gradient vector 107. In the present embodiment, this
determination may be performed, as explained below, by calculating a magnitude metric from the gradient vector 107 and comparing the gradient magnitude with a threshold, and
perform an update of the model when the magnitude is larger than the threshold.
[0036]
A magnitude metric calculation unit 108 calculates the magnitude metric value
109, m = h(w), where w is the gradient vector 107 and h(w) is a magnitude metric function to calculate the magnitude of the gradient vector 107. The magnitude metric function h(w) may be chosen from any vector magnitude metric function, for example, but not necessarily, an L1, L2, or Max function. If the metric function h(w) is L2, then
[0037]
A magnitude metric value 109 represents the magnitude of the gradient, which represents the unexpectedness, i.e., significance, of the update being made based on the current time frame data. In the surveillance camera scenario, if the pre-trained model produces a high-magnitude gradient of the loss, it is likely that this is caused by some misdetection or detection noises due to background change, such as a recent lighting change. In this case, the gradient from the current time frame should be efficiently used for the model updating, in order to reduce similar misdetections or noises in the future.
On the other hand, if the magnitude is low, the video frame probably did not experience any background change. In this case, running a model parameter update (which consumes significant computational resources) would not significantly improve the accuracy of the model, which should be avoided.
[0038]
A magnitude metric threshold value 110 may be determined emphatically.
[0039]
For an inference model parameter update unit 111, if magnitude metric value
109 is above the magnitude metric threshold value 110, the inference model parameter update unit 111 updates the parameters f stored in the inference model parameter memory 102 as where Tk are fixed parameters
controlling the learning speed, d is the time slice data 100 and w is the gradient vector 107.
[0040]
In the following, the operation of the apparatus depicted in Fig. 1 is explained according to the flow diagram in Fig. 3 as a series of steps.
[0041]
In step S200, the time series data slice 100 d— dt for some time slice t is received.
[0042]
In step S201, the inference unit 101 calculates the inference result vector, y = from time slice data 100 and the model parameter data stored in the
inference model parameter memory 102.
[0043]
In step S202a, the time series model update unit 104 retrieves the state probability distribution p(z) and parameters q stored in the time series model memory
105, and then updates parameters q stored in time series model memory 105 as
where y is the inference result vector 103.
[0044]
In step S202b, the time series model update unit 104 retrieves the state probability distribution p(z) and parameters q stored in time series model memory 105, and then updates the state probability distribution by Bayesian inference as
where y is the inference result vector 103.
[0045]
In step S203, the gradient calculation unit 106 retrieves the state probability distribution p(z) and parameters q stored in the time series model memory 105, calculates a gradient vector Wj 107 as
where for y is the inference result vector 103.
[0046]
In step S204, the magnitude metric calculation unit 108 calculates the magnitude metric value m 109, m = h(w), where w is the gradient vector 107.
[0047]
In step S205, if magnitude metric value 109 is above magnitude metric threshold value 110, execution proceeds-to step S206, else it proceeds to step S207.
[0048]
In step S206, the inference model parameter update unit 111 updates the parameters f stored in inference model parameter memory 102
where d is the time slice data 100 and w is the gradient vector 107.
In step S207, processing for time slice t is finished, and execution stops until another time series data slice 100 d = dt+1 for time slice t+1 is received.
(Second Example Embodiment)
[0049]
The apparatus from Fig. 2, which corresponds to the apparatus from first example embodiment amended as follows:
[0050]
The time series model update unit 104 additionally calculates the loss value 111 as l = L(y\p, q), where y is the inference result vector (1003), and p(z) and 0 are the state probability distribution and the parameters retrieved from the time series model memory 105, respectively.
[0051]
The magnitude metric calculation unit 108 calculates the magnitude metric value
109, m = h’(w, 1), where h’(w, 1) is a function of gradient vector 107 and loss value 111.
[0052]
The loss value 111 is the value l = L(y\p, q).
[0053]
The flow of operation follows the sequence from Fig. 3, which is altered with respect to the following:
[0054]
In step 202a, a time series model update unit 104 additionally calculates the loss value 111 as l = L(y\p, q), where y is the inference result vector 103, and p(z) and 0 are the state probability distribution and the parameters retrieved from the time series model memory 105, respectively.
[0055]
In step S204, the magnitude metric calculation unit 108 calculates the magnitude metric value 109, m = h’(w, 1), where h’(w, 1) is a function of the gradient vector 107 and loss value 111.
(Third Example Embodiment)
[0056]
In this third example embodiment, a description will be provided in accordance with either of the first and second example embodiments with the following additions and modifications with reference to Figs. 6-8. Redundant descriptions of components previously described in the first and second example embodiments will be omitted.
[0057]
Fig. 6 shows an example system diagram in which the transfer learning apparatus of the present disclosure may be applied for time series data analysis at, for example, a plurality of locations (e.g., supermarkets, convenience stores, stadiums, warehouse, etc.) with a plurality of sensors 305, such as cameras, audio recording devices, etc. In this example, the transfer learning apparatus is part of a cloud computing environment 310 and is able to perform processing of time slice data 100 for each of the locations which are equipped with an edge device 300 and one or more sensors 305 as shown, for example, in Fig. 7.
[0058]
In addition to the features of either the first or second example embodiments, a tracking data generation unit 112 is provided, as shown in Fig. 8, in order to output object tracking data, for example, back to the respective edge devices of the respective locations.
As shown in Fig. 7, the exemplary embodiments may include a central processing unit (CPU), and as the memory, a random access memory (RAM) may be used. As the storage device, a hard disk drive (HDD), a solid state drive (SSD), etc. may be used.
[0059]
With reference to Fig. 7, an exemplary structure of the edge device 300 will now be described. The edge device may include, for example, a communication I/F 301
(interface), a controller 302, storage 303, and a sensor I/F 304. The controller includes
CPU and memory. The storage 303 may be a storage medium such as an HDD and an SSD. The communication I/F 301 has general functions for communicating with cloud computing environment 310 via the communication network. The sensor I/F has general functions for instructing operations to the sensor 305 and retrieve detected
(sensed) information from the sensor 305. In other words, the edge device 300 has at least a computing function, a communication gateway function, and a storage function. However, it may be assumed that these functions of the edge device are relatively less performance intensive as compared with those of a high-end personal computer and also those of the cloud computing environment due to, for example, commercial reasons (i.e. cost) with regard to the edge device 300.
[0060]
It should be noted that the edge device 300 may be merely part of a POS (point of sale) system.
(Other Modifications)
[0061]
While the preferred example embodiments of the present invention have been described above, it is to be understood that the present invention is not limited to the example embodiments above and that further modifications, replacements, and adjustments may be added without departing from the basic technical concept of the present invention.
[0062]
In the first and second example embodiments, descriptions are given in accordance with the flow chart shown in Fig. 3. However, the present invention is not limited to this sequence of operations and may instead operate, for example, in accordance with the flow chart shown in Fig. 4.
[0063]
In the present disclosure, the embodiments are intended to be used with training being performed online. However, batch training is also possible depending on design specifications.
[0064]
One example of an object to be tracked could be human beings, and the objective may be to track the number of individuals in a store at any given time.
[Industrial Applicability]
The disclosed invention can be applied to the computer vision task of tracking objects from video data.
[Reference Signs List]
[0065]
100 Image Data
101 Inference Unit
102 Inference Model Parameter Memory
103 Inference Result Vector
104 Time Series Model Update Unit
105 Time series model memory
106 Gradient Calculation Unit
107 Gradient Vector
108 Magnitude Metric Calculation Unit
109 Magnitude Metric Value
110 Magnitude Metric Threshold Value
111 Inference Model Parameter Update Unit 112 Tracking Data Generation Unit
150 Object Detection Unit
151 Obj ect Tracking Unit
300 Edge Device
301 Communication I/F
302 Controller
303 Storage
304 Sensor I/F
305 Sensor
310 Cloud Computing Environment
Claims
[Claim 1]
A transfer learning apparatus, comprising:
an inference model parameter memory storing model parameter data associated with an inference model;
a time series model memory storing model parameter data associated with a time series model and a state probability distribution;
an inference unit configured to receive time slice data and configured to calculate an inference result vector from the time slice data and the parameter data stored in the inference model parameter memory;
a time series model update unit configured to receive the inference result vector from the inference unit and configured to update the parameter data and the state probability distribution stored in the time series model memory;
a gradient calculation unit configured to receive the inference result vector from the inference unit and parameter data from the time series model memory and calculate a gradient vector based on the inference result vector and the parameter data;
a magnitude metric calculation unit configured to receive the gradient vector and calculate a magnitude metric value; and
an inference model parameter update unit configured to update the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, if the magnitude metric value is higher than a magnitude metric threshold.
[Claim 2]
The transfer learning apparatus of claim 1 , wherein
the time series model update unit is further configured to calculate a loss value from the time series model parameter data and the inference result vector, and
the magnitude metric calculation unit calculates the magnitude metric value based on both of the lose value and the gradient vector.
[Claim 3]
The transfer learning apparatus of claim 1 or claim 2, wherein
the time series model update unit updates the state probability distribution stored in the time series model memory from the inference result vector at a time before the inference model parameter update unit determines whether or not the magnitude metric value is higher than the magnitude metric threshold.
[Claim 4]
The transfer learning apparatus of claim 1 or claim 2, wherein
if the inference model parameter update unit determines that the magnitude metric value is higher than the magnitude metric threshold and updates the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, the inference unit recalculates the inference result vector and the time series model update unit updates the state probability distribution, and
if the inference model parameter update unit determines that the magnitude metric value is less than or equal to the magnitude metric threshold, the time series model update unit updates the state probability distribution.
[Claim 5]
A transfer learning system, comprising
a communication network;
an inference model parameter memory storing model parameter data associated with an inference model;
a time series model memory storing model parameter data associated with a time series model and a state probability distribution;
an inference unit configured to receive time slice data and configured to calculate an inference result vector from the time slice data and the parameter data stored in the inference model parameter memory;
a time series model update unit configured to receive the inference result vector from the inference unit and configured to update the parameter data and the state probability distribution stored in the time series model memory;
a gradient calculation unit configured to receive the inference result vector from the inference unit and parameter data from the time series model memory and calculate a gradient vector based on the inference result vector and the parameter data;
a magnitude metric calculation unit configured to receive the gradient vector and calculate a magnitude metric value; and
an inference model parameter update unit configured to update the inference model parameter data stored in the inference model parameter memory based on the gradient vector and the time slice data, if the magnitude metric value is higher than a magnitude metric threshold, wherein
an edge device configured to provide time slice data through the communication network, the edge device decoding information from a sensor as the time slice data.
[Claim 6]
A method of transfer learning comprising, in order:
calculating an inference result vector from time slice data and inference model parameter data;
updating time series model parameter data from the inference result vector; updating a state probability distribution from the inference result vector;
calculating a gradient vector from the time series model parameter data and the inference result vector;
calculating a magnitude metric from the gradient vector; and
updating the inference model parameter data from the gradient vector and the time slice data when the magnitude metric value is higher than a magnitude metric threshold.
[Claim 7]
A computer readable storage medium storing instructions for causing a computer to execute:
calculating an inference result vector from time slice data and inference model parameter data;
updating time series model parameter data from the inference result vector; updating a state probability distribution from the inference result vector;
calculating a gradient vector from the time series model parameter data and the inference result vector;
calculating a magnitude metric from the gradient vector; and
updating the inference model parameter data from the gradient vector and the time slice data when the magnitude metric value is higher than a magnitude metric
threshold.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022522084A JP7255753B2 (en) | 2019-06-14 | 2019-06-14 | TRANSFER LEARNING APPARATUS, TRANSFER LEARNING SYSTEM, TRANSFER LEARNING METHOD, AND PROGRAM |
PCT/JP2019/024618 WO2020250451A1 (en) | 2019-06-14 | 2019-06-14 | Transfer learning apparatus, transfer learning system, method of transfer learning, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2019/024618 WO2020250451A1 (en) | 2019-06-14 | 2019-06-14 | Transfer learning apparatus, transfer learning system, method of transfer learning, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020250451A1 true WO2020250451A1 (en) | 2020-12-17 |
Family
ID=67226322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/024618 WO2020250451A1 (en) | 2019-06-14 | 2019-06-14 | Transfer learning apparatus, transfer learning system, method of transfer learning, and storage medium |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7255753B2 (en) |
WO (1) | WO2020250451A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022099327A (en) * | 2020-12-22 | 2022-07-04 | 株式会社リコー | Pre-trained language model, apparatus, and computer-readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180005069A1 (en) | 2016-06-29 | 2018-01-04 | Kabushiki Kaisha Toshiba | Information Processing Apparatus and Information Processing Method |
US20190180469A1 (en) * | 2017-12-08 | 2019-06-13 | Nvidia Corporation | Systems and methods for dynamic facial analysis using a recurrent neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6679086B2 (en) * | 2015-11-11 | 2020-04-15 | 国立研究開発法人情報通信研究機構 | Learning device, prediction device, learning method, prediction method, and program |
-
2019
- 2019-06-14 WO PCT/JP2019/024618 patent/WO2020250451A1/en active Application Filing
- 2019-06-14 JP JP2022522084A patent/JP7255753B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180005069A1 (en) | 2016-06-29 | 2018-01-04 | Kabushiki Kaisha Toshiba | Information Processing Apparatus and Information Processing Method |
US20190180469A1 (en) * | 2017-12-08 | 2019-06-13 | Nvidia Corporation | Systems and methods for dynamic facial analysis using a recurrent neural network |
Non-Patent Citations (2)
Title |
---|
REN ZIHAN ET AL: "A Real-Time Suspicious Stay Detection System Based on Face Detection and Tracking in Monitor Videos", 2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), IEEE, vol. 1, 9 December 2017 (2017-12-09), pages 264 - 267, XP033311979, DOI: 10.1109/ISCID.2017.150 * |
TSUNG-YI LIN: "Focal Loss for Dense Object Detection", IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV, 2017 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022099327A (en) * | 2020-12-22 | 2022-07-04 | 株式会社リコー | Pre-trained language model, apparatus, and computer-readable storage medium |
JP7226514B2 (en) | 2020-12-22 | 2023-02-21 | 株式会社リコー | PRE-TRAINED LANGUAGE MODEL, DEVICE AND COMPUTER-READABLE STORAGE MEDIA |
Also Published As
Publication number | Publication date |
---|---|
JP7255753B2 (en) | 2023-04-11 |
JP2022536561A (en) | 2022-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6877630B2 (en) | How and system to detect actions | |
US10657386B2 (en) | Movement state estimation device, movement state estimation method and program recording medium | |
Ondruska et al. | Deep tracking: Seeing beyond seeing using recurrent neural networks | |
CA3041651A1 (en) | Vision based target tracking using tracklets | |
KR102465960B1 (en) | Multi-Class Multi-Object Tracking Method using Changing Point Detection | |
JP2018538631A (en) | Method and system for detecting an action of an object in a scene | |
JP4668360B2 (en) | Moving object detection method and moving object detection apparatus | |
Zhang et al. | A fast method for moving object detection in video surveillance image | |
JP2020109644A (en) | Fall detection method, fall detection apparatus, and electronic device | |
WO2020250451A1 (en) | Transfer learning apparatus, transfer learning system, method of transfer learning, and storage medium | |
Wang et al. | A novel probability model for background maintenance and subtraction | |
Tang et al. | Real-time detection of moving objects in a video sequence by using data fusion algorithm | |
Dorudian et al. | Moving object detection using adaptive blind update and RGB-D camera | |
Behera et al. | Estimation of linear motion in dense crowd videos using Langevin model | |
JP7006724B2 (en) | Classification device, classification method, and program | |
Verma et al. | Generation of future image frames using optical flow | |
Abdullahi et al. | Intelligent fuzzy network for dynamic sign words recognition from spatial features | |
Prashanth et al. | Reduction of sample impoverishment problem in particle filter for object tracking | |
Chraa Mesbahi et al. | Head gesture recognition using optical flow based background subtraction | |
EP4258184A1 (en) | Action sequence determination device, action sequence determination method, and action sequence determination program | |
US20220101101A1 (en) | Domain adaptation | |
US11451694B1 (en) | Mitigation of obstacles while capturing media content | |
US20230206694A1 (en) | Non-transitory computer-readable recording medium, information processing method, and information processing apparatus | |
Kim et al. | Crowd activity recognition using optical flow orientation distribution | |
EP4207101A1 (en) | Information processing program, information processing method, and information processing apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19737923 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022522084 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19737923 Country of ref document: EP Kind code of ref document: A1 |