LU100902B1

LU100902B1 - System including a device and a learning algorithm to perform a confidence measure-based classification or regression problem

Info

Publication number: LU100902B1
Application number: LU100902A
Authority: LU
Inventors: Da Cruz Steve Dias; Hans Peter Beise; Udo Schröder
Original assignee: Iee Sa
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2020-02-17
Also published as: WO2020030722A1

Abstract

A system including a device and at least one learning algorithm to perform a classification or regression problem for an input (x) to the device, comprises a confidence measure module arrangement used in combination with the learning algorithm to decide when the algorithm should be allowed to perform a decision or regression on the input (x) and when not, the confidence measure module arrangement comprising an MB -module as implementation of a machine learning based method for the classification or regression task with trainable parameters 9, a DQ -module as implementation of a machine learning based method (for instance artificial neural network) which is responsible to learn a representation of the training dataset with trainable parameters 0, and an E-module as implementation of a measure to determine how far the input x is from the training dataset using the information of 0$, wherein the D0 -module provides for an D^ 00-output of a model being used by the E-module to determine how different the input x is from what has been seen during training, the E-module provides for an E(D0(x),x)-output of a model, said output being combined with M0 to decide whether the model is allowed to perform an action (classification or regression), and the Me -module provides for an Me (x)-output of a classification or regression model which output will be the classification or regression based on the input x.

Description

System including a device and a learning algorithm to perform a confidence measure-based classification or regression problem Technical field

[0001] The invention concerns use of one or multiple machine learning algorithms (e.g. neural networks) for a device to perform, for example, a classification or regression problem, particularly for a device for which the connection to a remote server is not possible or not fast enough. More particularly the invention is directed to a system including a device and at least one learning algorithm to perform a classification or regression problem for an input to the device.

Background of the Invention

[0002] The decision process of machine learning based algorithms, and especially neural networks, can usually not be influenced once the algorithm has been implemented and the device is in the field. Consequently, particularly for safety critical sensor applications, there is no measure to decide on whether an algorithm should be allowed to take an action for a new input of the device. That is, there is no inherent measure that checks whether the model is trained in a way that it generalizes as expected to some new input of the device. Usually, the algorithm is forced to take an action for every input, although the algorithm and device has no possibility to detect whether the input is exotic, or simply whether the input is in some sense far from what it has seen during training. Since input data can be very different from the initial training dataset on which the algorithm was trained on, the algorithm does not know how to handle such input correctly and it will, for example, classify the input in most cases wrongly.

[0003] It would be desirable in connection with a system including a device and at least one learning algorithm to perform a classification or regression problem for an input to the device to have a possibility to check during the classification or regression task whether the algorithm is allowed to perform an action of the device or whether the algorithm should ask for human interaction or warn the system or a user of the device that the system is not able make a meaningful decision.

Object of the invention

[0004] It is therefore an object underlying the present invention to provide a system in question without at least some of the above described shortcomings.

[0005] This object is achieved by a system comprising the features of claim 1.

General Description of the Invention

[0006] The invention provides a system including a device and at least one learning algorithm to perform a classification or regression problem for an input (x) to the device, the system comprising a confidence measure module arrangement used in combination with the learning algorithm to decide when the algorithm should be allowed to perform a decision or regression on the input (x) and when not, the confidence measure module arrangement comprising: - an M, -module as implementation of a machine learning based method for the classification or regression task with trainable parameters 6, - a D, -module as implementation of a machine learning based method (for instance artificial neural network) which is responsible to learn a representation of the training dataset with trainable parameters ¢, and - an E-module as implementation of a measure to determine how far the input x is from the training dataset using the information of D,, wherein the D, -module provides for an Dy (x)-output of a model which learned the representation of the training dataset, said output being used by the E-module to determine how different the input x is from what has been seen during training, the E-module provides for an E(Dg (x), x)-output of a model which determines how far the input x is from the training dataset using D, (x), said output being combined with M, to decide whether the model is allowed to perform an action (classification or regression), and the M, -module provides for an M, (x)-output of a classification or regression model using the input x, said output in combination with the output of the D,- module and the output of E-module being adapted to decide to perform an action, then the output will be the classification or regression based on the input x.

[0007] In other words, this invention proposes to use a confidence measure module arrangement on which a low-dimensional and/or smaller (in size) representation of the training dataset is stored/encoded (e.g. by means of an own neural network). This module arrangement is operated in combination with the classification or regression algorithm (e.g. neural networks) in order to decide when the algorithm should be allowed to perform a decision or regression and when not. The classification or regression algorithm is adapted to learn during training how to efficiently store the training dataset and how to use it.

[0008] The system of the invention allows for the self-verification of the capacity of handling the input to a device correctly during lifetime, wherein input received during lifetime being compared with the data seen during training. To this end, the invention provides for training the model of the confidence measure module arrangement, which model which is configured to learn to represent the training data in a compact and useful way. This model can be used during the classification and regression task to determine if the model has the capacity to use and process the input correctly. The model can also be a part of the model that is trained for the actual classification/regression task.

[0009] There exists prior art taking into account a confidence measure to determine how likely the algorithm can take a decision. Examples of such prior art is disclosed in the US20110087627, US20170185893, US5052043, and US5912986.

[0010] US patent application US20110087627 discloses a system and method for generating a prediction using neural networks, training a plurality of neural networks with training data, calculating an output value for each of the plurality of neural networks based at least in part on input evaluation points, applying a weight to each output value based at least in part on a confidence value for each of the plurality of neural networks; and generating an output result.

[0011] US patent application US20170185893 discloses a computer-implemented method of incrementally training a confidence assessment module that calculates a confidence value indicative of the extent to which a code associated with a patients encounter with a healthcare organization is proper. The method comprises assessing, with the confidence assessment module, a training corpuscomprised of a plurality of coded encounters, to produce resultant confidence values associated with each encounter; comparing the resultant confidence values to a target confidence value; and, adjusting variables within the confidence assessment module to produce resultant confidence values closer to the target confidence value.

[0012] US patent US5052043 discloses a method, for a neural network, which through controlling back propagation and adjustment of neural weight and bias values through an output confidence measure rapidly and accurately adapts its response to actual changing input data. The results of an appropriate actual unknown input are used to adaptively re-train the network during pattern recognition. By limiting the maximum value of the output confidence measure at which this re-training will occur, the network re-trains itself only when the input has changed by a sufficient margin from initial training data such that this re-training is likely to produce a subsequent noticeable increase in the recognition accuracy provided by the network.

[0013] US patent US5912986 discloses a method for use in a neural network- based optical character recognition system for accurately classifying each individual character extracted from a string of characters. Specifically, a confidence measure, associated with each output of, e.g., a neural classifier, is generated through use of all the neural activation output values. Each individual neural activation output provides information for a corresponding atomic hypothesis of an evidence function. This hypothesis is that a pattern belongs to a particular class. Each neural output is transformed through a pre-defined monotonic function into a degree of support in its associated evidence function. These degrees of support are then combined through an orthogonal sum to yield a single confidence measure associated with the specific classification then being produced by the neural classifier.

[0014] Therefore, this prior art does not interfere with the approach of the present invention defined by claim 1.

[0015] Advantageous developments of the invention are defined in the dependent claims.

Brief Description of the Drawings

[0016] Further details and advantages of the present invention will be apparent from the following detailed description of not limiting embodiments with reference to the attached drawing, wherein: Fig.1 depicts a schematic of an embodiment of the confidence measure module arrangement of the system of the invention, and Fig.2 depicts a schematic of a further embodiment of the confidence measure module arrangement of the system of the invention.

Description of Preferred Embodiments

[0017] The invention provides a system including a device and at least one learning algorithm to perform a classification or regression problem for an input (x) to the device, the system comprising a confidence measure module arrangement used in combination with the learning algorithm to decide when the algorithm should be allowed to perform a decision or regression on the input (x) and when not. The confidence measure module arrangement comprises the following modules: - an Mg -module as implementation of a machine learning based method for the classification or regression task with trainable parameters 6, - a D, -module as implementation of a machine learning based method (for instance artificial neural network) which is responsible to learn a representation of the training dataset with trainable parameters ¢, and | - an E-module as implementation of a measure to determine how far the input x is from the training dataset using the information of Dg.

As shown in Fig. 1 the D, -module provides for an D, (x)-output of a model which | learned the representation of the training dataset, said output being used by the E- module to determine how different the input x is from what has been seen during training. Further, the E-module provides for an E(Dg (x), x)-output of a model which determines how far the input x is from the training dataset using D, (x), said output being combined with M, to decide whether the model is allowed to perform an action (classification or regression). Still further, the Mg -module provides for an

Mg (x)-output of a classification or regression model using the input x, said output in combination with the output of the D,-module and the output of E-module being adapted to decide to perform an action, upon which the output will be the classification or regression based on the input x.

[0018] D, and Mg should preferably be trained together/or parallel and can possibly interact with each other to improve the efficiency of the whole system. However, both models can also be trained independently and only be combined in the system after training.

[0019] Alternatively, a structure can be implemented in which D, and Mg are not trained together. For example, D, could be a “region of interest (ROI)” algorithm, proposing only the interesting regions in an image. This algorithm would be optimized to be background independent. Then for each region, the model Mg, optimized for classifications, could perform a classification.

[0020] Fig. 2 shows an example of the confidence measure module arrangement of the invention.

[0021] For the data representation module Dy, a variational autoencoder (VAE) may be used in the example which can learn to represent the training data in lower dimensions. The skilled person will appreciate, that the variational autoencoder (VAE) is only one possible example of a neural network structure which can be used and that other autoencoder and/or neural network models could also be used as well.

[0022] The low-dimensional representation of the input X is denoted by Z in the following. In order to detect whether the input is in some sense close to the training data (module E(D4(x),x)), one could calculate the l-error I, = ISI, ID (0); — x;|? between the output of the autoencoder, denoted by Y, and the input X. If the input is close to the training data, then the low-dimensional representation Z of X is meaningful and consequently the reconstruction by Y should be close to X (since this is the initial goal of the autoencoder). The term “close” in latter can be interpreted in a wide sense, for instance close in norm distance or statistical measures. The actual distance is determined by the designof the autoencoder. If the error between X and Y is lower than a pre-defined threshold, than the model Mg is allowed to make a classification or regression based on X. Otherwise, the model is not allowed to do so and the model might need to inform the user or system that it cannot perform an action.

[0023] If D, and Mg are trained together, Mg can use the input X as well as the low-dimensional representation Z of X in order to perform a classification or regression tasks. Further, the low-dimensional representation could be optimized since Mg learns how to handle Z and X.

[0024] One can also imagine a structure in which D, and Mg are not trained together. For example, D, could be a “region of interest (ROI)” algorithm, proposing only the interesting regions in an image. This algorithm would be optimized to be background independent. Then for each region, the model Mg, optimized for classifications, could perform a classification.

Claims

1. A system including a device and at least one learning algorithm to perform a classification or regression problem for an input (x) to the device, the system comprising a confidence measure module arrangement used in combination with the learning algorithm to decide when the algorithm should be allowed to perform a decision or regression on the input (x) and when not, the confidence measure module arrangement comprising: - an M, -module as implementation of a machine learning based method for the classification or regression task with trainable parameters 6, - à Dg -module as implementation of a machine learning based method configured to learn a representation of the training dataset with trainable parameters ¢, and - an E-module as implementation of a measure to determine how far the input x is from the training dataset using the information of D,, wherein the Dy -module provides for an Dy (x)-output of a model which learned the representation of the training dataset, said output being used by the E-module to determine how different the input x is from what has been seen during training, the E-module provides for an E (D(x), x)-output of a model which determines how far the input x is from the training dataset using D, (x), said output being combined with Mg to decide whether the model is allowed to perform an action (classification or regression), and the Mg -module provides for an Mg (x)-output of a classification or regression model using the input x, wherein said E (Dy (x), x)-output and said Mg (x)-output in combination being adapted to decide to perform an action, upon which the output will be the classification or regression M, (x) based on the input x.

2. The system of claim 1, wherein a neural network structure, preferably a variational autoencoder (VAE), which can learn to represent the training data in lower dimensions is used for the D, -module, wherein the low-dimensional representation of the input X is denoted by Z.

3. The system of claim 1 or 2, wherein to detect whether the input is in some sense close to the training data (module E(Dy(x),x)), one could calculate an l,-error, L = [EM IDG): = xl? ‚ between the output of the autoencoder, denoted by Y, and the input X is calculated to detect whether the input is close to the training data (module E(D4(x),x)), wherein “close” can be interpreted in a wide sense, for instance close in norm distance or statistical measures.

4. The system of claim 3, where in case the input is close to the training data, then the low-dimensional representation Z of X is meaningful and consequently the reconstruction by Y is close to X in accordance with the initial goal of the autoencoder.

5. The system of claim 4, wherein in case the error between X and Y is lower than a pre-defined threshold, than the model of M,-module is allowed to make a classification or regression based on X, otherwise, the model is not allowed to do so and the model might need to inform the system that it cannot perform an action.

6. The system of one of claims 1 to 5, wherein the models of the D,- and M,- modules are trained together/or parallel and interact with each other to improve the efficiency of the whole system.

7. The system of claim 5, wherein in case the models of the D,- and Mg-modules are trained together, the M,-module uses the input X as well as the low- dimensional representation Z of X in order to perform a classification or regression task.

8. The system of claim 7, wherein the low-dimensional representation Z of X is optimized since My learns how to handle Z and X.

9. The system of one of claims 1 to 8, wherein the models of the D,- and M,- modules are trained independently from one another and only be combined in the system after training.

10. The system of claim 9, wherein the model of the D,-module is a region of interest (ROI) algorithm, proposing only the interesting regions in an image.

11. The system of claim 10, wherein the region of interest (ROI) algorithm is optimized to be background independent.

12. The system of claim 11, wherein for each interesting region, the model of the M,-module optimized for classifications, is adapted to perform a classification.

13. The system of one of claims 1 to 12, wherein the device comprises a single Sensor.

14. The system of one of claims 1 to 12, wherein the device comprises a multi- Sensor.