GB2597352A

GB2597352A - Method and system for efficient neural network training

Info

Publication number: GB2597352A
Application number: GB2106984.4A
Authority: GB
Inventors: Hospedales Timothy; Tsamoura Efthymia
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2020-06-01
Filing date: 2021-05-17
Publication date: 2022-01-26
Also published as: GB202106984D0

Abstract

Training method for a machine learning, ML, model based on a neural-symbolic framework that is a hybrid of a symbolic reasoning algorithm and artificial neural networks, providing a method of training multiple artificial neural networks in a ML which uses abductive reasoning to improve training data for a neural network by abducting possible correct intermediate neuron outputs (i.e. intermediate labels). The method comprising: receiving training data, i.e. input data and a correct output label, and a corresponding set of logical rules; inputting the label and the set of logical rules to a logic module of the ML model; using abduction at the logic module to compute a set of possible abducted intermediate labels for the input data item of the pair; and, training, iteratively, neural network modules of the ML model by: inputting the training data to neural models; outputting intermediate labels; comparing the output intermediate labels to the abducted intermediate labels and determining how well they match with other; using backpropagation to maximise the match between the abducted intermediate labels and the intermediate labels. The training data may be an image and a final label.

Description

Method and System for Efficient Neural Network Training

Field

[001] The present application generally relates to a method and system for training neural networks, and in particular to a computer-implemented method for training a machine learning, ML, model using a neural-symbolic framework.

Background

[2] Deep learning systems or deep neural networks have been very useful for certain tasks, such as image recognition and machine translation. However, deep neural networks have some limitations, and for many complex applications, these deep learning approaches may not be suitable. For example, deep learning systems require lots of data (i.e. large training sets) to be trained to a required level of accuracy. Deep learning systems may lack strong generalisation and so may transfer poorly to data generated from different distributions. Deep learning systems may not be able to discover new regularities and may not be able to extrapolate beyond the training sets, and can only interpolate and approximate based on what is already know. Deep learning systems may not be able to systemize, may be difficult to interpret, and may provide little to no guarantees on the quality of the returned outputs.

[3] Many of the disadvantages of deep neural networks are advantages of neuro-symbolic systems. Neuro-symbolic artificial intelligence (Al) is a combination of two existing approaches to building Al: symbolic Al and the neural networks or deep learning neural networks mentioned above. Symbolic Al is based on humans' ability to understand the world around them by forming internal symbolic representations -rules are created for dealing with these concepts, and the rules can be formalised in a way that captures everyday knowledge. In contrast, neural networks are data-driven rather than rule-based. Neuro-symbolic Al brings together these two approaches to combine both learning and logic. Neural networks may make symbolic Al systems smarter, while symbolic Al may incorporate common sense reasoning and domain knowledge into deep learning.

[4] Good neural-symbolic frameworks may be compositional (i.e. it should be possible to plug any logic and any network of interest into the framework), may be explainable (i.e. it should be possible to know why a particular output is returned), may have clean semantics (i.e. it should be possible to know the semantics of the outputs), and may have convergence guarantees (i.e. the framework should converge assuming that individual components satisfy certain convergence criteria).

[5] However, current neural-symbolic systems suffer from expensive inference and training algorithms (e.g. reasoning over all possible interpretations), unclear semantics, low expressivity in terms of the underlying logic (e.g. no neural-symbolic system supports a temporal or an action logic), and lack of convergence guarantees.

[6] The present applicant has recognised the need for a more efficient technique for training machine learning, ML, models that are based on a neural-symbolic framework.

Summary

[7] In a first approach of the present techniques, there is provided a computer-implemented method for training a machine learning, ML, model, the method comprising: receiving a set of training data, the training data comprising: a plurality of data item pairs, each data item pair comprising an input data item and a final label for the input data item, and a set of logical rules corresponding to the training data; inputting, into a logic module of the ML model, the final label for the input data item of a data item pair and the set of logical rules; computing, using abduction performed by the logic module, a set of possible abduced intermediate labels for the input data item of the data item pair; and training, using an iterative process, a plurality of trainable neural modules of the ML model by: inputting, into the plurality of trainable neural modules of the ML model, the input data item of the data item pair; outputting, using the neural modules, an intermediate label for the input data item; comparing the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module and determining how well the intermediate label matches any of the possible abduced intermediate labels; and updating the plurality of trainable neural modules, using backpropagafion, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module.

[8] In a second approach of the present techniques, there is provided a system for implementing a machine learning, ML, model, comprising: a server for training a machine learning, ML, model, the server comprising at least one processor coupled to memory, for: receiving a set of training data, the training data comprising: a plurality of data item pairs, each data item pair comprising an input data item and a final label for the input data item, and a set of logical rules corresponding to the training data; inputting, into a logic module of the ML model, the final label for the input data item of a data item pair and the set of logical rules; computing, using abduction performed by the logic module, a set of possible abduced intermediate labels for the input data item of the data item pair; and training, using an iterative process, a plurality of trainable neural modules of the ML model by: inputting, into the plurality of trainable neural modules of the ML model, the input data item of the data item pair; outputting, using the neural modules, an intermediate label for the input data item; comparing the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module and determining how well the intermediate label matches any of the possible intermediate labels; and updating the plurality of trainable neural modules, using backpropagation, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module.

[009] The system may further comprise at least one user electronic device for implementing a ML model, wherein after the training is complete and a trained ML model is obtained, the server provides the trained ML model to the at least one user electronic device for use.

[010] Preferred features are set out below and apply equally to the first and second approaches [11] In some cases, the step of computing a set of possible abduced intermediate labels (using abduction performed by the logic module), may comprise computing a set of all possible abduced intermediate labels for the input data item of the data item pair. In these cases, the computation of the set of all possible abduced intermediate labels may happen only once, and the computed set is used in every iteration of the training process.

[12] In other cases, the step of computing a set of possible abduced intermediate labels (using abduction performed by the logic module), may comprise computing a set of possible abduced intermediate labels for the input data item of the data item pair. Then, during each iteration of the training process, the updating of the neural modules may comprise recomputing the possible abduced intermediate labels based on the current intermediate label output by the neural module. That is, the updating step of each iteration of the training process may comprise: providing, to the logic module, the intermediate label output by the neural modules for the input data item during a current iteration; and receiving, from the logic module, a sub-set of possible abduced intermediate labels which are closest to the intermediate label output by the neural modules, wherein the sub-set of possible abduced intermediate labels is used by the neural modules in the subsequent iteration of training. In these cases, the computation of possible abduced intermediate labels is also revised during each iteration, and the revised computation is used in the subsequent iteration (to define the loss function for learning). The sub-set of possible abduced intermediate labels may be those which are determined to be within a predefined measure of closeness to the current intermediate label output by the neural modules. For example, the sub-set of possible abduced intermediate labels may comprise the top-K closest possible valid intermediate labels to the current neural modules prediction.

[13] The method of training the ML model may further comprise outputting, using the neural modules, a final intermediate label for the input data item of the data item pair, when training is complete. In this way, each input data item of the data item pair is provided with a final intermediate label. Thus, the training method enables unlabelled input data items to be labelled. As mentioned above, this is achieved by using a logic module (that uses abduction) to compute possible abduced intermediate labels that would lead to the known final label for the data item, using a neural module to guess one possible intermediate label for the data item, and training the neural module to make its guess match the computed possible abduced intermediate labels. This allows a symbolic module to be integrated with a neural module.

[14] The set of logical rules may comprise a set of integrity constraints. Logic-based abduction may require integrity constraints, which are logic formulas that need to be respected by the abductive reasoning (i.e. the abductive reasoning which the logic module of the ML model uses to compute the set (or sub-set) of possible abduced intermediate labels for an input data item, given the known final label associated with the data item). For example, where the input data item is an image that shows an arithmetical formula, the integrity constraints may reflect algebraic or mathematical axioms (such as x+y = y+x, for example). In another example, where the input data item is an image that shows a chess board or part of a chess board, the integrity constraints may comprise the fact that two pieces may not occupy the same square on the chess board at the same time, the fact that there is at most one black king on the chessboard, and so on. The integrity constraints may help to reduce the number of possible abduced intermediate labels in the set of possible abduced intermediate labels computed by the logic module (by filtering the initial guesses of the abduced intermediate labels to remove those that do not satisfy the integrity constraints). Thus, computing a set of possible abduced intermediate labels for the input data item may comprise: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of integrity constraints.

[15] In the example above, essentially a list of intermediate labels is first generated and then some elements of the list are removed based on the integrity constraints. However, for faster and more efficient implementation, it may be preferably to simply compute a set of abduced intermediate labels that meet both criteria in the first place. This is because the list of intermediate labels may be very long, and possible even longer than can be stored in the computer memory. Therefore, additionally or alternatively, the integrity constraints may help to reduce the number of possible abduced intermediate labels computed by the logic module by requiring that it only returns the set of labels that both respect the integrity constraints, and also lead to the desired final label/output of reasoning. Thus, computing a set of possible abduced intermediate labels for the input data item may comprise: computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item which satisfy the set of integrity constraints.

[16] The method may also comprise receiving a set of input-specific constraints corresponding to the input data items of the training data set. The input-specific constraints may be specific constraints relating to what the input data items represent. For example, if the input data items are known to be images representing a chessboard or parts of a chessboard, then the input-specific constraints may include input-specific knowledge or constraints about what the images show. In an example, the input-specific knowledge may be "on this specific board, squares ((3,4),(7,6)... etc) are empty". The image of the chessboard may be obtained by an image capture device, and the input-specific knowledge may be gathered by using a depth sensor, for example, which is able to detect which squares of the chess board are occupied or empty, but which cannot differentiate between the chess pieces themselves. In this case, the neural modules may recognise the pieces but uses information from the depth sensor as 'side-feedback'. This 'side-feedback' or input-specific knowledge may be used to restrict the abductive proof for that input, which thereby better focuses the training of the neural module. In other words, the input-specific constraints may help to reduce the number of possible abduced intermediate labels corresponding to a final label for an input image (by removing those that do not satisfy the input-specific constraints). Thus, if such input-specific constraints exist, computing a set of possible abduced intermediate labels for the input data item may comprise: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of input-specific constraints. As for the integrity constraints, the computation may be more efficient and faster if the computing of the set of possible abduced intermediate labels comprises computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item that satisfy the set of input-specific constraints.

[17] When training neural networks using abduction, it is necessary to determine a loss function that shows how close the intermediate label output by the neural modules is to the set of possible abduced intermediate labels computed by the logic module. The final loss may be a "point-to-set" distance, i.e. the loss measures the similarity of the intermediate label output to the set of abduced intermediate labels. The present techniques use weighted model counting to measure this closeness. Thus, determining how well the intermediate label output by the neural modules matches the possible abduced intermediate labels comprises: measuring a closeness between the intermediate label output by the neural modules and each possible abduced intermediate label computed by the logic module; assigning a weight to each possible abduced intermediate label based on the measured closeness; and determining a loss function for training the neural modules using weighted model counting.

[18] The weighted model counting may be used to train the ML model. Specifically, during each iteration, updating the plurality of trainable neural modules may comprise: adjusting weights assigned to each node of the neural modules to minimise the loss function (defined by a negative log weighted model count) Minimising the loss function is equivalent to maximising the weighted model counting.

[019] With respect to the training data, the input data item of each data item pair may be of any data type, and thus the present techniques may be used to analyse data of any type. In one particular example, each data item pair may comprise an input image and a final label for the input image. In this case, the trained ML model may be used to perform image analysis, such as image recognition.

[020] Each data item pair may comprise sensor data from, for example, a user's wearable device. The sensor data may be accelerometer data, heart rate data, sleep data, and so on. The sensor data may be processed by a trained ML model to for example understand the user's activities. For instance, the sensor data may comprise a sequence of acceleration readings, and the ML model may predict that the user is walking or jogging.

[21] Another example use is to perform network troubleshooting. In this case, the training would comprise learning to detect different (intermediate level) network events such as network congestion or a misconfigured router without annotation, but using some higher level rules that explain how these events produce observed higher level events such as user packet delays.

[22] In a third approach of the present techniques, there is provided an apparatus for performing image recognition using a trained machine learning, ML, model, the ML model comprising a logic module and a plurality of neural modules, the apparatus comprising: at least one interface for receiving an image and a final label for the image; storage storing a trained ML model trained using any of the methods described herein; and at least one processor coupled to memory and arranged to identify an intermediate label for each object in the received image, by inputting the received image and final label into the trained ML model.

[023] The at least one interface may receive at least one new logical rule, input by a user of the apparatus, for example. In this case, the at least one processor may: store the received at least one new logical rule; and input the at least one new logical rule into the logic module of the ML model. Thus, the next time the ML model is used for image recognition, the new logical rule is used by the logic module for the abduction step. Therefore, the behaviour of the trained ML model can be changed or extended by a user after training, by changing or adding to the set of logical rules used by the logic module. For example, a user may input a new object category to be recognised during image recognition, by defining a new rule. This is advantageous because part of the inference/recognition process is performed by the logical module, which is configured by a set of input rules. Furthermore, since adding a new rule is a lightweight change, the process can be done on-device rather than by a server, and no new training data or backpropagation is required, and no retraining is required.

[24] In a related approach of the present techniques, there is provided a non-transitory data carrier carrying processor control code to implement the methods described herein.

[25] As will be appreciated by one skilled in the art, the present techniques may be embodied as a system, method or computer program product. Accordingly, present techniques may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.

[26] Furthermore, the present techniques may take the form of a computer program product embodied in a computer readable medium having computer readable program code embodied thereon. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable medium may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.

[27] Computer program code for carrying out operations of the present techniques may be written in any combination of one or more programming languages, including object oriented programming languages and conventional procedural programming languages. Code components may be embodied as procedures, methods or the like, and may comprise subcomponents which may take the form of instructions or sequences of instructions at any of the levels of abstraction, from the direct machine instructions of a native instruction set to high-level compiled or interpreted language constructs.

[28] Embodiments of the present techniques also provide a non-transitory data carrier carrying code which, when implemented on a processor, causes the processor to carry out any of the methods described herein.

[029] The techniques further provide processor control code to implement the above- described methods, for example on a general purpose computer system or on a digital signal processor (DSP). The techniques also provide a carrier carrying processor control code to, when running, implement any of the above methods, in particular on a non-transitory data carrier. The code may be provided on a carrier such as a disk, a microprocessor, CD-or DVD-ROM, programmed memory such as non-volatile memory (e.g. Flash) or read-only memory (firmware), or on a data carrier such as an optical or electrical signal carrier. Code (and/or data) to implement embodiments of the techniques described herein may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as Python, C, or assembly code, code for setting up or controlling an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array), or code for a hardware description language such as Verilog (RTM) or VHDL (Very high speed integrated circuit Hardware Description Language). As the skilled person will appreciate, such code and/or data may be distributed between a plurality of coupled components in communication with one another. The techniques may comprise a controller which includes a microprocessor, working memory and program memory coupled to one or more of the components of the system.

[030] It will also be clear to one of skill in the art that all or part of a logical method according to embodiments of the present techniques may suitably be embodied in a logic apparatus comprising logic elements to perform the steps of the above-described methods, and that such logic elements may comprise components such as logic gates in, for example a programmable logic array or application-specific integrated circuit. Such a logic arrangement may further be embodied in enabling elements for temporarily or permanently establishing logic structures in such an array or circuit using, for example, a virtual hardware descriptor language, which may be stored and transmitted using fixed or transmittable carrier media.

[031] In an embodiment, the present techniques may be realised in the form of a data carrier having functional data thereon, said functional data comprising functional computer data structures to, when loaded into a computer system or network and operated upon thereby, enable said computer system to perform all the steps of the above-described method.

[032] The method described above may be wholly or partly performed on an apparatus, i.e. an electronic device, using a machine learning or artificial intelligence model. The model may be processed by an artificial intelligence-dedicated processor designed in a hardware structure specified for artificial intelligence model processing. The artificial intelligence model may be obtained by training. Here, "obtained by training" means that a predefined operation rule or artificial intelligence model configured to perform a desired feature (or purpose) is obtained by training a basic artificial intelligence model with multiple pieces of training data by a training algorithm. The artificial intelligence model may include a plurality of neural network layers. Each of the plurality of neural network layers includes a plurality of weight values and performs neural network computation by computation between a result of computation by a previous layer and the plurality of weight values.

[033] As mentioned above, the present techniques may be implemented using an Al model. A function associated with Al may be performed through the non-volatile memory, the volatile memory, and the processor. The processor may include one or a plurality of processors. At this time, one or a plurality of processors may be a general purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an Al-dedicated processor such as a neural processing unit (NPU). The one or a plurality of processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (Al) model stored in the non-volatile memory and the volatile memory. The predefined operating rule or artificial intelligence model is provided through training or learning. Here, being provided through learning means that, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or Al model of a desired characteristic is made. The learning may be performed in a device itself in which Al according to an embodiment is performed, and/o may be implemented through a separate server/system.

[034] The Al model may consist of a plurality of neural network layers. Each layer has a plurality of weight values, and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights. Examples of neural networks include, but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), restricted Boltzmann Machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN), generative adversarial networks (GAN), and deep 0-networks.

[035] The learning algorithm is a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction. Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

Brief description of drawings

[36] Implementations of the present techniques will now be described, by way of example only, with reference to the accompanying drawings, in which: [37] Figure 1 is a schematic diagram of an example framework for inference in a neurosymbolic machine learning, ML, model; [38] Figure 2 shows tables illustrating weighted model counting; [39] Figure 3 shows a schematic diagram of a technique to train a neurosymbolic ML model; [40] Figure 4A shows a schematic diagram of a first process to train a neurosymbolic ML model comprising three main stages; [41] Figure 4B shows a schematic diagram of a second process to train a neurosymbolic ML model comprising three main stages; [42] Figure 5 shows an example application of the training method of the present techniques; [43] Figure 6 is a schematic diagram showing the difference between a neurosymbolic ML model and a conventional neural network; [044] Figure 7 is a schematic diagram showing the difference between training data used to train a neurosymbolic ML model and a conventional neural network; [045] Figure 8 shows a flowchart of example steps to train a neurosymbolic ML model; and [046] Figure 9 shows a schematic diagram of a system for implementing a ML model.

Detailed description of drawings

[47] Broadly speaking, the present techniques relate to methods and systems for training neural networks, and in particular to a more efficient computer-implemented method for training a machine learning, ML, model that is based on a neural-symbolic framework.

[48] The goal of the present techniques is to develop a compositional neural-symbolic framework. To do so, the present techniques treat different components as black boxes. A compositional neural-symbolic framework may have several advantages: it may plug in any logic theory of interest; use native inference and training techniques for each component, thus controlling the learning and inference cost; offer clean semantics and hard convergence guarantees; and easily plug in new techniques, such as implicit learning and logical coaching.

[049] A pipeline may consist of a set of trainable neural components and a fixed logical component consuming the neural outputs. The neural components consist of known On terms of semantics) outputs. The training tuples are of the form: input to the neural components, desired output of the pipeline. The objective of the present techniques is to train the neural component using the input tuples so as the pipeline outputs the correct results when provided with future inputs to the neural components.

[50] A simple example of the framework is shown in Figure 1. Given a sequence of images showing a mathematical operation, the goal is to output the result of this operation. In other words, the objective is to train the neural networks so that the output of the pipeline is 8 given the input images 3 + 5, i.e. calc([3], [+], [5], ?Z=8).

[51] The training works by computing a formula representing what the networks should output in order to get the desired output after reasoning, and then using the computed formula to train the neural networks. In the example shown in Figure 1, the logical component outputs 8 when the neural network outputs satisfy the following formula: digit([3], 3) A oper([+], +) A digit([5], 5) v v digit([3], 8) A oper([+], X) A cligit([5], 1) This formula is then used to train the neural network.

[52] Computing a formula is an operation known in logic as abduction. Abduction is defined as follows. Given * a set of rules P * a set of abducible predicates A -data that is given as part of the input * a set of integrity constraints IC, and * a user query Q find a formula a over of facts over abducible predicates A, such that: * PU,AQ * P U41, IC [53] When training neural networks using formulae, the loss function must show how close -semantically -the outputs of the neural networks are to the formula found via abduction. To do so, weighted model counting may be used to measure how close the neural network outputs are to the set of abductive formulas.

[54] Consider a propositional formula 0, where each variable X in 0 is associated with a weight w(X) in [0,1]. A satisfying assignment a of 0 is a mapping of the variables in 0 to T or 1, that makes 0 true. The weight of a satisfying assignment a is defined as: "(x)x 1_,,"(x) xeolx=T xeolx=, The weighted model count of q is the sum of the weights of all satisfying assignments of 0.

[55] Figure 2 shows tables illustrating weighted model counting.

[56] Figure 3 shows a schematic diagram of a technique to train a neural network using formulae. Consider the formula: digit([3], 3) A digit([5], 5) A oper([+], +) Assume that each input is fed to a different network. Associate each neural network output with a unique Boolean variable. The above formula becomes: Xi A Yi A Zi Set the weight of each neural network output as the weight of the corresponding Boolean variable. The loss is defined as the negative logarithm of the weighted model count of X1 A Yi A [57] The present techniques are compared with two existing techniques that use logic to improve neural network performance. DeepProbLog (Neural Probabilistic Logic Programming in DeepProbLog, NeurIPS 2018), reduces the problem of training the neural components to the problem of learning the parameters of a probabilistic logic program. This technique uses abduction only in the context of probabilistic logic programs, and does not support logical theories with time, integrity or domain constraints. ABL2 (Bridging Machine Learning and Logical Reasoning by Abductive Learning, NeurIPS 2019) uses abduction, but only subsets of the computed proofs to train the neural components.

[058] Table 1 shows the training time and accuracy for 3000 samples and 3 epochs for different benchmarks: Existing techniques (time) Existing techniques (accuracy) Present techniques (time) Present techniques (accuracy) ADD2x2 17m19s 89.70% 8m 93% APPLY2x2 6m48s 99% 3m58s 99% Operator2x2 178m29s 88.52% 12m45s 93% MATH(3) 23m26s 94% 7m30s 94.52% MATH(5) Timeout Timeout 16m47s 90.40% MEMBER(3) 34m23s 94.60% 3m29s 94.92% MEMBER(5) Timeout Timeout 7m5s 96% PATH (6x6) Timeout Timeout 47m30s 97.51% It can be seen that the present techniques produce higher accuracy training in shorter training times than existing techniques.

[059] The present techniques may be used for machine learning applications, to enable learning using fewer and/or unlabelled data.

[60] The present techniques may be used for 5G networks as background knowledge and neural components could be used to improve anomaly detection, improve the system for resolving tickets, and/or change a network configuration on the fly to reduce latency.

[61] The present techniques fulfil the objectives above because the proposed pipeline is compositional (can plug in any net and any logic, e.g. non-monotonic, probabilistic, action, etc., of interest), has clean semantics (as it employs the semantics of the underlying logic), and is explainable (due to the logical component).

Figure 4A shows a schematic diagram of a first process to train a ML model comprising three main stages. Figure 4B shows a schematic diagram of a second process to train a neurosymbolic ML model comprising three main stages.

[62] In both the first and second processes, the training is performed using a set of training data. The training data comprises a plurality of data item pairs, each data item pair comprising an input image and a final label for the input image. For each data item pair, there may be a single image (e.g. a chessboard) corresponding to the final label, or there may be multiple images (e.g. images of a "3", a "+" and a "5", as per the example in Figure 1) corresponding to the final label. The training data also comprises a set of logical rules corresponding to the training data.

[063] As shown in Figures 4A and 4B, the first stage in the training process is that of logical abduction. (The first stage may be preceded by a data preparation step, not shown). In both the first and second processes, the first stage may be implemented by at least one logic module of a ML model. The logical abduction stage takes in, as input, the required final prediction, i.e. the known final labels for the input images. The logical abduction stage also takes in, as input, the set of logical rules. The logical abduction stage outputs a set of possible abduced intermediate labels for each input image. The intermediate labels are abduced for each input image using the final label corresponding to the input image, and the set of logical rules.

[064] The logical abduction stage varies slightly between the first process shown in Figure 4A and the second process shown in Figure 4B. The difference is explained below after the other stages have been explained.

[65] The set of logical rules may comprise a set of integrity constraints. Logic-based abduction may require integrity constraints, which are logic formulas that need to be respected by the abductive reasoning (i.e. the abducfive reasoning which the logic module of the ML model uses to compute the set (or sub-set) of possible abduced intermediate labels for an input image, given the known final label associated with the image). For example, where the input image may show an arithmetical formula, the integrity constraints may reflect algebraic or mathematical axioms (such as x+y = y+x, for example). In another example, where the input image shows a chess board or part of a chess board, the integrity constraints may comprise the fact that two pieces may not occupy the same square on the chess board at the same time, the fact that there is at most one black king on the chessboard, and so on. More generally, the integrity constraints may be constraints that the neural module is trained to learn to respect. This is achieved by ensuring that all abduced intermediate labels used to train the network respect these constraints.

[66] The integrity constraints may help to reduce the number of possible abduced intermediate labels in the set of possible abduced intermediate labels computed by the logic module (by filtering the initial guesses of the abduced intermediate labels to remove those that do not satisfy the integrity constraints). Thus, computing a set of possible abduced intermediate labels for the input image may comprise: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input image; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of integrity constraints.

[67] In the example above, essentially a list of intermediate labels is first generated and then some elements of the list are removed based on the integrity constraints. However, for faster and more efficient implementation, it may be preferably to simply compute a set of abduced intermediate labels that meet both criteria in the first place. This is because the list of intermediate labels may be very long, and possible even longer than can be stored in the computer memory. Therefore, additionally or alternatively, the integrity constraints may help to reduce the number of possible abduced intermediate labels computed by the logic module by requiring that it only returns the set of labels that both respect the integrity constraints, and also lead to the desired final label/output of reasoning. Thus, computing a set of possible abduced intermediate labels for the input image may comprise: computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input image which satisfy the set of integrity constraints.

[68] The first stage may also receive a set of input-specific constraints corresponding to the input images of the training data set. The input-specific constraints may be specific constraints relating to what the input images represent. For example, if the input images are known to represent a chessboard or parts of a chessboard, then the input-specific constraints may include input-specific knowledge or constraints about what the images show. In an example, the input-specific knowledge may be "on this specific board, squares ((3,4),(7,6)... etc) are empty". The image of the chessboard may be obtained by an image capture device, and the input-specific knowledge may be gathered by using a depth sensor, for example, which is able to detect which squares of the chess board are occupied or empty, but which cannot differentiate between the chess pieces themselves. In this case, the neural modules may recognise the pieces but uses information from the depth sensor as 'side-feedback'. This 'side-feedback' or input-specific knowledge may be used to restrict the abductive proof for that input, which thereby better focuses the training of the neural module. In other words, the input-specific constraints may help to reduce the number of possible abduced intermediate labels corresponding to a final label for an input image (by removing those that do not satisfy the input-specific constraints). Thus, if such input-specific constraints exist, computing a set of possible abduced intermediate labels for the input image may comprise: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input image; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of input-specific constraints. As for the integrity constraints, the computation may be more efficient and faster if the computing of the set of possible abduced intermediate labels comprises computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input image that satisfy the set of input-specific constraints.

[69] As shown in Figures 4A and 4B, the second stage in each process is that of neural inference. This may be implemented by a plurality of neural modules of the ML model. The second stage may comprise: inputting, into the plurality of trainable neural modules of the ML model, the input image of the data item pair; and outputting, using the neural modules, an intermediate label for the input image.

[70] As shown in Figures 4A and 4B, the third stage in each process is that of neural induction. This may be implemented by the plurality of neural modules of the ML model. The third stage may comprise: comparing the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module and evaluating the quality of the match using weighted model counting (i.e. determining how well the intermediate label output matches the set of abduced intermediate labels). The third stage may also comprise: updating the plurality of trainable neural modules, using backpropagation, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module.

[071] The difference between the first process (Figure 4A) and the second process (Figure 4B) is whether the process to train the neural network comprises using a fixed set of possible abduced intermediate labels during each iteration of the training process, or a varying set of possible abduced intermediate labels.

[072] In Figure 4A, a fixed set of possible abduced intermediate labels is used. Here, computing, using abduction performed by the logic module, may comprise computing the set of all possible abduced intermediate labels for the input image of the data item pair. Thus, the computation of the set of all possible abduced intermediate labels may happen only once, and the computed set is used in every iteration of the training process.

[73] In Figure 4B, a varying set of possible abduced intermediate labels is used. Here, computing, using abduction performed by the logic module, may comprise computing a set of possible abduced intermediate labels for the input image of the data item pair. Furthermore, during each iteration of the training process, the updating of the neural modules may comprise recomputing the possible abduced intermediate labels based on the current intermediate label output by the neural module. This is shown in Figure 4B by the arrow from step C to step A. That is, the updating step of each iteration of the training process may comprise: providing, to the logic module, the intermediate label output for the input image during a current iteration; and computing, using the logic module, a sub-set of possible abduced intermediate labels which are closest to the intermediate label output by the neural modules, wherein the sub-set of possible abduced intermediate labels is used by the neural modules in the subsequent iteration. Thus, the computation of possible abduced intermediate labels is also revised during each iteration, and the revised computation is used in the subsequent iteration. The sub-set of possible abduced intermediate labels may be those which are determined to be within a predefined measure of closeness to the current intermediate label output by the neural modules. For example, the sub-set of possible abduced intermediate labels may comprise the top-K possible intermediate labels closest the current neural module's prediction.

[74] Figure 5 shows an example application of the training method of the present techniques. In this example, the training data set comprises input-output data pairs relating to chess. The application of the reasoning process is shown on the left-hand side of the Figure, the training data is shown in the middle of the Figure, and the neural-module learning process is shown on the right-hand side of the Figure. The input of each input-output data pair is an unlabelled image of a section (e.g. 3x3 section) of a chessboard. The output of each input-output data pair is a label corresponding to a game result or state shown by the section of the chessboard shown in the input. For example, the training data includes one input-output data pair in which a 3x3 section of a chessboard corresponds to the output label "mate", and another input-output data pair in which a 3x3 section of a chessboard corresponds to the output label "draw". There may be other output labels such as "win", for example.

[75] Training the model to understand the state of a chessboard may require receiving a set of integrity constraints and/or a set of input-specific constraints corresponding to the input data items of the training data set. For example, if the input data items are known to represent a chessboard or parts of a chessboard, then the logical rules may include input-specific knowledge or constraints about chess (e.g. what the different pieces are called and what moves they can make, what it means to 'win', 'lose' and 'draw', etc.) The integrity constraints may specify constraints about valid chess boards (e.g., there is at most one piece on a given square, there is at most one king of each colour on the board). The input-specific constraints may be specific constraints relating to what the input data items represent. The input specific constraints may contain knowledge about which squares are empty in the current chessboard input. These integrity constraints and 'side-feedback' or input-specific knowledge may be used to restrict the abductive proof for that input, which thereby better focuses the training of the neural module. In other words, the integrity constraints and input-specific constraints may help to reduce the number of possible abduced intermediate labels for each input image. Thus, if such integrity and/or input-specific constraints exist, computing a set of possible abduced intermediate labels may comprise: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input image; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of input-specific constraints and/or integrity constraints. However, as explained above, for computational efficiency and speed, it may be preferable to simply compute a plurality of possible abduced intermediate labels defining how to obtain the final label from the input image which satisfy the set of integrity constraints.

[76] In the chess example of Figure 5, the logic module may determine, using abduction, a formula f(x) defining how to obtain the label "mate" for an output data item, from the input data item (x, i.e. a 3x3 section of a chessboard).

[77] As shown in Figure 5, a loss function may be computed during the training process. When training neural networks, it is necessary to determine a loss function that shows how desirable the neural network predictions are. In this case the loss function measures how close the neural network's predicted intermediate labels are to the set of valid intermediate labels (i.e. those that lead to the desired output label, and also respect the integrity constraints and side-constraints). The present techniques use weighted model counting to measure how close to satisfactory are the neural network's predicted intermediate labels. Thus, determining how well the intermediate label output by the neural modules matches the set of possible abduced intermediate labels comprises: measuring a closeness between the intermediate label output by the neural modules and each possible abduced intermediate label computed by the logic module; assigning a weight to each possible abduced intermediate label based on the measured closeness; and determining a loss function for training the neural modules using weighted model counting.

[78] The weighted model count may be used to train the ML model. Specifically, during each iteration, updating the plurality of trainable neural modules may comprise: adjusting weights assigned to each node of the neural modules to minimise the loss function (defined by a negative log weighted model counting). Minimising the loss function is equivalent to maximising the weighted model count.

[79] Figure 6 is a schematic diagram showing the difference between a neurosymbolic ML model and a conventional neural network. Conventional neural networks are composed of multiple layers or modules of neurons. Here, two neural modules are shown for the sake of simplicity. The amount of data to train the neural modules of a conventional neural network depends roughly on the number of neurons or layers. The more layers there are, the more data is required to train the conventional neural network.

[80] In contrast, the present neurosymbolic ML model (also referred to herein as "NeuroLog"), replace some neural layers with logical modules that can be more easily manually specified and thus, do not need to be trained. Since there are fewer layers to train, less training data is required in total to train the neurosymbolic ML model of the present techniques. In Figure 1, the neurosymbolic model is shown as having one neural module, compared to two in the conventional neural network. The specific achievable difference in the training data requirement depends on the details of the specific problem the neurosymbolic model is trying to solve. A 10-fold reduction in training data is possible, for example.

[81] Figure 7 shows how the present techniques enable a ML model to be trained more easily, without requiring detailed annotation or labels for every category of object the ML model needs to recognise.

[82] As shown in Figure 7, conventional neural networks require training annotation about every category they need to recognise. For example, with respect to images of a street scene (which may be used by autonomous vehicles, traffic monitoring systems, and so on), conventional neural networks need to be trained to recognise people, bikes, cars, roads and other objects. Many images of each category of object the neural network needs to recognise are required. Furthermore, typically, each object is cropped out and labelled before being used as training data.

[83] In contrast, the neurosymbolic model of the present techniques enables higher-level labels to be used to train the model, in conjunction with manually specified background knowledge. This is advantageous because higher-level labels may be easier to use and may require fewer labels to be used. For example, images that are simply labelled "street scene" may be used to train the model to recognise individual objects. By using these images and the higher-level labels, and the prior logical knowledge about how objects behave or appear in street scenes, the neural module is trained to recognise objects. Thus, the neurosymbolic model of the present techniques is able to perform image/object recognition with respect to many different kinds of objects, without needing detailed annotation of all the different object types during the training process. This may make training the neurosymbolic model faster and more data efficient to train.

[84] Furthermore, conventional neural networks generalise poorly, especially in terms of extrapolation to different data compared to data seen during training. For example, the majority of car images show cars having four wheels. If such image data is used to train a neural network, then it will fail to recognise three wheel cars (such as a Reliant Robin) because the neural network has not seen three wheel cars before. As a result, the neural network may instead mislabel an image of a three wheel car as a tricycle.

[085] In contrast, the present techniques enable prior knowledge to be input into the logic module. For example, the prior knowledge may be that three wheels are sufficient information to guess that an image shows a car. As a result, during testing and deployment, the neurosymbolic module will correctly recognise a three wheel car even no image of a three wheel car was used during training.

[086] While both conventional neural networks and the neurosymbolic model of the present techniques can be trained for image recognition, only the neurosymbolic model can advantageously be modified after training. Specifically the behaviour of the neurosymbolic model can be changed or extended by a user after training, by changing the logical knowledge base. For example, a user may input a new object category to be recognised during image recognition, by defining a new rule. This is advantageous because part of the inference/recognition process is performed by the logical module, which is configured by a set of input rules. Furthermore, since adding a new rule is a lightweight change, the process can be done on-device rather than by a server, and no new training data or backpropagation is required, and no retraining is required.

[87] This is impossible to achieve with conventional neural networks, because the functionality of a conventional neural network is fixed after training, and it cannot be changed or extended without retraining the model, which would require additional data collection, annotation, and server-based retraining.

[88] Figure 8 shows a flowchart of example steps to train a neurosymbolic ML model. The method begins by receiving a set of training data, the training data comprising: a plurality of data item pairs, each data item pair comprising an input data item and a final label for the input data item, and a set of logical rules corresponding to the training data, and a set of logical rules corresponding to the training data (step S100).

[89] The method comprises inputting, into a logic module of the ML model, the final label for the input data item of a data item pair and the set of logical rules (step S102). The method comprises computing, using abduction performed by the logic module, a set of possible abduced intermediate labels for the input data item of the data item pair (step S104).

[90] The method comprises training, using an iterative process, a plurality of trainable neural modules of the ML model. This is shown in Figure 8 by the arrow connecting step S112 and step 5106.

[91] The iterative training process of the method comprises inputting, into the plurality of trainable neural modules of the ML model, the input data item of the data item pair (step S106).

[92] The iterative training process of the method further comprises: outputting, using the neural modules, an intermediate label for the input data item (step 5108); comparing the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module and determining whether the intermediate label matches any of the possible abduced intermediate labels (step S110); and updating the plurality of trainable neural modules, using backpropagation, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module (step S112).

[93] Figure 9 shows a schematic diagram of a system 100 for implementing a ML model.

The system 100 comprises a server 102 for training a machine learning, ML, model 110.

[94] The server 102 comprises at least one processor 104 coupled to memory 106. The at least one processor 104 may comprise one or more of: a microprocessor, a microcontroller, and an integrated circuit. The memory 106 may comprise volatile memory, such as random access memory (RAM), for use as temporary memory, and/or non-volatile memory such as Flash, read only memory (ROM), or electrically erasable programmable ROM (EEPROM), for storing data, programs, or instructions, for example.

[95] The at least one processor 104 may be arranged to receive a set of training data 108, the training data comprising: , a plurality of data item pairs, each data item pair comprising an input data item 108a and a final label 108b for the input data item, and a set of logical rules 109 corresponding to the training data.

[96] The processor(s) 104 may be arranged to: input, into a logic module 110a of the ML model 110, the final label 108b for the input data item 108a of a data item pair and the set of logical rules 109; and compute, using abduction performed by the logic module 110a, a set of possible abduced intermediate labels for the input data item 108a of the data item pair.

The processor(s) 104 may be arranged to: input, into the plurality of trainable neural modules 110b of the ML model 110, the input data item 108a of the data item pair; and output, using the neural modules, an intermediate label for the input data item 108a.

[97] The neural modules 110b may: compare the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module 109 and determining whether the intermediate label matches any of the possible abduced intermediate labels; and update the plurality of trainable neural modules, using backpropagation, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module 109.

[98] The server 102 may comprise one or more interfaces (not shown) that enable the server 102 to receive inputs and/or provide outputs. For example, the server 102 may comprise an interface to receive the training data and any constraints (e.g. integrity constraints and/or input-specific constraints). The server 102 may comprise an interface to enable the trained ML model 110 to be provided to end user devices, such as user electronic device 112.

Although a single user electronic device 112 is shown in Figure 9, it will be understood that the server 102 may be able to communicate with many (e.g. tens, hundreds, thousands, or millions) user electronic devices 112.

[099] The system 100 may further comprise at least one user electronic device 112 for implementing a ML model, wherein after the training is complete and a trained ML model is obtained, the server provides the trained ML model 114 to the at least one user electronic device for use. The electronic device 112 may then run the trained ML model 114 on-device.

[100] The user electronic device 112 may be any one of: a smartphone, tablet, laptop, computer or computing device, virtual assistant device, a vehicle, a drone, an autonomous vehicle, a robot or robotic device, a robotic assistant, image capture system or device, an augmented reality system or device, a virtual reality system or device, a gaming system, an Internet of Things device, or a smart consumer device (such as a smart fridge). It will be understood that this is a non-exhaustive and non-limiting list of example devices.

[101] Those skilled in the art will appreciate that while the foregoing has described what is considered to be the best mode and where appropriate other modes of performing present techniques, the present techniques should not be limited to the specific configurations and methods disclosed in this description of the preferred embodiment. Those skilled in the art will recognise that present techniques have a broad range of applications, and that the embodiments may take a wide range of modifications without departing from any inventive concept as defined in the appended claims.

Claims

CLAIMS1. A computer-implemented method for training a machine learning, ML, model, the method comprising: receiving a set of training data, the training data comprising.a plurality of data item pairs, each data item pair comprising an input data item and a final label for the input data item, and a set of logical rules corresponding to the training data; inputting, into a logic module of the ML model, the final label for the input data item of a data item pair and the set of logical rules; computing, using abduction performed by the logic module, a set of possible abduced intermediate labels for the input data item of the data item pair; and training, using an iterative process, a plurality of trainable neural modules of the ML model by: inputting, into the plurality of trainable neural modules of the ML model, the input data item of the data item pair; outputting, using the neural modules, an intermediate label for the input data item; comparing the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module and determining how well the intermediate label matches any of the possible intermediate labels; and updating the plurality of trainable neural modules, using backpropagation, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module.
2. The method as claimed in claim 1 further comprising: outputting, using the neural modules, a final intermediate label for the input data item of the data item pair, when training is complete.
3. The method as claimed in claim 1 or 2 wherein computing, using abduction performed by the logic module, comprises computing a set of all possible abduced intermediate labels for the input data item of the data item pair.
4. The method as claimed in claim 1 or 2 wherein computing, using abduction performed by the logic module, comprises computing a set of possible abduced intermediate labels for the input data item of the data item pair; and wherein, during each iteration of the training process, the updating of the neural modules comprises: providing, to the logic module, the intermediate label output for the input data item during a current iteration; and receiving, from the logic module, a sub-set of possible abduced intermediate labels which are closest to the intermediate label output by the neural modules, wherein the sub-set of possible abduced intermediate labels is used by the neural modules in the subsequent iteration.
5. The method as claimed in any of claims 1 to 4 wherein the set of logical rules comprises a set of integrity constraints and wherein computing a set of possible abduced intermediate labels for the input data item comprises: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of integrity constraints.
6. The method as claimed in any of claims 1 to 4 wherein the set of logical rules comprises a set of integrity constraints and wherein computing a set of possible abduced intermediate labels for the input data item comprises: computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item which satisfy the set of integrity constraints.
7. The method as claimed in any of claims 1 to 6 further comprising: receiving a set of input-specific constraints corresponding to the input data items of the training data set; wherein computing a set of possible abduced intermediate labels for the input data item comprises: identifying a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item; and filtering the plurality of possible abduced intermediate labels by retaining the possible abduced intermediate labels that satisfy the set of input-specific constraints.
8. The method as claimed in any of claims 1 to 6 further comprising: receiving a set of input-specific constraints corresponding to the input data items of the training data set; wherein computing a set of possible abduced intermediate labels for the input data item comprises: computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item that satisfy the set of input-specific constraints.
9. The method as claimed in any preceding claim wherein determining how well the intermediate label output by the neural modules matches the possible abduced intermediate labels comprises: measuring a closeness between the intermediate label output by the neural modules and each possible abduced intermediate label computed by the logic module.assigning a weight to each possible abduced intermediate label based on the measured closeness; and determining a loss function for training the neural modules using weighted model counting.
10. The method as claimed in claim 9 wherein, during each iteration, updating the plurality of trainable neural modules comprises: adjusting weights assigned to each node of the neural modules to minimise the loss function.
11. The method as claimed in any preceding claim wherein each data item pair may comprise an input image and a final label for the input image.
12. A non-transitory data carrier carrying code which, when implemented on a processor, causes the processor to carry out the method of any of claims 1 to 11.
13. A system for implementing a machine learning, ML, model, comprising: a server for training a machine learning, ML, model, the server comprising at least one processor coupled to memory, for: receiving a set of training data, the training data comprising: a plurality of data item pairs, each data item pair comprising an input data item and a final label for the input data item, and a set of logical rules corresponding to the training data; inputting, into a logic module of the ML model, the final label for the input data item of a data item pair and the set of logical rules; computing, using abduction performed by the logic module, a set of possible abduced intermediate labels for the input data item of the data item pair; training, using an iterative process, a plurality of trainable neural modules of the ML model by: inputting, into the plurality of trainable neural modules of the ML model, the input data item of the data item pair; outputting, using the neural modules, an intermediate label for the input data item; comparing the intermediate label output by the neural modules with the set of possible abduced intermediate labels computed using the logic module and determining how well the intermediate label matches any of the possible abduced intermediate labels; and updating the plurality of trainable neural modules, using backpropagation, to maximise a likelihood of a match between the intermediate label output by the neural modules and at least one intermediate label in the set of possible abduced intermediate labels computed by the logic module.
14. The system as claimed in claim 13, further comprising: at least one user electronic device for implementing a ML model, wherein after the training is complete and a trained ML model is obtained, the server provides the trained ML model to the at least one user electronic device for use.
15. The system as claimed in any of claims 11 to 14 wherein computing, using abduction performed by the logic module, comprises computing a set of all possible abduced intermediate labels for the input data item of the data item pair.
16. The system as claimed in any of claims 11 to 15 wherein computing, using abduction performed by the logic module, comprises computing a set of all possible abduced intermediate labels for the input data item of the data item pair; and wherein, during each iteration of the training process, the updating of the neural modules comprises: providing, to the logic module, the intermediate label output for the input data item during a current iteration; and receiving, from the logic module, a sub-set of possible abduced intermediate labels which are closest to the intermediate label output by the neural modules, wherein the sub-set of possible abduced intermediate labels is used by the neural modules in the subsequent iteration.
17. The system as claimed in any of claims 13 to 16 wherein the set of logical rules comprises a set of integrity constraints and wherein computing a set of possible abduced intermediate labels for the input data item comprises: computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item that satisfy the set of integrity constraints.
18. The system as claimed in any of claims 13 to 17 wherein the processor is further configured to: receive a set of input-specific constraints corresponding to the input data item of the training data set; wherein computing a set of possible abduced intermediate labels for the comprises: computing a plurality of possible abduced intermediate labels defining how to obtain the final label from the input data item that satisfy the set of input-specific constraints.
19. The system as claimed in any of claims 13 to 18 wherein determining how well the intermediate label output by the neural modules matches any of the possible abduced intermediate labels comprises: measuring a closeness between the intermediate label output by the neural modules and each possible abduced intermediate label computed by the logic module; assigning a weight to each possible abduced intermediate label based on the measured closeness; and determining a loss function for training the neural modules using weighted model counting.
20. The system as claimed in claim 19 wherein, during each iteration, updating the plurality of trainable neural modules comprises: adjusting weights assigned to each node of the neural modules to minimise the loss function.
21. An apparatus for performing image recognition using a trained machine learning, ML, model, the ML model comprising a logic module and a plurality of neural modules, the apparatus comprising: at least one interface for receiving an image and a final label for the image-storage storing a trained ML model trained using the method recited in any of claims 1 to 8; and at least one processor coupled to memory and arranged to identify an intermediate label for each object in the received image, by inputting the received image and final label into the trained ML model.
22. The apparatus as claimed in claim 21 wherein the at least one interface receives at least one new logical rule, and wherein the at least one processor stores the received at least one new logical rule; and inputs the at least one new logical rule into the logic module of the ML model.