WO2022243570A1 - Verifying neural networks - Google Patents

Verifying neural networks Download PDF

Info

Publication number
WO2022243570A1
WO2022243570A1 PCT/EP2022/063919 EP2022063919W WO2022243570A1 WO 2022243570 A1 WO2022243570 A1 WO 2022243570A1 EP 2022063919 W EP2022063919 W EP 2022063919W WO 2022243570 A1 WO2022243570 A1 WO 2022243570A1
Authority
WO
WIPO (PCT)
Prior art keywords
constraints
network
layer
linear
neural network
Prior art date
Application number
PCT/EP2022/063919
Other languages
French (fr)
Inventor
Ben BATTEN
Panagiotis KOUVAROS
Jianglin LAN
Alessio LOMUSCIO
Yang Zhang
Original Assignee
Imperial College Innovations Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imperial College Innovations Limited filed Critical Imperial College Innovations Limited
Publication of WO2022243570A1 publication Critical patent/WO2022243570A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present disclosure relates to the verification of the consistency of the output of neural networks under variations to the input.
  • the present disclosure provides techniques for verifying the reliability of a neural network for the classification of objects in sensor data, such as image data.
  • Background Autonomous systems are forecasted to revolutionise key aspects of modern life including mobility, logistics, and beyond. While considerable progress has been made on the underlying technology, severe concerns remain about the safety and security of the autonomous systems under development.
  • One of the difficulties with forthcoming autonomous systems is that they incorporate complex components that are not programmed by engineers but are synthesised from data via machine learning methods, such as a neural network. Neural networks have been shown to be particularly sensitive to variations in their input.
  • neural networks currently used for image processing have been shown to be vulnerable to adversarial attacks in which the behaviour of a neural network can easily be manipulated by a minor change to its input, for example by presenting an “adversarial patch” to a small portion of the field of view of the image.
  • autonomous systems comprising neural networks in safety- critical areas, such as autonomous vehicles.
  • a network is said to be transformationally robust at a given input under a class of transformations if its output remains within a specified tolerance (e.g. one small enough to not cause a change in predicted class) when the input is subjected to any transformation in the class.
  • a specified tolerance e.g. one small enough to not cause a change in predicted class
  • safeguards on acceptable behaviour of the ACAS XU unmanned aircraft collision avoidance system have been defined in terms which are equivalent to transformational robustness (in K. Julian, J. Lopez, J. Brush. M. Owen and M. Kochenderfer. Policy compression for aircraft collision avoidance systems. In Proceedings of the 35th Digital Avionics Systems Conference (DASC16), pages 1-10, 2016).
  • acceptable behaviour of image classifiers has been specified in terms of continuing to predict the same class when a particular image input is subjected to transformations which remain within a certain Lp-distance, or subjected to a certain class of affine and/or photometric transformations. Transformations may also include, for example: white noise changes to a given input (defined by an epsilon ball for an infinite norm); white noise changes to a given input given by any box constraints on some/all of the input dimensions; or any linear or non-linear transformation of the given input governed by a modification of the input described by a mathematical function or an algorithm.
  • Current methods for NN verification can be categorized into complete and incomplete approaches. Aside from computational considerations, complete approaches are guaranteed to resolve any verification query.
  • Incomplete approaches are normally based on various forms of convex approximations of the network and only guarantee that whenever they output that the network is safe, then that is indeed the case. While this typically enables faster computation, the looser this approximation is, the more likely it is that the method may not be able to verify the problem instance. As a result, the present objective in incomplete methods is the development of tighter approximations, which can be efficiently computed, thereby strengthening the efficacy of the methods in answering the verification problem.
  • Proposed complete methods include those based on mixed-integer linear programming (MILP), satisfiability modulo theories or bound propagation techniques coupled with input refinement. While these methods offer theoretical termination guarantees, at present they do not scale to the network sizes that incomplete approaches are able to address.
  • MILP mixed-integer linear programming
  • a method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.
  • the dimensionality of the constraint may be significantly reduced, thereby reducing the computing resources required to apply the semidefinite programming relaxation.
  • the interlayer constraints may help to provide that the interaction between layer outputs and inputs are properly modelled at the same time.
  • the set of interlayer constraints constrain all outputs of one or more of the layers to corresponding inputs of one or more adjacent layers.
  • the set of interlayer constraints constrain a subset of outputs of one or more of the layers to corresponding inputs of one or more adjacent layers.
  • the method further comprises determining one or more initial linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network, wherein the applying a semidefinite programming relaxation is further subject to the one or more initial linear constraints.
  • the method further comprises determining, for each layer of the network, one or more further linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more further linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, and wherein the applying a semidefinite programming relaxation is further subject to the one or more further linear constraints.
  • determining, for each layer of the network, one or more further linear constraints expressed as a upper bound and a lower bound for elements of the matrix representation comprises calculating the upper and lower bounds given the range of inputs of the neural network and the one or more initial linear constraints.
  • Subjecting the semidefinite programming relaxation to the initial linear constraints can ensure that the semidefinite programming relaxation is tighter than linear programming relaxation.
  • the further linear constraints can tighten the semidefinite programming relaxation compared to semidefinite programming relaxation without the further linear constraints.
  • the semidefinite programming relaxation may be subjected to a portion of the further linear constraints to reduce computational cost.
  • the semidefinite programming relaxation may be iteratively repeated, wherein at each iteration one or more of the further linear constraints are added to the portion of the further linear constraints the semidefinite programming relaxation is subjected to.
  • the method further comprises determining, for each layer of the network, a non-linear constraint from the algebraic constraints for that layer, wherein the applying a semidefinite programming relaxation is further subject to the non-linear constraint for each layer of the network.
  • an objective value of the semidefinite programming relaxation determines the outcome of the semidefinite programming relaxation; and the objective value of the semidefinite programming relaxation is monotonically approached by an objective value sequence that converges to the objective value of the semidefinite programming relaxation, wherein a starting point of the objective value sequence is an objective value of the semidefinite programming relaxation not subject to the non-linear constraint for each layer of the network; and the objective value sequence is determined iteratively by solving an auxiliary convex semidefinite programming problem recursively, wherein a current objective value of the objective value sequence determined at an iteration is sequential to the objective values of the objective value sequence determined in prior iterations, wherein a current objective value of the auxiliary convex semidefinite programming problem is an objective value of the auxiliary convex semidefinite programming problem at the iteration.
  • the objective value of the auxiliary convex semidefinite programming problem is always greater than or equal to zero; and the objective value of the auxiliary convex semidefinite programming problem is equal to zero when the non-linear constraint for each layer of the network is satisfied.
  • Subjecting the semidefinite programming relaxation to the non-linear constraint for each layer provably can tighten the semidefinite programming relaxation compared to the semidefinite programming relaxation not subject to the non-linear constraint for each layer.
  • each objective value in the objective value sequence may be a tighter solution than the prior objective values in the sequence. The tightest solution of the sequence may be reached when the objective value sequence has converged to the objective value of the semidefinite programming relaxation subject to the non-linear constraints.
  • determining whether the neural network is robust across the range of inputs comprises: determining at each iteration, based on the current objective value of the objective value sequence, whether the neural network is robust across the range of inputs, if the neural network is robust across the range of inputs, providing as the outcome of the semidefinite programming relaxation that the neural network is robust across the range of inputs, if the neural network is unverified across the range of inputs, determining whether the current objective value of the auxiliary convex semidefinite programming problem is smaller than a predefined value, if the current objective value of the auxiliary convex semidefinite programming problem smaller than a predefined value, providing as the output of the semidefinite programming relaxation that the neural network is not verifiable across the range of inputs.
  • the method further comprises removing terms associated with nodes which are inactive across the range of inputs from the semidefinite constraints.
  • the semidefinite constraints comprise positive semidefinite constraints.
  • the neural network is a feed forward neural network.
  • the nodes of the neural network may apply a Rectified Linear Unit (ReLU) activation function.
  • the neural network may be an image processing network which takes an image as input.
  • the neural network may be trained for an image classification, object detection, image reconstruction, or other image processing task.
  • the network may further be deployed for performing the image processing task, such as the image classification, object detection or image reconstruction task.
  • the network may perform the image processing task on an image. In such circumstances, it may be possible to provide guarantees on the appropriateness of the network to perform the image processing task correctly.
  • the neural network may be an audio processing network which takes a representation of an audio signal as input.
  • the neural network may be trained for a voice authentication, speech recognition, audio reconstruction, or other audio processing task.
  • the network may further be deployed for performing the audio processing task, such as the voice authentication, speech recognition or audio reconstruction task.
  • the network may perform the audio processing task.
  • the input to the neural network may be sensor data such as image data, audio data, LiDAR data, or other data.
  • the claimed process may act to improve the ability or reliability of a network in classifying data of this kind.
  • the neural network may be part of an AI system to evaluate the credit worthiness or other risk or financial metrics and takes as input the relevant tabular information used to assess a financial decision.
  • the neural network may be trained for credit scoring of applicants for loan purposes.
  • the network may further be deployed for the decision making task in question.
  • the neural network may be a controller neural network which outputs a control signal for a physical device, such as an actuator.
  • the neural network may be trained for controlling a robot, vehicle, aircraft or plant.
  • the network may further be deployed for controlling the physical device, such as the actuator, robot, vehicle, aircraft or plant.
  • the network may control the physical device.
  • Other applications of the method above are in fraud monitoring, medical imaging, optical character recognition and generally whenever guarantees of transformational robustness aid in determining the robustness of the neural model.
  • a computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of the first aspect.
  • a system comprising one or more processors configured to carry out the method of the first aspect.
  • a method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining one or more linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network; applying a semidefinite programming relaxation subject to the semidefinite constraints and the linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.
  • a method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining, for each layer of the network, one or more linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, applying a semidefinite programming relaxation subject to the semidefinite constraints and the linear constraints across the range
  • a method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining, one or more non-linear constraint from the algebraic constraints for each layer of the network, wherein the applying a semidefinite programming relaxation is further subject to one or more non-linear constraints; applying a semidefinite programming relaxation subject to the semidefinite constraints and the non-linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.
  • Figure 1 illustrates a set of transformations of an input
  • Figure 2 shows a method according to the present disclosure
  • Figure 3 illustrates the relative tightness of SDP and LP relaxations
  • Figure 4 illustrates the relative tightness of Layer SDP and a subset of RLT-SDP linear constraints
  • Figure 5 illustrates the relative tightness of Layer SDP and a subset of RLT-SDP linear constraints
  • Figure 6 shows a method according to the present disclosure
  • Figure 7 illustrates an example system capable of verifying a neural network
  • Figure 8 shows experimental results
  • Figures 9A and 9B show experimental results
  • Figure 10 shows experimental results.
  • the present disclosure is directed to the verification of a neural network and particularly to verifying consistency of neural network output across a range of potential inputs.
  • verification may offer a guarantee that a neural network’s outputs remain within a certain tolerance when a starting input to the neural network input is varied across a range.
  • Transformations may include, for example: white noise changes to a given input (defined by an epsilon ball for an infinite norm); white noise changes to a given input given by any box constraints on some/all of the input dimensions; or any linear or non-linear transformation of the given input governed by a modification of the input described by a mathematical function or an algorithm.
  • the class of transformations may define the perturbations of the input for which the neural network output is to satisfy the output constraints.
  • the class of transformations may be defined in terms of a range for each component of the neural network’s input, within which the component is to vary.
  • the class of transformations may be defined by a bound on a global metric, such as by defining a maximum value for the l 1 -distance between the original input and the perturbed input.
  • the class of transformations may be specifically adapted to the task for which the network is trained: for example, for a network trained for image recognition, a class of affine or photometric transformations can be defined, for example in the manner described in WO 2020/109774 A1.
  • the class of transformations may be specified in terms of a set of algebraic constraints that are satisfied when applying any transformation in the class to the input.
  • the input and class of transformations may be chosen such that the input sufficiently unambiguously belongs to a particular class and the class of transformations define small enough perturbations that the neural network may be expected not to substantially change its output when the transformations are applied to the input.
  • Figure 1 depicts example affine (102-104), photometric (105-106) and random noise (110) transformations applied to an original image (101).
  • the transformations may be chosen such that the semantic content of the image is unchanged.
  • the set of output constraints define a maximum range within which the outputs of the neural network should vary if the transformational robustness property is to be satisfied.
  • any set of algebraic constraints that defines a region within which the neural network’s output should remain can be used as the set of output constraints.
  • the set of output constraints may be defined in terms of linear inequalities of the form where is the output of the network, a is a vector of coefficients, and b is a constant.
  • the set of output constraints can be defined using the neural network itself; for example, if the network provides for a classification stage, the set of output constraints may correspond to ensuring that the output remains in the same predicted class.
  • the verification problem as follows: given a a nominal input a linear function ⁇ , also called the specification, on the network’s outputs, and a perturbation radius the verification problem is to determine whether where denotes the standard norm of a vector.
  • also called the specification
  • the network is said to be certifiably robust on input x and perturbation radius ⁇ if the answer to the verification problem (1) is true for all
  • the optimal value ⁇ LP of the resulting linear program (LP) relaxation is relatively easy to compute in practice.
  • the semidefinite relaxation utilizes a single positive semidefinite (PSD) constraint that couples all ReLU constraints in (2a) to obtain a convex SDP.
  • PSD positive semidefinite
  • the ReLU constraints are equivalently replaced (2a) with the following quadratic constraints
  • Polynomial lifting and SDP-based hierarchies can be used to solve the resulting polynomial optimisation problem.
  • a lifting matrix P of monomials can be defined as in Raghunathan et al., 2018. Then, all the constraints in (5) and (6) become linear in terms of the elements of P.
  • SDP relaxation of (2) By relaxing the monomial matrix P to be we obtain an SDP relaxation of (2) as follows where the same symbolic indexing P[ ⁇ ] as Raghunathan et al., 2018 is adopted to index the elements of P.
  • (7a) and (7b) correspond to the RELU constraints (5)
  • (7c) corresponds to the bounds on activation vectors in (6).
  • We denote the optimal value of (7) as ⁇ SDP,1.
  • a method for verifying a neural network which adopts aspects of the above-referenced LP and SDP approaches, but is further improved by additional adaptations.
  • data is obtained defining a neural network, range of inputs and set of output constraints to verify.
  • the neural network may, for example, be an image classifier network. Such a network may be shown to classify an image appropriately for a given image.
  • the range of inputs may represent a region around that input for which it is desired that the output remains within the output constraints.
  • the range of inputs may comprise one or more of: white noise variations of an input; geometrical changes of an input; and colour, luminosity, contrast, and/or bias-field transformations of an input.
  • semidefinite constraints optionally positive semidefinite constraints, are adopted. However, unlike the SDP process described above, these semidefinite constraints are defined for each layer of the network rather than for the network as a whole. Consequently, significant computational benefits are realised when resolving these constraints. Further details of the definition of the semidefinite constraints are provided below.
  • one or more interlayer constraints are defined. These interlayer constraints couple outputs of network layers to corresponding inputs.
  • linear cut constraints are defined.
  • a linear constraint may provide further constraints to the approximation of the neural network based on the linear behaviour of the nodes within the exclusively activated or inactivated regions. Whereas conventional semidefinite constraints in these regions are approximate, by applying a linear constraint in such regions the overall tightness of the approximation can be improved.
  • a linear constraint may capture inter-layer and intra-layer dependencies between two nodes in the same or adjacent layers.
  • step 250 SDP relaxations are applied to solve for the constraints defined in steps 220 to 240, thereby obtaining a minimum value of ⁇ as described above. Where ⁇ obtained in this manner is equal to or greater than 0 then the network can be verified across the range of inputs at step 260. Where ⁇ is less than 0 it is not possible to verify the network (although it is possible that the network is itself robust across the range).
  • a verified neural network may be deployed with a degree of certainty for tasks dependent on accurate perception.
  • an image classification neural network is used to control a device (such as an autonomous vehicle)
  • confidence that its outputs are not adversely affected by transformations such as those reflected in the range of inputs may be important for demonstrating the safety and/or efficacy of the device.
  • Further details of the definition of the constraints at steps 220 to 240 are provided below.
  • the adoption of linear cuts providing further constraints to the approximation of the neural network based on the linear behaviour of the nodes within the exclusively activated or inactivated regions may further be understood with reference to Figure 3, which illustrates how in certain cases the SDP relaxation in equation (7) (illustrated by the dashed line) may be looser than the LP relaxation in equation (4) (illustrated by the solid line).
  • the standard SDP relaxation (7) is inexact even for inactive/stable neurons, while the triangular relaxation becomes exact.
  • linear cuts based on a linear approximation of an activation function for one or more nodes of the neural network may be introduced into the process as further set of initial linear constraints at step 240.
  • this process comprises extending the relaxation to include the linear cut (4b) thereby tightening the relaxation.
  • the cut (4b) in terms of the matrix P as follows and add it to (7).
  • Steps 220 and 230 are also effective to reduce the dimensionality of the PSD constraint in (9). These steps exploit the layer-wise cascading structure of NNs whereby each activation vector of a layer depends only on the previous layer’s activation vector. This can be understood using the equivalent quadratic formulation of (5).
  • a layer-based SDP relaxation at step 250 for the verification problem (2) can now be expressed as:
  • the layer-based SDP relaxation (14) employs multiple smaller PSD constraints for each layer. Smaller PSD constraints in an SDP can be considered to speed up its solution using off- the-self solvers.
  • the solution quality (14) is equivalent to that from (9). That is to say, given a non-convex NN verification instance (2), we have that .
  • the result (14) is often referred to as Layer SDP, with L klm,n interchangeably referred to as
  • the efficacy of incomplete NN verification methods depends both on the tightness of the utilized approximations and the computational efficiency of the method.
  • the Layer SDP result (14) can be further adapted for computational efficiency and tightness by adding or removing constraints.
  • variations (ii) and (iii) may also be applied to the global SDP relaxation formulated in (7) in analogy to their application to Layer SDP.
  • further relaxation of Layer SDP may be achieved via dropping equality constraints within the interlayer constraints of result (14).
  • the number of equality constraints (13) is quadratic in the number of neurons in each layer.
  • an SDP relaxation that uses only a subset of the constraints in (13) may be adopted at step 230.
  • step 250 another layer-based SDP relaxation may be formed as follows:
  • the solution quality of (16) may in some cases be less precise than (14) but will be faster to solve and it is still provably better than the LP relaxation (4), i.e.,
  • one or more further linear constraints capturing inter-layer and intra-layer dependencies between two nodes in the same or adjacent layers are added at step 240. These further linear constraints may be applied to global SDP (7) or Layer SDP (14).
  • the further linear constrains may be applied additionally or alternatively to the initial linear constraints based on the linear behaviour of the nodes within the exclusively activated or inactivated regions expressed by (8). Adding these further linear constraints tightens the SDP relaxation. In some embodiments, only a subset of the further linear constraints may be added to the SDP relaxation, thereby reducing computational cost of the method.
  • the further linear constraints are determined from an upper bound and a lower bound for each of two nodes from the network, wherein a first node is from a first layer of the network and a second node is either from the first layer of the network or from a layer of the network adjacent to the first layer of the network.
  • the further linear constraints are expressed as an upper bound and a lower bound for elements of the lifting matrix P.
  • the method aims to bound elements of the matrix Pi for each layer.
  • the constraints in (17) are linear and could be directly added to (14). However, they introduce new inequalities, thereby increasing the computational effort required to solve the verification problem. Therefore, herein efficient strategies for imposing the constraints in (17) are presented.
  • the method uses (i) reformulation-linearization technique (RLT) to construct valid further linear cut constraints that are provably stronger than (17), and (ii) provides a computationally-efficient strategy for integrating the linear cut constraints with the Layer SDP relaxation (14).
  • RLT reformulation-linearization technique
  • An analogous set of constraints may be formulated for lifting matrix P, and the technique applied to global SDP (7).
  • valid further linear cut constraints are constructed using RLT.
  • RLT involves the construction of valid linear cuts on the lifting matrices by using products of the existing linear constraints in (14) on the original variables ⁇ ⁇ @ ⁇ . Under the constraints and (12a) on Layer SDP (14), the variables satisfy: These can be used to construct the constraints: .
  • Layer SDP relaxation (14) also has other existing linear constraints (11a) and (12b), where (12b) was obtained as an initial linear constraint from triangle relaxation constraints (4).
  • (11a) and (12b) can be used to construct the new constraints: Linear cut constraint (20a) is weaker than the existing constraint while (20c) is weaker than the conjunction of existing constraints (11a), (11b) and (12b). Adding the linear cut constraint (20b) can tighten the Layer SDP relaxation, but only if its off-diagonals cut the feasible region, while the diagonals are implied by (11b). Therefore, including (20b) in the Layer SDP relaxation (14) can tighten the SDP relaxation.
  • Figure 4 shows the feasible region of the tipple by adding linear cuts (19b), (19c) and (21), with .
  • adding each linear cut removes a portion of the relaxation region.
  • the Layer RLT-SDP relaxation (22) offers a provably tighter bound than layer SDP relaxation (14), that is Inequality (23) holds even when only a portion of the further linear constraints (19b), (19c) and (21) are added to Layer SDP (14).
  • the semidefinite programming relaxation may be iteratively repeated, wherein at each iteration one or more of the further linear constraints (e.g.
  • Algorithm 0 describes an example of an efficient implementation of the Layer RLT-SDP relaxation.
  • the portion of linear constraints added at each iteration are set by choosing the sequence
  • the sequence and the maximum iteration ⁇ can be adapted to the computational power available. In some implementations, a different sequence ⁇ can be chosen for each individual layer.
  • the sequence is constant across all layers.
  • the matrix stores the ordering (in descending order) of the elements in each row of The ordering ensures that the portion of the linear cut constraints with larger influences on shrinking the feasible region of the SDP relaxation are added first. This is based on the consideration as follows: For neuron m at layer i + 1, its pre-activation is , where is a row vector.
  • the exemplary method here for tightening Layer SDP (14) subject to initial linear constraints (12b) by subjecting Layer SDP to further linear constraints may be analogously applied to global SDP (7), SDP2 (9) or Layer SDP not subject to initial linear constraints (12b).
  • the SDP relaxation is further tightened.
  • one or more non-linear constrains are determined from the algebraic constrains on the output of each layer of the network that tighten the semidefinite programming relaxation.
  • the semidefinite programming relaxation is Layer SDP and a non-linear constraint is determined for each layer of the network from the algebraic constraints for that layer.
  • a tighter semidefinite programming relaxation can verify more non-convex NN verification instances.
  • non-linear constraints require solving a non-convex semidefinite programming relaxation.
  • Such non-convex problems are generally much more computationally expensive than convex semidefinite programming problems, requiring more computational resources and being slower to solve.
  • a method is provided that solves the semidefinite programming relaxation subject to one or more non-linear constraints computationally efficiently.
  • the semidefinite programming relaxation not subject to the non-linear constraints optionally Layer SDP, is solved. If this semidefinite programming relaxation verifies the neural network is robust across the range of inputs no further action is required. Otherwise, at step 620 of the method it is determined that the semidefinite programming relaxation not subject to non-linear constraints does not verify the neural network as robust across the range of inputs.
  • one or more non-linear constraints is determined from the algebraic constraints on the output of each layer the neural network.
  • the semidefinite programming is Layer SDP
  • a non-linear constraint of the same algebraic form is determined for each layer of the neural network from the algebraic constraints for that layer.
  • the subsequent method steps circumvent the non-convexity issue by an iterative process that recursively solves an auxiliary convex SDP problem of around the same size as (14) and iteratively generates an objective value sequence that initializes form and monotonically converges to
  • the method sets the first current objective value of the objective value sequence to the objective value of the semidefinite programming relaxation not subject to non-linear constraints, and constructs the user specified constant vectors of the non- linear constraints.
  • the user specified constant vectors are constructed such as to ensure a solution of the auxiliary convex SDP problem can be used to calculate the current objective value of the objective value sequence at each iteration.
  • the method solves the auxiliary convex semidefinite programming relaxation and determines the current objective value of the objective value sequence at the iteration from the solution of the auxiliary convex semidefinite programming relaxation.
  • the method determines if the outcome of the semidefinite programming relaxation determined by the current objective value of the objective value sequence verifies the neural network is robust across the range of inputs. If the neural network is robust, at step 670, the method outputs that the neural network is robust across the range of inputs.
  • the method determines at step 680, if the objective value of the auxiliary convex semidefinite programming problem is smaller than or equal to a predetermined value.
  • the predetermined value ensures the method determines within a user defined tolerance. If the answer is “Yes” the method determines that the neural network cannot be verified across the range of inputs at step 690. If at step 680 the answer is “No”, the method returns to step 650 completing an iteration of the method. Further detail on steps 640 to 690 in an exemplary embodiment are provided below.
  • the semidefinite programming relaxation not subject to non-linear constraints is Layer SDP according to (14) and the non-linear constraint for each layer of the network is (24e).
  • the non-convex layer SDP relaxation (24) is generally hard to solve, its optimal objective value is bounded below by This lower bound pq can be efficiently solved from the convex layer SDP relaxation (14).
  • the auxiliary convex SDP problem has the form of: where
  • the weight ⁇ is a user-specified positive constant. Its value is set as 1 to penalize more on the SDP relaxations of the firslt L-1 layers. This is useful to obtain a tighter neural network output, as it is influenced by the SDP relaxations of the first L-1 layers.
  • the scalars can be chosen as any non-zero constraints. The choice of is iteratively updated for every repetition of step 650, as will be discussed later.
  • the vectors have fixed values and are constructed, at step 640, from Algorithm 1 by exploiting the activation pattern of the neural network.
  • the iterative loop encapsulated by steps 650 to 690 iteratively updates the value of and generates the objective value sequence that converges to
  • An exemplary iterative algorithm that outputs the current objective value at a final iteration q when a current objective value of the auxiliary convex semidefinite programming problem at an iteration k is smaller than a predefined value ⁇ is Algorithm 2.
  • the iterative algorithm is based on solving the auxiliary convex SDP problem (25) at each iteration with the scalar that is changed with the iterations.
  • the initial value of is set as , where is the optimal objective value of the layer SDP relaxation (14) determined at step 610 (see Line 2).
  • Algorithm 2 For each given the auxiliary SDP problem (25) is solved to obtain the objective value (see Lines 5 and 6). At each iteration, the obtained optimal objective value of problem (25) is used to update the value of (see Line 7). The iteration is terminated when is smaller than a prescribed tolerance (see Line 8). Algorithm 2 outputs the objective value which is used to determine whether the neural network is robust across the range of inputs.
  • the sequence generated by Algorithm 2 has the properties: Therefore, the sequence satisfies and converges to by setting Therefore, can be used to check when the sequence converges, enabling the use of as a stopping criterion in Algorithm 2.
  • the objective value sequence generated by Algorithm 2 correspondingly has the property and monotonically increases to converge to Thus, every current objective value in the objective value sequence is a valid lower bound to and subsequently y ⁇ . Moreover, the calculated objective values at all iterations are at least as good as and converge to the optimal objective value of the non-convex layer SDP relaxation (14). In this sense, the proposed iterative algorithm is an efficient method to solve the non-convex layer SDP relaxation (24), which would otherwise be hard to solve directly.
  • Algorithm 2 is applied in the method of Figure 6, the iteration is executed only when , i.e., the method determines at step 620 that the Layer SDP does not verify the neural network as robust across the range of inputs.
  • FIG. 7 illustrates an example system capable of verifying a neural network.
  • Such a system comprises at least one processor 402, which may receive data from at least one input 404 and provide data to at least one output 406.
  • the processor may be configured to perform the method outlined above. Results The benefits of the approaches described above have been demonstrated experimentally, as illustrated in Figure 8, 9A, 9B and 10.
  • Beta-crown Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification.
  • ICLR19 International Conference on Learning Representations
  • AI 2 Gehr, T.; Mirman, M.; Drachsler-Cohen, D.; Tsankov, P.; Chaudhuri, S.; and Vechev, M. 2018.
  • AI2 Safety and robustness certification of neural networks with abstract interpretation.
  • SP18 IEEE Symposium on Security and Privacy
  • (9) and (14) are referred to as LayerSDP, and its relaxed version (16) as FastSDP.
  • the standard LP relaxation (4) is also illustrated as a benchmark.
  • the formulation (7) is denoted as SDP-IP.
  • the lower and upper bounds were computed using a symbolic interval propagation algorithm.
  • PGD projected gradient descent
  • the convex relaxations are converted into a standard conic optimization before passing them to a numerical solver.
  • An automatic transformation from the convex relaxations into standard conic optimization was implemented.
  • the resulting LP/SDPs were then solved by MOSEK (see ApS Mosek. The mosek optimization toolbox for matlab manual, 2015).
  • the neural networks considered comprised eight fully connected ReLU networks trained on the MNIST dataset. To facilitate the comparison with existing tools, experiments were divided into three groups: 1) One self-trained NN with two hidden layers, each having 64 neurons; no adversarial training was used. The perturbation radius B was varied from 0.01 to 0.05; 2) Three NNs from [Raghunathan et al., 2018]: MLP-SDP, MLP-LP, and MLP-Adv.3) Four deep NNs from G. Singh, T. Gehr, M. Puschel, and M. Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.
  • the MOSEK interior point optimizer for linear programming an implementation of the homogeneous algorithm. In High performance optimization, 197–232. Springer). Results obtained are compared against presently available SoA methods and tools.
  • two groups of two-input, two-output, fully- connected random ReLU NNs generated by using the method in (Fazlyab, M.; Morari, M.; and Pappas, G. J.2020. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming.
  • RLT-SDP hereafter used interchangeably with Layer RLT-SDP
  • Figure 9B illustrates that adding a larger proportion of linear cuts yields a tighter over-approximation, along with an increase in runtime. Adding the same percentage of linear cuts leads to a more significant tightness improvement on larger networks (with larger L) than on smaller ones. For each network, as the percentage of linear cuts increases, the tightness improvement becomes less significant, but the runtime increase becomes more significant. Particularly, experimentally it is found that the first 20% of linear cuts contributes most significantly to the improvement in overall tightness of the method. We evaluated the impact of network width by using the models in Group 2 and observed very similar behaviour of the method. These experiment results clearly confirm y ⁇ and demonstrate the efficiency of Algorithm 0.
  • arXiv preprint arXiv:2009.04131, 2020 hereafter referred to as [Li et al., 2020]), which were trained using CROWN-IBP (Zhang, H.; Chen, H.; Xiao, C.; Gowal, S.; Stanforth, R.; Li, B.; Boning, D.; and Hsieh, C.-J.2019.
  • CROWN-IBP Zhang, H.; Chen, H.; Xiao, C.; Gowal, S.; Stanforth, R.; Li, B.; Boning, D.; and Hsieh, C.-J.2019.
  • they were tested under the perturbations ⁇ 0.1, 0.3, respectively.
  • the optimisation problems were modelled by using the toolbox YALMIP (Lofberg, J. Yalmip: A toolbox for modeling and optimization in matlab. In IEEE International Conference on Robotics and Automation (ICRA04), pp. 284–289. IEEE, 2004) and solved using the SDP solver MOSEK (Andersen, E. D. and Andersen, K. D. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization, pp. 197–232. Springer, 2000). To run Algorithm are chosen.
  • IterSDP is evaluated on several fully-connected ReLU NNs trained on the MNIST dataset (where “m ⁇ n” means a NN with m ⁇ 1 hidden layers each having n neurons): 1) One 3 ⁇ 50 network self-trained with no adversarial training, tested with perturbation radius ⁇ from 0.01 to 0.09. 2) The three small size networks MLP-Adv, MLP-LP, and MLP-SDP are from [Raghunathan et al., 2018] are tested under the same perturbation as in [Raghunathan et al., 2018] and the experiment illustrated in Figure 8.
  • a medium size network 6 ⁇ 100 is from [Singh et al., 2019a] and evaluated under the same as in [Singh et al., 2019a], [Müller et al.2021] and the experiment illustrated in Figure 8. 4)
  • Two large size networks 8 ⁇ 1024-0.1 and 8 ⁇ 1024-0.3 are from (Li, L., Qi, X., Xie, T., and Li, B. Sok: Certified robustness for deep neural networks. arXiv preprint arXiv:2009.04131, 2020; hereafter referred to as [Li et al., 2020]).
  • Figure 10 shows the computational results of the 3 ⁇ 50 network under different perturbation radius ⁇ and by using the methods IterSDP, LP, SDP-IP and LayerSDP.
  • the IterSDP method outperforms the baselines across all the ⁇ values. This confirms the relation Notably, IterSDP improves the verified robustness up to the PGD bounds for several ⁇ IterSDP requires more runtime (about twice) when compared to LayerSDP, but it is still computationally cheaper than SDP-IP. This is expected since Algorithm 2 uses LayerSDP to initialise and solves the auxiliary SDP whose size is similar to the layer SDP relaxation.
  • Table 3 reports the verified robustness (percentage of images that are verified to be robust) and runtime (average solver time for verifying an image) for each method.
  • the PGD upper bounds of MLP-Adv, MLP-LP, MLP-SDP and 6 ⁇ 100 are reiterated from Table 1 for direct comparison, while those of 8 ⁇ 1024-0.1 and 8 ⁇ 1024-0.3 are from [Li et al., 2020].
  • the results show that IterSDP is more precise than LayerSDP under the same bounds and all other baseline methods for all the networks.
  • One exception is the MLP-LP network, for which the methods IterSDP, LayerSDP, SDP-IP and LP all reach the PGD upper bound.
  • IterSDP increases the number of verified instances by 20% for the 6 ⁇ 100 network. For all the other networks, IterSDP obtained the number of verified cases that is close to or same as the PGD upper bound. It is also worth mentioning that IterSDP outperforms the SoA complete methods MILP and AI 2 according to the numbers reported in [Li et al., 2020]: MILP verified 67% (respectively, 7%) for 8 ⁇ 1024-0.1 (respectively, 8 ⁇ 1024-0.3), and AI2 verified 52% (respectively, 16%) for 8 ⁇ 1024-0.1 (respectively, 8 ⁇ 1024-0.3).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

There are provided processes for verifying the performance of neural networks across a range of inputs. The neural network comprises nodes arranged in a plurality of layers, and a disclosed process comprises the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; and based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.

Description

Verifying Neural Networks Field The present disclosure relates to the verification of the consistency of the output of neural networks under variations to the input. In particular, but not exclusively, the present disclosure provides techniques for verifying the reliability of a neural network for the classification of objects in sensor data, such as image data. Background Autonomous systems are forecasted to revolutionise key aspects of modern life including mobility, logistics, and beyond. While considerable progress has been made on the underlying technology, severe concerns remain about the safety and security of the autonomous systems under development. One of the difficulties with forthcoming autonomous systems is that they incorporate complex components that are not programmed by engineers but are synthesised from data via machine learning methods, such as a neural network. Neural networks have been shown to be particularly sensitive to variations in their input. For example, neural networks currently used for image processing have been shown to be vulnerable to adversarial attacks in which the behaviour of a neural network can easily be manipulated by a minor change to its input, for example by presenting an “adversarial patch” to a small portion of the field of view of the image. At the same time, there is an increasing trend to deploy autonomous systems comprising neural networks in safety- critical areas, such as autonomous vehicles. These two aspects taken together call for the development of rigorous methods to systematically verify the conformance of autonomous systems based on learning-enabled components to a defined specification. Often, such a specification can be defined in terms of robustness to one or more transformations at one or more inputs – formally, a network is said to be transformationally robust at a given input under a class of transformations if its output remains within a specified tolerance (e.g. one small enough to not cause a change in predicted class) when the input is subjected to any transformation in the class. For example, safeguards on acceptable behaviour of the ACAS XU unmanned aircraft collision avoidance system have been defined in terms which are equivalent to transformational robustness (in K. Julian, J. Lopez, J. Brush. M. Owen and M. Kochenderfer. Policy compression for aircraft collision avoidance systems. In Proceedings of the 35th Digital Avionics Systems Conference (DASC16), pages 1-10, 2016). In other examples, acceptable behaviour of image classifiers has been specified in terms of continuing to predict the same class when a particular image input is subjected to transformations which remain within a certain Lp-distance, or subjected to a certain class of affine and/or photometric transformations. Transformations may also include, for example: white noise changes to a given input (defined by an epsilon ball for an infinite norm); white noise changes to a given input given by any box constraints on some/all of the input dimensions; or any linear or non-linear transformation of the given input governed by a modification of the input described by a mathematical function or an algorithm. Current methods for NN verification can be categorized into complete and incomplete approaches. Aside from computational considerations, complete approaches are guaranteed to resolve any verification query. Incomplete approaches are normally based on various forms of convex approximations of the network and only guarantee that whenever they output that the network is safe, then that is indeed the case. While this typically enables faster computation, the looser this approximation is, the more likely it is that the method may not be able to verify the problem instance. As a result, the present objective in incomplete methods is the development of tighter approximations, which can be efficiently computed, thereby strengthening the efficacy of the methods in answering the verification problem. Proposed complete methods include those based on mixed-integer linear programming (MILP), satisfiability modulo theories or bound propagation techniques coupled with input refinement. While these methods offer theoretical termination guarantees, at present they do not scale to the network sizes that incomplete approaches are able to address. Incomplete methods are typically based on bound propagation, duality, and semidefinite program (SDP) relaxations. A common theme in this research is the linear program (LP) relaxation for the univariate ReLU function. A foundational relaxation is the triangle relaxation from R. Ehlers, Formal verification of piece-wise linear feed- forward neural networks, In ATVA17, volume 10482 of Lecture Notes in Computer Science, pages 269– 286, Springer, 2017 (referred to hereinafter as “Ehlers et al., 2017”). This gives a tight convex relaxation of the univariate ReLU function and forms the basis of many of the cited methods. It has been recently shown that the efficacy of these methods is intrinsically limited by the same convex relaxation barrier which is characterised by the tightness of the triangular relaxation. Another way to bypass the barrier is to seek alternative stronger relaxations beyond LPs, such as SDPs. It has been empirically observed that the SDP relaxation in is much tighter than LP relaxations. However, SDPs are computationally harder solve. An example of the SDP approach can be found in A. Raghunathan, J. Steinhardt, and P. Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In NeurIPS18, pages 10877–10887, 2018 (referred to hereinafter as “Raghunathan et al., 2018). There is therefore an ongoing need to improve provide computationally efficient solutions to the verification problem while at the same time maximising the efficacy of these methods. Summary According to a first aspect of the present disclosure, there is provided a method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. By determining semidefinite constraints for each layer rather than a semidefinite constraint across the entire network, the dimensionality of the constraint may be significantly reduced, thereby reducing the computing resources required to apply the semidefinite programming relaxation. The interlayer constraints may help to provide that the interaction between layer outputs and inputs are properly modelled at the same time. Optionally, the set of interlayer constraints constrain all outputs of one or more of the layers to corresponding inputs of one or more adjacent layers. Alternatively, the set of interlayer constraints constrain a subset of outputs of one or more of the layers to corresponding inputs of one or more adjacent layers. Optionally, the method further comprises determining one or more initial linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network, wherein the applying a semidefinite programming relaxation is further subject to the one or more initial linear constraints. Optionally, the method further comprises determining, for each layer of the network, one or more further linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more further linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, and wherein the applying a semidefinite programming relaxation is further subject to the one or more further linear constraints. Optionally, determining, for each layer of the network, one or more further linear constraints expressed as a upper bound and a lower bound for elements of the matrix representation comprises calculating the upper and lower bounds given the range of inputs of the neural network and the one or more initial linear constraints. Subjecting the semidefinite programming relaxation to the initial linear constraints, can ensure that the semidefinite programming relaxation is tighter than linear programming relaxation. Moreover, the further linear constraints can tighten the semidefinite programming relaxation compared to semidefinite programming relaxation without the further linear constraints. Optionally, the semidefinite programming relaxation may be subjected to a portion of the further linear constraints to reduce computational cost. Optionally, the semidefinite programming relaxation may be iteratively repeated, wherein at each iteration one or more of the further linear constraints are added to the portion of the further linear constraints the semidefinite programming relaxation is subjected to. Optionally, the method further comprises determining, for each layer of the network, a non-linear constraint from the algebraic constraints for that layer, wherein the applying a semidefinite programming relaxation is further subject to the non-linear constraint for each layer of the network. Optionally, an objective value of the semidefinite programming relaxation determines the outcome of the semidefinite programming relaxation; and the objective value of the semidefinite programming relaxation is monotonically approached by an objective value sequence that converges to the objective value of the semidefinite programming relaxation, wherein a starting point of the objective value sequence is an objective value of the semidefinite programming relaxation not subject to the non-linear constraint for each layer of the network; and the objective value sequence is determined iteratively by solving an auxiliary convex semidefinite programming problem recursively, wherein a current objective value of the objective value sequence determined at an iteration is sequential to the objective values of the objective value sequence determined in prior iterations, wherein a current objective value of the auxiliary convex semidefinite programming problem is an objective value of the auxiliary convex semidefinite programming problem at the iteration. Optionally, the objective value of the auxiliary convex semidefinite programming problem is always greater than or equal to zero; and the objective value of the auxiliary convex semidefinite programming problem is equal to zero when the non-linear constraint for each layer of the network is satisfied. Subjecting the semidefinite programming relaxation to the non-linear constraint for each layer provably can tighten the semidefinite programming relaxation compared to the semidefinite programming relaxation not subject to the non-linear constraint for each layer. Moreover, each objective value in the objective value sequence may be a tighter solution than the prior objective values in the sequence. The tightest solution of the sequence may be reached when the objective value sequence has converged to the objective value of the semidefinite programming relaxation subject to the non-linear constraints. Optionally, the based on an outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs comprises: determining at each iteration, based on the current objective value of the objective value sequence, whether the neural network is robust across the range of inputs, if the neural network is robust across the range of inputs, providing as the outcome of the semidefinite programming relaxation that the neural network is robust across the range of inputs, if the neural network is unverified across the range of inputs, determining whether the current objective value of the auxiliary convex semidefinite programming problem is smaller than a predefined value, if the current objective value of the auxiliary convex semidefinite programming problem smaller than a predefined value, providing as the output of the semidefinite programming relaxation that the neural network is not verifiable across the range of inputs. The determining at each iteration, based on the current objective value of the objective value sequence, whether the neural network is robust across the range of inputs can allow the iteration loop to be terminated when the neural network is verified. This can save computational power by reducing the number semidefinite programming instances that are solved for each verification instance. Optionally, the method further comprises removing terms associated with nodes which are inactive across the range of inputs from the semidefinite constraints. Optionally, the semidefinite constraints comprise positive semidefinite constraints. In some preferred examples, the neural network is a feed forward neural network. Optionally, the nodes of the neural network may apply a Rectified Linear Unit (ReLU) activation function. In some example implementations, the neural network may be an image processing network which takes an image as input. For example, the neural network may be trained for an image classification, object detection, image reconstruction, or other image processing task. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for performing the image processing task, such as the image classification, object detection or image reconstruction task. In particular, if the neural network is determined to be transformationally robust, the network may perform the image processing task on an image. In such circumstances, it may be possible to provide guarantees on the appropriateness of the network to perform the image processing task correctly. In other example implementations, the neural network may be an audio processing network which takes a representation of an audio signal as input. For example, the neural network may be trained for a voice authentication, speech recognition, audio reconstruction, or other audio processing task. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for performing the audio processing task, such as the voice authentication, speech recognition or audio reconstruction task. In particular, if the neural network is determined to be transformationally robust, the network may perform the audio processing task. In such circumstances, it may be possible to provide guarantees on the appropriateness of the network to perform the audio processing task correctly. While the above example implementations refer to image processing or audio processing, the skilled person will recognised that the claimed approach may apply to other inputs; for example, the input to the neural network may be sensor data such as image data, audio data, LiDAR data, or other data. In general, the claimed process may act to improve the ability or reliability of a network in classifying data of this kind. In other example implementations, the neural network may be part of an AI system to evaluate the credit worthiness or other risk or financial metrics and takes as input the relevant tabular information used to assess a financial decision. For example, the neural network may be trained for credit scoring of applicants for loan purposes. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for the decision making task in question. In particular, if the neural network is determined to be transformationally robust, guarantees may be given to the relevant regulators on the appropriateness of the network to perform the audio processing task correctly. In yet other example implementations, the neural network may be a controller neural network which outputs a control signal for a physical device, such as an actuator. For example, the neural network may be trained for controlling a robot, vehicle, aircraft or plant. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for controlling the physical device, such as the actuator, robot, vehicle, aircraft or plant. In particular, if the neural network is determined to be transformationally robust, the network may control the physical device. Other applications of the method above are in fraud monitoring, medical imaging, optical character recognition and generally whenever guarantees of transformational robustness aid in determining the robustness of the neural model. According to a further aspect, there may be provided a computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of the first aspect. There may also be provided a system comprising one or more processors configured to carry out the method of the first aspect. According to a first still further aspect of the present disclosure, there may be provided method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining one or more linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network; applying a semidefinite programming relaxation subject to the semidefinite constraints and the linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. According to a second still further aspect of the present disclosure, there may be provided method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining, for each layer of the network, one or more linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, applying a semidefinite programming relaxation subject to the semidefinite constraints and the linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. According to a third still further aspect of the present disclosure, there may be provided method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining, one or more non-linear constraint from the algebraic constraints for each layer of the network, wherein the applying a semidefinite programming relaxation is further subject to one or more non-linear constraints; applying a semidefinite programming relaxation subject to the semidefinite constraints and the non-linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. The skilled person will recognise that optional feature of the first aspect may also apply to any of the still further aspects. Moreover, there may be provided a computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of the still further aspect. There may also be provided a system comprising one or more processors configured to carry out the method of the still further aspect. Brief Description of the figures Examples of the present disclosure will be presented with reference to the accompanying drawings in which: Figure 1 illustrates a set of transformations of an input; Figure 2 shows a method according to the present disclosure; Figure 3 illustrates the relative tightness of SDP and LP relaxations; Figure 4 illustrates the relative tightness of Layer SDP and a subset of RLT-SDP linear constraints; Figure 5 illustrates the relative tightness of Layer SDP and a subset of RLT-SDP linear constraints; Figure 6 shows a method according to the present disclosure; Figure 7 illustrates an example system capable of verifying a neural network; Figure 8 shows experimental results; Figures 9A and 9B show experimental results; Figure 10 shows experimental results. Detailed Description The present disclosure is directed to the verification of a neural network and particularly to verifying consistency of neural network output across a range of potential inputs. In other words, verification may offer a guarantee that a neural network’s outputs remain within a certain tolerance when a starting input to the neural network input is varied across a range. For example, consider a baseline input subject to a class of transformations. Transformations may include, for example: white noise changes to a given input (defined by an epsilon ball for an infinite norm); white noise changes to a given input given by any box constraints on some/all of the input dimensions; or any linear or non-linear transformation of the given input governed by a modification of the input described by a mathematical function or an algorithm. The class of transformations may define the perturbations of the input for which the neural network output is to satisfy the output constraints. In some embodiments, the class of transformations may be defined in terms of a range for each component of the neural network’s input, within which the component is to vary. In other embodiments, the class of transformations may be defined by a bound on a global metric, such as by defining a maximum value for the l1-distance between the original input and the perturbed input. In yet other embodiments, the class of transformations may be specifically adapted to the task for which the network is trained: for example, for a network trained for image recognition, a class of affine or photometric transformations can be defined, for example in the manner described in WO 2020/109774 A1. In general, the class of transformations may be specified in terms of a set of algebraic constraints that are satisfied when applying any transformation in the class to the input. Typically, the input and class of transformations may be chosen such that the input sufficiently unambiguously belongs to a particular class and the class of transformations define small enough perturbations that the neural network may be expected not to substantially change its output when the transformations are applied to the input. A visually illustrative example of this is provided in Figure 1, which depicts example affine (102-104), photometric (105-106) and random noise (110) transformations applied to an original image (101). As can be seen, the transformations may be chosen such that the semantic content of the image is unchanged. The set of output constraints define a maximum range within which the outputs of the neural network should vary if the transformational robustness property is to be satisfied. In general, any set of algebraic constraints that defines a region within which the neural network’s output should remain can be used as the set of output constraints. For example, the set of output constraints may be defined in terms of linear inequalities of the form
Figure imgf000013_0001
where
Figure imgf000013_0002
is the output of the network, a is a vector of coefficients, and b is a constant. In some embodiments, the set of output constraints can be defined using the neural network itself; for example, if the network provides for a classification stage, the set of output constraints may correspond to ensuring that the output remains in the same predicted class. To understand the notation adopted in the following description, consider feed-forward ReLU neural networks (NNs). We consider an L-layer feed-forward NN to be represented by
Figure imgf000014_0001
to denote the pre- activation and activation vectors of the i-th layer, and define the NN output as f(x0) :=
Figure imgf000014_0002
Figure imgf000014_0003
are the weights and biases, respectively, n0 = d, nL+1 = m are input and output dimensions, and the ReLU function is defined as ReLU (z) = max(z, 0) for z e IK (the ReLU function is applied element-wise). We focus on classification networks whereby an input x0 is assigned to the class associated with the network output with the highest value:
Figure imgf000014_0004
In this context, one can define the verification problem as follows: given a
Figure imgf000014_0005
a nominal input
Figure imgf000014_0006
a linear function ∅, also called the specification, on the network’s outputs, and a perturbation radius
Figure imgf000014_0014
the verification problem is to determine whether
Figure imgf000014_0007
where denotes the standard
Figure imgf000014_0011
norm of a vector. In particular, we hereafter focus on the local adversarial robustness problem whereby the specification is
Figure imgf000014_0012
for a target label i. A network is said to be certifiably robust on input x and perturbation radius ∈ if the answer to the verification problem (1) is true for all
Figure imgf000014_0013
This problem can be answered by solving the optimisation problem
Figure imgf000014_0008
where
Figure imgf000014_0009
and [L] denotes
Figure imgf000014_0010
The verification problem is true if the optimal value y* of (2) is positive. The optimisation problem is however non-convex because of (2a) and is therefore generally difficult to solve. To obtain a tractable convex relaxation of the problem, we derive an outer- approximation of the feasible region (x0,x1,...,xL) in (2) using a convex set D. This relaxes (2) to a convex problem
Figure imgf000015_0001
which provides a valid lower bound
Figure imgf000015_0006
, then the answer to the verification problem (1) is true. If however
Figure imgf000015_0007
, then the verification problem cannot be decided. In order to demonstrate the benefits of the approach proposed by the present disclosure, it is useful to consider existing approaches: the triangle relaxation described in Ehlers et al., 2017; and the semidefinite relaxation described in Raghunathan et al., 2018. The triangle relaxation approximates a single univariate ReLU function z = max{x,0} with its convex hull. Specifically, the ReLU constraints (2a) are approximated by a set of linear constraints
Figure imgf000015_0002
where
Figure imgf000015_0003
denotes the Hadamard product, and ki :=
Figure imgf000015_0004
are upper and lower bounds of the pre-activation variable for any input satisfying (2b);
Figure imgf000015_0005
These bounds can be computed using interval propagation methods. The optimal value γLP of the resulting linear program (LP) relaxation is relatively easy to compute in practice. However, the quality of the LP relaxation (4) is intrinsically limited, i.e., there is always a positive gap
Figure imgf000015_0008
for many practical NNs, referred to as the convex relaxation barrier. The semidefinite relaxation utilizes a single positive semidefinite (PSD) constraint that couples all ReLU constraints in (2a) to obtain a convex SDP. In this approach, the ReLU constraints are equivalently replaced (2a) with the following quadratic constraints
Figure imgf000016_0001
Further, the input constraint (2b) as well as the lower and upper bounds
Figure imgf000016_0006
, on the activation vectors
Figure imgf000016_0003
(which can be obtained using interval prorogation methods) can be reformulated as quadratic constraints
Figure imgf000016_0002
where i = 0 corresponds to the
Figure imgf000016_0007
constraint (2b). Polynomial lifting and SDP-based hierarchies can be used to solve the resulting polynomial optimisation problem. Specifically a lifting matrix P of monomials
Figure imgf000016_0005
can be defined as in Raghunathan et al., 2018. Then, all the constraints in (5) and (6) become linear in terms of the elements of P. By relaxing the monomial matrix P to be
Figure imgf000016_0004
we obtain an SDP relaxation of (2) as follows
Figure imgf000016_0008
where the same symbolic indexing P[·] as Raghunathan et al., 2018 is adopted to index the elements of P. In this case (7a) and (7b) correspond to the RELU constraints (5), and (7c) corresponds to the bounds on activation vectors in (6). We denote the optimal value of (7) as γSDP,1. We always have y ≥ γSDP,1, where the equality is achieved if the optimal solution P to (7) is of rank one. Referring to Figure 2, a method is provided for verifying a neural network which adopts aspects of the above-referenced LP and SDP approaches, but is further improved by additional adaptations. At step 210 of the method, data is obtained defining a neural network, range of inputs and set of output constraints to verify. The neural network may, for example, be an image classifier network. Such a network may be shown to classify an image appropriately for a given image. The range of inputs may represent a region around that input for which it is desired that the output remains within the output constraints. For example, the range of inputs may comprise one or more of: white noise variations of an input; geometrical changes of an input; and colour, luminosity, contrast, and/or bias-field transformations of an input. At step 210, semidefinite constraints, optionally positive semidefinite constraints, are adopted. However, unlike the SDP process described above, these semidefinite constraints are defined for each layer of the network rather than for the network as a whole. Consequently, significant computational benefits are realised when resolving these constraints. Further details of the definition of the semidefinite constraints are provided below. In order that appropriate conditionality is retained between the layers of the network, at step 230 one or more interlayer constraints are defined. These interlayer constraints couple outputs of network layers to corresponding inputs. Where a neural network comprises N layers, outputs of each layer n (for n in the range 1 to N -1) are coupled to the inputs of subsequent layer n + 1 using the interlayer constraints. At step 240, linear cut constraints are defined. Many different types of linear cut constraints may be provided. For example, a linear constraint may provide further constraints to the approximation of the neural network based on the linear behaviour of the nodes within the exclusively activated or inactivated regions. Whereas conventional semidefinite constraints in these regions are approximate, by applying a linear constraint in such regions the overall tightness of the approximation can be improved. In another example, a linear constraint may capture inter-layer and intra-layer dependencies between two nodes in the same or adjacent layers. Applying such linear constraints capturing dependencies between nodes increases the tightness of the semidefinite programming relaxation, as conventional methods do not capture these dependencies. At step 250, SDP relaxations are applied to solve for the constraints defined in steps 220 to 240, thereby obtaining a minimum value of γ as described above. Where γ obtained in this manner is equal to or greater than 0 then the network can be verified across the range of inputs at step 260. Where γ is less than 0 it is not possible to verify the network (although it is possible that the network is itself robust across the range). A verified neural network may be deployed with a degree of certainty for tasks dependent on accurate perception. For example, where an image classification neural network is used to control a device (such as an autonomous vehicle), confidence that its outputs are not adversely affected by transformations such as those reflected in the range of inputs may be important for demonstrating the safety and/or efficacy of the device. Further details of the definition of the constraints at steps 220 to 240 are provided below. For example, with respect to step 240, the adoption of linear cuts providing further constraints to the approximation of the neural network based on the linear behaviour of the nodes within the exclusively activated or inactivated regions may further be understood with reference to Figure 3, which illustrates how in certain cases the SDP relaxation in equation (7) (illustrated by the dashed line) may be looser than the LP relaxation in equation (4) (illustrated by the solid line). In particular, Figure 3 shows LP and ADP outer approximations LP and SDP-based outer approximations of
Figure imgf000018_0002
{ ,
Figure imgf000018_0001
From left to right Figure 3 shows: 1) unstable neuron l = −4, u = 1; 2) inactive neuron l = −4, u = 0; 3) strictly active neuron l = 0, u = 1. The standard SDP relaxation (7) is inexact even for inactive/stable neurons, while the triangular relaxation becomes exact. To address this, linear cuts based on a linear approximation of an activation function for one or more nodes of the neural network may be introduced into the process as further set of initial linear constraints at step 240. In the context of the SDP relaxation (7), this process comprises extending the relaxation to include the linear cut (4b) thereby tightening the relaxation. We express the cut (4b) in terms of the matrix P as follows
Figure imgf000019_0001
and add it to (7). This leads to the following SDP relaxation for the verification problem (2):
Figure imgf000019_0002
Due to the linear cuts (8), the new SDP relaxation (9) is tighter than both the original SDP relaxation (7) and the standard triangle LP relaxation (4). Further benefits may arise from this approach, since given the activation pattern (i.e. which neurons are in a stable state of activation across the range of inputs) once the linear cuts are applied then the activation pattern can be used to reduce the dimensionality of the PSD constraint. Particularly, given lower and upper bounds on the pre-activation vector
Figure imgf000019_0003
, it is known that the constraints (4) for stable neurons of the (i+1)-th layer become exact and can be simplified: 1) if the kth neuron is strictly active, i.e.,
Figure imgf000019_0004
or 2) if the neuron is inactive, i.e.,
Figure imgf000019_0005
The information regarding inactive neurons can also be removed in (9) since P[xi+1](k) becomes zero thanks to the linear cuts (8). This effectively reduces the dimension of the PSD constraint
Figure imgf000019_0006
without altering the optimal value. In many practical cases, a significant portion of the neurons are stable under a given verification query, especially when small perturbation radiuses B are considered. Thus, adding the linear cuts (8) not only makes the SDP relaxation (9) theoretically stronger but also computationally easier. In an alternative approach, some of these advantages may be provided by first pruning the inactive neurons to form a new NN and then apply the SDP (7) to this newly pruned NN. Steps 220 and 230 are also effective to reduce the dimensionality of the PSD constraint in (9). These steps exploit the layer-wise cascading structure of NNs whereby each activation vector of a layer depends only on the previous layer’s activation vector. This can be understood using the equivalent quadratic formulation of (5). Instead of using a single big matrix P as in (7), we introduce, at step 220, multiple matrices of monomials Pi for each i ∈ [L]:
Figure imgf000020_0001
Then, the constraints (5a)-(5b) become linear in Pi:
Figure imgf000020_0002
Also, (7c) and (8) (thus reflecting the linear cuts of step 240) can be written with respect to Pi as
Figure imgf000020_0003
Upon relaxing the monomial matrices, we need to consider the input-output consistency among the Pi’s. Accordingly, interlayer constraints are introduced at step 230, i.e.,
Figure imgf000020_0004
where
Figure imgf000021_0002
Accordingly, a layer-based SDP relaxation at step 250 for the verification problem (2) can now be expressed as:
Figure imgf000021_0001
Instead of one single big PSD constraint of network size in (9), the layer-based SDP relaxation (14) employs multiple smaller PSD constraints for each layer. Smaller PSD constraints in an SDP can be considered to speed up its solution using off- the-self solvers. Moreover, the solution quality (14) is equivalent to that from (9). That is to say, given a non-convex NN verification instance (2), we have that
Figure imgf000021_0003
. In the following, the result (14) is often referred to as Layer SDP, with Lklm,n interchangeably referred to as
Figure imgf000021_0004
The efficacy of incomplete NN verification methods depends both on the tightness of the utilized approximations and the computational efficiency of the method. The Layer SDP result (14) can be further adapted for computational efficiency and tightness by adding or removing constraints. In the following, we describe three exemplary variations on Layer SDP: (i) a relaxation of the method via dropping equality constraints within the interlayer constraints, (ii) a tightening of the approximation via adding further linear cut constraints at step 240 and (iii) a tightening of the approximation via adding a non-linear constraint for each layer of the network as further described in the context of Figure 6. While these variations are described as alternatives, it is understood that each of these variations can be applied either additionally or alternatively to each other. Further, the variations (ii) and (iii) may also be applied to the global SDP relaxation formulated in (7) in analogy to their application to Layer SDP. In one exemplary variation, further relaxation of Layer SDP may be achieved via dropping equality constraints within the interlayer constraints of result (14). The number of equality constraints (13) is quadratic in the number of neurons in each layer. However, an SDP relaxation that uses only a subset of the constraints in (13) may be adopted at step 230. In particular, if at step 230 the interlayer constraints are constructed using a linear number of consistency constraints as
Figure imgf000022_0001
Then at step 250 another layer-based SDP relaxation may be formed as follows:
Figure imgf000022_0002
The solution quality of (16) may in some cases be less precise than (14) but will be faster to solve and it is still provably better than the LP relaxation (4), i.e.,
Figure imgf000022_0003
In second exemplary variation on semidefinite programming relaxation, one or more further linear constraints capturing inter-layer and intra-layer dependencies between two nodes in the same or adjacent layers are added at step 240. These further linear constraints may be applied to global SDP (7) or Layer SDP (14). Moreover, the further linear constrains may be applied additionally or alternatively to the initial linear constraints based on the linear behaviour of the nodes within the exclusively activated or inactivated regions expressed by (8). Adding these further linear constraints tightens the SDP relaxation. In some embodiments, only a subset of the further linear constraints may be added to the SDP relaxation, thereby reducing computational cost of the method. The further linear constraints are determined from an upper bound and a lower bound for each of two nodes from the network, wherein a first node is from a first layer of the network and a second node is either from the first layer of the network or from a layer of the network adjacent to the first layer of the network. Subsequently, the further linear constraints are expressed as an upper bound and a lower bound for elements of the lifting matrix P. In the context of Layer SDP (14), the method aims to bound elements of the matrix Pi for each layer. First we denote a few terms for each layer of a neural network:
Figure imgf000023_0001
and . Since
Figure imgf000023_0002
we have
Figure imgf000023_0003
These non-linear constraints can be reformulated as linear constraints on the elements of Pi:
Figure imgf000023_0004
The method aims to bound
Figure imgf000023_0005
within the region given in (17). The constraints in (17) are linear and could be directly added to (14). However, they introduce
Figure imgf000023_0006
new inequalities, thereby increasing the computational effort required to solve the verification problem. Therefore, herein efficient strategies for imposing the constraints in (17) are presented. The method uses (i) reformulation-linearization technique (RLT) to construct valid further linear cut constraints that are provably stronger than (17), and (ii) provides a computationally-efficient strategy for integrating the linear cut constraints with the Layer SDP relaxation (14). An analogous set of constraints may be formulated for lifting matrix P, and the technique applied to global SDP (7). In an embodiment implementing Layer SDP (14), valid further linear cut constraints are constructed using RLT. RLT involves the construction of valid linear cuts on the lifting matrices
Figure imgf000023_0007
by using products of the existing linear constraints in (14) on the original variables {^}^@^ . Under the constraints
Figure imgf000023_0008
and (12a) on Layer SDP (14), the variables
Figure imgf000023_0009
satisfy:
Figure imgf000023_0010
Figure imgf000023_0011
These can be used to construct the constraints:
Figure imgf000023_0012
Figure imgf000023_0013
. By using (10), these non-linear constraints are linearized as
Figure imgf000024_0001
The linear cut constraints (18a) – (18d) are stronger than (17). The existing constraints (12a) and
Figure imgf000024_0004
are stronger than the first part of (18a); while (12a) is stronger than the diagonal components of the second part of (18a). Therefore, the targeted bounding (17) can be realized by adding to the Layer SDP relaxation (14) the following linear cut constraints for each
Figure imgf000024_0005
(where in general
Figure imgf000024_0006
denotes a sequence of nonzero integers form 0 to b):
Figure imgf000024_0002
where the diagonal components of (19a) are redundant. The above shows that adding the linear cut constraints in (19) to the Layer SDP relaxation (14) is efficient to bound
Figure imgf000024_0007
and subsequently the matrix Pi. Layer SDP relaxation (14) also has other existing linear constraints (11a) and (12b), where (12b) was obtained as an initial linear constraint from triangle relaxation constraints (4). In some embodiments, (11a) and (12b) can be used to construct the new constraints:
Figure imgf000024_0003
Linear cut constraint (20a) is weaker than the existing constraint
Figure imgf000024_0008
while (20c) is weaker than the conjunction of existing constraints (11a), (11b) and (12b). Adding the linear cut constraint (20b) can tighten the Layer SDP relaxation, but only if its off-diagonals cut the feasible region, while the diagonals are implied by (11b). Therefore, including (20b) in the Layer SDP relaxation (14) can tighten the SDP relaxation. By defining and recalling that
Figure imgf000025_0001
Figure imgf000025_0002
under (13), the constraints (19a) and (20b) are merged as a linear cut constraint for each
Figure imgf000025_0003
:
Figure imgf000025_0004
When
Figure imgf000025_0005
is also needed. In integrating the linear cut constraints (19b), (19c) and (21) into (14) yields Layer RLT- SDP relaxation:
Figure imgf000025_0006
subject to:
Figure imgf000025_0007
(19b), (19c), (21). In embodiments where the initial linear constraints (12b) are not applied, Layer RLT- SDP may be formulated using further linear cut constraints (19a) in place of (21). Moreover, analogous constraints may be constructed for global SDP relaxation (7). Considering now an exemplary embodiment according to RLT-SDP relaxation (22), simple numerical examples in Figure 4 show that adding each of linear cuts (19b), (19c) and (21) shrinks the relaxation region of
Figure imgf000025_0008
and thus tightens the Layer SDP relaxation. In particular, Figure 4 shows the feasible region of the tipple
Figure imgf000025_0009
by adding linear cuts (19b), (19c) and (21), with
Figure imgf000025_0010
Figure imgf000025_0011
. Left to right columns: 1) inactive neuron
Figure imgf000025_0012
2) unstable neuron
Figure imgf000025_0013
3) strictly active neuron
Figure imgf000025_0014
For all cases, adding each linear cut removes a portion of the relaxation region. In fact, the Layer RLT-SDP relaxation (22) offers a provably tighter bound than layer SDP relaxation (14), that is
Figure imgf000026_0001
Inequality (23) holds even when only a portion of the further linear constraints (19b), (19c) and (21) are added to Layer SDP (14). In a computationally efficient implementation of RLT-SDP relaxation, the semidefinite programming relaxation may be iteratively repeated, wherein at each iteration one or more of the further linear constraints (e.g. (19b), (19a) and (21) or (19a)) are added to the portion of the further linear constraints the semidefinite programming relaxation is subjected to. In an Layer RLT-SDP relaxation according to (22), the number of linear inequalities introduced by (19b), (19c) and (21) for each
Figure imgf000026_0002
and
Figure imgf000026_0004
(by removing diagonals), respectively. For
Figure imgf000026_0003
, extra linear inequalities are needed. The total number of inequalities for each
Figure imgf000026_0005
is
Figure imgf000026_0006
, and for
Figure imgf000026_0007
Compared to directly imposing constraints (17) (which introduces
Figure imgf000026_0010
inequalities), adding (19b), (19c) and (21) has a lower computational burden, especially for large neural networks. To further increase computational efficiency of adding (19b), (19c) and (21) a strategy is deployed based on two observations: • The linear cut constraints (19b) and (19c) capture inter-layer dependencies (i.e. terms
Figure imgf000026_0008
Since
Figure imgf000026_0009
, the dependencies are also reflected in the weighting matrix ^^. Hence, the structure of wi can be exploited to efficiently adding (19b) and (19c). • The linear cut constraints (21) captures the intra-layer interactions (i.e. terms which cannot be clearly indicated by neural network parameters (weights or biases). Therefore, for an efficient implementation of Layer RLT-SDP a portion of the linear cut constraints (19b) and (19c) is used.
Figure imgf000027_0026
Algorithm 0 describes an example of an efficient implementation of the Layer RLT-SDP relaxation. The portion of linear constraints added at each iteration are set by choosing the sequence
Figure imgf000027_0001
The sequence
Figure imgf000027_0002
and the maximum iteration
Figure imgf000027_0003
{ can be adapted to the computational power available. In some implementations, a different sequence
Figure imgf000027_0004
^ can be chosen for each individual layer. In Algorithm 0 the sequence
Figure imgf000027_0005
is constant across all layers. The matrix
Figure imgf000027_0006
stores the ordering (in descending order) of the elements in each row of
Figure imgf000027_0025
The ordering ensures that the portion of the linear cut constraints with larger influences on shrinking the feasible region of the SDP relaxation are added first. This is based on the consideration as follows: For neuron m at layer i + 1, its pre-activation is
Figure imgf000027_0007
, where is a row vector. Let
Figure imgf000027_0008
and
Figure imgf000027_0009
be any two elements of
Figure imgf000027_0012
and their corresponding inputs are
Figure imgf000027_0010
and
Figure imgf000027_0011
respectively. If
Figure imgf000027_0013
, when compared to those linear cuts about
Figure imgf000027_0014
the linear cuts about
Figure imgf000027_0015
has a bigger influence on the feasible region of
Figure imgf000027_0016
Figure 5 provides an example for this, where it is seen that the linear cut constraints about
Figure imgf000027_0017
contribute more than
Figure imgf000027_0018
, in shrinking the feasible region of
Figure imgf000027_0019
. The feasible region of
Figure imgf000027_0020
, , and
Figure imgf000027_0021
is shown where a part or all of linear cuts (19b) and (19c) are added. Adding only the linear cuts about
Figure imgf000027_0022
(i.e. the larger element of
Figure imgf000027_0024
yields a feasible region close to one with full constraints on
Figure imgf000027_0023
Algorithm 0 has the following property: The relation holds
Figure imgf000028_0001
under any choice of
Figure imgf000028_0002
At any given iteration k of Algorithm 0, we have that
Figure imgf000028_0003
At each iteration, the layer RLT-SDP relaxation (22) is solved with a total number of
Figure imgf000028_0004
linear constraints. This is computationally lighter than the problem obtained by adding all the inequalities in (19b) and (19c). Furthermore, before running the algorithm, we can also remove the inactive neurons and simplify the constraints of stable neurons to reduce the sizes of the constraints
Figure imgf000028_0005
This can be realised by examining the activation pattern of the neural network under a given verification query and will not relax the solution. The exemplary method here for tightening Layer SDP (14) subject to initial linear constraints (12b) by subjecting Layer SDP to further linear constraints, may be analogously applied to global SDP (7), SDP2 (9) or Layer SDP not subject to initial linear constraints (12b). In any of these variations, the SDP relaxation is further tightened. In another exemplary variation on semidefinite programming relaxation, one or more non-linear constrains are determined from the algebraic constrains on the output of each layer of the network that tighten the semidefinite programming relaxation. In a preferred embodiment, the semidefinite programming relaxation is Layer SDP and a non-linear constraint is determined for each layer of the network from the algebraic constraints for that layer. A tighter semidefinite programming relaxation can verify more non-convex NN verification instances. Generally, non-linear constraints require solving a non-convex semidefinite programming relaxation. Such non-convex problems are generally much more computationally expensive than convex semidefinite programming problems, requiring more computational resources and being slower to solve. Referring to Figure 6, a method is provided that solves the semidefinite programming relaxation subject to one or more non-linear constraints computationally efficiently. At step 610 the semidefinite programming relaxation not subject to the non-linear constraints, optionally Layer SDP, is solved. If this semidefinite programming relaxation verifies the neural network is robust across the range of inputs no further action is required. Otherwise, at step 620 of the method it is determined that the semidefinite programming relaxation not subject to non-linear constraints does not verify the neural network as robust across the range of inputs. At step 630 of the method, one or more non-linear constraints is determined from the algebraic constraints on the output of each layer the neural network. If the semidefinite programming is Layer SDP, a non-linear constraint of the same algebraic form is determined for each layer of the neural network from the algebraic constraints for that layer. A promising way to reduce the relaxation gap of SDPs, i.e. tighten SDPs, is to introduce constraints to enforce the rank condition rank(P)=1 implied by (7) achieving
Figure imgf000029_0001
γ when this condition is fulfilled. This condition carries through to (9) and can be reformulated in (14) as rank (Pi)=1,
Figure imgf000029_0002
. Approaches to tighten SDP relaxation (7) using this condition have introduced the non-convex cuts
Figure imgf000029_0003
, where
Figure imgf000029_0005
and
Figure imgf000029_0004
Figure imgf000029_0006
In an exemplary embodiment, we introduce a non-linear constraint (24e) to each Pi in (14) and obtain non-convex layer SDP relaxation:
Figure imgf000029_0007
where
Figure imgf000029_0009
, , are user-specified constant vectors. For a given verification instance, the Layer SDP relaxation (24) fulfils
Figure imgf000029_0008
Due to non-linear constraint (24e) Layer SPD relaxation (24) is non-convex and is harder to solve than the original Layer SDP relaxation (14). The subsequent method steps circumvent the non-convexity issue by an iterative process that recursively solves an auxiliary convex SDP problem of around the same size as (14) and iteratively generates an objective value sequence that initializes form
Figure imgf000030_0001
and monotonically converges to
Figure imgf000030_0002
At step 640, the method sets the first current objective value of the objective value sequence to the objective value of the semidefinite programming relaxation not subject to non-linear constraints, and constructs the user specified constant vectors of the non- linear constraints. The user specified constant vectors are constructed such as to ensure a solution of the auxiliary convex SDP problem can be used to calculate the current objective value of the objective value sequence at each iteration. In an exemplary embodiment, where the semidefinite programming relaxation not subject to the non- linear constraints is Layer SDP the user specified constant vectors are the vectors vi of (24e). At step 650, the method solves the auxiliary convex semidefinite programming relaxation and determines the current objective value of the objective value sequence at the iteration from the solution of the auxiliary convex semidefinite programming relaxation. At step 660, the method determines if the outcome of the semidefinite programming relaxation determined by the current objective value of the objective value sequence verifies the neural network is robust across the range of inputs. If the neural network is robust, at step 670, the method outputs that the neural network is robust across the range of inputs. Outputting, that the network is robust as soon as this is determined by the method saves computational power in avoiding calculating to
Figure imgf000030_0004
even when the verification instance can be resolved at an earlier stage. If the neural network cannot be verified at step 660, the method determines at step 680, if the objective value of the auxiliary convex semidefinite programming problem is smaller than or equal to a predetermined value. The predetermined value ensures the method determines
Figure imgf000030_0003
within a user defined tolerance. If the answer is “Yes” the method determines that the neural network cannot be verified across the range of inputs at step 690. If at step 680 the answer is “No”, the method returns to step 650 completing an iteration of the method. Further detail on steps 640 to 690 in an exemplary embodiment are provided below. In this exemplary embodiment, the semidefinite programming relaxation not subject to non-linear constraints is Layer SDP according to (14) and the non-linear constraint for each layer of the network is (24e). Although the non-convex layer SDP relaxation (24) is generally hard to solve, its optimal objective value
Figure imgf000031_0001
is bounded below by
Figure imgf000031_0002
This lower bound
Figure imgf000031_0018
pq can be efficiently solved from the convex layer SDP relaxation (14). Hence, we can set
Figure imgf000031_0019
as a start point to search for the value of
Figure imgf000031_0003
at step 640. This inspires us to generate an objective value sequence
Figure imgf000031_0004
by solving an auxiliary convex SDP problem recursively. The sequence that is bounded by
Figure imgf000031_0005
Figure imgf000031_0006
Figure imgf000031_0007
and can converge to
Figure imgf000031_0008
Thereby, the objective value sequence is always tighter than
Figure imgf000031_0010
and remains a valid boundary to and
Figure imgf000031_0009
Figure imgf000031_0011
can be used for NN robustness verification. The auxiliary convex SDP problem has the form of:
Figure imgf000031_0012
where
Figure imgf000031_0013
In (25a) the weight α is a user-specified positive constant. Its value is set as
Figure imgf000031_0014
1 to penalize more on the SDP relaxations of the firslt L-1 layers. This is useful to obtain a tighter neural network output, as it is influenced by the SDP relaxations of the first L-1 layers. The scalars
Figure imgf000031_0015
, can be chosen as any non-zero constraints. The choice of
Figure imgf000031_0016
is iteratively updated for every repetition of step 650, as will be discussed later. In (25c), the vectors
Figure imgf000031_0017
have fixed values and are constructed, at step 640, from Algorithm 1 by exploiting the activation pattern of the neural network.
Figure imgf000032_0015
The construction of vectors }^ by Algorithm 1 ensures the equality constraint (25c) and
Figure imgf000032_0001
always hold. Thus, at step 650, solving the auxiliary convex SDP relaxation allows us to determine the current objective value of the objective value sequence as
Figure imgf000032_0002
The optimal objective value of the auxiliary SDP problem (25) has the following properties: 1)
Figure imgf000032_0003
for any given
Figure imgf000032_0004
2) if and only if the feasible solution satisfies
Figure imgf000032_0005
Figure imgf000032_0007
Figure imgf000032_0006
3) When
Figure imgf000032_0008
. 4) If
Figure imgf000032_0014
is chosen to satisfy
Figure imgf000032_0009
then
Figure imgf000032_0010
. Therefore, if we choose the scalar
Figure imgf000032_0011
such that
Figure imgf000032_0012
, then solving the auxiliary SDP problem (25) gives the objective value
Figure imgf000032_0013
at step 650. In this embodiment, the iterative loop encapsulated by steps 650 to 690 iteratively updates the value of
Figure imgf000033_0001
and generates the objective value sequence that
Figure imgf000033_0002
converges to
Figure imgf000033_0003
An exemplary iterative algorithm that outputs the current objective value at a final iteration
Figure imgf000033_0004
q when a current objective value of the auxiliary convex
Figure imgf000033_0005
semidefinite programming problem at an iteration k is smaller than a predefined value ε is Algorithm 2.
Figure imgf000033_0016
The iterative algorithm is based on solving the auxiliary convex SDP problem (25) at each iteration with the scalar that is changed with the iterations. The initial value of is set as
Figure imgf000033_0006
, where
Figure imgf000033_0007
is the optimal objective value of the layer SDP relaxation (14) determined at step 610 (see Line 2). For each given
Figure imgf000033_0008
the auxiliary SDP problem (25) is solved to obtain the objective value
Figure imgf000033_0009
(see Lines 5 and 6). At each iteration, the obtained optimal objective value
Figure imgf000033_0010
of problem (25) is used to update the value of
Figure imgf000033_0011
(see Line 7). The iteration is terminated when
Figure imgf000033_0012
is smaller than a prescribed tolerance
Figure imgf000033_0013
(see Line 8). Algorithm 2 outputs the objective value
Figure imgf000033_0014
which is used to determine whether the neural network is robust across the range of inputs. The sequence generated by Algorithm 2 has the properties:
Figure imgf000033_0015
Figure imgf000034_0001
Therefore, the sequence satisfies
Figure imgf000034_0003
and converges
Figure imgf000034_0002
to
Figure imgf000034_0004
by setting
Figure imgf000034_0005
Therefore,
Figure imgf000034_0006
can be used to check when the sequence converges, enabling the use of as a stopping criterion in
Figure imgf000034_0007
Figure imgf000034_0008
Algorithm 2. The objective value sequence generated by Algorithm 2 correspondingly has the
Figure imgf000034_0009
property
Figure imgf000034_0010
and monotonically increases to converge to
Figure imgf000034_0011
Thus, every current objective value
Figure imgf000034_0012
in the objective value sequence is a valid lower bound to and subsequently y. Moreover, the calculated
Figure imgf000034_0013
Figure imgf000034_0014
objective values at all iterations are at least as good as
Figure imgf000034_0015
and converge to the optimal objective value
Figure imgf000034_0016
of the non-convex layer SDP relaxation (14). In this sense, the proposed iterative algorithm is an efficient method to solve the non-convex layer SDP relaxation (24), which would otherwise be hard to solve directly. When Algorithm 2 is applied in the method of Figure 6, the iteration is executed only when
Figure imgf000034_0017
, i.e., the method determines at step 620 that the Layer SDP does not verify the neural network as robust across the range of inputs. Further method steps 660 and 670 are incorporated in Algorithm 2 such that the iteration is terminated whenever a positive value of is found, even though
Figure imgf000034_0019
as required at step
Figure imgf000034_0018
680 is not reached yet. This approach reduces computational cost, as the layer SDP approach can already verify a considerable number of instances. Also, it is clear that the use of our iterative algorithm can increase the number of instances that can be verified. Figure 7 illustrates an example system capable of verifying a neural network. Such a system comprises at least one processor 402, which may receive data from at least one input 404 and provide data to at least one output 406. The processor may be configured to perform the method outlined above. Results The benefits of the approaches described above have been demonstrated experimentally, as illustrated in Figure 8, 9A, 9B and 10. In particular, the described approaches have been compared against other state-of-the-art (SoA) convex relaxation methods, including: 1) the SDP formulation in [Raghunathan et al., 2018]; the advanced SDP-FO algorithm described in S. Dathathri, et al, Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming., NeurIPS20, 2020.2) Recent SoA LP relaxation methods, referred to as kPoly (G. Singh, T. Gehr, M. Puschel, and¨ M. Vechev. An abstract domain for certifying neural networks, Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019), OptC2V (C. Tjandraatmadja, R. Anderson, J. Huchette, W. Ma, K. PATEL, and J. Vielma, The convex relaxation barrier, revisited: Tightened singleneuron relaxations for neural network verification, In NeurIPS20, 2020; hereafter referred to as [Tjandraatmadja et al.2020]), IBP (Gowal, S., Dvijotham, K. D., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Arandjelovic, R., Mann, T., and Kohli, P. Scalable verified training for provably robust image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (IEEE/CVF19), pp.4842–4851, 2019) and PRIMA (Müller, M. N., Makarchuk, G., Singh, G., Püschel, M., and Vechev, M. PRIMA: Precise and general neural network certification via multi-neuron convex relaxations. arXiv preprint arXiv:2103.03638v2, 2021; referred to as [Müller et al. 2021]).3) Complete methods: β-CROWN (Wang, S., Zhang, H., Xu, K., Lin, X., Jana, S., Hsieh, C.-J., and Kolter, J. Z. Beta-crown: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification. arXiv preprint arXiv:2103.06624v1, 2021), MILP (Tjeng, V.; Xiao, K.; and Tedrake, R.2019. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations (ICLR19), 1–21) and AI2 (Gehr, T.; Mirman, M.; Drachsler-Cohen, D.; Tsankov, P.; Chaudhuri, S.; and Vechev, M. 2018. AI2: Safety and robustness certification of neural networks with abstract interpretation. In IEEE Symposium on Security and Privacy (SP18), 3–18. IEEE). In a first experiment illustrated in Figure 8, the standard robustness verification problem for image classifiers is addressed: given a correctly classified image, verify that the NN returns the same label for all input within an
Figure imgf000035_0002
perturbation of B. Formally, given an image
Figure imgf000035_0001
with a label ( and a radius
Figure imgf000035_0003
, a neural network is verified to be robust on
Figure imgf000035_0004
, if (2) is y positive for all ( ( . For LP- and SDP-based relaxation methods, we solve (2) multiple times for every potential adversarial target
Figure imgf000036_0002
and check whether the lower bound is positive. We consider the formulation (7), originally proposed in Raghunathan et al., 2018 and our SDP formulations from (9), (14), and (16) for experiments. Note that (9) and (14) are equivalent. In these results, (9) and (14) are referred to as LayerSDP, and its relaxed version (16) as FastSDP. The standard LP relaxation (4) is also illustrated as a benchmark. The formulation (7) is denoted as SDP-IP. The lower and upper bounds
Figure imgf000036_0001
were computed using a symbolic interval propagation algorithm. To get an upper bound of verified accuracy, a projected gradient descent (PGD) algorithm is run. For numerical computation, the convex relaxations are converted into a standard conic optimization before passing them to a numerical solver. An automatic transformation from the convex relaxations into standard conic optimization was implemented. The resulting LP/SDPs were then solved by MOSEK (see ApS Mosek. The mosek optimization toolbox for matlab manual, 2015). Time reported by MOSEK is presented for comparison. The neural networks considered comprised eight fully connected ReLU networks trained on the MNIST dataset. To facilitate the comparison with existing tools, experiments were divided into three groups: 1) One self-trained NN with two hidden layers, each having 64 neurons; no adversarial training was used. The perturbation radius B was varied from 0.01 to 0.05; 2) Three NNs from [Raghunathan et al., 2018]: MLP-SDP, MLP-LP, and MLP-Adv.3) Four deep NNs from G. Singh, T. Gehr, M. Puschel, and M. Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019. For each network, the first 100 images forming the MNIST test set were verified and those incorrectly classified were excluded. The experiments were performed on an Intel(R) i9-10850K CPU 3.60GHz machine with 32 GB of RAM, except for SDP-FO which was carried out on an Intel i71065G7 with 15GB RAM. Figure 8 reports the verified accuracy for the 64×2 network with different perturbation radius ^ using different verifiers: LayerSDP and FastSDP, SDP-IP, and standard LP. As expected, the LayerSDP approach offers improved robust accuracies than SDP-IP and LP across different ^. Interestingly, it is observed that SDP-IP verified fewer images than the standard LP relaxation when
Figure imgf000037_0001
and required longer time. This indicates that the behaviour in Figure 8 persists in practical NN verification, confirming the tightness of LayerSDP. Furthermore, a combination of inactive neuron pruning and layer decomposition made LayerSDP and FastSDP two orders of magnitude faster to solve than SDP-IP. We observe that SDP-FO verified fewer images than SDP-IP using similar computational time. The results in Table 1 demonstrate that LayerSDP is also much faster than SDP-IP, while being more precise than the LP baseline across the networks considered. Furthermore, for the robustly trained NNs (MLPAdv, MLP-SDP, MLP-LP), LayerSDP achieved a very good verified accuracy compared to PGD, with MLP-SDP and MLP-LP matched. Compared to the SoA LP-based methods, kPoly and OptC2V, LayerSDP significantly improved the verified accuracy for 6×100 and 6×200 networks, while remaining competitive for the other two networks. The results suggest that the linear cuts in kPoly and OptC2V can be potentially combined in LayerSDP to obtain a stronger relaxation. Table 1
Figure imgf000037_0002
Time is reported as runtime per image (in seconds). † These results are taken from previously reported values; Dashes (–) indicate previously reported numbers are unavailable. The verified accuracies obtained through this implementation of SDP-FO were slightly lower than previously reported numbers due to different hyper-parameters. SDP-FO failed to verify any instance within maximum iterations in the 6 x 100, 9 x 100, 6 x 200 or 9 x 200 models. ∗: To facilitate time consumption comparison, SDP-IP was run over three images for these networks on the experimental equipment and an average time was taken. Two further sets of experiments were carried out to evaluate the precision and scalability of the Layer RLT-SDP relaxation (22) as well as Algorithm 0. These two experiments were run on a Linux machine with an Intel i9-10920X 3.5 GHz 12-core CPU with 128 GB RAM. The optimisation problems were modelled by using YALMIP (Lofberg, J.2004. YALMIP: A toolbox for modeling and optimization in MATLAB. In IEEE International Conference on Robotics and Automation (ICRA04), 284–289. IEEE) and solved using MOSEK (Andersen, E. D.; and Andersen, K. D.2000. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization, 197–232. Springer). Results obtained are compared against presently available SoA methods and tools. In the first of the two experiments which evaluates the efficacy of the implementation strategy as illustrated in Figure 9A and 9B, two groups of two-input, two-output, fully- connected random ReLU NNs generated by using the method in (Fazlyab, M.; Morari, M.; and Pappas, G. J.2020. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Transactions on Automatatic Control, doi:10.1109/TAC.2020.3046193, referred to as [Fazlyab, Morari, and Pappas 2020]) are considered. Group 1 had four models with L = 4, 6, 8, 10 hidden layers, respectively, and 15 neurons for each hidden layer. Group 2 had four three-layer models, with ni = 10, 15, 50, 100 neurons per hidden layer, respectively. Both the network depth and width are investigated by using RLT-SDP (hereafter used interchangeably with Layer RLT-SDP) to obtain an over-approximation of the feasible output region of the neural network for a given input set. The test inputs were random values within [0; 1] and the heuristic method in [Fazlyab, Morari, and Pappas 2020] was adopted to compute the over-approximations. Algorithm 0 was run with
Figure imgf000038_0001
and
Figure imgf000038_0002
. Without linear cuts (p1 = 0), RLT-SDP is equivalent to LayerSDP. We first studied the impact of network depth on the verification method here proposed by using the models in Group 1. Figure 9A shows over-approximations of the feasible output region by solving RLT-SDP with different percentages of linear cuts (percentages indicated in the legend) for networks of different hidden layers L. The 0% case is LayerSDP. For all the four models considered, adding a larger percentage of linear cuts yields a tighter over-approximation. As the number of hidden layers L increases, LayerSDP becomes looser and the effects of adding linear cuts becomes more significant. The figures show that across all models, even using just 20% of the linear cuts considerably reduces the over-approximation. To further analyse the gain in the approximation versus the corresponding increase in computational complexity, we considered two metrics: the improvement in approximation (or tightness) and the runtime increase. The former is the relative reduction in the feasible output regions obtained by RLT-SDP and LayerSDP; the latter is the relative increase in their runtime. Figure 9B shows the tightness improvement and runtime increase obtained by solving RLT-SDP with different percentages of linear cuts for networks of different hidden layers L. The 0% case is LayerSDP. As expected, Figure 9B illustrates that adding a larger proportion of linear cuts yields a tighter over-approximation, along with an increase in runtime. Adding the same percentage of linear cuts leads to a more significant tightness improvement on larger networks (with larger L) than on smaller ones. For each network, as the percentage of linear cuts increases, the tightness improvement becomes less significant, but the runtime increase becomes more significant. Particularly, experimentally it is found that the first 20% of linear cuts contributes most significantly to the improvement in overall tightness of the method. We evaluated the impact of network width by using the models in Group 2 and observed very similar behaviour of the method. These experiment results clearly confirm
Figure imgf000039_0001
y and demonstrate the efficiency of Algorithm 0. Further, the addition of 20% of linear cuts could be sufficient to improve considerably the precision of the SDP approach without incurring the higher computational costs associated with larger problems. In the second of the two experiments RLT-SDP is compared to SoA methods. Three groups of fully connected ReLU neural networks trained on the MNIST dataset are considered. • Small NNs: MLP-Adv, MLP-LP and MLP-SDP from [Raghunathan et al., 2018] and tested under the same perturbation є = 0.1 as in [Raghunathan et al., 2018] and the experiment illustrated in Figure 8. • Medium NNs: Models 6 × 100 and 9 × 100 from (Singh, G., Gehr, T., Püschel, M., and Vechev, M. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41:1–41:30, 2019; here after referred to as [Singh et al., 2019a]) and evaluated under the same є = 0.026 and є = 0:015 as in [Singh et al.2019a], [Tjandraatmadja et al. 2020], [Müller et al.2021] and the experiment illustrated in Figure 8. • Large NNs: Models 8 × 1024-0.1 and 8 × 1024-0.3 from (Li, L., Qi, X., Xie, T., and Li, B. Sok: Certified robustness for deep neural networks. arXiv preprint arXiv:2009.04131, 2020; hereafter referred to as [Li et al., 2020]), which were trained using CROWN-IBP (Zhang, H.; Chen, H.; Xiao, C.; Gowal, S.; Stanforth, R.; Li, B.; Boning, D.; and Hsieh, C.-J.2019. Towards stable and efficient training of verifiably robust neural networks. arXivpreprint arXiv:1906.06316) with adversarial attack є = 0.1, 0.3, respectively. As in [Li et al.2020], they were tested under the perturbations є = 0.1, 0.3, respectively. To evaluate the efficiency of the proposed RLT-SDP method against the SoA, we benchmarked the technique on the neural networks built on the MNIST dataset described above. All experiments were run on the first 100 images of the dataset. The results obtained are reported in Table 2, where the runtime is the solver time. The PGD upper bounds of MLP-Adv, MLP-LP, MLPSDP, 6 × 100 and 9 × 100 are reiterated from Table 1, while those of 8 x 1024-0.1 and 8 × 1024-0.3 are from [Li et al.2020]. Motivated by the experiment illustrated in Figures 9A and 9B, we ran Algorithm 0 with the sequence {0.1, 0.2} and kmax = 2. As in LayerSDP, we further optimised RLT-SDP by removing inactive neurons in the first step. The results show that RLT-SDP based on the interval arithmetic bounds is more precise than LayerSDP under the same bounds and all other baseline methods for all the networks. One exception is the 9 × 100 network, for which β-CROWN achieves the highest precision. By using the tighter symbolic bound propagation (Botoeva, E.; Kouvaros, P.; Kronqvist, J.; Lomuscio, A.; and Misener, R.2020. Efficient verification of neural networks via dependency analysis. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI20), 3291–3299), RLT-SDP significantly outperformed all the incomplete/complete baseline methods. As expected we found RLT-SDP to be significantly more computationally demanding than LayerSDP across all the networks. However, it still resulted to be faster than SDP- IP for MLP-Adv, MLP-LP, MLP-SDP, 6×100 and 9×100. Neither SDP-IP nor SDP-FO could verify the large networks 8 × 1024-0.1 and 8 × 1024-0.3. SDP-FO fails to verify 6 × 100 and 9 × 100. These results confirm that RLT-SDP remains competitive in terms of computational efficiency. We note that the runtime of LayerSDP in Table 2 is larger than that reported in Table 1. This is because we directly solved the layer SDP relaxation (14), without implementing SparseColO (Fujisawa, K.; and et al.2009. User’s manual for SparseCoLO: Conversion methods for sparse conic-form linear optimization problems. Dept. of Math. and Comp. Sci. Japan, Tech. Rep., 152–8552.) or the automatic model transformation as in Table 1. While these methods reduce computation times, they are equally applicable to RLT-SDP. Hence, the results presented in Table 2 provides a like-for-like comparison between LayerSDP and RLT- SDP. Table 2
Figure imgf000041_0001
†: Results taken from previous reports or Table 1. -: Previously reported number unavailable. ⋄: The methods fail to verify any instance. *: The runtime is estimated by running over five images using the same interval arithmetic bounds. |: The certified robustness on the left and right are obtained using interval arithmetic bounds and symbolic interval propagation, respectively In a final experiment illustrated in Figure 10, we compare the iterative method described in Figure 6 to solve the non-convex semidefinite programming problem (24), hereafter referred to as IterSDP against the SoA incomplete methods for verification: β- CROWN, LP, IBP, OptC2V, LayerSDP formulated in (14), SDP-IP and SDP-FO. The experiments are conducted on Linux machine running an Intel i9-10920X 3.5 GHz 12-core CPU with 128 GB RAM. The optimisation problems were modelled by using the toolbox YALMIP (Lofberg, J. Yalmip: A toolbox for modeling and optimization in matlab. In IEEE International Conference on Robotics and Automation (ICRA04), pp. 284–289. IEEE, 2004) and solved using the SDP solver MOSEK (Andersen, E. D. and Andersen, K. D. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization, pp. 197–232. Springer, 2000). To run Algorithm
Figure imgf000042_0001
are chosen. IterSDP is evaluated on several fully-connected ReLU NNs trained on the MNIST dataset (where “m × n” means a NN with m − 1 hidden layers each having n neurons): 1) One 3 × 50 network self-trained with no adversarial training, tested with perturbation radius ^ from 0.01 to 0.09. 2) The three small size networks MLP-Adv, MLP-LP, and MLP-SDP are from [Raghunathan et al., 2018] are tested under the same perturbation
Figure imgf000042_0002
as in [Raghunathan et al., 2018] and the experiment illustrated in Figure 8. 3) A medium size network 6 × 100 is from [Singh et al., 2019a] and evaluated under the same
Figure imgf000042_0003
as in [Singh et al., 2019a], [Müller et al.2021] and the experiment illustrated in Figure 8. 4) Two large size networks 8 × 1024-0.1 and 8 × 1024-0.3 are from (Li, L., Qi, X., Xie, T., and Li, B. Sok: Certified robustness for deep neural networks. arXiv preprint arXiv:2009.04131, 2020; hereafter referred to as [Li et al., 2020]). They were trained using CROWN-IBP (Zhang, H., Chen, H., Xiao, C., Gowal, S., Stanforth, R., Li, B., Boning, D., and Hsieh, C.-J. Towards stable and efficient training of verifiably robust neural networks. arXiv preprint arXiv:1906.06316, 2019) with adversarial attack
Figure imgf000042_0004
respectively. As in [Li et al., 2020], they were tested under perturbations
Figure imgf000042_0005
respectively. To ensure a fair comparison, the proposed method IterSDP and re-implemented the baseline methods LayerSDP, SDP-IP and LP were run based on the same interval arithmetic bounds. All experiments were run on the first 100 images of the dataset. Figure 10 shows the computational results of the 3 × 50 network under different perturbation radius ^ and by using the methods IterSDP, LP, SDP-IP and LayerSDP. The IterSDP method outperforms the baselines across all the ^ values. This confirms the relation
Figure imgf000043_0001
Notably, IterSDP improves the verified robustness up to the PGD bounds for several
Figure imgf000043_0002
^ IterSDP requires more runtime (about twice) when compared to LayerSDP, but it is still computationally cheaper than SDP-IP. This is expected since Algorithm 2 uses LayerSDP to initialise and solves the auxiliary SDP whose size is similar to the layer SDP relaxation. Table 3 reports the verified robustness (percentage of images that are verified to be robust) and runtime (average solver time for verifying an image) for each method. The PGD upper bounds of MLP-Adv, MLP-LP, MLP-SDP and 6 × 100 are reiterated from Table 1 for direct comparison, while those of 8 × 1024-0.1 and 8 × 1024-0.3 are from [Li et al., 2020]. The results show that IterSDP is more precise than LayerSDP under the same bounds and all other baseline methods for all the networks. One exception is the MLP-LP network, for which the methods IterSDP, LayerSDP, SDP-IP and LP all reach the PGD upper bound. Remarkably, IterSDP increases the number of verified instances by 20% for the 6 × 100 network. For all the other networks, IterSDP obtained the number of verified cases that is close to or same as the PGD upper bound. It is also worth mentioning that IterSDP outperforms the SoA complete methods MILP and AI2 according to the numbers reported in [Li et al., 2020]: MILP verified 67% (respectively, 7%) for 8 × 1024-0.1 (respectively, 8 × 1024-0.3), and AI2 verified 52% (respectively, 16%) for 8 × 1024-0.1 (respectively, 8 × 1024-0.3). As expected we found IterSDP needs a larger amount of runtime (around twice) than LayerSDP across all the networks. However, IterSDP resulted to be faster than SDP-IP for MLP-Adv, MLP-LP, MLP-SDP and 6 × 100. Neither SDP-IP nor SDP-FO could verify the large networks 8 × 1024-0.1 and 8 × 1024-0.3. These results confirm that our proposed IterSDP significantly improves the verification precision, whilst retaining a competitive computational efficiency. Table 3
Figure imgf000044_0001
Time is reported as runtime per image (in seconds). *: These results are obtained by re-implementing the corresponding methods based on the same interval arithmetic bounds as our method. †: These numbers are directly taken from previously reported values. -: Previously reported numbers are unavailable. ⋄: The methods fail to verify any instance.

Claims

Claims 1. A method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; and based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.
2. A method according to claim 1, wherein the set of interlayer constraints constrain all outputs of one or more of the layers to corresponding inputs of one or more adjacent layers
3. A method according to claim 1, wherein the set of interlayer constraints constrain a subset of outputs of one or more of the layers to corresponding inputs of one or more adjacent layers.
4 A method according to any one of the preceding claims, further comprising determining one or more initial linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network, wherein the applying a semidefinite programming relaxation is further subject to the one or more initial linear constraints.
5. A method according to any one of the preceding claims, further comprising determining, for each layer of the network, one or more further linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more further linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, and wherein the applying a semidefinite programming relaxation is further subject to the one or more further linear constraints.
6. A method according to claim 5 when dependent on claim 4, wherein determining, for each layer of the network, one or more further linear constraints expressed as a upper bound and a lower bound for elements of the matrix representation comprises calculating the upper and lower bounds given the range of inputs of the neural network and the one or more initial linear constraints.
7. A method according to any one of the preceding claims, further comprising determining for each layer of the network, a non-linear constraint from the algebraic constraints for that layer, wherein the applying a semidefinite programming relaxation is further subject to the non-linear constraint for each layer of the network.
8. A method according to claim 7, wherein an objective value of the semidefinite programming relaxation determines the outcome of the semidefinite programming relaxation; and the objective value of the semidefinite programming relaxation is monotonically approached by an objective value sequence that converges to the objective value of the semidefinite programming relaxation, wherein a starting point of the objective value sequence is an objective value of the semidefinite programming relaxation not subject to the non-linear constraint for each layer of the network; and the objective value sequence is determined iteratively by solving an auxiliary convex semidefinite programming problem recursively, wherein a current objective value of the objective value sequence determined at an iteration is sequential to the objective values of the objective value sequence determined in prior iterations, wherein a current objective value of the auxiliary convex semidefinite programming problem is an objective value of the auxiliary convex semidefinite programming problem at the iteration.
9. A method according to claim 8, wherein the objective value of the auxiliary convex semidefinite programming problem is always greater than or equal to zero; and the objective value of the auxiliary convex semidefinite programming problem is equal to zero when the non-linear constraint for each layer of the network is satisfied.
10. A method according to any one of the preceding claims, further comprising removing terms associated with nodes which are inactive across the range of inputs from the semidefinite constraints.
11. A method according to any one of the preceding claims, wherein the semidefinite constraints comprise positive semidefinite constraints.
12. A method according to any one of the preceding claims, wherein the neural network is a feed forward neural network.
13. A method according to any one of the preceding claims, wherein the nodes of the neural network apply a Rectified Linear Unit (ReLU) activation function.
14. A method according to any of the preceding claims, wherein the neural network is an image processing neural network which takes an image as input.
15. A method according to any of claims 1-13, wherein the neural network is a controller neural network for controlling a physical device.
16. A computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any one of the preceding claims.
17. A perception system comprising one or more processors configured to carry out the method of any one of claims 1 to 15.
PCT/EP2022/063919 2021-05-21 2022-05-23 Verifying neural networks WO2022243570A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2107304.4 2021-05-21
GBGB2107304.4A GB202107304D0 (en) 2021-05-21 2021-05-21 Verifying neural networks

Publications (1)

Publication Number Publication Date
WO2022243570A1 true WO2022243570A1 (en) 2022-11-24

Family

ID=76637762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/063919 WO2022243570A1 (en) 2021-05-21 2022-05-23 Verifying neural networks

Country Status (2)

Country Link
GB (1) GB202107304D0 (en)
WO (1) WO2022243570A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023250435A1 (en) * 2022-06-22 2023-12-28 Ntt Research, Inc. Remote execution verification with reduced resource requirements

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A. RAGHUNATHANJ. STEINHARDTP. LIANG: "Semidefinite relaxations for certifying robustness to adversarial examples", NEURIPS18, 2018, pages 10877 - 10887
BOTOEVA, E.KOUVAROS, P.KRONQVIST, J.LOMUSCIO, A.MISENER, R.: "Efficient verification of neural networks via dependency analysis", IN PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI20, 2020, pages 3291 - 3299, XP055827255, DOI: 10.1609/aaai.v34i04.5729
K. JULIANJ. LOPEZJ. BRUSHM. OWENM. KOCHENDERFER: "Policy compression for aircraft collision avoidance systems", PROCEEDINGS OF THE 35TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC16, 2016, pages 1 - 10, XP033019348, DOI: 10.1109/DASC.2016.7778091
LOFBERG, J. YALMIP: "IEEE International Conference on Robotics and Automation (ICRA04", 2004, IEEE, article "YALMIP: A toolbox for modeling and optimization in MATLAB", pages: 284 - 289
SUMANTH DATHATHRI ET AL: "Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 October 2020 (2020-10-22), XP081793254 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023250435A1 (en) * 2022-06-22 2023-12-28 Ntt Research, Inc. Remote execution verification with reduced resource requirements

Also Published As

Publication number Publication date
GB202107304D0 (en) 2021-07-07

Similar Documents

Publication Publication Date Title
Gordaliza et al. Obtaining fairness using optimal transport theory
Ghodsi et al. Safetynets: Verifiable execution of deep neural networks on an untrusted cloud
Xu et al. Optimization of graph neural networks: Implicit acceleration by skip connections and more depth
Nilsson et al. Synthesis of separable controlled invariant sets for modular local control design
Mathiesen et al. Safety certification for stochastic systems via neural barrier functions
Huang et al. Quantifying epistemic uncertainty in deep learning
Pfrommer et al. TaSIL: Taylor series imitation learning
Gurevin et al. Enabling retrain-free deep neural network pruning using surrogate lagrangian relaxation
WO2022243570A1 (en) Verifying neural networks
KR20220083833A (en) Systems and methods with robust deep generation models
Ngo et al. Adaptive anomaly detection for internet of things in hierarchical edge computing: A contextual-bandit approach
Guo et al. Eager falsification for accelerating robustness verification of deep neural networks
Mohan et al. Structure in reinforcement learning: A survey and open problems
Dushatskiy et al. A novel surrogate-assisted evolutionary algorithm applied to partition-based ensemble learning
Cyr et al. Multilevel initialization for layer-parallel deep neural network training
Quindlen et al. Active sampling-based binary verification of dynamical systems
US20210182631A1 (en) Classification using hyper-opinions
Al-Hyari et al. An adaptive analytic FPGA placement framework based on deep-learning
US11494634B2 (en) Optimizing capacity and learning of weighted real-valued logic
Lechner et al. Quantization-aware interval bound propagation for training certifiably robust quantized neural networks
Cai et al. Ensemble-in-One: Learning Ensemble within Random Gated Networks for Enhanced Adversarial Robustness
Smolensky Overview: Computational, Dynamical, and Statistical Perspectives on the Processing and Learning Problems in Neural Network Theory
Newton et al. Rational Neural Network Controllers
US20220366226A1 (en) Methods and systems for compressing a trained neural network and for improving efficiently performing computations of a compressed neural network
Tan et al. Weighted neural tangent kernel: A generalized and improved network-induced kernel

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22732429

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22732429

Country of ref document: EP

Kind code of ref document: A1