WO2022243570A1

WO2022243570A1 - Verifying neural networks

Info

Publication number: WO2022243570A1
Application number: PCT/EP2022/063919
Authority: WO
Inventors: Ben BATTEN; Panagiotis KOUVAROS; Jianglin LAN; Alessio LOMUSCIO; Yang Zhang
Original assignee: Imperial College Innovations Limited
Priority date: 2021-05-21
Filing date: 2022-05-23
Publication date: 2022-11-24
Also published as: GB202107304D0

Abstract

There are provided processes for verifying the performance of neural networks across a range of inputs. The neural network comprises nodes arranged in a plurality of layers, and a disclosed process comprises the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; and based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.

Description

Verifying Neural Networks Field The present disclosure relates to the verification of the consistency of the output of neural networks under variations to the input. In particular, but not exclusively, the present disclosure provides techniques for verifying the reliability of a neural network for the classification of objects in sensor data, such as image data. Background Autonomous systems are forecasted to revolutionise key aspects of modern life including mobility, logistics, and beyond. While considerable progress has been made on the underlying technology, severe concerns remain about the safety and security of the autonomous systems under development. One of the difficulties with forthcoming autonomous systems is that they incorporate complex components that are not programmed by engineers but are synthesised from data via machine learning methods, such as a neural network. Neural networks have been shown to be particularly sensitive to variations in their input. For example, neural networks currently used for image processing have been shown to be vulnerable to adversarial attacks in which the behaviour of a neural network can easily be manipulated by a minor change to its input, for example by presenting an “adversarial patch” to a small portion of the field of view of the image. At the same time, there is an increasing trend to deploy autonomous systems comprising neural networks in safety- critical areas, such as autonomous vehicles. These two aspects taken together call for the development of rigorous methods to systematically verify the conformance of autonomous systems based on learning-enabled components to a defined specification. Often, such a specification can be defined in terms of robustness to one or more transformations at one or more inputs – formally, a network is said to be transformationally robust at a given input under a class of transformations if its output remains within a specified tolerance (e.g. one small enough to not cause a change in predicted class) when the input is subjected to any transformation in the class. For example, safeguards on acceptable behaviour of the ACAS XU unmanned aircraft collision avoidance system have been defined in terms which are equivalent to transformational robustness (in K. Julian, J. Lopez, J. Brush. M. Owen and M. Kochenderfer. Policy compression for aircraft collision avoidance systems. In Proceedings of the 35th Digital Avionics Systems Conference (DASC16), pages 1-10, 2016). In other examples, acceptable behaviour of image classifiers has been specified in terms of continuing to predict the same class when a particular image input is subjected to transformations which remain within a certain Lp-distance, or subjected to a certain class of affine and/or photometric transformations. Transformations may also include, for example: white noise changes to a given input (defined by an epsilon ball for an infinite norm); white noise changes to a given input given by any box constraints on some/all of the input dimensions; or any linear or non-linear transformation of the given input governed by a modification of the input described by a mathematical function or an algorithm. Current methods for NN verification can be categorized into complete and incomplete approaches. Aside from computational considerations, complete approaches are guaranteed to resolve any verification query. Incomplete approaches are normally based on various forms of convex approximations of the network and only guarantee that whenever they output that the network is safe, then that is indeed the case. While this typically enables faster computation, the looser this approximation is, the more likely it is that the method may not be able to verify the problem instance. As a result, the present objective in incomplete methods is the development of tighter approximations, which can be efficiently computed, thereby strengthening the efficacy of the methods in answering the verification problem. Proposed complete methods include those based on mixed-integer linear programming (MILP), satisfiability modulo theories or bound propagation techniques coupled with input refinement. While these methods offer theoretical termination guarantees, at present they do not scale to the network sizes that incomplete approaches are able to address. Incomplete methods are typically based on bound propagation, duality, and semidefinite program (SDP) relaxations. A common theme in this research is the linear program (LP) relaxation for the univariate ReLU function. A foundational relaxation is the triangle relaxation from R. Ehlers, Formal verification of piece-wise linear feed- forward neural networks, In ATVA17, volume 10482 of Lecture Notes in Computer Science, pages 269– 286, Springer, 2017 (referred to hereinafter as “Ehlers et al., 2017”). This gives a tight convex relaxation of the univariate ReLU function and forms the basis of many of the cited methods. It has been recently shown that the efficacy of these methods is intrinsically limited by the same convex relaxation barrier which is characterised by the tightness of the triangular relaxation. Another way to bypass the barrier is to seek alternative stronger relaxations beyond LPs, such as SDPs. It has been empirically observed that the SDP relaxation in is much tighter than LP relaxations. However, SDPs are computationally harder solve. An example of the SDP approach can be found in A. Raghunathan, J. Steinhardt, and P. Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In NeurIPS18, pages 10877–10887, 2018 (referred to hereinafter as “Raghunathan et al., 2018). There is therefore an ongoing need to improve provide computationally efficient solutions to the verification problem while at the same time maximising the efficacy of these methods. Summary According to a first aspect of the present disclosure, there is provided a method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. By determining semidefinite constraints for each layer rather than a semidefinite constraint across the entire network, the dimensionality of the constraint may be significantly reduced, thereby reducing the computing resources required to apply the semidefinite programming relaxation. The interlayer constraints may help to provide that the interaction between layer outputs and inputs are properly modelled at the same time. Optionally, the set of interlayer constraints constrain all outputs of one or more of the layers to corresponding inputs of one or more adjacent layers. Alternatively, the set of interlayer constraints constrain a subset of outputs of one or more of the layers to corresponding inputs of one or more adjacent layers. Optionally, the method further comprises determining one or more initial linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network, wherein the applying a semidefinite programming relaxation is further subject to the one or more initial linear constraints. Optionally, the method further comprises determining, for each layer of the network, one or more further linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more further linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, and wherein the applying a semidefinite programming relaxation is further subject to the one or more further linear constraints. Optionally, determining, for each layer of the network, one or more further linear constraints expressed as a upper bound and a lower bound for elements of the matrix representation comprises calculating the upper and lower bounds given the range of inputs of the neural network and the one or more initial linear constraints. Subjecting the semidefinite programming relaxation to the initial linear constraints, can ensure that the semidefinite programming relaxation is tighter than linear programming relaxation. Moreover, the further linear constraints can tighten the semidefinite programming relaxation compared to semidefinite programming relaxation without the further linear constraints. Optionally, the semidefinite programming relaxation may be subjected to a portion of the further linear constraints to reduce computational cost. Optionally, the semidefinite programming relaxation may be iteratively repeated, wherein at each iteration one or more of the further linear constraints are added to the portion of the further linear constraints the semidefinite programming relaxation is subjected to. Optionally, the method further comprises determining, for each layer of the network, a non-linear constraint from the algebraic constraints for that layer, wherein the applying a semidefinite programming relaxation is further subject to the non-linear constraint for each layer of the network. Optionally, an objective value of the semidefinite programming relaxation determines the outcome of the semidefinite programming relaxation; and the objective value of the semidefinite programming relaxation is monotonically approached by an objective value sequence that converges to the objective value of the semidefinite programming relaxation, wherein a starting point of the objective value sequence is an objective value of the semidefinite programming relaxation not subject to the non-linear constraint for each layer of the network; and the objective value sequence is determined iteratively by solving an auxiliary convex semidefinite programming problem recursively, wherein a current objective value of the objective value sequence determined at an iteration is sequential to the objective values of the objective value sequence determined in prior iterations, wherein a current objective value of the auxiliary convex semidefinite programming problem is an objective value of the auxiliary convex semidefinite programming problem at the iteration. Optionally, the objective value of the auxiliary convex semidefinite programming problem is always greater than or equal to zero; and the objective value of the auxiliary convex semidefinite programming problem is equal to zero when the non-linear constraint for each layer of the network is satisfied. Subjecting the semidefinite programming relaxation to the non-linear constraint for each layer provably can tighten the semidefinite programming relaxation compared to the semidefinite programming relaxation not subject to the non-linear constraint for each layer. Moreover, each objective value in the objective value sequence may be a tighter solution than the prior objective values in the sequence. The tightest solution of the sequence may be reached when the objective value sequence has converged to the objective value of the semidefinite programming relaxation subject to the non-linear constraints. Optionally, the based on an outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs comprises: determining at each iteration, based on the current objective value of the objective value sequence, whether the neural network is robust across the range of inputs, if the neural network is robust across the range of inputs, providing as the outcome of the semidefinite programming relaxation that the neural network is robust across the range of inputs, if the neural network is unverified across the range of inputs, determining whether the current objective value of the auxiliary convex semidefinite programming problem is smaller than a predefined value, if the current objective value of the auxiliary convex semidefinite programming problem smaller than a predefined value, providing as the output of the semidefinite programming relaxation that the neural network is not verifiable across the range of inputs. The determining at each iteration, based on the current objective value of the objective value sequence, whether the neural network is robust across the range of inputs can allow the iteration loop to be terminated when the neural network is verified. This can save computational power by reducing the number semidefinite programming instances that are solved for each verification instance. Optionally, the method further comprises removing terms associated with nodes which are inactive across the range of inputs from the semidefinite constraints. Optionally, the semidefinite constraints comprise positive semidefinite constraints. In some preferred examples, the neural network is a feed forward neural network. Optionally, the nodes of the neural network may apply a Rectified Linear Unit (ReLU) activation function. In some example implementations, the neural network may be an image processing network which takes an image as input. For example, the neural network may be trained for an image classification, object detection, image reconstruction, or other image processing task. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for performing the image processing task, such as the image classification, object detection or image reconstruction task. In particular, if the neural network is determined to be transformationally robust, the network may perform the image processing task on an image. In such circumstances, it may be possible to provide guarantees on the appropriateness of the network to perform the image processing task correctly. In other example implementations, the neural network may be an audio processing network which takes a representation of an audio signal as input. For example, the neural network may be trained for a voice authentication, speech recognition, audio reconstruction, or other audio processing task. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for performing the audio processing task, such as the voice authentication, speech recognition or audio reconstruction task. In particular, if the neural network is determined to be transformationally robust, the network may perform the audio processing task. In such circumstances, it may be possible to provide guarantees on the appropriateness of the network to perform the audio processing task correctly. While the above example implementations refer to image processing or audio processing, the skilled person will recognised that the claimed approach may apply to other inputs; for example, the input to the neural network may be sensor data such as image data, audio data, LiDAR data, or other data. In general, the claimed process may act to improve the ability or reliability of a network in classifying data of this kind. In other example implementations, the neural network may be part of an AI system to evaluate the credit worthiness or other risk or financial metrics and takes as input the relevant tabular information used to assess a financial decision. For example, the neural network may be trained for credit scoring of applicants for loan purposes. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for the decision making task in question. In particular, if the neural network is determined to be transformationally robust, guarantees may be given to the relevant regulators on the appropriateness of the network to perform the audio processing task correctly. In yet other example implementations, the neural network may be a controller neural network which outputs a control signal for a physical device, such as an actuator. For example, the neural network may be trained for controlling a robot, vehicle, aircraft or plant. In such implementations, if the neural network is determined to be transformationally robust, the network may further be deployed for controlling the physical device, such as the actuator, robot, vehicle, aircraft or plant. In particular, if the neural network is determined to be transformationally robust, the network may control the physical device. Other applications of the method above are in fraud monitoring, medical imaging, optical character recognition and generally whenever guarantees of transformational robustness aid in determining the robustness of the neural model. According to a further aspect, there may be provided a computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of the first aspect. There may also be provided a system comprising one or more processors configured to carry out the method of the first aspect. According to a first still further aspect of the present disclosure, there may be provided method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining one or more linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network; applying a semidefinite programming relaxation subject to the semidefinite constraints and the linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. According to a second still further aspect of the present disclosure, there may be provided method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining, for each layer of the network, one or more linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, applying a semidefinite programming relaxation subject to the semidefinite constraints and the linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. According to a third still further aspect of the present disclosure, there may be provided method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, a semidefinite constraint from the algebraic constraints for the network; determining, one or more non-linear constraint from the algebraic constraints for each layer of the network, wherein the applying a semidefinite programming relaxation is further subject to one or more non-linear constraints; applying a semidefinite programming relaxation subject to the semidefinite constraints and the non-linear constraints across the range of inputs; based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs. The skilled person will recognise that optional feature of the first aspect may also apply to any of the still further aspects. Moreover, there may be provided a computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of the still further aspect. There may also be provided a system comprising one or more processors configured to carry out the method of the still further aspect. Brief Description of the figures Examples of the present disclosure will be presented with reference to the accompanying drawings in which: Figure 1 illustrates a set of transformations of an input; Figure 2 shows a method according to the present disclosure; Figure 3 illustrates the relative tightness of SDP and LP relaxations; Figure 4 illustrates the relative tightness of Layer SDP and a subset of RLT-SDP linear constraints; Figure 5 illustrates the relative tightness of Layer SDP and a subset of RLT-SDP linear constraints; Figure 6 shows a method according to the present disclosure; Figure 7 illustrates an example system capable of verifying a neural network; Figure 8 shows experimental results; Figures 9A and 9B show experimental results; Figure 10 shows experimental results. Detailed Description The present disclosure is directed to the verification of a neural network and particularly to verifying consistency of neural network output across a range of potential inputs. In other words, verification may offer a guarantee that a neural network’s outputs remain within a certain tolerance when a starting input to the neural network input is varied across a range. For example, consider a baseline input subject to a class of transformations. Transformations may include, for example: white noise changes to a given input (defined by an epsilon ball for an infinite norm); white noise changes to a given input given by any box constraints on some/all of the input dimensions; or any linear or non-linear transformation of the given input governed by a modification of the input described by a mathematical function or an algorithm. The class of transformations may define the perturbations of the input for which the neural network output is to satisfy the output constraints. In some embodiments, the class of transformations may be defined in terms of a range for each component of the neural network’s input, within which the component is to vary. In other embodiments, the class of transformations may be defined by a bound on a global metric, such as by defining a maximum value for the l₁-distance between the original input and the perturbed input. In yet other embodiments, the class of transformations may be specifically adapted to the task for which the network is trained: for example, for a network trained for image recognition, a class of affine or photometric transformations can be defined, for example in the manner described in WO 2020/109774 A1. In general, the class of transformations may be specified in terms of a set of algebraic constraints that are satisfied when applying any transformation in the class to the input. Typically, the input and class of transformations may be chosen such that the input sufficiently unambiguously belongs to a particular class and the class of transformations define small enough perturbations that the neural network may be expected not to substantially change its output when the transformations are applied to the input. A visually illustrative example of this is provided in Figure 1, which depicts example affine (102-104), photometric (105-106) and random noise (110) transformations applied to an original image (101). As can be seen, the transformations may be chosen such that the semantic content of the image is unchanged. The set of output constraints define a maximum range within which the outputs of the neural network should vary if the transformational robustness property is to be satisfied. In general, any set of algebraic constraints that defines a region within which the neural network’s output should remain can be used as the set of output constraints. For example, the set of output constraints may be defined in terms of linear inequalities of the form

where

is the output of the network, a is a vector of coefficients, and b is a constant. In some embodiments, the set of output constraints can be defined using the neural network itself; for example, if the network provides for a classification stage, the set of output constraints may correspond to ensuring that the output remains in the same predicted class. To understand the notation adopted in the following description, consider feed-forward ReLU neural networks (NNs). We consider an L-layer feed-forward NN to be represented by

to denote the pre- activation and activation vectors of the i-th layer, and define the NN output as f(x₀) :=

are the weights and biases, respectively, n₀ = d, n_L+1 = m are input and output dimensions, and the ReLU function is defined as ReLU (z) = max(z, 0) for z e IK (the ReLU function is applied element-wise). We focus on classification networks whereby an input x₀ is assigned to the class associated with the network output with the highest value:

In this context, one can define the verification problem as follows: given a

a nominal input

a linear function ∅, also called the specification, on the network’s outputs, and a perturbation radius

the verification problem is to determine whether

where denotes the standard

norm of a vector. In particular, we hereafter focus on the local adversarial robustness problem whereby the specification is

for a target label i. A network is said to be certifiably robust on input x and perturbation radius ∈ if the answer to the verification problem (1) is true for all

This problem can be answered by solving the optimisation problem

where

and [L] denotes

The verification problem is true if the optimal value y* of (2) is positive. The optimisation problem is however non-convex because of (2a) and is therefore generally difficult to solve. To obtain a tractable convex relaxation of the problem, we derive an outer- approximation of the feasible region (x₀,x₁,...,x_L) in (2) using a convex set D. This relaxes (2) to a convex problem

which provides a valid lower bound

, then the answer to the verification problem (1) is true. If however

, then the verification problem cannot be decided. In order to demonstrate the benefits of the approach proposed by the present disclosure, it is useful to consider existing approaches: the triangle relaxation described in Ehlers et al., 2017; and the semidefinite relaxation described in Raghunathan et al., 2018. The triangle relaxation approximates a single univariate ReLU function z = max{x,0} with its convex hull. Specifically, the ReLU constraints (2a) are approximated by a set of linear constraints

where

denotes the Hadamard product, and ki :=

are upper and lower bounds of the pre-activation variable for any input satisfying (2b);

These bounds can be computed using interval propagation methods. The optimal value γLP of the resulting linear program (LP) relaxation is relatively easy to compute in practice. However, the quality of the LP relaxation (4) is intrinsically limited, i.e., there is always a positive gap

for many practical NNs, referred to as the convex relaxation barrier. The semidefinite relaxation utilizes a single positive semidefinite (PSD) constraint that couples all ReLU constraints in (2a) to obtain a convex SDP. In this approach, the ReLU constraints are equivalently replaced (2a) with the following quadratic constraints

Further, the input constraint (2b) as well as the lower and upper bounds

, on the activation vectors

(which can be obtained using interval prorogation methods) can be reformulated as quadratic constraints

where i = 0 corresponds to the

constraint (2b). Polynomial lifting and SDP-based hierarchies can be used to solve the resulting polynomial optimisation problem. Specifically a lifting matrix P of monomials

can be defined as in Raghunathan et al., 2018. Then, all the constraints in (5) and (6) become linear in terms of the elements of P. By relaxing the monomial matrix P to be

we obtain an SDP relaxation of (2) as follows

where the same symbolic indexing P[·] as Raghunathan et al., 2018 is adopted to index the elements of P. In this case (7a) and (7b) correspond to the RELU constraints (5), and (7c) corresponds to the bounds on activation vectors in (6). We denote the optimal value of (7) as γSDP,1. We always have y^⋆ ≥ γSDP,1, where the equality is achieved if the optimal solution P to (7) is of rank one. Referring to Figure 2, a method is provided for verifying a neural network which adopts aspects of the above-referenced LP and SDP approaches, but is further improved by additional adaptations. At step 210 of the method, data is obtained defining a neural network, range of inputs and set of output constraints to verify. The neural network may, for example, be an image classifier network. Such a network may be shown to classify an image appropriately for a given image. The range of inputs may represent a region around that input for which it is desired that the output remains within the output constraints. For example, the range of inputs may comprise one or more of: white noise variations of an input; geometrical changes of an input; and colour, luminosity, contrast, and/or bias-field transformations of an input. At step 210, semidefinite constraints, optionally positive semidefinite constraints, are adopted. However, unlike the SDP process described above, these semidefinite constraints are defined for each layer of the network rather than for the network as a whole. Consequently, significant computational benefits are realised when resolving these constraints. Further details of the definition of the semidefinite constraints are provided below. In order that appropriate conditionality is retained between the layers of the network, at step 230 one or more interlayer constraints are defined. These interlayer constraints couple outputs of network layers to corresponding inputs. Where a neural network comprises N layers, outputs of each layer n (for n in the range 1 to N -1) are coupled to the inputs of subsequent layer n + 1 using the interlayer constraints. At step 240, linear cut constraints are defined. Many different types of linear cut constraints may be provided. For example, a linear constraint may provide further constraints to the approximation of the neural network based on the linear behaviour of the nodes within the exclusively activated or inactivated regions. Whereas conventional semidefinite constraints in these regions are approximate, by applying a linear constraint in such regions the overall tightness of the approximation can be improved. In another example, a linear constraint may capture inter-layer and intra-layer dependencies between two nodes in the same or adjacent layers. Applying such linear constraints capturing dependencies between nodes increases the tightness of the semidefinite programming relaxation, as conventional methods do not capture these dependencies. At step 250, SDP relaxations are applied to solve for the constraints defined in steps 220 to 240, thereby obtaining a minimum value of γ as described above. Where γ obtained in this manner is equal to or greater than 0 then the network can be verified across the range of inputs at step 260. Where γ is less than 0 it is not possible to verify the network (although it is possible that the network is itself robust across the range). A verified neural network may be deployed with a degree of certainty for tasks dependent on accurate perception. For example, where an image classification neural network is used to control a device (such as an autonomous vehicle), confidence that its outputs are not adversely affected by transformations such as those reflected in the range of inputs may be important for demonstrating the safety and/or efficacy of the device. Further details of the definition of the constraints at steps 220 to 240 are provided below. For example, with respect to step 240, the adoption of linear cuts providing further constraints to the approximation of the neural network based on the linear behaviour of the nodes within the exclusively activated or inactivated regions may further be understood with reference to Figure 3, which illustrates how in certain cases the SDP relaxation in equation (7) (illustrated by the dashed line) may be looser than the LP relaxation in equation (4) (illustrated by the solid line). In particular, Figure 3 shows LP and ADP outer approximations LP and SDP-based outer approximations of

{ ,

From left to right Figure 3 shows: 1) unstable neuron l = −4, u = 1; 2) inactive neuron l = −4, u = 0; 3) strictly active neuron l = 0, u = 1. The standard SDP relaxation (7) is inexact even for inactive/stable neurons, while the triangular relaxation becomes exact. To address this, linear cuts based on a linear approximation of an activation function for one or more nodes of the neural network may be introduced into the process as further set of initial linear constraints at step 240. In the context of the SDP relaxation (7), this process comprises extending the relaxation to include the linear cut (4b) thereby tightening the relaxation. We express the cut (4b) in terms of the matrix P as follows

and add it to (7). This leads to the following SDP relaxation for the verification problem (2):

Due to the linear cuts (8), the new SDP relaxation (9) is tighter than both the original SDP relaxation (7) and the standard triangle LP relaxation (4). Further benefits may arise from this approach, since given the activation pattern (i.e. which neurons are in a stable state of activation across the range of inputs) once the linear cuts are applied then the activation pattern can be used to reduce the dimensionality of the PSD constraint. Particularly, given lower and upper bounds on the pre-activation vector

, it is known that the constraints (4) for stable neurons of the (i+1)-th layer become exact and can be simplified: 1) if the kth neuron is strictly active, i.e.,

or 2) if the neuron is inactive, i.e.,

The information regarding inactive neurons can also be removed in (9) since P[xi+1](k) becomes zero thanks to the linear cuts (8). This effectively reduces the dimension of the PSD constraint

without altering the optimal value. In many practical cases, a significant portion of the neurons are stable under a given verification query, especially when small perturbation radiuses B are considered. Thus, adding the linear cuts (8) not only makes the SDP relaxation (9) theoretically stronger but also computationally easier. In an alternative approach, some of these advantages may be provided by first pruning the inactive neurons to form a new NN and then apply the SDP (7) to this newly pruned NN. Steps 220 and 230 are also effective to reduce the dimensionality of the PSD constraint in (9). These steps exploit the layer-wise cascading structure of NNs whereby each activation vector of a layer depends only on the previous layer’s activation vector. This can be understood using the equivalent quadratic formulation of (5). Instead of using a single big matrix P as in (7), we introduce, at step 220, multiple matrices of monomials Pi for each i ∈ [L]:

Then, the constraints (5a)-(5b) become linear in Pi:

Also, (7c) and (8) (thus reflecting the linear cuts of step 240) can be written with respect to Pi as

Upon relaxing the monomial matrices, we need to consider the input-output consistency among the P_i’s. Accordingly, interlayer constraints are introduced at step 230, i.e.,

where

Accordingly, a layer-based SDP relaxation at step 250 for the verification problem (2) can now be expressed as:

Instead of one single big PSD constraint of network size in (9), the layer-based SDP relaxation (14) employs multiple smaller PSD constraints for each layer. Smaller PSD constraints in an SDP can be considered to speed up its solution using off- the-self solvers. Moreover, the solution quality (14) is equivalent to that from (9). That is to say, given a non-convex NN verification instance (2), we have that

. In the following, the result (14) is often referred to as Layer SDP, with L_klm,n interchangeably referred to as

The efficacy of incomplete NN verification methods depends both on the tightness of the utilized approximations and the computational efficiency of the method. The Layer SDP result (14) can be further adapted for computational efficiency and tightness by adding or removing constraints. In the following, we describe three exemplary variations on Layer SDP: (i) a relaxation of the method via dropping equality constraints within the interlayer constraints, (ii) a tightening of the approximation via adding further linear cut constraints at step 240 and (iii) a tightening of the approximation via adding a non-linear constraint for each layer of the network as further described in the context of Figure 6. While these variations are described as alternatives, it is understood that each of these variations can be applied either additionally or alternatively to each other. Further, the variations (ii) and (iii) may also be applied to the global SDP relaxation formulated in (7) in analogy to their application to Layer SDP. In one exemplary variation, further relaxation of Layer SDP may be achieved via dropping equality constraints within the interlayer constraints of result (14). The number of equality constraints (13) is quadratic in the number of neurons in each layer. However, an SDP relaxation that uses only a subset of the constraints in (13) may be adopted at step 230. In particular, if at step 230 the interlayer constraints are constructed using a linear number of consistency constraints as

Then at step 250 another layer-based SDP relaxation may be formed as follows:

The solution quality of (16) may in some cases be less precise than (14) but will be faster to solve and it is still provably better than the LP relaxation (4), i.e.,

In second exemplary variation on semidefinite programming relaxation, one or more further linear constraints capturing inter-layer and intra-layer dependencies between two nodes in the same or adjacent layers are added at step 240. These further linear constraints may be applied to global SDP (7) or Layer SDP (14). Moreover, the further linear constrains may be applied additionally or alternatively to the initial linear constraints based on the linear behaviour of the nodes within the exclusively activated or inactivated regions expressed by (8). Adding these further linear constraints tightens the SDP relaxation. In some embodiments, only a subset of the further linear constraints may be added to the SDP relaxation, thereby reducing computational cost of the method. The further linear constraints are determined from an upper bound and a lower bound for each of two nodes from the network, wherein a first node is from a first layer of the network and a second node is either from the first layer of the network or from a layer of the network adjacent to the first layer of the network. Subsequently, the further linear constraints are expressed as an upper bound and a lower bound for elements of the lifting matrix P. In the context of Layer SDP (14), the method aims to bound elements of the matrix Pi for each layer. First we denote a few terms for each layer of a neural network:

and . Since

we have

These non-linear constraints can be reformulated as linear constraints on the elements of P_i:

The method aims to bound

within the region given in (17). The constraints in (17) are linear and could be directly added to (14). However, they introduce

new inequalities, thereby increasing the computational effort required to solve the verification problem. Therefore, herein efficient strategies for imposing the constraints in (17) are presented. The method uses (i) reformulation-linearization technique (RLT) to construct valid further linear cut constraints that are provably stronger than (17), and (ii) provides a computationally-efficient strategy for integrating the linear cut constraints with the Layer SDP relaxation (14). An analogous set of constraints may be formulated for lifting matrix P, and the technique applied to global SDP (7). In an embodiment implementing Layer SDP (14), valid further linear cut constraints are constructed using RLT. RLT involves the construction of valid linear cuts on the lifting matrices

by using products of the existing linear constraints in (14) on the original variables {^}_^@^ . Under the constraints

and (12a) on Layer SDP (14), the variables

satisfy:

These can be used to construct the constraints:

. By using (10), these non-linear constraints are linearized as

The linear cut constraints (18a) – (18d) are stronger than (17). The existing constraints (12a) and

are stronger than the first part of (18a); while (12a) is stronger than the diagonal components of the second part of (18a). Therefore, the targeted bounding (17) can be realized by adding to the Layer SDP relaxation (14) the following linear cut constraints for each

(where in general

denotes a sequence of nonzero integers form 0 to b):

where the diagonal components of (19a) are redundant. The above shows that adding the linear cut constraints in (19) to the Layer SDP relaxation (14) is efficient to bound

and subsequently the matrix Pi. Layer SDP relaxation (14) also has other existing linear constraints (11a) and (12b), where (12b) was obtained as an initial linear constraint from triangle relaxation constraints (4). In some embodiments, (11a) and (12b) can be used to construct the new constraints:

Linear cut constraint (20a) is weaker than the existing constraint

while (20c) is weaker than the conjunction of existing constraints (11a), (11b) and (12b). Adding the linear cut constraint (20b) can tighten the Layer SDP relaxation, but only if its off-diagonals cut the feasible region, while the diagonals are implied by (11b). Therefore, including (20b) in the Layer SDP relaxation (14) can tighten the SDP relaxation. By defining and recalling that

under (13), the constraints (19a) and (20b) are merged as a linear cut constraint for each

:

When

is also needed. In integrating the linear cut constraints (19b), (19c) and (21) into (14) yields Layer RLT- SDP relaxation:

subject to:

(19b), (19c), (21). In embodiments where the initial linear constraints (12b) are not applied, Layer RLT- SDP may be formulated using further linear cut constraints (19a) in place of (21). Moreover, analogous constraints may be constructed for global SDP relaxation (7). Considering now an exemplary embodiment according to RLT-SDP relaxation (22), simple numerical examples in Figure 4 show that adding each of linear cuts (19b), (19c) and (21) shrinks the relaxation region of

and thus tightens the Layer SDP relaxation. In particular, Figure 4 shows the feasible region of the tipple

by adding linear cuts (19b), (19c) and (21), with

. Left to right columns: 1) inactive neuron

2) unstable neuron

3) strictly active neuron

For all cases, adding each linear cut removes a portion of the relaxation region. In fact, the Layer RLT-SDP relaxation (22) offers a provably tighter bound than layer SDP relaxation (14), that is

Inequality (23) holds even when only a portion of the further linear constraints (19b), (19c) and (21) are added to Layer SDP (14). In a computationally efficient implementation of RLT-SDP relaxation, the semidefinite programming relaxation may be iteratively repeated, wherein at each iteration one or more of the further linear constraints (e.g. (19b), (19a) and (21) or (19a)) are added to the portion of the further linear constraints the semidefinite programming relaxation is subjected to. In an Layer RLT-SDP relaxation according to (22), the number of linear inequalities introduced by (19b), (19c) and (21) for each

and

(by removing diagonals), respectively. For

, extra linear inequalities are needed. The total number of inequalities for each

is

, and for

Compared to directly imposing constraints (17) (which introduces

inequalities), adding (19b), (19c) and (21) has a lower computational burden, especially for large neural networks. To further increase computational efficiency of adding (19b), (19c) and (21) a strategy is deployed based on two observations: • The linear cut constraints (19b) and (19c) capture inter-layer dependencies (i.e. terms

Since

, the dependencies are also reflected in the weighting matrix ^_^. Hence, the structure of w_i can be exploited to efficiently adding (19b) and (19c). • The linear cut constraints (21) captures the intra-layer interactions (i.e. terms which cannot be clearly indicated by neural network parameters (weights or biases). Therefore, for an efficient implementation of Layer RLT-SDP a portion of the linear cut constraints (19b) and (19c) is used.

Algorithm 0 describes an example of an efficient implementation of the Layer RLT-SDP relaxation. The portion of linear constraints added at each iteration are set by choosing the sequence

The sequence

and the maximum iteration

_{ can be adapted to the computational power available. In some implementations, a different sequence

_^ can be chosen for each individual layer. In Algorithm 0 the sequence

is constant across all layers. The matrix

stores the ordering (in descending order) of the elements in each row of

The ordering ensures that the portion of the linear cut constraints with larger influences on shrinking the feasible region of the SDP relaxation are added first. This is based on the consideration as follows: For neuron m at layer i + 1, its pre-activation is

, where is a row vector. Let

and

be any two elements of

and their corresponding inputs are

and

respectively. If

, when compared to those linear cuts about

the linear cuts about

has a bigger influence on the feasible region of

Figure 5 provides an example for this, where it is seen that the linear cut constraints about

contribute more than

_, in shrinking the feasible region of

. The feasible region of

, , and

is shown where a part or all of linear cuts (19b) and (19c) are added. Adding only the linear cuts about

(i.e. the larger element of

yields a feasible region close to one with full constraints on

Algorithm 0 has the following property: The relation holds

under any choice of

At any given iteration k of Algorithm 0, we have that

At each iteration, the layer RLT-SDP relaxation (22) is solved with a total number of

linear constraints. This is computationally lighter than the problem obtained by adding all the inequalities in (19b) and (19c). Furthermore, before running the algorithm, we can also remove the inactive neurons and simplify the constraints of stable neurons to reduce the sizes of the constraints

This can be realised by examining the activation pattern of the neural network under a given verification query and will not relax the solution. The exemplary method here for tightening Layer SDP (14) subject to initial linear constraints (12b) by subjecting Layer SDP to further linear constraints, may be analogously applied to global SDP (7), SDP2 (9) or Layer SDP not subject to initial linear constraints (12b). In any of these variations, the SDP relaxation is further tightened. In another exemplary variation on semidefinite programming relaxation, one or more non-linear constrains are determined from the algebraic constrains on the output of each layer of the network that tighten the semidefinite programming relaxation. In a preferred embodiment, the semidefinite programming relaxation is Layer SDP and a non-linear constraint is determined for each layer of the network from the algebraic constraints for that layer. A tighter semidefinite programming relaxation can verify more non-convex NN verification instances. Generally, non-linear constraints require solving a non-convex semidefinite programming relaxation. Such non-convex problems are generally much more computationally expensive than convex semidefinite programming problems, requiring more computational resources and being slower to solve. Referring to Figure 6, a method is provided that solves the semidefinite programming relaxation subject to one or more non-linear constraints computationally efficiently. At step 610 the semidefinite programming relaxation not subject to the non-linear constraints, optionally Layer SDP, is solved. If this semidefinite programming relaxation verifies the neural network is robust across the range of inputs no further action is required. Otherwise, at step 620 of the method it is determined that the semidefinite programming relaxation not subject to non-linear constraints does not verify the neural network as robust across the range of inputs. At step 630 of the method, one or more non-linear constraints is determined from the algebraic constraints on the output of each layer the neural network. If the semidefinite programming is Layer SDP, a non-linear constraint of the same algebraic form is determined for each layer of the neural network from the algebraic constraints for that layer. A promising way to reduce the relaxation gap of SDPs, i.e. tighten SDPs, is to introduce constraints to enforce the rank condition rank(P)=1 implied by (7) achieving

γ when this condition is fulfilled. This condition carries through to (9) and can be reformulated in (14) as rank (P_i)=1,

. Approaches to tighten SDP relaxation (7) using this condition have introduced the non-convex cuts

, where

and

In an exemplary embodiment, we introduce a non-linear constraint (24e) to each P_i in (14) and obtain non-convex layer SDP relaxation:

where

, , are user-specified constant vectors. For a given verification instance, the Layer SDP relaxation (24) fulfils

Due to non-linear constraint (24e) Layer SPD relaxation (24) is non-convex and is harder to solve than the original Layer SDP relaxation (14). The subsequent method steps circumvent the non-convexity issue by an iterative process that recursively solves an auxiliary convex SDP problem of around the same size as (14) and iteratively generates an objective value sequence that initializes form

and monotonically converges to

At step 640, the method sets the first current objective value of the objective value sequence to the objective value of the semidefinite programming relaxation not subject to non-linear constraints, and constructs the user specified constant vectors of the non- linear constraints. The user specified constant vectors are constructed such as to ensure a solution of the auxiliary convex SDP problem can be used to calculate the current objective value of the objective value sequence at each iteration. In an exemplary embodiment, where the semidefinite programming relaxation not subject to the non- linear constraints is Layer SDP the user specified constant vectors are the vectors v_i of (24e). At step 650, the method solves the auxiliary convex semidefinite programming relaxation and determines the current objective value of the objective value sequence at the iteration from the solution of the auxiliary convex semidefinite programming relaxation. At step 660, the method determines if the outcome of the semidefinite programming relaxation determined by the current objective value of the objective value sequence verifies the neural network is robust across the range of inputs. If the neural network is robust, at step 670, the method outputs that the neural network is robust across the range of inputs. Outputting, that the network is robust as soon as this is determined by the method saves computational power in avoiding calculating to

even when the verification instance can be resolved at an earlier stage. If the neural network cannot be verified at step 660, the method determines at step 680, if the objective value of the auxiliary convex semidefinite programming problem is smaller than or equal to a predetermined value. The predetermined value ensures the method determines

within a user defined tolerance. If the answer is “Yes” the method determines that the neural network cannot be verified across the range of inputs at step 690. If at step 680 the answer is “No”, the method returns to step 650 completing an iteration of the method. Further detail on steps 640 to 690 in an exemplary embodiment are provided below. In this exemplary embodiment, the semidefinite programming relaxation not subject to non-linear constraints is Layer SDP according to (14) and the non-linear constraint for each layer of the network is (24e). Although the non-convex layer SDP relaxation (24) is generally hard to solve, its optimal objective value

is bounded below by

This lower bound

_pq can be efficiently solved from the convex layer SDP relaxation (14). Hence, we can set

as a start point to search for the value of

at step 640. This inspires us to generate an objective value sequence

by solving an auxiliary convex SDP problem recursively. The sequence that is bounded by

and can converge to

Thereby, the objective value sequence is always tighter than

and remains a valid boundary to and

can be used for NN robustness verification. The auxiliary convex SDP problem has the form of:

where

In (25a) the weight α is a user-specified positive constant. Its value is set as

1 to penalize more on the SDP relaxations of the firslt L-1 layers. This is useful to obtain a tighter neural network output, as it is influenced by the SDP relaxations of the first L-1 layers. The scalars

, can be chosen as any non-zero constraints. The choice of

is iteratively updated for every repetition of step 650, as will be discussed later. In (25c), the vectors

have fixed values and are constructed, at step 640, from Algorithm 1 by exploiting the activation pattern of the neural network.

The construction of vectors }_^ by Algorithm 1 ensures the equality constraint (25c) and

always hold. Thus, at step 650, solving the auxiliary convex SDP relaxation allows us to determine the current objective value of the objective value sequence as

The optimal objective value of the auxiliary SDP problem (25) has the following properties: 1)

for any given

2) if and only if the feasible solution satisfies

3) When

. 4) If

is chosen to satisfy

then

. Therefore, if we choose the scalar

such that

, then solving the auxiliary SDP problem (25) gives the objective value

at step 650. In this embodiment, the iterative loop encapsulated by steps 650 to 690 iteratively updates the value of

and generates the objective value sequence that

converges to

An exemplary iterative algorithm that outputs the current objective value at a final iteration

_q when a current objective value of the auxiliary convex

semidefinite programming problem at an iteration k is smaller than a predefined value ε is Algorithm 2.

The iterative algorithm is based on solving the auxiliary convex SDP problem (25) at each iteration with the scalar that is changed with the iterations. The initial value of is set as

, where

is the optimal objective value of the layer SDP relaxation (14) determined at step 610 (see Line 2). For each given

the auxiliary SDP problem (25) is solved to obtain the objective value

(see Lines 5 and 6). At each iteration, the obtained optimal objective value

of problem (25) is used to update the value of

(see Line 7). The iteration is terminated when

is smaller than a prescribed tolerance

(see Line 8). Algorithm 2 outputs the objective value

which is used to determine whether the neural network is robust across the range of inputs. The sequence generated by Algorithm 2 has the properties:

Therefore, the sequence satisfies

and converges

to

by setting

Therefore,

can be used to check when the sequence converges, enabling the use of as a stopping criterion in

Algorithm 2. The objective value sequence generated by Algorithm 2 correspondingly has the

property

and monotonically increases to converge to

Thus, every current objective value

in the objective value sequence is a valid lower bound to and subsequently y^∗. Moreover, the calculated

objective values at all iterations are at least as good as

and converge to the optimal objective value

of the non-convex layer SDP relaxation (14). In this sense, the proposed iterative algorithm is an efficient method to solve the non-convex layer SDP relaxation (24), which would otherwise be hard to solve directly. When Algorithm 2 is applied in the method of Figure 6, the iteration is executed only when

, i.e., the method determines at step 620 that the Layer SDP does not verify the neural network as robust across the range of inputs. Further method steps 660 and 670 are incorporated in Algorithm 2 such that the iteration is terminated whenever a positive value of is found, even though

as required at step

680 is not reached yet. This approach reduces computational cost, as the layer SDP approach can already verify a considerable number of instances. Also, it is clear that the use of our iterative algorithm can increase the number of instances that can be verified. Figure 7 illustrates an example system capable of verifying a neural network. Such a system comprises at least one processor 402, which may receive data from at least one input 404 and provide data to at least one output 406. The processor may be configured to perform the method outlined above. Results The benefits of the approaches described above have been demonstrated experimentally, as illustrated in Figure 8, 9A, 9B and 10. In particular, the described approaches have been compared against other state-of-the-art (SoA) convex relaxation methods, including: 1) the SDP formulation in [Raghunathan et al., 2018]; the advanced SDP-FO algorithm described in S. Dathathri, et al, Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming., NeurIPS20, 2020.2) Recent SoA LP relaxation methods, referred to as kPoly (G. Singh, T. Gehr, M. Puschel, and¨ M. Vechev. An abstract domain for certifying neural networks, Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019), OptC2V (C. Tjandraatmadja, R. Anderson, J. Huchette, W. Ma, K. PATEL, and J. Vielma, The convex relaxation barrier, revisited: Tightened singleneuron relaxations for neural network verification, In NeurIPS20, 2020; hereafter referred to as [Tjandraatmadja et al.2020]), IBP (Gowal, S., Dvijotham, K. D., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Arandjelovic, R., Mann, T., and Kohli, P. Scalable verified training for provably robust image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (IEEE/CVF19), pp.4842–4851, 2019) and PRIMA (Müller, M. N., Makarchuk, G., Singh, G., Püschel, M., and Vechev, M. PRIMA: Precise and general neural network certification via multi-neuron convex relaxations. arXiv preprint arXiv:2103.03638v2, 2021; referred to as [Müller et al. 2021]).3) Complete methods: β-CROWN (Wang, S., Zhang, H., Xu, K., Lin, X., Jana, S., Hsieh, C.-J., and Kolter, J. Z. Beta-crown: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification. arXiv preprint arXiv:2103.06624v1, 2021), MILP (Tjeng, V.; Xiao, K.; and Tedrake, R.2019. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations (ICLR19), 1–21) and AI² (Gehr, T.; Mirman, M.; Drachsler-Cohen, D.; Tsankov, P.; Chaudhuri, S.; and Vechev, M. 2018. AI2: Safety and robustness certification of neural networks with abstract interpretation. In IEEE Symposium on Security and Privacy (SP18), 3–18. IEEE). In a first experiment illustrated in Figure 8, the standard robustness verification problem for image classifiers is addressed: given a correctly classified image, verify that the NN returns the same label for all input within an

perturbation of B. Formally, given an image

with a label ( and a radius

, a neural network is verified to be robust on

, if (2) is y^⋆ positive for all ( ( . For LP- and SDP-based relaxation methods, we solve (2) multiple times for every potential adversarial target

and check whether the lower bound is positive. We consider the formulation (7), originally proposed in Raghunathan et al., 2018 and our SDP formulations from (9), (14), and (16) for experiments. Note that (9) and (14) are equivalent. In these results, (9) and (14) are referred to as LayerSDP, and its relaxed version (16) as FastSDP. The standard LP relaxation (4) is also illustrated as a benchmark. The formulation (7) is denoted as SDP-IP. The lower and upper bounds

were computed using a symbolic interval propagation algorithm. To get an upper bound of verified accuracy, a projected gradient descent (PGD) algorithm is run. For numerical computation, the convex relaxations are converted into a standard conic optimization before passing them to a numerical solver. An automatic transformation from the convex relaxations into standard conic optimization was implemented. The resulting LP/SDPs were then solved by MOSEK (see ApS Mosek. The mosek optimization toolbox for matlab manual, 2015). Time reported by MOSEK is presented for comparison. The neural networks considered comprised eight fully connected ReLU networks trained on the MNIST dataset. To facilitate the comparison with existing tools, experiments were divided into three groups: 1) One self-trained NN with two hidden layers, each having 64 neurons; no adversarial training was used. The perturbation radius B was varied from 0.01 to 0.05; 2) Three NNs from [Raghunathan et al., 2018]: MLP-SDP, MLP-LP, and MLP-Adv.3) Four deep NNs from G. Singh, T. Gehr, M. Puschel, and M. Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019. For each network, the first 100 images forming the MNIST test set were verified and those incorrectly classified were excluded. The experiments were performed on an Intel(R) i9-10850K CPU 3.60GHz machine with 32 GB of RAM, except for SDP-FO which was carried out on an Intel i71065G7 with 15GB RAM. Figure 8 reports the verified accuracy for the 64×2 network with different perturbation radius ^ using different verifiers: LayerSDP and FastSDP, SDP-IP, and standard LP. As expected, the LayerSDP approach offers improved robust accuracies than SDP-IP and LP across different ^. Interestingly, it is observed that SDP-IP verified fewer images than the standard LP relaxation when

and required longer time. This indicates that the behaviour in Figure 8 persists in practical NN verification, confirming the tightness of LayerSDP. Furthermore, a combination of inactive neuron pruning and layer decomposition made LayerSDP and FastSDP two orders of magnitude faster to solve than SDP-IP. We observe that SDP-FO verified fewer images than SDP-IP using similar computational time. The results in Table 1 demonstrate that LayerSDP is also much faster than SDP-IP, while being more precise than the LP baseline across the networks considered. Furthermore, for the robustly trained NNs (MLPAdv, MLP-SDP, MLP-LP), LayerSDP achieved a very good verified accuracy compared to PGD, with MLP-SDP and MLP-LP matched. Compared to the SoA LP-based methods, kPoly and OptC2V, LayerSDP significantly improved the verified accuracy for 6×100 and 6×200 networks, while remaining competitive for the other two networks. The results suggest that the linear cuts in kPoly and OptC2V can be potentially combined in LayerSDP to obtain a stronger relaxation. Table 1

Time is reported as runtime per image (in seconds). † These results are taken from previously reported values; Dashes (–) indicate previously reported numbers are unavailable. The verified accuracies obtained through this implementation of SDP-FO were slightly lower than previously reported numbers due to different hyper-parameters. SDP-FO failed to verify any instance within maximum iterations in the 6 x 100, 9 x 100, 6 x 200 or 9 x 200 models. ∗: To facilitate time consumption comparison, SDP-IP was run over three images for these networks on the experimental equipment and an average time was taken. Two further sets of experiments were carried out to evaluate the precision and scalability of the Layer RLT-SDP relaxation (22) as well as Algorithm 0. These two experiments were run on a Linux machine with an Intel i9-10920X 3.5 GHz 12-core CPU with 128 GB RAM. The optimisation problems were modelled by using YALMIP (Lofberg, J.2004. YALMIP: A toolbox for modeling and optimization in MATLAB. In IEEE International Conference on Robotics and Automation (ICRA04), 284–289. IEEE) and solved using MOSEK (Andersen, E. D.; and Andersen, K. D.2000. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization, 197–232. Springer). Results obtained are compared against presently available SoA methods and tools. In the first of the two experiments which evaluates the efficacy of the implementation strategy as illustrated in Figure 9A and 9B, two groups of two-input, two-output, fully- connected random ReLU NNs generated by using the method in (Fazlyab, M.; Morari, M.; and Pappas, G. J.2020. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Transactions on Automatatic Control, doi:10.1109/TAC.2020.3046193, referred to as [Fazlyab, Morari, and Pappas 2020]) are considered. Group 1 had four models with L = 4, 6, 8, 10 hidden layers, respectively, and 15 neurons for each hidden layer. Group 2 had four three-layer models, with ni = 10, 15, 50, 100 neurons per hidden layer, respectively. Both the network depth and width are investigated by using RLT-SDP (hereafter used interchangeably with Layer RLT-SDP) to obtain an over-approximation of the feasible output region of the neural network for a given input set. The test inputs were random values within [0; 1] and the heuristic method in [Fazlyab, Morari, and Pappas 2020] was adopted to compute the over-approximations. Algorithm 0 was run with

and

. Without linear cuts (p1 = 0), RLT-SDP is equivalent to LayerSDP. We first studied the impact of network depth on the verification method here proposed by using the models in Group 1. Figure 9A shows over-approximations of the feasible output region by solving RLT-SDP with different percentages of linear cuts (percentages indicated in the legend) for networks of different hidden layers L. The 0% case is LayerSDP. For all the four models considered, adding a larger percentage of linear cuts yields a tighter over-approximation. As the number of hidden layers L increases, LayerSDP becomes looser and the effects of adding linear cuts becomes more significant. The figures show that across all models, even using just 20% of the linear cuts considerably reduces the over-approximation. To further analyse the gain in the approximation versus the corresponding increase in computational complexity, we considered two metrics: the improvement in approximation (or tightness) and the runtime increase. The former is the relative reduction in the feasible output regions obtained by RLT-SDP and LayerSDP; the latter is the relative increase in their runtime. Figure 9B shows the tightness improvement and runtime increase obtained by solving RLT-SDP with different percentages of linear cuts for networks of different hidden layers L. The 0% case is LayerSDP. As expected, Figure 9B illustrates that adding a larger proportion of linear cuts yields a tighter over-approximation, along with an increase in runtime. Adding the same percentage of linear cuts leads to a more significant tightness improvement on larger networks (with larger L) than on smaller ones. For each network, as the percentage of linear cuts increases, the tightness improvement becomes less significant, but the runtime increase becomes more significant. Particularly, experimentally it is found that the first 20% of linear cuts contributes most significantly to the improvement in overall tightness of the method. We evaluated the impact of network width by using the models in Group 2 and observed very similar behaviour of the method. These experiment results clearly confirm

y^∗ and demonstrate the efficiency of Algorithm 0. Further, the addition of 20% of linear cuts could be sufficient to improve considerably the precision of the SDP approach without incurring the higher computational costs associated with larger problems. In the second of the two experiments RLT-SDP is compared to SoA methods. Three groups of fully connected ReLU neural networks trained on the MNIST dataset are considered. • Small NNs: MLP-Adv, MLP-LP and MLP-SDP from [Raghunathan et al., 2018] and tested under the same perturbation є = 0.1 as in [Raghunathan et al., 2018] and the experiment illustrated in Figure 8. • Medium NNs: Models 6 × 100 and 9 × 100 from (Singh, G., Gehr, T., Püschel, M., and Vechev, M. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41:1–41:30, 2019; here after referred to as [Singh et al., 2019a]) and evaluated under the same є = 0.026 and є = 0:015 as in [Singh et al.2019a], [Tjandraatmadja et al. 2020], [Müller et al.2021] and the experiment illustrated in Figure 8. • Large NNs: Models 8 × 1024-0.1 and 8 × 1024-0.3 from (Li, L., Qi, X., Xie, T., and Li, B. Sok: Certified robustness for deep neural networks. arXiv preprint arXiv:2009.04131, 2020; hereafter referred to as [Li et al., 2020]), which were trained using CROWN-IBP (Zhang, H.; Chen, H.; Xiao, C.; Gowal, S.; Stanforth, R.; Li, B.; Boning, D.; and Hsieh, C.-J.2019. Towards stable and efficient training of verifiably robust neural networks. arXivpreprint arXiv:1906.06316) with adversarial attack є = 0.1, 0.3, respectively. As in [Li et al.2020], they were tested under the perturbations є = 0.1, 0.3, respectively. To evaluate the efficiency of the proposed RLT-SDP method against the SoA, we benchmarked the technique on the neural networks built on the MNIST dataset described above. All experiments were run on the first 100 images of the dataset. The results obtained are reported in Table 2, where the runtime is the solver time. The PGD upper bounds of MLP-Adv, MLP-LP, MLPSDP, 6 × 100 and 9 × 100 are reiterated from Table 1, while those of 8 x 1024-0.1 and 8 × 1024-0.3 are from [Li et al.2020]. Motivated by the experiment illustrated in Figures 9A and 9B, we ran Algorithm 0 with the sequence {0.1, 0.2} and kmax = 2. As in LayerSDP, we further optimised RLT-SDP by removing inactive neurons in the first step. The results show that RLT-SDP based on the interval arithmetic bounds is more precise than LayerSDP under the same bounds and all other baseline methods for all the networks. One exception is the 9 × 100 network, for which β-CROWN achieves the highest precision. By using the tighter symbolic bound propagation (Botoeva, E.; Kouvaros, P.; Kronqvist, J.; Lomuscio, A.; and Misener, R.2020. Efficient verification of neural networks via dependency analysis. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI20), 3291–3299), RLT-SDP significantly outperformed all the incomplete/complete baseline methods. As expected we found RLT-SDP to be significantly more computationally demanding than LayerSDP across all the networks. However, it still resulted to be faster than SDP- IP for MLP-Adv, MLP-LP, MLP-SDP, 6×100 and 9×100. Neither SDP-IP nor SDP-FO could verify the large networks 8 × 1024-0.1 and 8 × 1024-0.3. SDP-FO fails to verify 6 × 100 and 9 × 100. These results confirm that RLT-SDP remains competitive in terms of computational efficiency. We note that the runtime of LayerSDP in Table 2 is larger than that reported in Table 1. This is because we directly solved the layer SDP relaxation (14), without implementing SparseColO (Fujisawa, K.; and et al.2009. User’s manual for SparseCoLO: Conversion methods for sparse conic-form linear optimization problems. Dept. of Math. and Comp. Sci. Japan, Tech. Rep., 152–8552.) or the automatic model transformation as in Table 1. While these methods reduce computation times, they are equally applicable to RLT-SDP. Hence, the results presented in Table 2 provides a like-for-like comparison between LayerSDP and RLT- SDP. Table 2

†: Results taken from previous reports or Table 1. -: Previously reported number unavailable. ⋄: The methods fail to verify any instance. *: The runtime is estimated by running over five images using the same interval arithmetic bounds. |: The certified robustness on the left and right are obtained using interval arithmetic bounds and symbolic interval propagation, respectively In a final experiment illustrated in Figure 10, we compare the iterative method described in Figure 6 to solve the non-convex semidefinite programming problem (24), hereafter referred to as IterSDP against the SoA incomplete methods for verification: β- CROWN, LP, IBP, OptC2V, LayerSDP formulated in (14), SDP-IP and SDP-FO. The experiments are conducted on Linux machine running an Intel i9-10920X 3.5 GHz 12-core CPU with 128 GB RAM. The optimisation problems were modelled by using the toolbox YALMIP (Lofberg, J. Yalmip: A toolbox for modeling and optimization in matlab. In IEEE International Conference on Robotics and Automation (ICRA04), pp. 284–289. IEEE, 2004) and solved using the SDP solver MOSEK (Andersen, E. D. and Andersen, K. D. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization, pp. 197–232. Springer, 2000). To run Algorithm

are chosen. IterSDP is evaluated on several fully-connected ReLU NNs trained on the MNIST dataset (where “m × n” means a NN with m − 1 hidden layers each having n neurons): 1) One 3 × 50 network self-trained with no adversarial training, tested with perturbation radius ^ from 0.01 to 0.09. 2) The three small size networks MLP-Adv, MLP-LP, and MLP-SDP are from [Raghunathan et al., 2018] are tested under the same perturbation

as in [Raghunathan et al., 2018] and the experiment illustrated in Figure 8. 3) A medium size network 6 × 100 is from [Singh et al., 2019a] and evaluated under the same

as in [Singh et al., 2019a], [Müller et al.2021] and the experiment illustrated in Figure 8. 4) Two large size networks 8 × 1024-0.1 and 8 × 1024-0.3 are from (Li, L., Qi, X., Xie, T., and Li, B. Sok: Certified robustness for deep neural networks. arXiv preprint arXiv:2009.04131, 2020; hereafter referred to as [Li et al., 2020]). They were trained using CROWN-IBP (Zhang, H., Chen, H., Xiao, C., Gowal, S., Stanforth, R., Li, B., Boning, D., and Hsieh, C.-J. Towards stable and efficient training of verifiably robust neural networks. arXiv preprint arXiv:1906.06316, 2019) with adversarial attack

respectively. As in [Li et al., 2020], they were tested under perturbations

respectively. To ensure a fair comparison, the proposed method IterSDP and re-implemented the baseline methods LayerSDP, SDP-IP and LP were run based on the same interval arithmetic bounds. All experiments were run on the first 100 images of the dataset. Figure 10 shows the computational results of the 3 × 50 network under different perturbation radius ^ and by using the methods IterSDP, LP, SDP-IP and LayerSDP. The IterSDP method outperforms the baselines across all the ^ values. This confirms the relation

Notably, IterSDP improves the verified robustness up to the PGD bounds for several

^ IterSDP requires more runtime (about twice) when compared to LayerSDP, but it is still computationally cheaper than SDP-IP. This is expected since Algorithm 2 uses LayerSDP to initialise and solves the auxiliary SDP whose size is similar to the layer SDP relaxation. Table 3 reports the verified robustness (percentage of images that are verified to be robust) and runtime (average solver time for verifying an image) for each method. The PGD upper bounds of MLP-Adv, MLP-LP, MLP-SDP and 6 × 100 are reiterated from Table 1 for direct comparison, while those of 8 × 1024-0.1 and 8 × 1024-0.3 are from [Li et al., 2020]. The results show that IterSDP is more precise than LayerSDP under the same bounds and all other baseline methods for all the networks. One exception is the MLP-LP network, for which the methods IterSDP, LayerSDP, SDP-IP and LP all reach the PGD upper bound. Remarkably, IterSDP increases the number of verified instances by 20% for the 6 × 100 network. For all the other networks, IterSDP obtained the number of verified cases that is close to or same as the PGD upper bound. It is also worth mentioning that IterSDP outperforms the SoA complete methods MILP and AI² according to the numbers reported in [Li et al., 2020]: MILP verified 67% (respectively, 7%) for 8 × 1024-0.1 (respectively, 8 × 1024-0.3), and AI2 verified 52% (respectively, 16%) for 8 × 1024-0.1 (respectively, 8 × 1024-0.3). As expected we found IterSDP needs a larger amount of runtime (around twice) than LayerSDP across all the networks. However, IterSDP resulted to be faster than SDP-IP for MLP-Adv, MLP-LP, MLP-SDP and 6 × 100. Neither SDP-IP nor SDP-FO could verify the large networks 8 × 1024-0.1 and 8 × 1024-0.3. These results confirm that our proposed IterSDP significantly improves the verification precision, whilst retaining a competitive computational efficiency. Table 3

Time is reported as runtime per image (in seconds). *: These results are obtained by re-implementing the corresponding methods based on the same interval arithmetic bounds as our method. †: These numbers are directly taken from previously reported values. -: Previously reported numbers are unavailable. ⋄: The methods fail to verify any instance.

Claims

Claims 1. A method for verifying a neural network comprising nodes arranged in a plurality of layers, comprising the steps of: obtaining data representing a trained neural network, a set of algebraic constraints on the output of each layer the network, and a range of inputs to the neural network over which the algebraic constraints are to be verified, such that the data defines a verification problem; determining, for each layer of the network, a semidefinite constraint from the algebraic constraints for that layer; determining a set of interlayer constraints which constrain outputs of one or more of the layers to corresponding inputs of one or more adjacent layers; applying a semidefinite programming relaxation subject to the semidefinite constraints and the interlayer constraints across the range of inputs; and based on the outcome of the semidefinite programming relaxation, determining whether the neural network is robust across the range of inputs.

2. A method according to claim 1, wherein the set of interlayer constraints constrain all outputs of one or more of the layers to corresponding inputs of one or more adjacent layers

3. A method according to claim 1, wherein the set of interlayer constraints constrain a subset of outputs of one or more of the layers to corresponding inputs of one or more adjacent layers.

4 A method according to any one of the preceding claims, further comprising determining one or more initial linear constraints based on a linear approximation of an activation function for one or more nodes of the neural network, wherein the applying a semidefinite programming relaxation is further subject to the one or more initial linear constraints.

5. A method according to any one of the preceding claims, further comprising determining, for each layer of the network, one or more further linear constraints based on a upper bound and a lower bound for each of two nodes from the network, wherein a first node is from the layer of the network and a second node is either from the layer of the network or from a layer of the network adjacent to the layer of the network, expressing, the one or more further linear constraints as a upper bound and a lower bound for elements of a matrix representation of the layer of the network, and wherein the applying a semidefinite programming relaxation is further subject to the one or more further linear constraints.

6. A method according to claim 5 when dependent on claim 4, wherein determining, for each layer of the network, one or more further linear constraints expressed as a upper bound and a lower bound for elements of the matrix representation comprises calculating the upper and lower bounds given the range of inputs of the neural network and the one or more initial linear constraints.

7. A method according to any one of the preceding claims, further comprising determining for each layer of the network, a non-linear constraint from the algebraic constraints for that layer, wherein the applying a semidefinite programming relaxation is further subject to the non-linear constraint for each layer of the network.

8. A method according to claim 7, wherein an objective value of the semidefinite programming relaxation determines the outcome of the semidefinite programming relaxation; and the objective value of the semidefinite programming relaxation is monotonically approached by an objective value sequence that converges to the objective value of the semidefinite programming relaxation, wherein a starting point of the objective value sequence is an objective value of the semidefinite programming relaxation not subject to the non-linear constraint for each layer of the network; and the objective value sequence is determined iteratively by solving an auxiliary convex semidefinite programming problem recursively, wherein a current objective value of the objective value sequence determined at an iteration is sequential to the objective values of the objective value sequence determined in prior iterations, wherein a current objective value of the auxiliary convex semidefinite programming problem is an objective value of the auxiliary convex semidefinite programming problem at the iteration.

9. A method according to claim 8, wherein the objective value of the auxiliary convex semidefinite programming problem is always greater than or equal to zero; and the objective value of the auxiliary convex semidefinite programming problem is equal to zero when the non-linear constraint for each layer of the network is satisfied.

10. A method according to any one of the preceding claims, further comprising removing terms associated with nodes which are inactive across the range of inputs from the semidefinite constraints.

11. A method according to any one of the preceding claims, wherein the semidefinite constraints comprise positive semidefinite constraints.

12. A method according to any one of the preceding claims, wherein the neural network is a feed forward neural network.

13. A method according to any one of the preceding claims, wherein the nodes of the neural network apply a Rectified Linear Unit (ReLU) activation function.

14. A method according to any of the preceding claims, wherein the neural network is an image processing neural network which takes an image as input.

15. A method according to any of claims 1-13, wherein the neural network is a controller neural network for controlling a physical device.

16. A computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any one of the preceding claims.

17. A perception system comprising one or more processors configured to carry out the method of any one of claims 1 to 15.