WO2022113083A1

WO2022113083A1 - Method and system for visualizing neural network output

Info

Publication number: WO2022113083A1
Application number: PCT/IL2021/051411
Authority: WO
Inventors: Shir GUR; Lior Wolf
Original assignee: Ramot At Tel-Aviv University Ltd.
Priority date: 2020-11-26
Filing date: 2021-11-26
Publication date: 2022-06-02

Abstract

A method of visualizing an output of a trained multi-layer neural network comprises feeding input data to the neural network. For each layer of the neural network, an attribution tensor and a set of feature gradients are calculated. The attribution tensor is corrected based on factorizations of an input feature map of the layer and of the feature gradients, and an output in which elements of the input data are highlighted according to the corrected attribution tensor is generated.

Description

METHOD AND SYSTEM FOR VISUALIZING NEURAL NETWORK OUTPUT

RELATED APPLICATION

This application claims the benefit of priority of U.S. Provisional Patent Application No. 63/118,654 filed on November 26, 2020, the contents of which are incorporated herein by reference in their entirety.

FIELD AND BACKGROUND OF THE INVENTION

The present invention, in some embodiments thereof, relates to visualization and, more particularly, but not exclusively, to a method and system for visualizing neural network output, for example, for the purpose of explaining image classification performed by the neural network.

Neural network visualization techniques mark image locations by their relevancy to the network's classification. Typically, the visualization is by highlighting image regions that mostly affect the classification, providing a characterization to the neural network known as "explainability". Some techniques for providing explainability are found in Selvaraju el al., In: Proceedings of the IEEE international conference on computer vision pp. 618-626 (2017), Smilkov et al., arXiv: 1706.03825 (2017), Srinivas el al., In: Advances in Neural Information Processing Systems pp. 4126-4135 (2019), Sundararajan et al., In: Proceedings of the 34th International Conference on Machine Learning- Volume 70. pp. 3319-3328. JMLR. 336 org (2017), Bach et al., PloS one 10(7), e0130140 (2015), Gu et al., In: Asian Conference on Computer Vision pp. 119-134. Springer (2018), Iwana et al., arXiv: 1908.04351 (2019), and Nam etal, arXiv: 1904.00605 (2019).

SUMMARY OF THE INVENTION

According to some embodiments of the invention the present invention there is provided a method of visualizing an output of a trained multi-layer neural network. The method comprises: feeding input data to the neural network; for each layer of the neural network: calculating an attribution tensor and a set of feature gradients for the layer, and correcting the attribution tensor based on factorizations of an input feature map of the layer and of the feature gradients; and generating an output in which elements of the input data are highlighted according to the corrected attribution tensor.

According to some embodiments of the invention the factorizations are based on the attribution tensor as calculated prior to the correction. According to some embodiments of the invention the method comprises calculating a first attribution tensor C⁽ⁿ⁾ and a second attribution tensor A⁽ⁿ⁾, wherein the factorizations are based on the first attribution tensor but not the second attribution tensor, and wherein the correction is based also on the second attribution tensor.

According to some embodiments of the invention the calculation of the attribution tensor is by propagation across layers of the neural network.

According to some embodiments of the invention the propagation satisfies a conservation rule. According to some embodiments of the invention the conservation rule is that a sum of elements of the attribution tensor is constant.

According to some embodiments of the invention the calculation of the set of feature gradients is by propagation across layers of the neural network.

According to some embodiments of the invention the method comprises calculating a residual tensor for the layer using the factorizations, wherein the correction of the attribution tensor is based on the residual tensor. According to some embodiments of the invention the correction comprises shifting the attribution tensor by the residual tensor.

According to some embodiments of the invention the input data comprise an image.

According to some embodiments of the invention the image is a medical image.

According to some embodiments of the invention the input data describe electric, magnetic, or ultrasound signals received from a human or animal subject.

According to some embodiments of the invention the input data describe an acoustic signal.

According to some embodiments of the invention the input data comprise a corpus of text.

According to some embodiments of the invention the input data comprise natural language data.

According to some embodiments of the invention the corpus of text is a programming language source code.

According to some embodiments of the invention the input data comprise an object code, optionally and preferably an object code generated by compiler software.

According to some embodiments of the invention the input data comprise bioinformatics data.

According to an aspect of some embodiments of the present invention there is provided a computer software product. The computer software product comprises a computer-readable medium in which program instructions are stored, which instructions, when read by a data processor, cause the data processor to execute the method as delineated above and optionally and preferably as further detailed below.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.

For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings and images. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced. In the drawings:

FIGs. 1A-F show visualizations of outputs of a trained multi-layer neural network according to some embodiments of the present invention for a pre-trained VGG-19;

FIGs. 2A-C show comparisons between methods for VGG-19 (FIG. 2A) and ResNet-50 (FIG. 2B), and visualization of two different classes for VGG-19;

FIGs. 3A and 3B show negative perturbation results on a ImageNet validation set of predicted (FIG. 3A) and target (FIG. 3A) classes. Shown is the change in accuracy, when removing a fraction of the image according to the attribution value, starting from lowest to highest FIGs. 4A and 4B show quantitative results for self-supervised methods in the segmentation task. FIG. 4A shows comparison of different explainability methods for SCAN, SeLa and RotNet, and FIG. 4B shows per-layer performance of RotNet using linear-probes.

FIG. 5 shows method decomposition on the image of FIG. 1A of the paper. Visualization of the different elements of the method for a pre-trained VGG-19 network. Predicted top-class is "dog", propagated class is "cat". (1), (3), (5), and (11) are convolution layer numbers.

FIGs. 6A-F the visualization obtained for the image on the left when training a supervised linear classifier after each layer and applying the method of the present embodiments.

FIG. 7 shows visualization of two different classes for VGG-19.

FIG. 8 shows visualization of additional two different classes for VGG-19. FIGs. 9A-H show visualization of a top predicted class of a VGG-19 ImageNet trained network.

FIGs. 10A-G show visualization of a top predicted class of a ResNet-50 ImageNet trained network.

FIG. 11 shows visualization after each layer using linear probes on a RotNet AlexNet model.

FIG. 12 shows results obtained on Resnet-50 by projecting self labeling label with the highest probability, and results obtained using self-supervised explainability that adopts nearest neighbors (SSNN).

FIG. 13 shows visualization of different explanability methods on Resnet-50 trained in self-supervised regime using SCAN method.

FIG. 14 is a flowchart diagram of a method suitable for visualizing an output of a trained multi-layer neural network according to various exemplary embodiments of the present invention. FIG. 15 is a schematic illustration of a computing platform suitable for execution selected operations of the method according to some embodiments of the present invention.

DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.

The present embodiments comprise a technique for class specific visualization of neural network, particularly neural network having a plurality of hidden layers, referred to in the literature as deep neural network. In some embodiments, the deep neural network is a convolutional neural network (CNN). In various exemplary embodiments of the invention the neural network is an image recognition neural network, more preferably an image recognition deep neural network.

The technique of the present embodiments accumulates across the layers of the network both gradient information, and relevance attribution, in a manner that provides a per-class explainability.

"Per-class explainability" means identifying in the input data (the data that are fed to the neural network) those elements of the input data that belong to the class predicted by the neural network, and optionally and preferably the extent (e.g., membership level) by which each element of the input data belong to this class. For example, when the neural network is an image recognition neural network and the input data is an image, the technique of the present embodiments determines those pixels of the image that belong to the class predicted by the neural network.

The Inventors found that conventional techniques suffer from a limitation referred to herein as "saliency bias". The saliency bias is caused by misidentification of salient activations of the network, which oftentimes prevents visualizing other elements or groups of elements of the data (e.g., image objects) that belong to the predicted class, and therefore bias the explanation.

The Inventors discovered and successfully applied a technique that overcomes this limitation by correcting for salient features that would otherwise bias the explanation. In various exemplary embodiments of the invention the correction is by means of a first attribution tensor C⁽ⁿ⁾, that is calculated on a per-layer basis, and that describes the absolute influence of the input feature map and weights to the respective layer on the attribution map of an adjacent layer ( e.g ., the next layer) of the network. The correction is optionally and preferably applied both to a calculated difference between foreground and background weights of the respective layer, and to an input-gradient interaction calculated for the respective layer, for example, by performing an element-wise multiplication of each element of the input feature map of the respective layer by the element's gradient.

Referring now to the drawings, FIG. 14 is a flowchart diagram of a method suitable for visualizing an output of a trained multi-layer neural network according to various exemplary embodiments of the present invention. It is to be understood that, unless otherwise defined, the operations described hereinbelow can be executed either contemporaneously or sequentially in many combinations or orders of execution. Specifically, the ordering of the flowchart diagrams is not to be considered as limiting. For example, two or more operations, appearing in the following description or in the flowchart diagrams in a particular order, can be executed in a different order (e.g., a reverse order) or substantially contemporaneously. Additionally, several operations described below are optional and may not be executed.

At least part of the operations described herein can be implemented by a data processing system, e.g., a dedicated circuitry or a general purpose processor, configured for executing the operations described below. At least part of the operations can be implemented by a cloud computing facility at a remote location.

Computer programs implementing the method of the present embodiments can commonly be distributed to users by a communication network or on a distribution medium such as, but not limited to, a floppy disk, a CD-ROM, a flash memory device and a portable hard drive. From the communication network or distribution medium, the computer programs can be copied to a hard disk or a similar intermediate storage medium. The computer programs can be run by loading the code instructions either from their distribution medium or their intermediate storage medium into the execution memory of the computer, configuring the computer to act in accordance with the method of this invention. During operation, the computer can store in a memory data structures or values obtained by intermediate calculations and pull these data structures or values for use in subsequent operation. All these operations are well-known to those skilled in the art of computer systems. Processing operations described herein may be performed by means of processer circuit, such as a DSP, microcontroller, FPGA, ASIC, etc., or any other conventional and/or dedicated computing system.

The method of the present embodiments can be embodied in many forms. For example, it can be embodied in on a tangible medium such as a computer for performing the method operations. It can be embodied on a computer readable medium, comprising computer readable instructions for carrying out the method operations. In can also be embodied in electronic device having digital computer capabilities arranged to run the computer program on the tangible medium or execute the instruction on a computer readable medium.

The method begins at 10 and optionally and preferably continues to 11 at which a trained neural network and input data for the neural network are received. Optionally, the method also receives the target class for which the neural network has been trained. The neural network model and input data can be read from a computer readable medium or downloaded from a remote site or a cloud facility, as desired.

Preferably, but not necessarily, the input data is an image. Representative examples of images that can be used as the input data including, without limitation, visible light image, an infrared image, a wide dynamic range image, and a medical image (e.g., MRI, CT, ultrasound). Thus, the input data can describe electric, magnetic, or ultrasound signals received from a human or animal subject, which signals can be provided in the form of an image. The input data can also describe an acoustic signal, such as an acoustic signal received from a human or animal subject, or an acoustic signal received from the environment or from an non-living object. Also contemplated are embodiments in which the input data comprise a corpus of text, such as, but not limited to, a programming language source code, or data pertaining to natural language. Further contemplated, are embodiments in which the input data comprise an object code, optionally and preferably an object code generated by compiler software. Additionally contemplated, are embodiments in which the input data comprise bioinformatics data.

The neural network model is typically received as a computer code instructions, and the method is capable of executing, by a computer, the computer code instructions using the received input data as the input for the neural network.

The method continues to 12 at which the neural network is fed with the input data, and the outputs (e.g., feature maps) obtained from each layer of the neural network are stored in a computer readable medium. Alternatively, the method can receive the outputs of the layers of the neural network from an external source ( e.g ., a computer readable medium, a remote site, a cloud facility, etc.) in which case operations 11 and 12 can be skipped.

The method proceeds to 13 at which for each layer of neural network, an attribution tensor C⁽ⁿ⁾ and a set of gradients Vx⁽ⁿ⁾ of the feature map x⁽ⁿ⁾ of the layer are calculated for the layer.

The attribution tensor can be calculated by propagating across the layers of the neural network. The propagation through a given layer optionally and preferably uses the feature map of the layer, the weights of the layer, and the input class attribution map of the layer. In various exemplary embodiments of the invention the propagation is by means of the Layer-wise Relevance Propagation (LRP) described in Binder et ah, 2016. Preferably, the propagation satisfies a conservation rule, such as, but not limited to, a rule that a sum of elements of the attribution tensor is constant.

The set of feature gradients can also be calculated by propagation across the layers of neural network.Typically, such a set is calculating using the derivative chain-rule, see, for example, the chain-rule defined in EQ. 5 of the Examples section that follows.

At 14, the attribution tensor C⁽ⁿ⁾ is corrected based on a factorization F_x of the input feature map x⁽ⁿ⁾ of the layer as well as a factorization Fv_x of the feature map gradients

of the layer. For example, the attribution tensor C⁽ⁿ⁾ can be corrected by applying a shift to C⁽ⁿ⁾ using a residual attribution tensor r⁽ⁿ⁾ calculated using the factorizations F_x and Fv_x. It was found by the Inventors that an attribution tensor corrected using such a shift can be indicative of the membership levels of the elements of the input data to the predicted class.

In some embodiments of the present invention the factorizations F_x and Fv_x are based on the attribution tensor C⁽ⁿ⁾ as calculated prior to the correction. For example, the factorization F_x can be calculated based on the feature map x⁽ⁿ⁾ and the attribution tensor C⁽ⁿ⁾, and the factorization Fv_x can be calculated based on the feature map gradients Vx⁽ⁿ⁾ and the attribution tensor C⁽ⁿ⁾.

In some embodiments of the present invention the method calculates a first attribution tensor C⁽ⁿ⁾ and a second attribution tensor A⁽ⁿ⁾, wherein the factorizations are based on the first attribution tensor C⁽ⁿ⁾ but not the second attribution tensor A⁽ⁿ⁾, and wherein the correction at 14 is based on both tensors C⁽ⁿ⁾ and A⁽ⁿ⁾. For example, the residual attribution tensor r⁽ⁿ⁾ can be calculated using both tensors C⁽ⁿ⁾ and A⁽ⁿ⁾, e.g., according to EQ. 13 of the Example section that follow, but the factorizations F_x and can be calculated based on C⁽ⁿ⁾ but not based on A⁽ⁿ⁾, e.g., according to EQ. 11 of the Example section that follow. The second attribution tensor A⁽ⁿ⁾ can be calculated by means of propagation, in a similar manner to the calculation of first attribution tensor C⁽ⁿ⁾ (e.g., using LRP) except that instead of using the feature map of the layer, the propagation uses an all-one tensor having the same shape as the feature map. In these embodiments, the second attribution tensor is an input-agnostic influence tensor, which is calculated based only on the weights of the layer and the input class attribution map of the layer.

The method proceeds to 15 at which an output in which elements of input data are highlighted according to corrected attribution tensor is generated. Representative examples of generated outputs are provided in the Examples section that follows (see, for example, FIGs. 1B, 1C, 1E, 1F, 2A-B, 5, 6B-F, and 7-13.

The method ends at 16.

FIG. 15 is a schematic illustration of a client computer 130 having a hardware processor 132, which typically comprises an input/output (I/O) circuit 134, a hardware central processing unit (CPU) 136 (e.g., a hardware microprocessor), and a hardware memory 138 which typically includes both volatile memory and non-volatile memory. CPU 136 is in communication with I/O circuit 134 and memory 138. Client computer 130 preferably comprises a graphical user interface (GUI) 142 in communication with processor 132. I/O circuit 134 preferably communicates information in appropriately structured form to and from GUI 142. Also shown is a server computer 150 which can similarly include a hardware processor 152, an I/O circuit 154, a hardware CPU 156, a hardware memory 158. I/O circuits 134 and 154 of client 130 and server 150 computers can operate as transceivers that communicate information with each other via a wired or wireless communication. For example, client 130 and server 150 computers can communicate via a network 140, such as a local area network (FAN), a wide area network (WAN) or the Internet. Server computer 150 can be in some embodiments be a part of a cloud computing resource of a cloud computing facility in communication with client computer 130 over the network 140.

GUI 142 and processor 132 can be integrated together within the same housing or they can be separate units communicating with each other. GUI 142 can optionally and preferably be part of a system including a dedicated CPU and I/O circuits (not shown) to allow GUI 142 to communicate with processor 132. Processor 132 issues to GUI 142 graphical and textual output generated by CPU 136. Processor 132 also receives from GUI 142 signals pertaining to control commands generated by GUI 142 in response to user input. GUI 142 can be of any type known in the art, such as, but not limited to, a keyboard and a display, a touch screen, and the like. Client 130 and server 150 computers can further comprise one or more computer-readable storage media 144, 164, respectively. Media 144 and 164 are preferably non-transitory storage media storing computer code instructions for executing the method as further detailed herein, and processors 132 and 152 execute these code instructions. The code instructions can be run by loading the respective code instructions into the respective execution memories 138 and 158 of the respective processors 132 and 152.

Each of storage media 144 and 164 can store program instructions which, when read by the respective processor, cause the processor to receive the neural network, input data, and optionally the target class, as further detailed hereinabove. The program instructions can also cause the processor to feed the input data to the neural network, to calculate the attribution tensor and the feature gradients, to correct the attribution tensor and to generating an output as further detailed hereinabove.

In some embodiments of the present invention all the input is received by computer 130 locally, for example, using GUI 142 and/or storage 144. In these embodiments computer 130 can execute the operations of the method described herein. Alternatively, computer 130 can transmit the received neural network, input data, and optionally the target class to computer 150 via communication network 140, in which case computer 130 executes the operations of the method described herein, and transmits the generated output or corrected tensor back to computer 130 for generating a displayed output on GUI 142.

In some embodiments of the present invention at least a portion of the input is stored in storage 164, and that input is received by computer 130 over communication network 140. In these embodiments, computer 130 can execute the operations of the method described herein using the input received from computer 150 over communication network 140.

As used herein the term “about” refers to ± 10 %

The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to".

The term “consisting of’ means “including and limited to”.

The term "consisting essentially of" means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof. Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.

EXAMPLES

Reference is now made to the following examples, which together with the above descriptions illustrate some embodiments of the invention in a non limiting fashion.

This Example integrates a gradient-based method and an attribution-based method, and provides a technique that provides per-class explainability. The technique of the present embodiments back-projects the per pixel local influence, in a manner that is guided by the local attributions, while correcting for salient features that would otherwise bias the explanation. This Example describes extensive experiments in which the ability of the technique of the present embodiments to class-specific visualization, as opposed to just the predicted label, was demonstrate. Using an unsupervised procedure, the technique is also successful in demonstrating that self-supervised methods learn semantic information.

The most common class of explainability methods for image classifiers visualize the reason behind the classification of the network as a heatmap. These methods can make the rationale of the decision accessible to humans, leading to higher confidence in the ability of the classifier to focus on the relevant parts of the image, and not on spurious associations, and help debug the model. In addition to the human user, the "computer user" can also benefit from such methods, which can seed image segmentation techniques, or help focus generative image models, among other tasks.

Some methods can be divided into two families: (i) gradient-based maps, which consider the gradient signal as it is computed by the conventional backpropagation approach and (ii) relevance propagation, which projects high-level activations back to the input domain, mostly based on the deep Taylor decomposition. The two families are used for different purposes, and are evaluated by different sets of experiments and performance metrics. As shown hereinunder, gradient based methods, such as Grad-CAM, are able to provide a class specific visualization for deep layers, but fail to do so for the input image, and also provide a unilateral result. Attribution based methods, provide adequate visualization at the input image level, and have bipartite results, but lack in visualizing class specific explanations.

This Example presents a technique for class specific visualization of deep image recognition models. The technique overcomes the limitations of the conventional methods, by combining ideas from both families of methods, and accumulating across the layers both gradient information, and relevance attribution. The method corrects for what is referred to herein as the saliency bias. This bias draws the attention of the network towards the salient activations, and can prevent visualizing other image objects.

The technique of the present embodiments is particularly useful for determining the pixels of the image that belong to the class predicted by the neural network. The Inventors found that the technique identifies regions of multiple classes in each image, and not just the prominent class, as shown in FIGs. 1A-F, which show visualizations of outputs of a trained multi-layer neural network according to some embodiments of the present invention for a pre-trained VGG- 19. FIGs. 1A and ID show input images, FIGs. IB and IE show heat maps generated for the top label, and FIGs. 1C and IF show the heat maps for the 2nd highest prediction.

The technique described in this Example combines gradient and attribution techniques, and uses attribution guided factorization for extracting informative class -specific attributions from the input feature map and its gradients. The technique corrects for the saliency bias. As demonstrated below, the technique provides improvement in performance in both negative perturbation and in segmentation-based evaluation. The former is often used to evaluate attribution methods, and the latter is often used to evaluate gradient-based methods. This Example also demonstrates that self-supervised networks implicitly learn semantic segmentation information.

Propagation Methods

The building blocks of attribution propagation and gradient propagation that are used in the techniques of the present embodiments will now be defined. Attribution Propagation

Let x⁽ⁿ⁾, θ⁽ⁿ⁾ be the input feature map and weights of layer L⁽ⁿ⁾, respectively, where n=l,..., N is the layer index in a network, consisting of N layers. Herein, layer n- 1 is downstream of layer n, layer N processes the input, and layer 1 produces the final output.

Let x^{(n-1 )}=L⁽ⁿ⁾(x⁽ⁿ⁾, θ⁽ⁿ⁾) to be the result of applying layer L⁽ⁿ⁾ on x⁽ⁿ⁾. The relevancy of layer L⁽ⁿ⁾ is given by R⁽ⁿ⁾ and is also known as the attribution.

The generic attribution propagation rule is defined, for two tensors, X and Q, as:

Typically, X is related to the layer's input x⁽ⁿ⁾, and Q to its weights θ⁽ⁿ⁾. Layer-wise

Relevance Propagation (LRP) [Binder et al. 2016] can be written in this notation by setting X=x⁽ⁿ⁾⁺ and Θ=θ⁽ⁿ⁾⁺, where t⁺ = max(0, t) for a tensor t.

Note that EQ. 1 satisfies the conservation rule:

Let y be the output vector of a classification network with C classes, and let y^t represent the specific value of class t∈C. LRP defines R⁽⁰⁾ ∈ R^lCl to be the a zeros vector, except for index t, where Rt⁽⁰⁾ =y^t Similarly, the Contrastive LRP (CLRP) [Gu, Yang, and Tresp 2018] and the Softmax- Gradient LRP (SGLRP) [(Iwana, Kuroki, and Uchida 2019)] methods calculate the difference between two LRP results, initialized with two opposing R⁽⁰⁾ for "target" and "'rest", propagating relevance from the "target" class, and the "rest" of the classes, et al. CLRP is defined as:

and /V is referred to as a normalization

D-Shift: EQ. 1 present a generic propagation rule that satisfies the conservation rule in

EQ. 2. However, in many cases, it is advantageous to add a residual signal denoting another type of attribution. The D-Shift corrects for the deviation from the conservation rule. Given a generic propagation result following EQ. 2, and a residual tensor r⁽ⁿ⁾, the Δ-Shift is defined as

follows:

Note that the sum of the residual signal is divided by the number of non-zero neurons. While not formulated this way, the Relative Attributing Propagation (RAP) method [(Nam et al. 2019)] employs this type of correction defined in EQ. 4.

Gradient Propagation:

Let be the loss of a neural network. The input feature gradients, x⁽ⁿ⁾ of layer L⁽ⁿ⁾, with respect to arc defined by the chain rule as follows:

Methods such as FullGrad [Srinivas and Fleuret 2019] and SmoothGrad [Smilkov et al. 2017] use the raw gradients, as defined in EQ. 5, for visualization. Grad-CAM [Selvaraju et al. 2017], on the other hand, performs a weighted combination of the input feature gradients, in order to obtain a class specific visualization, defined as follows:

is the specific value of the gradient C-channel tensor x⁽ⁿ⁾ at channel c and pixel ( h , w), and x_c ⁽ⁿ⁾ is the entire channel, which is a matrix of size HxW.

The presented embodiments creates a separation between the positively contributing regions, or the foreground, and the negatively contributing ones, referred to as the background. This is optionally and preferably employed for both the activations and the gradients during propagation. Ideally, the relevant data would be partitioned into two clusters, one for positive contributions and one for the negative contributions. In this Example the data is divided spatially between positive and negative locations, in accordance with the sign of a partition map Φ∈R ^{H xW} .

A tensor Y∈R ^CxHxW, is re-written as a matrix in the form of Y∈R^CxHW . The Heaviside function of Y is optionally and preferably computed using a sigmoid function H=sigmoid(Y). The matrix H ∈ [0, 1]^CxHW is a positive-matrix. According to some embodiments of the present invention the following two-class non-negative matrix factorization is employed: H=RW, where W∈(R⁺)^2xHW contains the spatial mixing weights, and the representative matrix R=[R_b R_f], defined by the mean of each class in the data tensor H based on the assignment of f:

where R^f, R^b ∈ (R⁺)^c, ceC is the channel dimension, and Q denotes the Hadamard product.

The matrix W of positive weights is estimated by least squares W=[W_b W_f]= ((R^TR)- ¹R^TH)⁺, W_f, W_b ∈ (R⁺)^HW. Combining the foreground weights W_f with the background weights W_b into the same axis is optionally and preferably done by using both negative and positive values, leading to the following operator:

is a function that receives Y and f, as inputs.

In various exemplary embodiments of the invention Y is further normalized using to allow multiple streams to be integrated together:

The Integrated Method

Let M be a multiclass CNN classifier (C labels), and I=x^(N) be the input image. The network M outputs a score vector y ∈ R^lCl obtained before applying the softmax operator. Given any target class t, the method of the present embodiments explains where (spatially) in the image I lies the support for class t. The method optionally and preferably comprises two streams, gradients and attribution propagation. In each layer, the previous values of the two streams is used, and the current layer's input gradient and attribution is computed.

In various exemplary embodiments of the invention the method employs propagating attribution (for example, using EQ. 1), factorizes the activations and the gradients in a manner that is guided by the attribution, and performs attribution aggregation and value shifting such that the conservation rule is preserved. The shift splits the neurons into those with a positive and negative attributions.

Initial Attribution Propagation Let Φ⁽ⁿ⁾ and F^(n-1) be the output and input class attribution maps of layer L⁽ⁿ⁾, respectively.

The following initial attribution for explaining decision t are employed. Let y:=x⁽⁰⁾ be the output vector of the classification network (logits). The initial attribution Φ⁽¹⁾ is computed:

In this formulation, the psuedo-probabilities of vector y are replaced with another vector, in which the class t for which an explanation is desired is highlighted, and the rest of the classes are scored by the closeness of their assigned probability to that of t. This way, the explanation is no longer dominated by the predicted class. Class Attribution Propagation Φ^(n-1) is optionally and preferably propagated through L⁽ⁿ⁾, for example, according to EQ. 1, using two attribution tensors.

The first attribution tensor considers the absolute influence C ⁽ⁿ⁾, defined by:

The second attribution tensor computes the input-agnostic influence A⁽ⁿ⁾ according to EQ.

1:

where 1 is an all-ones tensor of the shape of x⁽ⁿ⁾. The input- agnostic propagation is selected because features in shallow layers, such as edges, are more local and less semantic. It, therefore, reduces the sensitivity to texture.

Residual Update

The factorization of both the input feature map of layer L⁽ⁿ⁾ and its gradients are optionally and preferably computed, in addition to C⁽ⁿ⁾. This branch is defined by the chain rule in EQ. 5, where

is considered. The factorization results in foreground and background partitions, using guidance from C⁽ⁿ⁾. This partition follows the principles of the attribution properties, where positive values are part of class t, and negatives otherwise. The following attribution guided factorization is therefore employed (EQ 7 with respect to x⁽ⁿ⁾ and

note that the positive values of the factorization update are considered, and that the two results are normalized by their maximal value. The input-gradient interaction is defined as

The residual attribution is then defined by all attributions other than C⁽ⁿ⁾:

It is observed that both F_x ⁽ⁿ⁾ and

are affected by the input feature map, resulting in the saliency bias effect. As a result, their sum is penalized according to C⁽ⁿ⁾, in a manner that emphasizes positive attribution regions.

It is noted that and the residual is optionally and preferably be

compensated for so as to preserve the conservation rule. In these embodiments A-shift is performed as further detailed hereinabove, resulting in the attribution:

Explaining Self-Supervised Learning (SSL)

While SSL greatly reduces the need for labeled samples, no explainability method was applied to verify that these models, which are often based on image augmentations, do not ignore localized image features.

Since no label information is used, the classifier of the self-supervised task itself is relied upon. This has nothing to do with the classes of the datasets. For each image, the image that is closest to it in the penultimate layer's activations is considered. The logits of the self supervised task of the image to be visualized and its nearest neighbor are subtracted, to emphasize what is unique to the current image. Explainability methods are then used on the predicted class of the self- supervised task.

Experiments

The method of the present embodiments was implement as computer instructions employing the following procedure, referred to as procedure 1.

Note that: line 1 of the instructions is a forward-pass in which the intermediate feature maps are saved, line 4 of the instructions is the initial attribution - first linear layer, line 6 of the instructions is the absolute influence, line 10 of the instructions is the residual, line 13 represents the shift by the residual, and lines 16-19 of the instructions are, respectively, the input agnostic attribution, the factorization of the input feature map, the factorization of input feature map gradients, and the recalculation of the residual.

For the linear layers of the network, the residual was calculated using only F_xv_x in the following way:

Qualitative Evaluation

FIGs. 2 A and 2B present sample visualization on a representative set of images for networks trained on ImageNet, using VGG19 and ResNet-50, respectively. These figures provide visualization of the top-predicted class. The preferred visualization quality provided by the method optionally and preferably is evident. One can observe that (i) LRP, FullGrad and Grad-CAM output only positive results, wherein LRP edges are most significant, and in all three, the threshold between the object and background is ambiguous (ii) CLRP and SGLRP, which apply LRP twice, have volatile outputs (iii) RAP is the most consistent, other than the method of the present embodiments, but falls behind in object coverage and boundaries (iv) The method of the present embodiments produces relatively complete regions with clear boundaries between positive and negative regions.

In order to test whether each method is class-agnostic or not, the classifier images containing two clearly seen objects were fed, and each object class was propagated separately. FIG. 2C presents results for a sample image. As shown, LRP, FullGrad and RAP output similar visualizations for both classes. Grad-CAM, on the other hand, clearly shows a coarse region of the target class, but lacks the spatial resolution. CLRP and SGLRP both achieve class separation, and yet, they are highly biased toward image edges, and do not present a clear separation between the object and its background. The method of the present embodiments provides the clearest visualization, which is both highly correlated with the target class, and is less sensitive toward edges.

Quantitative Experiments:

Two experiment settings were employed, negative perturbation and segmentation tests. The method of the present embodiments was evaluated using three datasets: (i) the validation set of ImageNet [Russakovsky et al., 2015] (ILSVRC) 2012, consisting of 50K images from 1000 classes, (ii) an annotated subset of ImageNet called ImageNet-Segmentation [Guillaumin et al., 2014] containing 4,276 images from 445 categories, and (iii) the PASCAL-VOC 2012 dataset, depicting 20 foreground object classes and one background class, and containing 10,582 images for training, 1449 images for validation and 1,456 images for testing.

Negative Perturbation Experiments:

The negative perturbation test is composed of two stages, first, a pre-trained network is used to generate the visualizations of the ImageNet validation set. In the present Example, the Inventors used the VGG-19 architecture, trained on the full ImageNet training set. Second, the Inventors masked out an increasing portion of the image, starting from lowest to highest values, determined by the explainability method. At each step, the Inventors computed the mean accuracy of the pre-trained network. This test was repeated twice: once for the explanation of the top-1 predicted class, and once for the ground truth class. The results are presented in FIGs. 3A and 3B and in Table 1, providing Area Under the Curve (AUC) results for the two negative perturbation tests, showing results for predicted and target class. The class-agnostic methods either perform worse or experience insignificant change on the target class test. The rightmost column, designated "Ours" corresponds to the method of the present embodiments.

Table 1

As shown, the method of the present embodiments achieves the best performance across both tests, where the margin is highest when removing 40%-80%of the pixels.

Semantic Segmentation Metrics:

To evaluate the segmentation quality obtained by each explainability method, the Inventors compared each to the ground truth segmentation maps of the ImageNet- Segmentation dataset, and the PASCAL-VOC 2012, evaluating by pixel-accuracy and mean average-precision. The goal was to demonstrate the ability of each method without follow-up training. For the first dataset, the Inventors employed the pre-trained VGG19 classifier trained on ImageNet training set, and computed the explanation for the top-predicted class and compared it to the ground truth mask provided in the dataset. For the second, the Inventors trained a multi-label classifier on the PASCAL-VOC 2012 training set, and considered labels with a probability larger than 0.5 to extract the explainability maps. For methods that provide both positive and negative values (Gradient SHAP, LRP_αβ, RAP, CLRP, SGLRP, and the method of the present embodiments), the Inventors considered the positive part as the segmentation map of that object. For methods that provide only positive values (Integrated Grad, Smooth Grad, Full Grad, GradCAM, LRP, Meaningful Perturbation), the Inventors thresholded the obtained maps at the mean value to obtain the segmentation map. Results are reported in Tables 2A and 2B, below, providing quantitative segmentation results on ImageNet and PASCAL-VOC 2012. The rightmost column in Table 2B, designated "Ours," corresponds to the method of the present embodiments.

Table 2A

Table 2B

Tables 2A and 2B demonstrate an advantage of the method of the present embodiments over all nine baseline methods, for all datasets and metrics. Other methods seem to work well only in one of the datasets or present a trade-off between the two metrics.

Explainability for Self-Supervised Models:

Three SSL models were used: ResNet-50 trained with either SCAN [Van Gansbeke 2020] or SeLa [Asano el al. 2019b], and an Alexnet [Asano el al. 2019a], which is denoted as RotNet.

FIG. 4A shows the segmentation performance for the ImageNet ground truth class of the completely unsupervised SSL methods using different explainability methods. For all models the explainability method of the present embodiments outperforms the baselines in both mAP and pixel accuracy, except for RotNet where the mAP is considerably better and the pixel accuracy is slightly lower. FIG. 4B shows the increase of segmentation performance for RotNet with the method of the present embodiments, as deeper layers of SSL Alexnet are visualized, using a supervised linear post-training.

Table 3, below, compares RAP and the method of the present embodiments in the predicted SSL class (not ImageNet) negative perturbation setting. Table 3 provides AUC for negative perturbation tests for self-supervised methods - SeLa and SCAN. In Table 3, (A) denotes the method of the present embodiments. RAP, which is the best baseline in Table 1, is used as a baseline. Table 3

As demonstrated in Table 3, the method of the present embodiments has superior performance. SSL results also seem correlated with the fully supervised ones (note that the architecture and the processing of the fully connected layer is different from the one used in Table 1). In the two alternatives to the novel SSL procedure are also presented in Table 3. In one, denoted (å) the difference from the neighbor is replaced with a sum. In the other, denoted "w/o" no comparison to the neighbor takes place.

Ablation Study: By repeatedly employing normalization, the method of the present embodiments is kept parameter-free. Table 4 presents negative perturbation results for methods that are obtained by removing one components out of the complete method. The complete method is provided on the first line and denoted "Our." Also presented, are the results of a similar method in which the guided attribution based residual-term r is replaced by a GradCam term. As shown, each of these modifications damages the performance to some degree. Without any residual term, the method is slightly worse than RAP, while a partial residual term further hurts performance.

Table 4

The different components of the method are visualized in FIG. 5. Note that (i) across all components, the semantic information is best visible in deeper layers of the network, where residual information is becoming more texture-oriented at shallower layers. (ii) The difference between F_x and Fv_x is mostly visible in the out-of-class regions: F_x is derived from the data directly and is biased toward input- specific activations, resulting in highlights from the class that is not being visualized from layer 5 onward (iii) The input-agnostic data term A is more blurry than C, as a result of using the 1 tensor as input. It can, therefore, serve as a regularization term that is less susceptible to image edges.

Additional Results

Tables 5 and 6, below provide additional results for two different Grad-CAM variants, considering visualisation from shallower layers as denoted by Grad-CAM* = 4-layers shallower and Grad-CAM** = 8-layers shallower. An example of the different outputs is shown in FIG. 5, where the semantic information of the Grad=CAM is less visible as one goes shallower, where as in the method of the present embodiments, the sematic information is becoming more fine grained. Also shown are the inferior results of Gradient SHAP and DeepLIFT SHAP for negative perturbation.

Table 5

Table 6

Self-Supervised Learning

The method of the present embodiments on three recent models. First, the Inventors show the performance of the method of the present embodiments on AlexNet model, trained by RotNet in a completely self- supervised fashion which depends only on data augmentations specially predicting the 2d image rotations that is applied to the image that it gets as input. In order to evaluate the quality of the features learned the Inventors used (only for the RotNet experiments) a supervised linear post-training in order to visualize deeper layers. FIGs. 6A-F show the visualization obtained for the image on the left when training a supervised linear classifier after each layer and applying the method of the present embodiments. FIG. 6A shows the original image. FIG. 6B shows the visualization after first layer. FIG. 6C shows the visualization after second layer. FIG. 6D shows the visualization after third layer. FIG. 6E shows the visualization after the fourth layer. FIG. 6F shows the visualization after last layer. As shown early layers learns edges-like patterns and the deeper layer learns more complicated patterns, more visualizations are shown in the end of the supplementary.

In order to evaluate SSL methods without any additional supervision, the Inventors employed the classifier the SSL is trained with. The computer instructions used for this evaluation are provided below, provide self-supervised explainability by adopting nearest neighbors, and are referred to as SSNN.

where Li is the latent vector of image I, is the set of all laten

t vectors for all images in S, L

_N is the nearest neighbor of L, the subtraction of L_N from Li emphasizes the unique elements of L_I, the assignment v (line 5) is a forward pass with the new latent vector, and t is the class with the highest probability.

Additional Results - Multi Label

FIG. 7 shows visualization of two different classes for VGG-19, and FIG. 8 shows visualization of additional two different classes for VGG-19. Additional Results - Top Class

FIGs. 9A-H show visualization of the top predicted class of a VGG-19 ImageNet trained network, and FIGs. 10A-G show visualization of the top predicted class of a ResNet-50 ImageNet trained network.

Additional Results - AlexNet Probes FIG. 11 shows the visualization after each layer using linear probes on the RotNet

AlexNet model.

Additional Results - Self-labeling (SeLa) Method

For each explainability method, FIG. 12 shows results obtained on Resnet-50 by simply projecting the self labeling label with the highest probability (a simplified version of the procedure of the present embodiments that does not involve the nearest neighbor computation) as well as the one for the aforementioned SSNN computer instructions.

Additional Results - SCAN Method

FIG. 13 shows results obtained similarly to FIG. 12, except that Resnet-50 was trained in self- supervised regime using SCAN.

This Example describes an explainability method that outputs class -dependent explanations that are clearer and more exact than those presented by the many existing methods tested. The method of the present embodiments is based on combining attribution methods and gradient methods. This combination is done, on equal grounds, through the usage of a non- negative matrix factorization technique that partitions the image into foreground and background regions. This Example also describes a procedure for evaluating the explainability of SSL methods.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

It is the intent of the applicant(s) that all publications, patents and patent applications referred to in this specification are to be incorporated in their entirety by reference into the specification, as if each individual publication, patent or patent application was specifically and individually noted when referenced that it is to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting. In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.

REFERENCES

Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I; Hardt, M.; and Kim, B. 2018. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, 9505- 9515.

Ahn, J.; Cho, S.; and Kwak, S. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2209-2218.

Asano, Y. M.; Rupprecht, C.; and Vedaldi, A. 2019a. A critical analysis of selfsupervision, or what we can learn from a single image. arXiv preprint arXiv:1904.13132.

Asano, Y. M.; Rupprecht, C.; and Vedaldi, A. 2019b. Selflabelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371

Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.-R.; and Samek, W. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10(7): e0130140.

Binder, A.; Montavon, G.; Lapuschkin, S.; Müller, K.-R.; and Samek, W. 2016. Layer- wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks, 63-71. Springer.

Dabkowski, P.; and Gal, Y. 2017. Real time image saliency for black box classifiers. In Advances in Neural Information Processing Systems, 6970-6979.

Erhan, D.; Bengio, Y.; Courville, A.; and Vincent, P. 2009. Visualizing higher-layer features of a deep network. University of Montreal 1341(3): 1.

Fong, R.; Patrick, M.; and Vedaldi, A.2019. Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE International Conference on Computer Vision, 2950-2958.

Fong, R. C.; and Vedaldi, A 2017. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision, 3429-3437.

Gao, M.; Chen, H.; Zheng, S.; and Fang, B. 2016. A factorization based active contour model for texture segmentation In 2016 IEEE International Conference on Image Processing (ICIP), 4309-4313 IEEE.

Gu, J.; Yang, Y.; and Tresp, V. 2018. Understanding individual decisions of cnns via contrastive backpropagation. In Asian Conference on Computer Vision, 119-134. Springer.

Guillaumin, M.; K uttel, D.; and Ferrari, V. 2014. Imagenet auto-annotation with segmentation propagation. International Journal of Computer Vision 110(3): 328-348. Hoyer, L.; Munoz, M.; Katiyar, P.; Khoreva, A.; and Fischer, V. 2019. Grid saliency for context explanations of semantic segmentation. In Advances in Neural Information Processing Systems, 6462-6473.

Huang, Z.; Wang, X.; Wang, J.; Liu, W.; and Wang, J. 2018. Weakly- supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7014-7023.

Iwana, B. K.; Kuroki, R.; and Uchida, S. 2019. Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation. arXiv preprint arXiv: 1908.04351.

Kindermans, P.-L; Sch^'utt, K. T.; Alber, M.; Müller, K.-R.; Erhan, D.; Kim, B.; and D^' ahne, S.2017. Learning how to explain neural networks: Pattemnet and patternattribution. arXiv preprint arXiv: 1705.05598.

Lundberg, S. M.; and Lee, S.-I. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 4765-4774.

Mahendran, A.; and Vedaldi, A. 2016. Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision 120(3): 233-255.

Montavon, G.; Lapuschkin, S.; Binder, A.; Samek, W.; and Müller, K.-R. 2017. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 65: 211-222.

Nam,W.-J.; Gur, S.; Choi, J.;Wolf, L.; and Lee, S.-W. 2019. Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks. arXiv preprint arXiv: 1904.00605.

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115(3): 211- 252.

Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618-626.

Shrikumar, A.; Greenside, P.; and Kundaje, A. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning- Volume 70, 3145-3153. JMLR. org.

Shrikumar, A.; Greenside, P.; Shcherbina, A.; and Kundaje, A. 2016. Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713. Simonyan, K.; Vedaldi, A.; and Zisserman, A. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv: 1312.6034.

Smilkov, D.; Thorat, N.; Kim, B.; Vi 'cgas, F.; and Wattenberg, M. 2017.Smoothgrad: removing noise by adding noise. arXiv preprint arXiv: 1706.03825.

Srinivas, S.; and Fleuret, F. 2019. Full-gradient representation for neural network visualization_· In Advances in Neural Information Processing Systems, 4126-4135.

Sundararajan, M.; Taly, A.; and Yan, Q. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning- Volume 70, 3319— 3328. JMLR. org.

Van Gansbeke, W.; Vandenhende, S.; Georgoulis, S.; Proesmans, M.; and Van Gool, L. 2020. SCAN: Learning to Classify Images without Labels. In European Conference on Computer Vision (ECCV).

Wang, Y.; Zhang, J.; Kan, M.; Shan, S.; and Chen, X. 2019. Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation. arXiv preprint arXiv: 1909.03714.

Yuan, J.; Wang, D.; and Cheriyadat, A. M. 2015. Factorization-based texture segmentation. IEEE Transactions on Image Processing 24(11): 3488-3497.

Zeiler, M. D.; and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, 818-833. Springer.

Zhang, J.; Bargal, S. A.; Lin, Z.; Brandt, J.; Shen, X.; and Sclaroff, S. 2018. Top-down neural attention by excitation backprop. International Journal of Computer Vision 126(10): 1084- 1102.

Zhou, B.; Bau, D.; Oliva, A.; and Torralba, A. 2018. Interpreting deep visual representations via network dissection. IEEE transactions on pattern analysis and machine intelligence.

Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; and Torralba, A. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921-2929.

Claims

WHAT IS CLAIMED IS:

1. A method of visualizing an output of a trained multi-layer neural network, the method comprising: feeding input data to the neural network; for each layer of said neural network: calculating an attribution tensor and a set of feature gradients for said layer, and correcting said attribution tensor based on factorizations of an input feature map of said layer and of said feature gradients; and generating an output in which elements of said input data are highlighted according to said corrected attribution tensor.

2. The method according to claim 1, wherein said factorizations are based on said attribution tensor as calculated prior to said correction.

3. The method according to claim 2, comprising calculating a first attribution tensor and a second attribution tensor, wherein said factorizations are based on said first attribution tensor but not said second attribution tensor, and wherein said correction is based also on said second attribution tensor.

4. The method according to any of claims 1-3, wherein said calculating said attribution tensor is by propagation across layers of said neural network.

5. The method according to claim 4, wherein said propagation satisfies a conservation rule.

6. The method according to claim 5, wherein said conservation rule is that a sum of elements of said attribution tensor is constant.

7. The method according to any of claims 1-6, wherein said calculating said set of feature gradients is by propagation across layers of said neural network.

8. The method according to any of claims 1-7, comprising calculating a residual tensor for said layer using said factorizations, wherein said correcting is based on said residual tensor.

9. The method according to claim 8, wherein said correcting comprises shifting said attribution tensor by said residual tensor.

10. The method according to any of claims 1-9, wherein said input data comprise an image.

11. The method according to claim 10, wherein said image is a medical image.

12. The method according to any of claims 1-9, wherein said input data describe electric, magnetic, or ultrasound signals received from a human or animal subject.

13. The method according to any of claims 1-9, wherein said input data describe an acoustic signal.

14. The method according to any of claims 1-9, wherein said input data comprise a corpus of text.

15. The method according to claim 14, wherein said corpus of text is a programming language source code.

16. The method according to any of claims 1-9, wherein said input data comprise an object code, optionally and preferably an object code generated by compiler software.

17. The method according to any of claims 1-9, wherein said input data comprise bioinformatics data.

18. The method according to any of claims 1-9, wherein said input data comprise natural language data.

19. A computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a data processor, cause the data processor to execute the method according to any of claims 1-17.