US12124958B2 - Idempotence-constrained neural network - Google Patents
Idempotence-constrained neural network Download PDFInfo
- Publication number
- US12124958B2 US12124958B2 US16/748,871 US202016748871A US12124958B2 US 12124958 B2 US12124958 B2 US 12124958B2 US 202016748871 A US202016748871 A US 202016748871A US 12124958 B2 US12124958 B2 US 12124958B2
- Authority
- US
- United States
- Prior art keywords
- neural network
- idempotent
- matrix
- idempotence
- linear approximation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the invention relates generally to a method for training of a neural network, and more specifically, to a method enforcing an idempotent-constrained characteristic during training of a neural network.
- the invention relates further to a system for enforcing an idempotent-constrained characteristic during training of a neural network, and a computer program product.
- neural networks have been proposed as universal approximators of functions. Their ability to learn rather than formalize a transformation from multiple input-output pairs, and biasing the definition of the function, makes them suitable for a variety of applications.
- a neural network with parameters ⁇ approximates the function ⁇ when it is trained with Pat examples of input (x) and output (x′) signals. Due to the probabilistic approach and because the neural network training is subject to some training errors, there is no guarantee that f ⁇ (x) is exactly x′. There is also no guarantee that all information of interest in x will be completely preserved by f ⁇ (x).
- a computer-implemented method for enforcing an idempotent-constrained characteristic during training of a neural network may be provided.
- the method may comprise training of a neural network by minimizing a loss function.
- the loss function may comprise an additional term imposing an idempotence-based regularization to the neural network during the training.
- a system for enforcing an idempotent-constrained characteristic during training of a neural network may comprise a neural network trainable by minimizing a loss function, wherein the loss function comprises an additional term imposing an idempotence-based regularization to the neural network during the training.
- embodiments may take the form of a related computer program product, accessible from a computer-usable or computer-readable medium providing program code for use, by, or in connection, with a computer or any instruction execution system.
- a computer-usable or computer-readable medium may be any apparatus that may contain means for storing, communicating, propagating or transporting the program for use, by, or in connection, with the instruction execution system, apparatus, or device.
- Neural networks are frequently the state-of-the-art for inherently idempotent operations, such as image de-noising or signal convolution.
- most of the literature that proposes a neural network to perform such operations rely on minimizing a functional loss, implicitly assuming, that if the training of the network f ⁇ is good enough, there is no need to further impose idempotence. However, this assumption does not hold even for a good approximation of the function.
- FIG. 1 shows a block diagram of an embodiment of the inventive computer-implemented method for enforcing an idempotent-constrained characteristic during training of a neural network.
- FIG. 2 shows a block diagram of an embodiment of the training process of the neural network.
- FIG. 3 shows a block diagram of an embodiment of the novel method portion of a determination of the additional term.
- FIG. 4 shows an embodiment of the system for enforcing an idempotent-constrained characteristic during training of a neural network.
- FIG. 5 shows an embodiment of a computing system comprising the system according to FIG. 4 .
- the term ‘idempotence’ may denote the property of a transformation, whereby operations which may be applied multiple times to input data without changing the results beyond the initial input data. Consequently, the term ‘idempotent-constrained’ may determine—in particular in the context of a training of a neural network—that input data may result in output data of the neural network which directly correspond to each other. This may, e.g., be used for a “de-noising” of sound input data or image input data. Hence, such a neural network may function as a “clarifier filter”.
- neural network in particular, artificial neural network (ANN)—may denote a computing system that is inspired by, but not identical to, biological neural networks comprising artificial neurons. Such systems may be trained to perform tasks by considering examples, generally without being programmed with task-specific rules.
- a plurality of artificial neurons, i.e., nodes may be organized in a plurality of network layers: an input layer, one or more (for deep NNs) hidden layer(s) and an output layer.
- neural networks are trained as classifier systems.
- the domain of the input data is identical or equivalent to the domain of the output data: sound data are transformed to sound data, image data are transformed to image data, etc.
- the different nodes of the different layers may have connections—often denoted as edges—from one layer to the next layer and may carry a weight factor characterizing a signal amplification or signal damping.
- a loss function result may be minimized by a back propagation process, feeding back the output signals of the output layer of the neural network in order to—step by step—adjust the weight factors of the nodes within the different layers of the neural network.
- the term ‘loss function’ may denote a function adapted for determining a difference between an input value and a desired output value of the function.
- additional term may denote another, additional term to the loss function which may ensure an idempotent characteristic of the transformation function of the neural network.
- the term ‘idempotence-based regularization’ may denote an adjustment of the behavior of the neural network such that its characteristic is trimmed to an idempotent characteristic of the neural network.
- the NN may be used as an instrument to remove noise from, e.g. an image, the idempotence behavior guarantees that the original image will not fade away. Only the noise will be reduced.
- matrix of vectorized input data may denote a matrix comprising the vectors of input data, wherein each input vector may comprise one training example.
- Images as training data may themselves be described as a matrix.
- each row or each column of the matrix of the image may be stringed together building a single vector.
- a plurality of such stringed together vectors may build the matrix of vectorized input data.
- all training data may be put as a single batch in one matrix.
- Jacobibian matrix may denote the matrix of all its first-order partial derivatives. If this matrix is square, i.e., if the function takes the same number of variables as input as the number of vector components of its output, both, the matrix and its determinant are referred to as the Jacobian.
- spectral restriction may denote that a related matrix may have only an allowed eigenvector within a predefined range.
- scaling may denote a multiplication with a mathematical real value.
- the factor may be above 1—i.e., scaling up—or below 1, i.e., scaling down.
- the term ‘Singular Value Decomposition’ may denote a factorization of a real or complex matrix. It may denote the generalization of the eigendecomposition of a positive semidefinite normal matrix—e.g., a symmetric matrix with non-negative eigenvalues—to any m ⁇ n matrix via an extension of the polar decomposition.
- the proposed computer-implemented method—and a related system—for enforcing an idempotent-constrained characteristic during training of a neural network may offer multiple advantages, contributions and technical effects:
- the proposed concept may allow restricting a training of a neural network to an idempotent behavior or characteristic at a global level—i.e., from the input layer to the output layer—of the neural network. This may imply that such an idempotent characteristic may not only be achievable from one layer of a neural network to another layer but across the complete neural network from the input layer to the output layer.
- the domain of the input data is identical or equivalent to the domain of the output data of the neural network.
- the proposed concept may be applied without any restrictions to any neural network. Thus, no additional constraints may have to be applied to the definition of the neural network.
- the proposed concept may be independent of the architecture of the neural network.
- the proposed concept is also independent of the dimensionality or type of input data.
- the here proposed method is also adapted to show a fast convergence to the iterations to find a usable idempotence matrix, and it generalizes the concept of preserving information after successive applications.
- the additional term ⁇ P+ (X) may be determined by:
- the matrix of input data may comprise a complete batch of input data, for example, if the input data comprise images, the pixel information of the images may be vectorized, i.e., each input vector of an image pixel matrix may be stringed together one after another. All such vectorized images may then build the matrix of input data. This principle may also be applicable for other higher dimensional tensor data as input data.
- the domain of the input data vector is equivalent to the domain of the output vector.
- an image as input data remains an image as output data.
- the neural network is not used as classifier but more as a signal filter.
- the network parameters are the network hyper-parameters in a broad sense including, e.g., individual weights of the nodes, number of layers, connections between the nodes, etc., simply all parameters of the neural network.
- the method may also comprise determining the idempotent matrix P* by determining a linear approximation of the network function.
- the method may also comprise determining the linear approximation using an inverse or pseudo-inverse matrix—which may be more flexible and not constrained to a specific matrix form only of the input data, i.e., the matrix, or using the Jacobian matrix of the neural network function f ⁇ .
- an inverse or pseudo-inverse matrix which may be more flexible and not constrained to a specific matrix form only of the input data, i.e., the matrix, or using the Jacobian matrix of the neural network function f ⁇ .
- the method may also comprise determining if the linear approximation complies with spectral restrictions—i.e., the matrix may need eigenvectors in a given range—of idempotent matrices or, matrices that can be approximated to idempotence. Examples of eigenvectors may lie in the range of ⁇ 0.5 to 1.5.
- the method may comprise: upon determining that the linear approximation does not comply with spectral restrictions of idempotent matrices, scaling singular values of a Singular Value Decomposition of the linear approximation. Thereby, a regularization step on the linear approximation may be performed.
- scaling an up or down scaling may be meant, i.e., a multiplication with real values smaller or larger than 1.
- the stop condition may suppress an infinite loop.
- step (i) firstly, it is determined that the approximation to idempotence can be made idempotent (by checking and scaling the spectral properties) and (ii) secondly, an iterative process that will actually achieve idempotence in a few iterations from the point where step (i) left.
- f ⁇ is a non-linear neural network function and the determined linear idempotent matrix P is compliant with the spectral restrictions, P 0 is a linear approximation after the scaling; if required, further P 1 , P 2 , . . . , P* iterations are done until P* emerges as the linear approximation of f ⁇ that is also idempotent.
- the method may also comprise: upon determining that the linear approximation complies with spectral restrictions and idempotence—i.e., meeting a stop condition, i.e., an error below a threshold value—feeding the additional term back to a back-propagation step of the training of the neural network.
- the additional term may be used as the normal training back-propagation step of the regular training of the NN.
- the idempotent regularization may be a Procrustes-based regularization, i.e., a regularization being based on principles of Procrustes analytics, i.e., finding out an optimal rotation and/or reflection (i.e., the optimal orthogonal linear transformation) for a Procrustes-Superimposition of one object to another. This will lay the basis for the idempotence characteristic.
- the domain of an input vector to the neural network may be equivalent to a domain of the output vector of the neural network.
- the input and the output data may relate to image data, sound data, etc.
- the neural network may be used as a filter instead of a classifier.
- This last term may impose a Procrustes-based regularization at global scale to the neural network without any restriction on the network architecture, making the approach extremely versatile.
- ⁇ P+ represents the problem of solving during each loss evaluation and optimization problem with no analytical solutions. Since at each mini-batch step one needs to determine the loss, it is mandatory to find an efficient way to approximate P* via an iterative model.
- the non-idempotent matrix P 0 may be reduced to a close idempotent one using a rapidly converging sequence. Thereby, one can reduce to 0 the matrix P 0 2 ⁇ P 0 by minimizing the semi-definite scalar tr ((P 0 2 ⁇ P 0 ) 2 ) by computing its derivative:
- a candidate P 0 ′ should minimize ⁇ f ⁇ (X) ⁇ XP 0 ′ ⁇ F 2 and it has to be fast to be determined.
- FIG. 1 shows a block diagram of a preferred embodiment of the computer-implemented method 100 for enforcing an idempotent-constrained characteristic during training of a neural network.
- This characteristic is independent from the network architecture, i.e., independent from the network structure and the dimensionality of the input data. This characteristic is in particular valid on a global level of the neural network, i.e., across all layers and not for one or a few layers only.
- the method 100 comprises training, 102 , of a neural network by minimizing a loss function.
- This can be a standard loss function, e.g., a min-square error function, a min-absolute-error function, a binary-cross-entropy algorithm, and the like.
- the loss function comprises also using, 104 , an additional term imposing an idempotence-based regularization to the neural network during the training. It may again be emphasized that the idempotence-based regularization does not apply to only parts of the layers of the neural network but the complete network.
- FIG. 2 shows a block diagram 200 of an embodiment of the training process of the neural network (not shown).
- the input data are loaded, 204 , from a storage of training data 206 .
- the batch of input data is transferred to the novel method portion 218 —relating to the additional term—which in turn delivers the result of the additional term determination as additional input to the back propagation process 212 of the main process flow. If a determined stop condition is not reached—reference numeral 214 (case “no”)—the flow goes back to the start of the process; otherwise (case of “yes”), the process stops (reference numeral 216 ).
- FIG. 3 shows a block diagram of an embodiment of the novel method portion 218 of a determination of the additional term.
- This novel method portion 218 describes an efficient partial method to approximate P*, which is initially unknown.
- Input for this method portion 218 comes from the main process described in FIG. 2 from the process steps “load batch” 204 and “neural network” 208 .
- the result of the determination in the method portion 218 is delivered back to this step “back propagate” 212 of the main process flow according to FIG. 2 .
- all parameters of the neural network are used.
- a linear approximation to the network function f ⁇ is obtained, 302 .
- two ways are possible: using an inverse of pseudo-inverse of the input data or using the Jacobian of the network function as linear approximation.
- the process proceeds to a determination 310 if the matrix is already idempotent (stop condition met, 310 ). If that is the case—case “yes”—the determined idempotent loss is determined 312 , and fed back to the main flow (compare FIG. 2 , 212 ).
- a regularization step 306 is performed by scaling the singular values of the singular value decomposition of the linear approximation.
- an iterative partial method 308 quickly converges to an idempotent matrix (as shown in the above-mentioned theoretical section of this document). This iterative process is stopped when the stop condition is met, 310 , e.g., if the matrix is fully idempotent, i.e., if the error after sequential application of the matrix is below a given threshold value.
- FIG. 4 shows a block diagram of an embodiment of the system 400 for enforcing an idempotent-constrained characteristic during training of the neural network.
- the system 400 comprises a neural network system 402 trainable by minimizing a loss function.
- the loss function comprises an additional term imposing an idempotence-based regularization to the neural network during the training.
- a first determination unit 404 for determining the idempotent matrix P* by determining a linear approximation of the network function
- a second determination unit 406 for determining the idempotent matrix P* by determining a linear approximation of the neural network function
- a third determination unit 408 for determining the linear approximation using an inverse or pseudo-inverse matrix of the input data or using the Jacobian matrix of the neural network function f ⁇
- a fourth determination unit for determining if the linear approximation complies with spectral restrictions of idempotent matrices, wherein the fourth determination unit is also adapted for upon determining that the linear approximation does not comply with spectral restrictions of idempotent matrices, triggering means for scaling—in particular a scaling unit 410 adapted for singular values of a Singular Value Decomposition of the linear approximation, thereby performing a regularization step on the linear approximation.
- Embodiments of the invention may be implemented together with virtually any type of computer, regardless of the platform being suitable for storing and/or executing program code.
- FIG. 5 shows, as an example, a computing system 500 suitable for executing program code related to the proposed method.
- the computing system 500 is only one example of a suitable computer system, and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention described herein regardless of whether the computer system 500 is capable of being implemented and/or performing any of the functionality set forth hereinabove.
- the computer system 500 there are components, which are operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 500 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
- Computer system/server 500 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system 500 .
- program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- Computer system/server 500 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer system storage media, including memory storage devices.
- computer system/server 500 is shown in the form of a general-purpose computing device.
- the components of computer system/server 500 may include, but are not limited to, one or more processors or processing units 502 , a system memory 504 , and a bus 506 that couples various system components including system memory 504 to the processor 502 .
- Bus 506 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- Computer system/server 500 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 500 , and includes both volatile and non-volatile media as well as removable and non-removable media.
- the system memory 504 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 508 and/or cache memory 510 .
- Computer system/server 500 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- a storage system 512 may be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a ‘hard drive’).
- a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a ‘floppy disk’), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media may be provided.
- each can be connected to bus 506 by one or more data media interfaces.
- memory 504 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
- the program/utility having a set (at least one) of program modules 516 , may be stored in memory 504 by way of example, and not limiting, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating systems, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
- Program modules 516 generally carry out the functions and/or methodologies of embodiments of the invention, as described herein.
- the computer system/server 500 may also communicate with one or more external devices 518 such as a keyboard, a pointing device, a display 520 , etc.; one or more devices that enable a user to interact with computer system/server 500 ; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 500 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 514 . Still yet, computer system/server 500 may communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 522 .
- LAN local area network
- WAN wide area network
- public network e.g., the Internet
- network adapter 522 may communicate with the other components of the computer system/server 500 via bus 506 .
- bus 506 It should be understood that, although not shown, other hardware and/or software components could be used in conjunction with computer system/server 500 . Examples include (but are not limited to): microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
- a system 400 for enforcing an idempotent-constrained characteristic during training of a neural network maybe attached to the bus system 506 .
- the present invention may be embodied as a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- the medium may be an electronic, magnetic, optical, electromagnetic, infrared or a semi-conductor system for a propagation medium.
- Examples of a computer-readable medium may include a semi-conductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
- Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W), DVD, and other digital optical disk formats.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object-oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatuses, or another device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatuses, or another device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
Description
-
- Θ P*(X)=∥ X′−XP*=fΘ(X)−XP*∥, wherein
- X=a matrix of vectorized input data for the neural network,
- X′=a matrix of vectorized output data of the neural network,
- fΘ=neural network function with network parameters.
- P*=idempotent matrix that maps X to fΘ (X) with a close—in particular, the closest—approximation that may constrain the mapping to idempotence, and
- ∥ . . . ∥ is the norm. This may be, e.g., the square norm or the Frobenius norm. The
Frobenius norm or Schurnorm is a matrix norm based on the Euclidian norm.
P i+1 =P i 2(3−2P i)
until a stop condition is met. The stop condition may suppress an infinite loop.
Θ F(X, X′)=∥f Θ(X)−X′∥ F 2
Although this approach may ensure to find an accurate approximation of the function fΘ, it does not guarantee the idempotence of the transformation fΘ.
Θ′(X′)=∥f Θ(X′)−X′∥ F 2
This term may impose a strong requirement of learning an identity mapping when the input is the target. While this constraint, at the optimum, may guarantee the idempotence of the transformation fΘ, it might hinder the ability to learn a complex transformation.
Θ P+(X′)=∥f Θ(X′)−XP*∥ F 2.
Thereby, P* may represent an idempotent matrix that maps as accurately as possible X to fΘ(X′):
P*=argmin{P∈ p×p |P 2 =PP=P} (Eq. A)
P∈I(p)
P n+1 =P n 2(3-2P n), (Eq. C)
tr((P ∞ −P 0)2)«tr(P∞)<p.
−½≤λi(P 0)≤ 3/2 ∀i=1, . . . ,p
0≤λi(P ∞)≤1 ∀1=1, . . . , p
Proof. One may start by defining:
B P(a,b)={a≤λ i(P)≤b∀ i=1, . . . ,p}
P 1 =P 0 2(3-2P 0)
B P0(l b , u b)⇒B P1(0, 1)
B Pn(0,1)⊂B Pn(1 b , u b)
one can trivially prove by induction that:
B P0(1 b , u b)⇒B P∞(0, 1)
P 0=UΣbV*, (Eq. D)
wherein [Σb]ij=[Σ]ij/2σ1 (P0) and P0=UΣV*.
|λi(P 0)|½∀i=1, . . . ,p.
|Pv|=UΣV*v∥=|ΣV*v∥≤σ 1(P)∥V*v∥=σ 1(P)∥v∥.
|λi(P)∥v∥≤σ 1(P)∥v∥⇒|λ i(P)|≤σ1(P)∀i=1, . . . , p
-
- A random batch is taken as a target X′ ∈ m×p,
- A Gaussian perturbation is added to X′ to generate noisy examples X ∈ m×p,
- It is further assumed to have a network trained, simulated by adding a fraction of a Gaussian perturbation with the same magnitude of the one used to generate X. The leaky denoiser is fΘ(X).
Claims (15)
L Θ P+(X′)=∥X′−XP*∥=∥f Θ(X)−XP*∥, wherein
L Θ P*(X)=∥X′−XP*∥=∥f Θ(X)−XP*∥, wherein
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/748,871 US12124958B2 (en) | 2020-01-22 | 2020-01-22 | Idempotence-constrained neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/748,871 US12124958B2 (en) | 2020-01-22 | 2020-01-22 | Idempotence-constrained neural network |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20210224656A1 US20210224656A1 (en) | 2021-07-22 |
| US12124958B2 true US12124958B2 (en) | 2024-10-22 |
Family
ID=76857918
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/748,871 Active 2043-01-10 US12124958B2 (en) | 2020-01-22 | 2020-01-22 | Idempotence-constrained neural network |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12124958B2 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12288160B2 (en) * | 2021-06-16 | 2025-04-29 | International Business Machines Corporation | Transfer learning with basis scaling and pruning |
| CN118586443A (en) * | 2024-08-08 | 2024-09-03 | 中国人民解放军国防科技大学 | A distributed model aggregation method and system for preventing dimensionality collapse |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140058991A1 (en) | 2012-08-27 | 2014-02-27 | Georges Harik | Method for improving efficiency in an optimizing predictive model using stochastic gradient descent |
| US20140222739A1 (en) * | 2011-09-21 | 2014-08-07 | Brain Corporation | Apparatus and methods for gating analog and spiking signals in artificial neural networks |
| US20160071526A1 (en) | 2014-09-09 | 2016-03-10 | Analog Devices, Inc. | Acoustic source tracking and selection |
| US20170132512A1 (en) | 2015-11-06 | 2017-05-11 | Google Inc. | Regularizing machine learning models |
| US20170228433A1 (en) * | 2016-02-04 | 2017-08-10 | Microsoft Technology Licensing, Llc | Method and system for diverse set recommendations |
| US20190197670A1 (en) | 2017-12-27 | 2019-06-27 | Facebook, Inc. | Automatic Image Correction Using Machine Learning |
| US20190272309A1 (en) * | 2018-03-05 | 2019-09-05 | Electronics And Telecommunications Research Institute | Apparatus and method for linearly approximating deep neural network model |
-
2020
- 2020-01-22 US US16/748,871 patent/US12124958B2/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140222739A1 (en) * | 2011-09-21 | 2014-08-07 | Brain Corporation | Apparatus and methods for gating analog and spiking signals in artificial neural networks |
| US20140058991A1 (en) | 2012-08-27 | 2014-02-27 | Georges Harik | Method for improving efficiency in an optimizing predictive model using stochastic gradient descent |
| US20160071526A1 (en) | 2014-09-09 | 2016-03-10 | Analog Devices, Inc. | Acoustic source tracking and selection |
| US20170132512A1 (en) | 2015-11-06 | 2017-05-11 | Google Inc. | Regularizing machine learning models |
| US20170228433A1 (en) * | 2016-02-04 | 2017-08-10 | Microsoft Technology Licensing, Llc | Method and system for diverse set recommendations |
| US20190197670A1 (en) | 2017-12-27 | 2019-06-27 | Facebook, Inc. | Automatic Image Correction Using Machine Learning |
| US20190272309A1 (en) * | 2018-03-05 | 2019-09-05 | Electronics And Telecommunications Research Institute | Apparatus and method for linearly approximating deep neural network model |
Non-Patent Citations (11)
| Title |
|---|
| Bunne et al., "Learning Generative Models Across Incomparable Spaces", Proceedings of the 36th International Conference on Machine Learning, PMLR 97, May 15, 2019, 16 pages. |
| Burger et al., "Image denoising: Can plain Neural Networks compete with BM3D?", 2012 IEEE, pp. 2392-2399. |
| Haynes, "Linear-scaling methods in ab initio quantum-mechanical calculations", A dissertation submited for the degree of Doctor of Philosophy at the University of Cambridge, Jul. 1998, 169 pages. |
| He et al., "Deep Residual Learning for Image Recognition", arXiv:1512.03385v1, Dec. 10, 2015, 12 pages. |
| Lecun et al., "The MNIST Database of handwritten digits", printed Jan. 7, 2020, 8 pages, http://yann.lecun.com/exdb/mnist/. |
| Moens, "Musical Style Transfer with Generative Neural Network Models," master's dissertation, Ghent U. (2019). (Year: 2019). * |
| Ronneberger et al., "U-Net: Convolutional Networks for Biomedical Image Segmentation", arXiv:1505.04597v1, May 18, 2015, 8 pages. |
| Wang et al., "Dilated Deep Residual Network for Image Denoising", 2017 International Conference on Tools with Artificial Intelligence, pp. 1272-1279. |
| Wang et al., "Orthogonal and Idempotent Transformations for Learning Deep Neural Networks", arXiv:1707.05974v1, Jul. 19, 2017, 10 pages. |
| Wikipedia, Idempotence, archive captured Dec. 21, 2019, http://web.archive.org/web/20191221060706/https://en.wikipedia.org/wiki/Idempotence. (Year: 2019). * |
| Xu et al., "Deep Convolutional Neural Network for Image Deconvolution", NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems—vol. 1, Dec. 2014, pp. 1790-1798. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20210224656A1 (en) | 2021-07-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12271823B2 (en) | Training machine learning models by determining update rules using neural networks | |
| Shlezinger et al. | Model-based deep learning | |
| US11875262B2 (en) | Learning neural network structure | |
| Tran et al. | Bayesian deep net GLM and GLMM | |
| US12288236B2 (en) | Recommendation with neighbor-aware hyperbolic embedding | |
| Yao et al. | Policy gradient based quantum approximate optimization algorithm | |
| US20200410384A1 (en) | Hybrid quantum-classical generative models for learning data distributions | |
| US20210034985A1 (en) | Unification of models having respective target classes with distillation | |
| US20210056378A1 (en) | Resource constrained neural network architecture search | |
| Savitha et al. | A meta-cognitive learning algorithm for an extreme learning machine classifier | |
| US10387749B2 (en) | Distance metric learning using proxies | |
| US20190065957A1 (en) | Distance Metric Learning Using Proxies | |
| CN107169573A (en) | Method and system for performing predictions using composite machine learning models | |
| CN111460528B (en) | Multi-party combined training method and system based on Adam optimization algorithm | |
| Heitz et al. | Ground metric learning on graphs | |
| US20240202511A1 (en) | Gated linear networks | |
| Mirzaei et al. | Variational relevant sample-feature machine: a fully Bayesian approach for embedded feature selection | |
| US11636667B2 (en) | Pattern recognition apparatus, pattern recognition method, and computer program product | |
| Sun et al. | An improved multiclass LogitBoost using adaptive-one-vs-one | |
| US12124958B2 (en) | Idempotence-constrained neural network | |
| US12287848B2 (en) | Learning Mahalanobis distance metrics from data | |
| US10984320B2 (en) | Highly trainable neural network configuration | |
| US20230088669A1 (en) | System and method for evaluating weight initialization for neural network models | |
| Dayi et al. | Gradient dynamics for low-rank fine-tuning beyond kernels | |
| US12106193B2 (en) | Moving decision boundaries in machine learning models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FONCUBIERTA RODRIGUEZ, ANTONIO;MANICA, MATTEO;CADOW, JORIS;REEL/FRAME:051578/0827 Effective date: 20200121 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION COUNTED, NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |