WO2019217835A1  Training of photonic neural networks through in situ backpropagation  Google Patents
Training of photonic neural networks through in situ backpropagation Download PDFInfo
 Publication number
 WO2019217835A1 WO2019217835A1 PCT/US2019/031747 US2019031747W WO2019217835A1 WO 2019217835 A1 WO2019217835 A1 WO 2019217835A1 US 2019031747 W US2019031747 W US 2019031747W WO 2019217835 A1 WO2019217835 A1 WO 2019217835A1
 Authority
 WO
 WIPO (PCT)
 Prior art keywords
 set
 input
 photonic
 oius
 ann
 Prior art date
Links
 230000001537 neural Effects 0 abstract claims description title 34
 230000003287 optical Effects 0 abstract claims description 15
 230000003213 activating Effects 0 claims description 48
 230000000306 recurrent Effects 0 claims description 5
 239000010703 silicon Substances 0 claims description 4
 239000010410 layers Substances 0 description 46
 238000000034 methods Methods 0 description 46
 239000011159 matrix materials Substances 0 description 15
 230000001427 coherent Effects 0 description 11
 230000000875 corresponding Effects 0 description 9
 238000005259 measurements Methods 0 description 7
 230000035945 sensitivity Effects 0 description 7
 238000003786 synthesis Methods 0 description 7
 238000009826 distribution Methods 0 description 6
 230000014509 gene expression Effects 0 description 6
 239000000562 conjugates Substances 0 description 5
 238000009795 derivation Methods 0 description 5
 238000004422 calculation algorithm Methods 0 description 4
 230000002452 interceptive Effects 0 description 4
 230000013016 learning Effects 0 description 4
 238000004088 simulation Methods 0 description 4
 230000000694 effects Effects 0 description 2
 239000007924 injection Substances 0 description 2
 238000002347 injection Methods 0 description 2
 238000005457 optimization Methods 0 description 2
 239000000243 solutions Substances 0 description 2
 238000007514 turning Methods 0 description 2
 210000002216 Heart Anatomy 0 description 1
 239000003570 air Substances 0 description 1
 230000015572 biosynthetic process Effects 0 description 1
 238000004364 calculation methods Methods 0 description 1
 230000021615 conjugation Effects 0 description 1
 230000001808 coupling Effects 0 description 1
 238000010168 coupling process Methods 0 description 1
 238000005859 coupling reaction Methods 0 description 1
 230000004069 differentiation Effects 0 description 1
 238000004870 electrical engineering Methods 0 description 1
 238000005755 formation Methods 0 description 1
 238000009472 formulation Methods 0 description 1
 238000000691 measurement method Methods 0 description 1
 239000000203 mixtures Substances 0 description 1
 238000006011 modification Methods 0 description 1
 230000004048 modification Effects 0 description 1
 230000000051 modifying Effects 0 description 1
 230000003094 perturbing Effects 0 description 1
 239000000047 products Substances 0 description 1
 230000001902 propagating Effects 0 description 1
 230000002441 reversible Effects 0 description 1
 238000010206 sensitivity analysis Methods 0 description 1
 239000002356 single layers Substances 0 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F1/00—Details not covered by groups G06F3/00 – G06F13/00 and G06F21/00

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computer systems based on biological models
 G06N3/02—Computer systems based on biological models using neural network models

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computer systems based on biological models
 G06N3/02—Computer systems based on biological models using neural network models
 G06N3/08—Learning methods

 H—ELECTRICITY
 H03—BASIC ELECTRONIC CIRCUITRY
 H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
 H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes

 H—ELECTRICITY
 H03—BASIC ELECTRONIC CIRCUITRY
 H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
 H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
 H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
 H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
 H03M13/11—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits

 H—ELECTRICITY
 H03—BASIC ELECTRONIC CIRCUITRY
 H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
 H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
 H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
 H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
 H03M13/13—Linear codes

 G—PHYSICS
 G02—OPTICS
 G02F—DEVICES OR ARRANGEMENTS, THE OPTICAL OPERATION OF WHICH IS MODIFIED BY CHANGING THE OPTICAL PROPERTIES OF THE MEDIUM OF THE DEVICES OR ARRANGEMENTS FOR THE CONTROL OF THE INTENSITY, COLOUR, PHASE, POLARISATION OR DIRECTION OF LIGHT, e.g. SWITCHING, GATING, MODULATING OR DEMODULATING; TECHNIQUES OR PROCEDURES FOR THE OPERATION THEREOF; FREQUENCYCHANGING; NONLINEAR OPTICS; OPTICAL LOGIC ELEMENTS; OPTICAL ANALOGUE/DIGITAL CONVERTERS
 G02F1/00—Devices or arrangements for the control of the intensity, colour, phase, polarisation or direction of light arriving from an independent light source, e.g. switching, gating, or modulating; Nonlinear optics
 G02F1/01—Devices or arrangements for the control of the intensity, colour, phase, polarisation or direction of light arriving from an independent light source, e.g. switching, gating, or modulating; Nonlinear optics for the control of the intensity, phase, polarisation or colour
 G02F1/21—Devices or arrangements for the control of the intensity, colour, phase, polarisation or direction of light arriving from an independent light source, e.g. switching, gating, or modulating; Nonlinear optics for the control of the intensity, phase, polarisation or colour by interference
Abstract
Description
Training of Photonic Neural Networks Through in situ Backpropagation STATEMENT OF FEDERALLY SPONSORED RESEARCH
[0001] This invention was made with Government support under contract FA95501710002 awarded by the Air Force Ofﬁce of Scientiﬁc Research. The Government has certain rights in the invention. CROSSREFERENCE TO RELATED APPLICATIONS
[0002] The present application claims the beneﬁt of and priority to U.S. Provisional Patent Ap plication No. 62/669,899 entitled”Training of Photonic Neural Networks Through in Situ Back propagation”,ﬁled May 10, 2018 and to U.S. Provisional Patent Application No. 62/783,992 entitled”Training of Photonic Neural Networks Through in Situ Backpropagation”,ﬁled Decem ber 21, 2018. The disclosure of U.S. Provisional Patent Application Serial Nos. 62/669,899 and 62/783,992 are herein incorporated by reference in its entirety. FIELD OF THE INVENTION
[0003] The present invention generally relates to photonic neural networks and more speciﬁcally relates to training of photonic neural networks through in situ backpropagation. BACKGROUND
[0004] Recently, integrated optics has gained interest as a hardware platform for implementing machine learning algorithms, including artiﬁcial neural networks (ANNs), which rely heavily on matrixvector multiplications that may be done efﬁciently in photonic circuits. Artiﬁcial neural net works, and machine learning in general, are becoming ubiquitous for an impressively large number of applications. This has brought ANNs into the focus of research in not only computer science, but also electrical engineering, with hardware speciﬁcally suited to perform neural network op erations actively being developed. There are signiﬁcant efforts in constructing artiﬁcial neural network architectures using various electronic solidstate platforms, but ever since the conception of ANNs, a hardware implementation using optical signals has also been considered. Photonic implementations beneﬁt from the fact that, due to the noninteracting nature of photons, linear op erations– like the repeated matrix multiplications found in every neural network algorithm– can be performed in parallel, and at a lower energy cost, when using light as opposed to electrons.
[0005] Many implementations of photonic neural networks are trained using a model of the sys tem simulated on a regular computer, but this can be inefﬁcient for two reasons. First, this strategy depends entirely on the accuracy of the model representation of the physical system. Second, unless one is interested in deploying a large number of identical,ﬁxed copies of the ANN, any ad vantage in speed or energy associated with using the photonic circuit is lost if the training must be done on a regular computer. Alternatively, training using a brute force, in situ computation of the gradient of the objective function has been proposed. However, this strategy involves sequentially perturbing each individual parameter of the circuit, which is highly inefﬁcient for large systems. SUMMARY OF THE INVENTION
[0006] Systems and methods for training photonic neural networks in accordance with embodi ments of the invention are illustrated. One embodiment includes a method for training a set of one or more optical interference units (OIUs) of a photonic artiﬁcial neural network (ANN), wherein the method includes calculating a loss for an original input to the photonic ANN, computing an adjoint input based on the calculated loss, measuring intensities for a set of one or more phase shifters in the set of OIUs when the computed adjoint input and the original input are interfered with each other within the set of OIUs, computing a gradient from the measured intensities, and tuning phase shifters of the OIU based on the computed gradient.
[0007] In a further embodiment, computing the adjoint input includes sending the calculated loss through output ports of the ANN.
[0008] In still another embodiment, the intensities for each phase shifter of the set of phase shifters is measured after the phase shifter.
[0009] In a still further embodiment, the set of OIUs includes a mesh of controllable Mach Zehnder interferometers (MZIs) integrated in a silicon photonic circuit.
[0010] In yet another embodiment, computing the adjoint input includes sending the calculated loss through output ports of the ANN.
[0011] In a yet further embodiment, the loss is the result of a mean squared cost function.
[0012] In another additional embodiment, the ANN is a feedforward ANN.
[0013] In a further additional embodiment, the ANN is a recurrent neural network (RNN).
[0014] In another embodiment again, the ANN further includes a set of one or more activation units, wherein each OIU performs a linear operation and each activation unit performs a nonlinear function on the input.
[0015] In a further embodiment again, computing the adjoint input includes linearizing at least one activation unit of the set of activation units prior to sending an input through the ANN.
[0016] In still yet another embodiment, the method further includes steps for performing dropout during training by shutting off channels in the activation units.
[0017] In a still yet further embodiment, nonlinear functions of the ANN are performed using an electronic circuit.
[0018] In still another additional embodiment, each OIU of the set of OIUs includes a same num ber of input ports and output ports.
[0019] Additional embodiments and features are set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the speciﬁcation or may be learned by the practice of the invention. A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the speciﬁcation and the drawings, which forms a part of this disclosure. BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The description and claims will be more fully understood with reference to the follow ingﬁgures and data graphs, which are presented as exemplary embodiments of the invention and should not be construed as a complete recitation of the scope of the invention.
[0021] Figure 1 illustrates an example of a schematic for an artiﬁcial neural network (ANN).
[0022] Figure 2 illustrates an example of operations in an ANN.
[0023] Figure 3 illustrates a schematic of a process for experimental measurement of gradient information.
[0024] Figure 4 conceptually illustrates a process for experimental measurement of gradient in formation.
[0025] Figure 5 illustrates a numerical demonstration of a timereversal procedure.
[0026] Figure 6 illustrates how a timereversal interference technique could be performed for a layer embedded in a network.
[0027] Figure 7 conceptually illustrates a process for using a timereversal inference method to measure sensitivities.
[0028] Figure 8 illustrates how a timereversal inference technique can be applied without inter nal coherent detection and preparation.
[0029] Figure 9 illustrates a schematic of a recurrent neural network (RNN).
DETAILED DESCRIPTION
[0030] Turning now to the drawings, systems and methods in accordance with certain embodi ments of the invention can be used to train photonic neural networks. In some embodiments, meth ods can compute the gradient of the cost function of a photonic artiﬁcial neural network (ANN) by use of only in situ intensity measurements. Processes in accordance with several embodiments of the invention physically implement the adjoint variable method (AVM). Furthermore, methods in accordance with a number of embodiments of the invention scale in constant time with respect to the number of parameters, which allows for backpropagation to be efﬁciently implemented in a hybrid optoelectronic network. Although many of the examples described herein are described with reference to a particular hardware implementation of a photonic ANN, one skilled in the art will recognize that methods and systems can be readily applied to other photonic platforms without departing from the heart of the invention.
[0031] Currently, there is no efﬁcient protocol for the training of photonic neural networks, which is a crucial step for any machine learning application, and should ideally be performed on the same platform. Methods in accordance with a number of embodiments of the invention enable highly ef ﬁcient, in situ training of a photonic neural network. In a variety of embodiments, adjoint methods may be used to derive the photonic analogue of the backpropagation algorithm, which is the stan dard method for computing gradients of conventional neural networks. Gradients in accordance with a number of embodiments of the invention may be obtained exactly by performing intensity measurements within the device.
[0032] Training protocols in accordance with many embodiments of the invention can greatly simplify the implementation of backpropagation. Beyond the training of photonic machine learn ing implementations, methods in accordance with some embodiments of the invention may also be of broader interest to experimental sensitivity analysis of photonic systems and the optimization of reconﬁgurable optics platforms, among other applications. Photonic Neural Networks [0033] In its most general case, a feedforward ANN maps an input vector to an output vector via an alternating sequence of linear operations and elementwise nonlinear functions of the vectors, also called‘activations’. A cost function, L, is deﬁned over the outputs of the ANN and the ma trix elements involved in the linear operations are tuned to minimize L over a number of training examples via gradientbased optimization. The‘backpropagation algorithm’ is typically used to compute these gradients analytically by sequentially utilizing the chain rule from the output layer backwards to the input layer.
[0034] A photonic hardware platform implemented in accordance with certain embodiments of the invention is illustrated in Figure 1. The boxed regions correspond to optical interference units (OIUs) 105 that perform a linear operation represented by the matrix Wˆ_{l} . Each OIU can include a number of integrated phase shifters (e.g., 110) illustrated as rounded shapes within each OIU. In many embodiments, integrated phase shifters can be used to control an OIU and train a network. Photonic hardware platform 100 also includes nonlinear activations 115 represented as f_{l}(·).
[0035] In this example, photonic element 100 performs linear operations using optical interfer ence units (OIUs). OIUs in accordance with several embodiments of the invention are meshes of controllable MachZehnder interferometers (MZIs) integrated in a silicon photonic circuit. By tuning the phase shifters integrated in the MZIs, any unitary N ×N operation on the input can be implemented, whichﬁnds applications both in classical and quantum photonics. In photonic ANNs in accordance with some embodiments of the invention, OIUs can be used for each linear matrixvector multiplication. In certain embodiments, nonlinear activations can be performed us ing an electronic circuit, which involves measuring the optical state before activation, performing the nonlinear activation function on an electronic circuit such as a digital computer, and preparing the resulting optical state to be injected to the next stage of the ANN.
[0036] In the description of this example, the OIU is described by a number, N, of singlemode waveguide input ports coupled to the same number of singlemode output ports through a linear and lossless device. In certain embodiments, the device may also be extended to operate on differ ing numbers of inputs and outputs. OIUs in accordance with some embodiments of the invention implement directional propagation such that all powerﬂows exclusively from the input ports to the output ports. In its most general form, devices implement the linear operation
where X_{in} and Z_{out} are the modal amplitudes at the input and output ports, respectively, and Wˆ , or the transfer matrix, is the offdiagonal block of the system’s full scattering matrix,
[0037] The diagonal blocks are zero because forwardonly propagation is assumed, while the offdiagonal blocks are the transpose of each other because a reciprocal system is assumed. Z_{in} and X_{out} correspond to the input and output modal amplitudes, respectively, if the device were run in reverse, i.e., sending a signal in from the output ports.
Operation and training with backpropagation [0038] A key requirement for the utility of any ANN platform is the ability to train the net work using algorithms such as error backpropagation. Such training typically demands signiﬁcant computational time and resources and it is generally desirable for error backpropagation to be im plemented on the same platform.
[0039] The operation and gradient computation in an ANN in accordance with an embodiment of the invention is illustrated in Figure 2. In this example, propagation through a square cell (e.g., 215) corresponds to matrix multiplication, while propagation through a rounded region (e.g., 220) corresponds to activation. The ^ (e.g., 225) indicates elementwise vector multiplication.
[0040] The top row 205 corresponds to the forward propagation steps in the operation and train ing of an ANN. For the forward propagation step, processes in accordance with numerous embodi ments of the invention begin with an initial input to the system, X_{0}, and perform a linear operation on this input using an OIU represented by the matrix Wˆ_{1}. In several embodiments, processes can apply an elementwise nonlinear activation, f_{1}(·), on the outputs, giving the input to the next layer. This process can be repeated for each layer l until the output layer, L. Written compactly, for l = 1... L
[0041] Once forward propagation is completed, a cost function is computed to train the network. Cost function L is an explicit function of the outputs from the last layer . To train
the network, cost functions can be minimized with respect to the linear operators, Wˆ_{l} , which may be adjusted by tuning the integrated phase shifters within the OIUs in accordance with some em bodiments of the invention. In a variety of embodiments, training methods can operate without resorting to an external model of the system, while allowing for the tuning of each parameter to be done in parallel, therefore scaling signiﬁcantly better with respect to the number of parameters when compared to a brute force gradient computation method.
[0042] Once a cost (or loss) function is computed, backward propagation is performed to adjust the model based on the computed loss. Bottom row 210 of Figure 2 illustrates the backward propa gation steps. In a number of embodiments, backpropagation processes can derive an expression for the gradient of the cost function with respect to the permittivities of the phase shifters in the OIUs. In the following, e_{l} is the permittivity of a single phase shifter in layer l, as the same derivation holds for each of the phase shifters present in that layer. Note that Wˆ_{l} has an explicit dependence on e_{l} , but allﬁeld components in the subsequent layers also depend implicitly on e_{l} .
[0043] As a demonstration, a mean squared cost function is calculated
where T is a complexvalued target vector corresponding to the desired output of a system given input X_{0}.
[0044] Starting from the last layer in the circuit, the derivatives of the cost function with respect to the permittivity of the phase shifters in the last layer e_{L} are given by
where ^ is elementwise vector multiplication, deﬁned such that, for vectors a and b, the ith
¢
element of the vector a^b is given by a_{i}b_{i}. R{·} gives the real part, f_{l} (·) is the derivative of the
¢ lth layer activation function with respect to its (complex) argument. The vector d_{L} º G_{L} ^ f_{L} is deﬁned in terms of the error vecto
[0045] For any layer l < L, the chain rule can be used to perform a recursive calculation of the gradients,
[0046] This process is illustrated in the backward propagation of the second row 210 of Figure 2, which computes the d_{l} vectors sequentially from the output layer to the input layer. The com putation of d_{l} requires performing the operation which corresponds physically to
sending d_{l+1} into the output end of the OIU in layer l + 1. In this way, processes in accordance with many embodiments of the invention‘backpropagate’ the vectors d_{l} and G_{l} physically through the entire circuit.
[0047] In some embodiments, training a photonic ANN relies on the ability to create arbitrary complex inputs. Processes in accordance with several embodiments of the invention require an integrated intensity detection scheme to occur in parallel and with virtually no loss. In numerous embodiments, this can be implemented by integrated, transparent photodetectors.
[0048] The problem of overﬁtting is one that can be addressed by‘regularization’ in any practi cal realization of a neural network. Photonic ANNs in accordance with various embodiments of the invention provide a convenient alternative approach to regularization based on‘dropout’. In various embodiments, in a dropout procedure, certain nodes can be probabilistically and temporar ily‘deleted’ from the network during train time, which has the effect of forcing the network to ﬁnd alternative paths to solve the problem at hand. This has a strong regularization effect and has become popular in conventional ANNs. Dropout in accordance with some embodiments of the invention can be implemented in the photonic ANN by‘shutting off’ channels in the activation functions during training. Speciﬁcally, at each time step and for each layer l and element i, one may set f_{l}(Z_{i}) = 0 with someﬁxed probability.
[0049] Speciﬁc processes for training photonic neural networks in accordance with embodiments of the invention are described above; however, one skilled in the art will recognize that any number of processes can be utilized as appropriate to the requirements of speciﬁc applications in accor dance with embodiments of the invention.
[0050] For example, the discussion above assumes that the functions f_{l}(·) are holomorphic. For each element of input Z_{l} , labeled z, this means that the derivative of f_{l}(z) with respect to its com plex argument is well deﬁned. In other words, the derivative
does not depend on the direction that Dz approaches 0 in the complex plane.
[0051] In numerous embodiments, the backpropagation derivation can be extended to nonholomorphic activation functions. In the backpropagation algorithm, the change in the meansquared loss func tion with respect to the permittivity of a phase shifter in the last layer OIU as written in Eq. (5) is
Where the error vector was deﬁned as
for simplicity and ) is the output of theﬁnal layer.[0052] To evaluate this expression for nonholomorphic activation functions, f_{L}(Z) and its argu ment can be split into their real and imaginary parts
where i is the imaginary unit and a and b are the real and imaginary parts of Z_{L}, respectively. [0053] Evaluating gives the following via the chain rule
where the layer index has been dropped for simplicity. Here, terms of the form correspond
to elementwise differentiation of the vector x with respect to the vector y. For example, the ith element of the vector is given by
[0054] Now, inserting into Eq. (12):
[0055] The real and imaginary parts of G_{L} are deﬁned as G_{R} and G_{I}, respectively. Inserting the deﬁnitions of a and b in terms of Wˆ_{L} and X_{L1} and doing some algebra:
[0056] Finally, the expression simpliﬁes to
[0057] As a check, if the conditions for f_{L}(Z) to be holomorphic are set, namely
Eq. (20) simpliﬁes to
as before.
[0058] This derivation may be similarly extended to any layer l in the network. For holomorphic activation functions, whereas the d vectors were deﬁned as
for nonholomorphic activation functions, the respective deﬁnition is
( ) ( )
where G_{R} and G_{I} are the respective real and imaginary parts of G_{l} , u and v are the real and imagi nary parts of f_{l}(·), and a and b are the real and imaginary parts of Z_{l} , respectively. [0059] This can be written more simply as
[0060] In polar coordinates where Z = rexp(if) and f = f(r,f), this equation becomes
where all operations are elementwise.
Gradient computation [0061] Computing gradient terms of the form which contain derivatives with
respect to permittivity of the phase shifters in the OIUs, can be a crucial step in training an ANN. In certain embodiments, gradients can be expressed as the solution to an electromagnetic adjoint problem.
[0062] OIUs used to implement the matrix Wˆ_{l} , relating the complex mode amplitudes of input and output ports, can be described usingﬁrstprinciples electrodynamics. This can allow for the gradient to be computed with respect to each e_{l} , as these are the physically adjustable parameters in the system in accordance with some embodiments of the invention. Assuming a source at frequency w, at steady state, Maxwell’s equations take the form
which can be written more succinctly as
Here, eˆ_{r} describes the spatial distribution of the relative permittivity (e_{r}), k_{0} = w^{2}/c^{2} is the free space wavenumber, e is the electricﬁeld distribution, j is the electric current density, and Aˆ = Aˆ^{T} due to Lorentz reciprocity. Eq. (34) is the starting point of theﬁnitedifference frequencydomain (FDFD) simulation technique, where it is discretized on a spatial grid, and the electricﬁeld e is solved given a particular permittivity distribution, e_{r}, and source, b.[0063] To relate this formulation to the transfer matrix Wˆ , source terms b_{i}, i Î 1...2N can be deﬁned, which correspond to a source placed in one of the input or output ports. In this example, it is assumed that there are a total of N input and N output waveguides. The spatial distribution of the source term, b_{i}, matches the mode of the ith singlemode waveguide. Thus, the electricﬁeld amplitude in port i is given by b ^{T}
i e, and a relationship can be established between e and X_{in}, as
for i = 1 ... N over the input port indices, where
is the ith component of X_{in}. Or more compactly,Similarly,
for i+N = (N +1) ...2N over the output port indices, or,
and, with this notation, Eq. (1) becomes
[0064] Based on the above, the cost function gradient in Eq. (10) can be evaluated. In particular, with Eqs. (34) and (39),
Here b_{x,l1} is the modal source proﬁle that creates the inputﬁeld amplitudes X_{l1} at the input ports.
[0065] The key insight of the adjoint variable method is that the expression can be interpreted as an operation involving theﬁeld solutions of two electromagnetic simulations, which can be referred to as the‘original’ (og) and the‘adjoint’ (aj)
using the symmetric property of
[0066] Eq. (40) can now be expressed in a compact form as
[0067] If it is assumed that this phase shifter spans a set of points, rf in the system, then, from
where dˆ_{r,r} ¢ is the Kronecker delta.
[0068] Inserting this into Eq. (43), the gradient is given by the overlap of the twoﬁelds over the phaseshifter positions
[0069] A schematic illustration of methods in accordance with many embodiments of the inven tion for experimental measurement of gradient information is illustrated in three stages 305315 in Figure 3. The box region 320 represents the OIU. The ovals (e.g., 350) represent tunable phase shifters. Computation of the gradient is illustrated with respect to phase shifters 360 and 365.
[0070] In theﬁrst stage 305, the original set of amplitudes X_{l} is sent through the OIU 32. The constant intensity terms is measured at each phase shifter. The second stage 310 shows that the adjoint mode amplitudes, given by d_{l} , are sent through the output side of the OIU. is recorded
from the opposite side, as well as in each phaseshifter. In the third stage, X_{l} + X_{T R} is sent
through the OIU, interfering e_{og} and inside the device and recovering the gradient information
for all phase shifters simultaneously.
[0071] Methods in accordance with numerous embodiments of the invention compute the gra dient from the previous section through in situ intensity measurements. Speciﬁcally, an intensity pattern is generated with the form matching that of Eq. (45). Interfering
directly in the system results in the intensity pattern:
the last term of which matches Eq. (45). Thus, in many embodiments, the gradient can be com puted purely through intensity measurements if theﬁeld can be generated in the OIU.
[0072] The adjointﬁeld
as deﬁned in Eq. (42), is sourced by meaning that it physically corresponds to a mode sent into the system from the output ports. As complex conjugation in the frequency domain corresponds to timereversal of theﬁelds, is expected to be sent in from
the input ports. Formally, to generate a set of input source amplitudes, X_{T R}, is found such that
the output port source amplitudes, , are equal to the complex conjugate of the adjoint
amplitudes, or Using the unitarity property of transfer matrix Wˆ_{l} for a lossless system, along with the fact that
output modes, the input mode amplitudes for the timereversed adjoint can be computed asAs discussed earlier is the transfer matrix from output ports to input ports. Thus, X_{T R} can be
experimentally determined by sending into the device output ports, measuring the output at the input ports, and taking the complex conjugate of the result.
[0073] A process 400 for experimentally measuring a gradient of an OIU layer in an ANN with respect to the permittivities of the OIU layer’s integrated phase shifters is conceptually illustrated in Figure 4. Process 400 sends (405) originalﬁeld amplitudes X_{l1} through the OIU layer and measures the intensities at each phase shifter. Process 400 sends (410) d_{l} into the output ports of the OIU layer and measures the intensities at each phase shifter. Process 400 computes (415) the timereversed adjoint inputﬁeld amplitudes. In numerous embodiments, timereversed adjoint inputﬁeld amplitudes are calculated as in Eq. (47). Process 400 interferes (420) the original and timereversed adjointﬁelds in the device and measures the resulting intensities at each phase shifter. Process 400 computes (425) a gradient from the measured intensities. In a number of embodiments, processes compute gradients by subtracting constant intensity terms measured from the originalﬁeld amplitudes and d_{l} (e.g., at steps 405 and 410 of process 400) and multiply by k^{2}
0 to recover the gradient, as in Eq. (45).
[0074] In many embodiments, the isolated forward and adjoint steps are performed separately, storing the intensities at each phase shifter for each step, and then subtracting this information from theﬁnal interference intensity. In a variety of embodiments, rather than storing these constant in tensities, processes can introduce a lowfrequency modulation on top of one of the two interfering ﬁelds, such that the product term of Eq. (46) can be directly measured from the lowfrequency signal.
[0075] Speciﬁc processes for experimentally measuring a gradient of an OIU layer in an ANN in accordance with embodiments of the invention are described above; however, one skilled in the art will recognize that any number of processes can be utilized as appropriate to the requirements of speciﬁc applications in accordance with embodiments of the invention.
[0076] A numerical demonstration of a timereversal procedure is illustrated inﬁve panels 505 530 of Figure 5. In this example, the procedure is performed with a series of FDFD simulations of an OIU implementing a 3× 3 unitary matrix. These simulations are intended to represent the gradient computation corresponding to one OIU in a single layer, l, of a neural network with input X_{l1} and delta vector d_{l} . In these simulations, absorbing boundary conditions are used on the outer edges of the system to eliminate backreﬂections.
[0077] Theﬁrst panel 505 shows a relative permittivity distribution for three MZIs arranged to perform a 3x3 linear operation. Boxes (e.g., 550) represent where variable phase shifters could be placed in this system. As an example, the gradient information for a layer wi
with unit amplitude and d_{l} = [0 1 0]^{T} is computed, corresponding to the bottom left and middle right port, respectively.
[0078] The second panel 510 illustrates the real part of a simulated electricﬁeld E_{z} corresponding to injection from the bottom left port. Speciﬁcally, stage 510 shows the real part of e_{og}, correspond ing to the original, forwardﬁeld.
[0079] In the third panel 515, the real part of the adjoint E_{z} is shown, corresponding to injection from the middle right port. The third panel 515 shows the real part of the adjointﬁeld, e_{aj}, corre sponding to the cost function
[0080] The fourth panel 520 shows a timereversed adjointﬁeld in accordance with some em bodiments of the invention that can be fed in through all three ports on the left. Panel 520 shows the real part of the timereversed copy of e_{aj} as computed by the method described in the previous section, in which X^{*}
T_{ R} is sent in through the input ports. There is excellent agreement, up to a constant, between the complex conjugate of theﬁeld pattern of panel 515 and theﬁeld pattern of panel 520.
[0081] Theﬁfth panel 525 shows gradient information as obtained directly by the adjoint
method, normalized by its maximum absolute value. Panel 525 shows the gradient of the objective function with respect to the permittivity of each point of space in the system, as computed with the adjoint method, described in Eq. (45).
[0082] In the sixth panel 530, the gradient information as obtained by methods in accordance with a variety of embodiments of the invention is shown, normalized by its maximum absolute value. Namely, theﬁeld pattern from panel 510 is interfered with the timereversed adjointﬁeld of panel 520 and the constant intensity terms are subtracted from the resulting intensity pattern. In certain embodiments, the results are then multiplied by an appropriate set of constants. Panels 525 and 530 match with high precision.
[0083] In a realistic system, the gradient must be constant for any stretch of waveguide between waveguide couplers because the interferingﬁelds are at the same frequency and are traveling in the same direction. Thus, there should be no distance dependence in the corresponding intensity distribution. This is largely observed in this simulation, although smallﬂuctuations are visible because of the proximity of the waveguides and the sharp bends, which were needed to make the structure compact enough for simulation within a reasonable time. In practice, the importance of this constant intensity is that it can be detected after each phase shifter, instead of inside of it. Intensity measurements in accordance with some embodiments of the invention can occur in the waveguide regions directly after the phase shifters, which eliminates the need for phase shifter and photodetector components at the same location.
[0084] Numerically generated systems in accordance with many embodiments of the invention experience a power transmission of only 59% due to radiative losses and backscattering caused by very sharp bends and staircasing of the structure in the simulation. Nevertheless, the timereversal interference procedure still reconstructs the adjoint sensitivity with very goodﬁdelity. Further more, a reasonable amount of this loss is nonuniform due to the asymmetry of the structure. Choice and implementation of activation functions [0085] A major drawback of saturable absorption is that it is fundamentally lossy. Depending on the threshold power and waveguide implementation, an attenuation per layer of at least 1 dB can be expected. In a large scale photonic ANN with many layers, the compounding attenuation from each layer can bring the signal levels below the optical noiseﬂoor. Moreover, this scheme may require lowered activation power thresholds for successively deeper layers, which can be challeng ing to achieve for aﬁxed hardware platform and saturable absorption mechanism.
[0086] It is therefore of substantial interest to develop activation functions that are not subjected to the above limitations. Additionally, for simple implementation of the backpropagation algo rithm described in this work, activation functions in accordance with various embodiments of the invention can have derivatives that allow the operation d_{l} = G_{l} ^ f^{¢}(Z_{l}) to be performed simply in the photonic circuit.
[0087] A possible alternative to saturable absorption is the rectiﬁed linear unit (ReLU) activation, which is a common activation function in conventional realvalued ANNs with several complex valued variants. For example, the complex ReLU (cReLU) variant returns the output only if it is above a power threshold. This function is convenient because it is holomorphic (away from the discontinuity) and its derivative is simply 0 below the power threshold and 1 above the power threshold. Therefore, forward and backward propagation steps can be performed on the same hard ware. For forward propagation, one wouldﬁrst measure the power of the waveguide mode with an electrooptic circuit and close the channel if it is below a threshold. For backward propagation, simply leave this channel either closed or open, depending on the forward pass.
[0088] The modReLU variant similarlyﬁrst checks whether the input is above a power thresh old. If not, 0 is returned. Otherwise, it returns the input but with the power threshold subtracted from its amplitude.
[0089] Speciﬁc activation functions in accordance with embodiments of the invention are de scribed above; however, one skilled in the art will recognize that any number of activation func tions can be utilized as appropriate to the requirements of speciﬁc applications in accordance with embodiments of the invention. Avoiding internal coherent detection and reconstruction [0090] As presented thus far, the timereversal interference method requires coherent detection and state preparation inside of the device. This introduces potential technical challenges and the need for additional components within the device. To mitigate this issue, methods in accordance with some embodiments of the invention (also referred to as a‘linearization methods’) can recover gradient information without needing coherent detection within the device. In a variety of embod iments, methods can allow one to work entirely on the outsides of the ANN by selectively turning off or‘linearizing’ the activation functions between layers in a controlled fashion. However, such methods can require an activation that may implement backpropagation through some protocol that, itself, does not require internal coherent measurement and state preparation.
[0091] A timereversal interference method for a single linear OIU embedded in the network is described above. As described, timereversal interference methods can require coherent detec tion to obtain X^{*}
T_{R}, external computation of X_{TR}, and subsequent coherent preparation of X_{TR} in the device. Thus, implementing this in a multilayered network could require additional elements between each layer, complicating the experimental setup. An example of such a timereversal in terference technique for a layer embedded in a network is illustrated in Figure 6. In this example, coherent detection, external computation, and state preparation is required between layers 610 and 615, which can be undesirable.
[0092] To overcome this issue, methods in accordance with various embodiments of the invention can obtain the sensitivity of the internal layers without needing coherent detection or preparation within the network. The strategy relies on the ability to‘linearize’ the activations, such that they become represented as simple transmission for both forward and backward passes. For a linearized activation given by f_{l}(Z_{l}) = Z_{l} , Z_{l} = X_{l} and d_{l} = G_{l} , which greatly simpliﬁes both forward and backward propagation.
[0093] A process for tuning a linear layer Wˆ_{l} is conceptually illustrated in Figure 7. Process 700 linearizes (705) all activations after Wˆ_{l} and performs a forward pass with original input X_{0}. Pro cess 700 records (710) intensities within Wˆ_{l} and measures theoutput. Process 700 linearizes (715) all activations in network and sends the complex conjugate of the measured output into the output end of network and measures output b. Sending b^{*} into the input end of the network recreates the desired X_{l1} at the input of Wˆ_{l} . Process 700 linearizes (720) all activations in network before Wˆ_{l} and performs backpropagation, sending G_{L} into the output end of network. Process 700 measures (725) intensities in Wˆ_{l} for subtraction and the output c. Sending c^{*} into the input end of the network recreates the desired X_{TR} at the input end of Wˆ_{l} and d* at the output end of Wˆ_{l} . Process 700 inputs (730) b^{*} + c^{*} into completely linearized network, which reproduces the desired interference term X_{2} +X_{TR} at the input end of Wˆ_{l} . Process 700 measures (735) the intensities in Wˆ_{l} and computes sensitivity. Methods for computing the sensitivity based on the measured intensities are described in greater detail above.
[0094] By linearizing the activation functions, large sections of the network can be made unitary. This allows the encoding of information about the internalﬁelds in the externally measured states a, b, c. These states can be used to recreate the desired interference terms needed to obtain gradi ents as intensity measurements.
[0095] In numerous embodiments, methods in accordance with several embodiments of the in vention can be used in a general network. In order to compute gradients for layer l of L, each OIU can be described by a matrix Wˆ_{l} and each activation can be described by a Fˆ_{l} operator in the forward propagation and Bˆ_{l} in the backward propagation. Note that even though the activations are nonlinear, depending on the result of a forward pass through the network, they can be set and then may resemble linear operations.
[0096] For the timereversal interference method to work, an input to the system can be produced such that the mode X_{l} +X_{TR} is created right before Wˆ_{l} . Equivalently, the state
can be pro duced directly after Wˆ_{l} . Using the notation introduced, these states can be explicitly described as
[0097] A stage diagram of a procedure for using the timereversal interference method to measure sensitivities in the OIUs without internal coherent detection or preparation is illustrated in six stages 805830 of Figure 8. In theﬁrst stage 805, the gradient measurement with respect to the phase shifters is shown in the l = 3 layer, which requires creating an input state to recreate X_{2} + X_{TR} at the input Empty ovals 850 correspond to’linearized’ activations, which can be
represented as an identity matrix. Theﬁrst stage 805 shows all of the channels following layer l are linearized. The output of the system when sending in the original input, X_{0}, is computed, which is labeled a.[0098] The second stage 810 shows that all of the activation channels are linearized. Then, a^{*} is sent into the output end of the network and the output is measured, which is labeled b.
The complex conjugate of b is given by
[0099] The third stage 815 shows that sending b^{*} into the input end of the network recreates the desired X_{l1} at the input of Wˆ_{l} . In the fourth stage 820, only the activation channels before layer l are linearized and G_{L} is input into the output end of the system, measuring c.
[0100] Theﬁfth stage 825 shows that sending c^{*} into the input end of the network recreates the desired X_{TR} at the input end of Wˆ_{l} and d*
l at the output end of Wˆ_{l} . The sixth stage 830 shows that inputting b^{*} + c^{*} will output directly after layer l, which will be sufﬁcient for training via
the timereversal intensity measurement technique. To make the derivation more clear, these two terms are split up and the total output is deﬁned as o º o_{1} +o_{2} where
Inserting the form of b^{*} from Eq. (55) into the expression for o_{1},
as desired. Similarly, inserting the form of c^{*} from Eq. (57) into the expression for o_{2},
also as desired.
[0101] Thus, inputing b^{*} + c^{*} will give o = Z_{l} + d*
l at the output of layer l. Equivalently, this same input reproduces X_{l1} +X_{TR} at the input end of later l, which is what we need to do time reversal sensitivity measurements. The derivation presented here holds for an arbitrary choice of activation function, even those not representable by a linear operator. Other Applications [0102] In addition to the feedforward ANNs discussed in this work, methods in accordance with some embodiments of the invention can be used to train recurrent neural networks (RNNs), which are commonly used to analyze sequential data, such natural language or time series inputs. Re current networks, as diagrammed in Fig. 9, have a single linear layer and activation function. A schematic of a recurrent neural network (RNN) is illustrated in Figure 9 with a single OIU and activation function. A time series of inputs to the system are provided from the left. The output of this layer may be read out with a portion routed back into the input end of the network. In many embodiments, this splitting and routing may be done simply in the photonic platform using an integrated beam splitter. RNNs may also be trained using the same backpropagation method as described above.
[0103] Methods in accordance with several embodiments of the invention can be used to train Convolutional Neural Networks (CNNs). CNNs are another popular class of neural network that is commonly used to analyze image data, where there are many inputs with a large amount of corre lation between adjacent elements. In CNNs, the matrix multiplications are replaced by convolution operations between the input values and a trainable kernel. This convolution may be performed on a mesh of MZIs similar to those presented here, but with far fewer MZI units necessary. In various embodiments, backpropagation for CNNs may be performed using methods as described above.
[0104] The training scheme presented here can also be used to tune a single OIU without non linearity, which mayﬁnd a variety of applications. Speciﬁcally, the linear unit can be trained to map a set of input vectors {X_{i}} to a corresponding set of output vectors {Y_{i}}. One example for an objective function for this problem is
where j is a vector containing all the degrees of freedom. Other choices that also maximize the overlap between X_{i} and Y_{i} for every i are, of course, also possible.
[0105] One application of this training is a mode sorter for sorting out a number of arbitrary, or thogonal modes. In particular, for an N ×N linear system, N orthogonal inputs X_{i} can be chosen. Deﬁne Y_{i} = B_{i} as the unit vector for the ith output port. While this speciﬁc problem may also be solved sequentially by existing methods, gradientbased implementations in accordance with some embodiments of the invention may have beneﬁts in speed and scaling with number of parameters in the system, especially when the degrees of freedom can be updated in parallel.
[0106] Training protocols in accordance with numerous embodiments of the invention may also be used for optimizing a single (potentially random) input into a desired output, i.e.
This type of problem arises for example when considering optical control systems for tuning the power delivery system for dielectric laser accelerators. In this application, a series of waveguides carry power to a central accelerator structure. However, it is likely that these signals will initially consist of randomly distributed powers and phases due to the coupling, splitting, and transport stages earlier in the process. Thus, the OIUs can be used as a phase and amplitude sorting element, where now X is an initially random amplitude and phase in each waveguide, and d is a vector of target amplitudes and phases for optimal delivery to the dielectric laser accelerator. The adjoint ﬁeld is directly given by the radiation from an electron beam, so the target vector may be generated physically by sending in a test electron beam. In several embodiments, a similar system can be used for electron beam focusing and bunching applications.
[0107] In numerous embodiments, OIUs can also be used to implement reconﬁgurable optical quantum computations, with various applications in quantum information processing. In such sys tems, linear training with classical coherent light can be used to conﬁgure the quantum gates, e.g., by setting up a speciﬁc matrix Wˆ described by complete, orthonormal sets {X_{i}} and {Y_{i}}. After the tuning, systems in accordance with numerous embodiments of the invention can be run in the quantum photonic regime.
[0108] Methods in accordance with numerous embodiments of the invention work by physically propagating an adjointﬁeld and interfering its timereversed copy with the originalﬁeld. In a number of embodiments, the gradient information can then be directly obtained out as an insitu intensity measurement. While processes are described in the context of ANNs, one skilled in the art will recognize that the processes are broadly applicable to any reconﬁgurable photonic system. Such a setup can be used to tune phased arrays, optical delivery systems for dielectric laser accel erators, or other systems that rely on large meshes of integrated optical phase shifters. Methods in accordance with some embodiments of the invention can implement the nonlinear elements of the ANN in the optical domain.
[0109] Although the present invention has been described in certain speciﬁc aspects, many ad ditional modiﬁcations and variations would be apparent to those skilled in the art. It is therefore to be understood that the present invention may be practiced otherwise than speciﬁcally described. Thus, embodiments of the present invention should be considered in all respects as illustrative and not restrictive.
Claims
Priority Applications (4)
Application Number  Priority Date  Filing Date  Title 

US201862669899P true  20180510  20180510  
US62/669,899  20180510  
US201862783992P true  20181221  20181221  
US62/783,992  20181221 
Publications (1)
Publication Number  Publication Date 

WO2019217835A1 true WO2019217835A1 (en)  20191114 
Family
ID=68467574
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/US2019/031747 WO2019217835A1 (en)  20180510  20190510  Training of photonic neural networks through in situ backpropagation 
Country Status (1)
Country  Link 

WO (1)  WO2019217835A1 (en) 
Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US6374385B1 (en) *  19980526  20020416  Nokia Mobile Phones Limited  Method and arrangement for implementing convolutional decoding 
US20040107172A1 (en) *  20010925  20040603  Ruibo Wang  Optical pulsecoupled artificial neurons 
US20080154815A1 (en) *  20061016  20080626  Lucent Technologies Inc.  Optical processor for an artificial neural network 
US20150261058A1 (en) *  20130219  20150917  The University Of Bristol  Optical source 
US20170351293A1 (en) *  20160602  20171207  Jacques Johannes Carolan  Apparatus and Methods for Optical Neural Network 

2019
 20190510 WO PCT/US2019/031747 patent/WO2019217835A1/en unknown
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US6374385B1 (en) *  19980526  20020416  Nokia Mobile Phones Limited  Method and arrangement for implementing convolutional decoding 
US20040107172A1 (en) *  20010925  20040603  Ruibo Wang  Optical pulsecoupled artificial neurons 
US20080154815A1 (en) *  20061016  20080626  Lucent Technologies Inc.  Optical processor for an artificial neural network 
US20150261058A1 (en) *  20130219  20150917  The University Of Bristol  Optical source 
US20170351293A1 (en) *  20160602  20171207  Jacques Johannes Carolan  Apparatus and Methods for Optical Neural Network 
Similar Documents
Publication  Publication Date  Title 

Buck  Fundamentals of optical fibers  
Gambetta et al.  Building logical qubits in a superconducting quantum computing system  
LalauKeraly et al.  Adjoint shape optimization applied to electromagnetic design  
Kocabaş et al.  Modal analysis and coupling in metalinsulatormetal waveguides  
Hong et al.  Optical pattern classifier with perceptron learning  
Chen  Nonlinear time series modelling and prediction using Gaussian RBF networks with enhanced clustering and RLS learning  
Larger et al.  Highspeed photonic reservoir computing using a timedelaybased architecture: Million words per second classification  
Goetsch et al.  Linear stochastic wave equations for continuously measured quantum systems  
Crespi et al.  Anderson localization of entangled photons in an integrated quantum walk  
He et al.  Creation of highquality longdistance entanglement with flexible resources  
Opper et al.  Adaptive and selfaveraging ThoulessAndersonPalmer meanfield theory for probabilistic modeling  
Eden et al.  The analytic Smatrix  
Harris et al.  Quantum transport simulations in a programmable nanophotonic processor  
Miller  Perfect optics with imperfect components  
CA2724617C (en)  Systems, methods, and apparatus for calibrating, controlling, and operating a quantum processor  
Ruggenthaler et al.  Quantumelectrodynamical densityfunctional theory: Bridging quantum optics and electronicstructure theory  
Ndagano et al.  Characterizing quantum channels with nonseparable states of classical light  
Tait et al.  Neuromorphic photonic networks using silicon photonic weight banks  
Rao et al.  An improved ADIFDTD method and its application to photonic simulations  
Ixaru et al.  Piecewise perturbation methods for calculating eigensolutions of a complex optical potential  
Burger et al.  Inverse problem techniques for the design of photonic crystals  
Ibrahim et al.  Orbitalangularmomentum entanglement in turbulence  
WO2010148120A2 (en)  Systems and methods for solving computational problems  
Yu  Electromagnetic simulation techniques based on the FDTD method  
Solntsev et al.  Generation of nonclassical biphoton states through cascaded quantum walks on a nonlinear chip 
Legal Events
Date  Code  Title  Description 

121  Ep: the epo has been informed by wipo that ep was designated in this application 
Ref document number: 19799982 Country of ref document: EP Kind code of ref document: A1 