CN117436486A

CN117436486A - Optical convolution neural network based on thin film lithium niobate and silicon mixture

Info

Publication number: CN117436486A
Application number: CN202311424796.9A
Authority: CN
Inventors: 董建绩; 周浩军; 吴波
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-23

Abstract

The invention discloses an optical convolution neural network based on the mixture of film lithium niobate and silicon, belonging to the field of optical calculation. The optical convolutional neural network includes: an input information loading layer, a convolution layer, a pooling layer, a nonlinear layer and a full connection layer; the input information loading layer performs spectrum shaping on the optical frequency comb to finish loading of an input vector, the convolution layer uses a thin film lithium niobate phase modulator to finish convolution calculation, the pooling layer uses a silicon-based filter to perform average pooling, the nonlinear layer uses a PD to drive an active micro-ring modulator to perform nonlinear operation on an optical signal, and the full-connection layer uses a silicon-based micro-ring array to finish matrix vector multiplication. The convolutional neural network utilizes online training to complete network convergence, thereby realizing the functions of image classification, processing and the like. The optical convolution neural network based on the mixing of the thin film lithium niobate and the silicon has the advantages of high calculation power, low energy consumption, ultra-compactness and the like, has expandability, and greatly improves the performance of optical matrix calculation.

Description

Optical convolution neural network based on thin film lithium niobate and silicon mixture

Technical Field

The invention belongs to the technical field of optical computation, and particularly relates to an optical convolution neural network based on a thin film lithium niobate and silicon mixture.

Background

With the advent of big data and artificial intelligence age, the computational power and energy consumption demands on computers have grown exponentially. However, conventional integrated circuit designs have approached the limit of moore's law, and the size and power consumption of electronic transistors has not continued to shrink in a conventional manner, meaning that the ever-increasing computational power demands have not been met, and the heat generated by electronic transistors has become difficult to handle. To address this challenge, new computing architectures are being used in a brand new sense. Photonic devices have been attracting attention for their excellent performance characteristics, including high bandwidth, low cross-talk, and low power consumption. These features have led to a wide range of applications for photonic devices in the communications and computing fields. In big data processing and artificial intelligence algorithms, convolution computation takes a significant place and requires huge computational resources. Optical devices are widely used in convolution computing applications, such as optical convolution neural networks and image processing, due to their multi-dimensional resources and parallel transmission characteristics. Integrated optical convolution calculations will become an important acceleration engine for future CPUs.

At present, optical convolutional neural networks are mainly divided into four types from the principle of realizing convolution: matrix multiplication, delay accumulation, spatial fourier transform, and frequency domain convolution. The core ideas of the first and second types are mainly to accomplish dot product with various integrated photonic devices. Both schemes require repeated and moving of the input information, and such redundant computation can significantly increase memory usage and access. And as the scale of convolution calculations increases, the number of devices required for these schemes also increases, which can be detrimental to the compactness of the chip. The core idea of the third and fourth classes is mainly to use fourier transforms. The third category is to load the input vector by using spatial information, so that the input vector can be expanded in a severe way. The fourth class mainly faces the problems that the input data loading needs complex electric Fourier transform operation, the bandwidth is low and the like at present.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention aims to provide an optical convolution neural network based on the mixture of film lithium niobate and silicon, and aims to solve the problems of calculation redundancy, large space resource consumption and low operation rate existing in the existing integrated optical convolution architecture.

To achieve the above object, the present invention provides an optical convolutional neural network based on a thin film lithium niobate and silicon mixture, which includes an input information loading layer, a convolutional layer, a pooling layer, a nonlinear layer, and a full-connection layer.

The input information loading layer comprises an optical frequency comb light source and an optical frequency comb spectrum shaping part. The optical frequency comb light source can be a distributed feedback (Distributed feedback, DFB) laser to generate single wavelength laser, and then a thin film lithium niobate micro-ring Modulator driven by single frequency microwave to generate the laser, or a Phase Modulator (PM) +intensity Modulator (Intensity Modulator, IM) to generate frequency comb, or micro-ring of other materials (such as Si) ₃ N ₄ AlGaAsOI, etc.), or directly utilize a multi-wavelength light source at equally spaced frequencies. The optical frequency comb shaping shapes the intensity of the generated optical frequency comb teeth into a neural network input vector, and can be completed by a field programmable gate array (Field programmable gate array, FPGA) module to load voltage on a thin film lithium niobate micro-ring resonator array through a Digital-to-analog converter (DAC), wherein the micro-ring corresponds to the intensity of each comb tooth one by one, and can also be completed by other spectral shaping devices, such as a Wavelength Division Multiplexer (WDM) +IM+WDM, and the like.

The convolutional layer is accomplished by a single thin film lithium niobate phase modulator that is driven by a high-speed Radio Frequency (RF) signal generated by an AWG. The optical frequency comb loaded with the input vector enters a thin film lithium niobate phase modulator driven by an AWG signal to carry out convolution calculation, and the repetition frequency of the AWG signal is required to be the same as that of the signal driving the thin film lithium niobate micro-ring modulator, namely the repetition frequency of the AWG signal is required to be the same as that of the frequency interval of the input vector. To produce the desired target convolution kernel, the AWG needs to be controlled by the FPGA.

The pooling layer is completed by a flat top filter with a certain bandwidth. The filter can be completed by a plurality of groups of micro-ring arrays, each group of micro-rings can be single or a plurality of micro-rings can be connected in series or in parallel to form flat-top filtering with a certain bandwidth, so that each group of micro-rings can sequentially filter out wavelengths where a plurality of adjacent convolution results are located, and average pooling is realized. The pooling layer may also be implemented by other flat top filters with a certain bandwidth.

The nonlinear layer is completed by a Photodetector (PD) driven active silicon-based micro-ring. PD is the silicon germanium photodetector on the chip, and the active micro-ring is PN doped micro-ring modulator. And each group of output wavelengths of the pooling layer enter different PDs to convert light intensity information into photocurrents, and the photocurrents respectively drive a plurality of active micro-loops to generate nonlinearity. The active micro-ring is cascaded on a waveguide, and the input light source is the same as the input information loading layer, and the method comprises initial optical frequency comb generation and optical frequency comb shaping, so that the intensities of a plurality of wavelengths entering the active micro-ring array are the same. Each active micro-ring loads a nonlinear response onto a corresponding wavelength.

The full-connection layer is composed of a silicon-based micro-ring array. And the pooling result (N wavelengths) output by the pooling layer is equally divided into M parts by an on-chip beam splitter, and each part enters a weight library cascaded by N micro-rings. Each micro-ring regulates the weight corresponding to one wavelength. The output is summed with the intensity using PD to achieve MXN matrix vector multiplication. Other schemes of incoherent wavelength intensity superposition, such as MZI, phase change material schemes, etc., may also be employed to implement the fully connected layer.

Further, the convolutional neural network utilizes online training to complete convergence of the network. The output port of the network is connected with the FPGA through the ADC, the FPGA controls the convolution vector generated by the AWG, and the output port of the network is connected with the micro-ring array through the DAC, so that the input information loading and the matrix of the full-connection layer are controlled, and a feedback system is formed. The digital-to-analog conversion chip is used for outputting analog voltage and controlling the thermal phase shifter and the modulator. The analog-to-digital conversion chip is used for converting the analog electric signals detected by the photoelectric detector into digital electric signals, and is convenient for the subsequent processing of the FPGA. And (3) performing online training by using a gradient descent algorithm, and adjusting various parameters of the network.

Further, the network may implement convolution calculations in the real number domain. Dividing an input signal into two paths, respectively completing convolution calculation with the film lithium niobate PM, respectively pooling the results, and then differentiating the results by a balance detector to complete convolution calculation of a real number domain.

Furthermore, the convolution layer, the pooling layer, the nonlinear layer and the full-connection layer of the network can be arranged and combined at will or can be expanded at will, and the network can realize the layer number expansion of the deep convolution neural network due to the relay supplement of the light source, so that the convolution neural network with larger scale and complexity is realized.

The optical frequency comb light source and the optical frequency comb spectrum shaping unit can be a thin film lithium niobate device integrated on a mixing platform or an off-chip separation device.

The thin film lithium niobate modulator, other silicon-based micro-rings, PD detectors and the like are integrated on a silicon-thin film lithium niobate mixing platform. The film lithium niobate is used for realizing the generation of an optical frequency comb light source and a convolution layer; silicon is used to achieve pooling, nonlinearity, and fully connected layers.

The FPGA module comprises an FPGA chip, a digital-analog converter chip and an analog-digital converter chip. The FPGA chip is used for controlling the digital-analog/analog-digital conversion chip and executing basic arithmetic operation and particle swarm algorithm. The digital-to-analog conversion chip is used for outputting analog voltage and controlling the thermal phase shifter and the modulator. The analog-to-digital conversion chip is used for converting the analog electric signals detected by the photoelectric detector into digital electric signals, and is convenient for the subsequent processing of the FPGA.

The detector array may be an on-chip germanium-silicon PIN photodetector or an off-chip discrete III-V PIN photodetector that functions to convert output light intensity information into voltage information.

Further, the particle swarm algorithm is used to optimize the voltage value of the hot electrode during the actual application so that the value of the loss function is minimized. The application scenario is not limited to optical neural networks and the like.

In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:

1. compared with the redundancy brought by the traditional scheme of converting convolution calculation into matrix multiplication, the optical convolution neural network based on the thin film lithium niobate and silicon mixed optical convolution neural network provided by the invention has the advantages that the existing wavelength resource is fully utilized, and complex Fourier transformation is avoided being completed on an electronic computer.

2. Compared with the problems of large space resource consumption and poor expandability existing in the prior art, the scale of the input vector of the optical convolution neural network based on the mixing of the thin film lithium niobate and the silicon is only dependent on the wavelength number of an optical frequency comb, and the scale of the convolution kernel is only dependent on the AWG sampling rate and the driving power.

3. According to the optical convolution neural network based on the mixing of the thin film lithium niobate and the silicon, the lithium niobate platform and the silicon-based platform are mixed for use, part of modules can be integrated in a single chip or in a mixed mode, and the modules can also be used as separation modules independently, so that each material is ensured to be optimally selected according to the functions of the materials.

Drawings

Fig. 1 is a schematic structural diagram of an optical convolutional neural network based on a thin film lithium niobate and silicon mixture provided by an example of the present invention.

Fig. 2 is a schematic diagram of an implementation of a nonlinear layer in an optical convolutional neural network based on a hybrid of a thin film lithium niobate chip and a silicon-based chip according to an embodiment of the present invention, (a) where an input voltage of an active micro-ring modulator is less than a threshold voltage, and (b) where the input voltage of the active micro-ring modulator is greater than the threshold voltage.

Fig. 3 is a schematic diagram of a convolutional architecture of an optical convolutional neural network based on a hybrid of thin film lithium niobate and silicon provided by an example of the present invention.

Fig. 4 is a graph of comparison between several sets of convolution calculation experimental results and target results based on a thin film lithium niobate and silicon mixed optical convolution neural network according to an embodiment of the present invention, (a) (e) is an input vector, (b) (f) is a graph of comparison between a convolution kernel obtained through experimental training and a target convolution kernel, (c) (g) is a graph of comparison between a convolution result obtained through experiments and a target convolution result, and (d) (h) is a spectrum graph of a convolution result.

Fig. 5 is a schematic diagram of a real-number domain convolution architecture of an optical convolution neural network based on a thin-film lithium niobate and silicon mixture provided by an example of the present invention.

Fig. 6 is an experimental architecture diagram of an optical convolutional neural network based on a thin film lithium niobate and silicon hybrid, provided by an example of the present invention.

Fig. 7 shows the results of an optical convolutional neural network based on a thin film lithium niobate and silicon hybrid on an MNIST handwritten number set provided by the example of the present invention, and compares the results with the results of simulation performed on an electronic computer with the same architecture, (a) (b) is an iteration curve and a confusion matrix for realizing two classifications for online training, (e) (f) is an iteration curve and a confusion matrix for realizing two classifications for the electronic computer, (c) (d) is an iteration curve and a confusion matrix for realizing four classifications for online training, and (g) (h) is an iteration curve and a confusion matrix for realizing four classifications for online training.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not interfere with each other.

As shown in fig. 1, an optical convolutional neural network architecture based on a hybrid of a thin film lithium niobate chip and a silicon-based chip according to an embodiment of the present invention includes an input information loading layer 1, a convolutional layer 2, a pooling layer 3, a nonlinear layer 4, and a full-connection layer 5. In the input information loading layer 1, the optical frequency comb light source generates a multi-wavelength optical frequency comb, input information is loaded through an optical frequency comb spectrum shaping unit, and the shaping unit is controlled by an FPGA through a DAC. In the convolution layer 2, the optical frequency comb loaded with input information completes convolution calculation in a frequency domain through a thin film lithium niobate phase modulator, and a signal generated by the convolution kernel by the AWG is loaded to PM. And in the pooling layer 3, the convolution calculation results are filtered by the silicon-based passive micro-ring for each group, so that the average pooling operation is completed. And in the nonlinear layer 4, the pooling result is converted into an electric signal by PD detection to drive a silicon-based active micro-ring, and a new optical frequency comb is input into the input end of the micro-ring by another group of DFB lasers and a thin film lithium niobate micro-ring, so that nonlinear operation can be completed. And at the full connection layer 5, inputting the optical frequency comb obtained through nonlinear operation into a micro-ring array for matrix vector multiplication, controlling the weight value of the micro-ring by an FPGA, and finally detecting by using PD at the downloading end of the micro-ring to complete full connection. And finally, feeding back the output result to the FPGA through the ADC, and adjusting network parameters by utilizing a gradient descent algorithm to realize the function of online training of the optical convolutional neural network.

Fig. 2 shows the principle of realization of the nonlinear layer 4 in an optical convolutional neural network based on a mixture of thin film lithium niobate chips and silicon-based chips. V (V) _M Is the input voltage (driving forward biased PN junction) of an active Micro-ring modulator (MRM). Consider the case where the resonant wavelength lambda of the MRM _res Initially and the wavelength lambda of the supplied light _laser Alignment. As shown in fig. 2 (a), when the input voltage V of the MRM _M Less than threshold voltage V _TH (corresponding to the built-in potential of the micro-ring PN junction), the PN junction remains off, and no carriers are injected into the PN junction. Thus lambda is _res And lambda is _laser Maintain alignment while outputting the optical power P _out Still low because the supplied light is filtered by the notch response of the MRM. As shown in fig. 2 (b), when the driving current is large enough to make V _M Exceeding V _TH When the PN junction is opened, the injected carriers change the refractive index of the optical waveguide in the PN junction. As a result, lambda _res Offset, output optical power increases.

Fig. 3 illustrates the principle of optical frequency flow convolution based on a high-speed lithium niobate modulator. In the experiment, a laser is used for generating single-wavelength light, an optical frequency comb is generated through a thin film lithium niobate IM and PM, an optical filter (Waveshape, WS) is used for carrying out weighted assignment on input wavelength power, input information loading is realized, convolution calculation is completed through the thin film lithium niobate PM, and a convolution kernel is provided by an RF signal generated by an AWG. Let us assume that the input light is C _in (omega) the modulation function (i.e. convolution kernel) is K (omega) and the output light is C _out (ω), the convolution calculation can be expressed as:

k (ω) is the Fourier transform of the phase modulation function h (t), expressed as:

a _m to modulate the mth order amplitude of the signal, J _n Is an n-order Bessel function. In the frequency domain, the equation can be converted into a form of convolution:

in order to obtain a target convolution kernel, single-wavelength light can be directly input into a film lithium niobate PM, an output result is converted into an electric vector through a filter and the PD and then is input into an FPGA, the FPGA utilizes a gradient descent algorithm to feed back and adjust the AWG until iteration converges, and the output is the target convolution kernel. At this time, WS is adjusted to load any input vector and input the vector into PM, so that any convolution calculation with the convolution kernel can be completed. Fig. 4 shows two sets of experimental results of convolution calculations, we first obtain the convolution kernels shown in (b) and (f) in fig. 4 by using the above-mentioned online training method, where the correlation coefficients are defined as:then, two groups of input optical frequency combs (shown as (a) and (e) in fig. 4) are respectively input to obtain two corresponding groups of convolution results (shown as (c) and (g) in fig. 4). Fig. 4 (d) and (h) show that the two sets of convolution calculations are performed at frequency intervals of 4GHz and 8GHz, respectively.

Fig. 5 shows an architectural diagram of real-number domain calculations based on optical frequency flow convolution of a high-speed thin film lithium niobate modulator. Since a convolution kernel of a real number domain is often required in the fields of image processing and the like to perform operations such as edge extraction on an image, we slightly change the architecture of fig. 3, so that the convolution kernel can complete convolution calculation of the real number domain. In the data loading part, which is consistent with the framework of fig. 3, the input optical frequency comb is divided into two paths only in the convolution part, the upper lithium niobate PM and the lower lithium niobate PM are respectively subjected to convolution calculation, namely, respectively subjected to convolution with the positive number part and the negative number part of the convolution kernel, and the result is respectively filtered by the micro-loop filter and then subjected to difference by the balance detector, so that the convolution of the real number domain can be completed. In the method for obtaining the real number domain convolution kernel by using training, we do not need to care about the convolution kernel corresponding to the AWG signal loaded on each lithium niobate PM, but only need to ensure that the difference result is closer to the target real number domain convolution kernel.

Fig. 6 shows an experimental architecture diagram of an optical convolutional neural network based on a thin film lithium niobate and silicon hybrid, where the input information loading layer 1 and the convolutional layer 2 are consistent with the structure in fig. 3, and the pooling layer 3 adopts a flat-top filter with a certain bandwidth which is realized by connecting every two micro-loops in parallel so as to complete average pooling, and the subsequent nonlinear units and full-connection units are completed in an electric domain due to the limitation of experimental devices. Here we mainly show its classification task on MNIST handwritten digital datasets. The 28×28 image is compressed into 5×5, then converted into 25×1 vector and loaded onto an input optical frequency comb, convolution calculation is completed through a film lithium niobate PM, and pooling operation is carried out through a silicon-based micro-ring filter to obtain 8×1 vector (2 classification task)/16×1 vector (4 classification task). After PD detection, a nonlinear activation function (Nonlinear active function, NAF) module and a Full Connection (FC) module are fed back to the FPGA, and the FPGA controls the AWG and the WS by using a gradient descent algorithm to complete online training. Using 30 pictures as training set, the iterative process is shown as (a) (2 classification task) in fig. 7 and (c) (4 classification task) in fig. 7. After training, the confusion matrix obtained using 200 pictures as the test set is shown in fig. 7 (b) (classification task 2) and fig. 7 (d) (classification task 4). The accuracy was 95.5% and 71.5%, respectively. This compares favorably with the results of training and testing the same single layer structure on an electronic computer with 97.5% and 75.5% accuracy (as shown in (e) - (h) of fig. 7), respectively. From the application result of the optical neural network, the optical convolutional neural network based on the thin film lithium niobate and silicon mixture is fully feasible.

The invention provides an optical convolution neural network based on the mixing of film lithium niobate and silicon, the convolution scale of the network is not limited, the data redundancy caused by the traditional matrix multiplication is effectively avoided by adopting frequency flow convolution, the input data fully utilizes the existing wavelength resource, and the complex Fourier transform on an electronic computer is avoided. The full-on-chip hybrid integrated optical convolutional neural network constructed by the method enables the thin film lithium niobate and the silicon-based material to be optimally selected according to functions of the thin film lithium niobate and the silicon-based material, so that TOPS-level calculation power and mW-level power consumption are realized. The invention lays a foundation for future heterogeneous integrated optical computing architecture.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An optical convolution neural network based on the mixture of film lithium niobate and silicon is characterized by comprising an input information loading layer, a convolution layer, a pooling layer, a nonlinear layer and a full connection layer;

the input information loading layer is used for encoding input information onto the optical frequency comb teeth through spectral shaping, the convolution layer is used for completing convolution calculation by using the thin film lithium niobate phase modulator, the pooling layer is used for carrying out average pooling on convolution results, the nonlinear layer is used for carrying out nonlinear operation on the average pooling results, and the full-connection layer is used for completing matrix vector multiplication.

2. The optical convolutional neural network based on a mixture of thin film lithium niobate and silicon according to claim 1, wherein the input information loading layer comprises an optical frequency comb light source for generating a frequency comb and an optical frequency comb spectral shaping module for shaping the intensity of the generated optical frequency comb teeth into an input vector of the neural network.

3. An optical convolutional neural network based on a mixture of thin film lithium niobate and silicon according to claim 1, wherein the convolutional layer comprises a single thin film lithium niobate phase modulator driven by a high speed radio frequency signal generated by an AWG; the optical frequency comb loaded with the input vector enters a thin film lithium niobate phase modulator driven by an AWG signal to carry out convolution calculation, and the repetition frequency of the AWG signal is required to be the same as the repetition frequency of the driving signal of the input information loading layer, namely the same as the frequency interval of the input vector.

4. The optical convolution neural network based on the thin film lithium niobate and silicon mixture as claimed in claim 1, wherein the pooling layer comprises a plurality of groups of micro-ring arrays, each group of micro-ring arrays is a single micro-ring or a plurality of micro-rings are connected in series or in parallel to form a flat-top filter with a certain bandwidth, so that each group of micro-ring arrays can sequentially filter out wavelengths where adjacent convolution results are located, and average pooling is achieved.

5. An optical convolutional neural network based on a thin film lithium niobate and silicon mixture according to claim 1, wherein the nonlinear layer comprises an active silicon-based micro-ring, the active silicon-based micro-ring is driven by a PD, each set of output wavelengths of the pooling layer enters a different PD to convert light intensity information into photocurrents, the photocurrents respectively drive a plurality of active silicon-based micro-rings to generate nonlinearities, and each active micro-ring loads a nonlinear response onto a corresponding wavelength.

6. The optical convolutional neural network based on the mixture of thin film lithium niobate and silicon according to claim 1, wherein the fully-connected layer comprises m×n silicon-based micro-rings, the N wavelengths output by the pooling layer are equally divided into M parts, each part enters a weight library cascaded by the N silicon-based micro-rings, each micro-ring regulates a weight corresponding to one wavelength, and the output is summed by PD to the intensity to realize m×n matrix vector multiplication.

7. The optical convolutional neural network based on the mixture of thin film lithium niobate and silicon according to claim 1, wherein the convolutional neural network utilizes on-line training to complete the convergence of the network, the output port of the network is connected with an FPGA through an ADC, the FPGA controls the convolutional vector generated by the AWG, and is connected with a silicon-based micro-ring array through a DAC, so as to control the input information loading and the matrix of the full connection layer, thereby forming a feedback system.

8. An optical convolution neural network based on a mixture of thin film lithium niobate and silicon according to claim 3, wherein the input signal is divided into two paths, the two paths respectively complete convolution calculation with the thin film lithium niobate phase modulator, and the convolution calculation of real number domain can be completed by respectively pooling the results and then differentiating the results by a balance detector.

9. The optical convolutional neural network based on the mixture of thin film lithium niobate and silicon according to claim 1, wherein the convolutional layer, the pooling layer, the nonlinear layer and the full-connection layer of the network are expanded by arbitrary permutation and combination or arbitrary cascade, so as to realize the layer number expansion of the deep convolutional neural network, thereby realizing the convolutional neural network with larger scale and more complexity.