CN113869508A

CN113869508A - Optical neural network training method, system, storage medium and equipment

Info

Publication number: CN113869508A
Application number: CN202111035185.6A
Authority: CN
Inventors: 黄萍; 吴睿振; 陈静静; 王凛
Original assignee: Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Current assignee: Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date: 2021-09-05
Filing date: 2021-09-05
Publication date: 2021-12-31

Abstract

The invention provides a method, a system, a storage medium and equipment for training an optical neural network, wherein the method comprises the following steps: inputting a plurality of samples in a training data set into an optical neural network with a parameter of a first iteration as an initial value to obtain a plurality of output data; obtaining a loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples respectively; obtaining a derivative of a loss function for the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm; obtaining a numerical value of a parameter of the next iteration based on a derivative of the loss function and the parameter through a first-order optimization algorithm, and inputting a plurality of samples into an optical neural network with the parameter being the numerical value; and if the last iteration is reached currently, obtaining an updated parameter based on the derivative and the parameter of the loss function of the last iteration through a first-order optimization algorithm, and obtaining the trained optical neural network based on the updated parameter. The invention improves the deduction accuracy and robustness of the optical neural network.

Description

Optical neural network training method, system, storage medium and equipment

Technical Field

The present invention relates to the field of optical neural network technology, and in particular, to an optical neural network training method, system, storage medium, and device.

Background

In recent years, the demand for optical computing techniques has increased rapidly due to: firstly, with the gradual failure of moore's law and the continuous improvement of the requirements of the big data era on the power consumption and the speed of a computing system, the characteristics of high speed and low power consumption of an optical computing technology are more and more emphasized by people; secondly, the parallelism operation characteristic of the optical computing technology and the development of algorithms and hardware architectures such as an optical neural network provide the most potential solution for the demands of the artificial intelligence technologies such as image recognition, voice recognition, virtual reality and the like on computing power. The light calculation can be divided into an analog light calculation and a digital light calculation. The most typical example of the analog light calculation is fourier operation, and fourier transform related calculation, such as convolution calculation, needs to be applied in the field of image processing and the like. The calculation of the fourier transform with a conventional computer is very computationally expensive, and the passage of light through the lens is itself a fourier transform process, which requires almost no time at all. The digital optical calculation is to form a classic logic gate by combining light and an optical device, construct a calculation system similar to the traditional digital electronic calculation principle, and realize calculation through complex logic gate combination operation.

ONN (Optical Neural Network) training is to train a model on a pure software engine, train model parameters using a Back Propagation (BP) algorithm, and then map the trained model onto an Optical device for inference through matrix SVD decomposition and MZI (Mach-zehnder interferometer) topology parameterization. The advantage of this approach is that the weight matrix can be trained in a noise-free environment using existing deep learning toolkits. However, this approach suffers from the following drawbacks in terms of efficiency, performance and robustness:

1) pure software-based ONN training is limited by digital computer performance, while considering the computational cost of matrix decomposition and parameterization, digital computers are inefficient in simulating ONN architectures.

2) Optical devices introduce certain noise due to manufacturing variations and control inaccuracies, thermal effects, etc., and pure software trained ONN models result in severe performance degradation and poor robustness due to lack of accurate non-ideal modeling.

Disclosure of Invention

In view of the above, an objective of the present invention is to provide an optical neural network training method, system, storage medium and device, so as to solve the problems of low performance and poor robustness of an optical neural network trained in the prior art.

Based on the above purpose, the present invention provides an optical neural network training method, which comprises the following steps:

inputting a plurality of samples in a training data set into an optical neural network with a parameter of a first iteration as an initial value to obtain a plurality of output data corresponding to the plurality of samples respectively;

obtaining a loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples respectively;

obtaining a derivative of a loss function for the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm;

obtaining a numerical value of a parameter of the next iteration based on a derivative of the loss function and the parameter through a first-order optimization algorithm, and inputting a plurality of samples into an optical neural network with the parameter being the numerical value;

and in response to the current last iteration, obtaining an updated parameter based on the derivative and the parameter of the loss function of the last iteration through a first-order optimization algorithm, and obtaining a trained optical neural network based on the updated parameter.

In some embodiments, deriving the derivative of the loss function for the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm comprises:

the derivative of the loss function with respect to the parameter is obtained based on a plurality of samples by a coordinate-wise gradient estimation algorithm.

In some embodiments, the first order optimization algorithm comprises a random gradient descent algorithm.

In some embodiments, obtaining the value of the parameter for the next iteration based on the derivative of the loss function and the parameter by a first order optimization algorithm comprises:

and obtaining the numerical value of the parameter of the next iteration through a random gradient descent algorithm based on the derivative of the loss function, the parameter and a preset learning rate.

In some embodiments, deriving the loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples, respectively, includes:

and calculating cross entropy loss functions among a plurality of output data and a plurality of labels respectively corresponding to the plurality of samples in the training data set, and taking the cross entropy loss functions as loss functions.

In some embodiments, inputting the plurality of samples in the training data set into the optical neural network for the first iteration with the parameter as the initial value comprises:

and extracting a part of data in the data set as a training data set, and inputting a plurality of samples in the training data set into the optical neural network with the initial value of the parameter of the first iteration.

In some embodiments, the method further comprises:

and extracting another part of data in the data set as a test data set, and testing the trained optical neural network by using the test data set.

In another aspect of the present invention, there is also provided an optical neural network training system, including:

the output data acquisition module is configured to input a plurality of samples in the training data set into the optical neural network with the parameter of the first iteration as an initial value so as to obtain a plurality of output data corresponding to the plurality of samples respectively;

a loss function obtaining module configured to obtain a loss function based on the plurality of output data and a plurality of labels in the training data set, the labels corresponding to the plurality of samples, respectively;

a derivative obtaining module configured to obtain a derivative of a loss function with respect to the parameter based on a plurality of samples by a zeroth order gradient estimation algorithm;

the parameter updating module is configured to obtain a numerical value of a parameter of the next iteration based on a derivative of the loss function and the parameter through a first-order optimization algorithm, and input a plurality of samples into the optical neural network with the parameter as the numerical value; and

and the training completion module is configured to respond to the current last round of iteration, obtain a parameter after updating through a first-order optimization algorithm based on the derivative of the loss function and the parameter of the last round of iteration, and obtain the optical neural network after training based on the parameter after updating.

In yet another aspect of the present invention, there is also provided a computer readable storage medium storing computer program instructions which, when executed by a processor, implement any one of the methods described above.

In yet another aspect of the present invention, a computer device is provided, which includes a memory and a processor, the memory storing a computer program, the computer program executing any one of the above methods when executed by the processor.

The invention has at least the following beneficial technical effects:

the optical neural network training method provided by the embodiment of the invention utilizes the characteristics of low delay, high bandwidth and high energy efficiency of an optical device, avoids a large amount of photoelectric conversion or complex derivation operation on the optical device, and realizes the training of the all-optical neural network; meanwhile, the influence of partial noise can be compensated by directly training the model based on the optical device, and the inference accuracy and robustness of the optical neural network are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic diagram of an optical neural network training method provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of an optical neural network training system provided in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of a computer-readable storage medium for implementing an optical neural network training method according to an embodiment of the present invention;

fig. 4 is a schematic hardware structure diagram of a computer device for executing the optical neural network training method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include all of the other steps or elements inherent in the list.

In view of the above-mentioned objectives, a first aspect of the embodiments of the present invention provides an embodiment of an optical neural network training method. Fig. 1 is a schematic diagram illustrating an embodiment of an optical neural network training method provided by the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:

step S10, inputting a plurality of samples in the training data set into the optical neural network with the parameter of the first iteration as an initial value to obtain a plurality of output data corresponding to the plurality of samples respectively;

step S20, obtaining a loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples, respectively;

step S30, obtaining a derivative of a loss function related to the parameter based on a plurality of samples through a zeroth order gradient estimation algorithm;

step S40, obtaining the numerical value of the parameter of the next iteration through a first-order optimization algorithm based on the derivative of the loss function and the parameter, and inputting a plurality of samples into the optical neural network with the parameter as the numerical value;

and step S50, responding to the current last round of iteration, obtaining a parameter of updating end through a first-order optimization algorithm based on the derivative of the loss function and the parameter of the last round of iteration, and obtaining a trained optical neural network based on the parameter of updating end.

In the embodiment, for noise introduced in practical application of an optical device, one scheme is to model and compensate the noise, but the existing compensation method has high redundancy and is not suitable for a large-scale MZI (Mach-Zehnder interferometer) array, and the other scheme is to compensate in a self-reconf manner, that is, model training is directly performed based on the optical device, original model parameters are mapped to parameters of the MZI array, training and inference are directly performed on a device with noise, and the influence caused by part of the noise can be cancelled. Therefore, the ONN training is directly carried out on the optical device by considering the characteristics of low delay, high bandwidth and high energy efficiency of the optical device, namely, the gradient descent optimization process is realized based on the optical device, and the ultrafast photonic chip is fully utilized to accelerate the training process. Since the parameterization process is differentiable and according to the chain rule, there are:

at ONN, the training to represent all programmable MZI phases as Φ, ONN is to constantly update Φ based on the following gradient descent equation, thereby minimizing the loss function. Wherein a is a learning rate, and a is,

is the first derivative of the loss function with respect to the parameter:

because the operation types which can be realized by the optical device are limited so far, the operation complexity of the first derivative with gradient reduction is extremely high and is difficult to realize on the optical device, if the derivative is calculated based on the traditional electric signal, model parameter updating is carried out on the optical device, and a large amount of photoelectric conversion is needed, the zero-order optimization (ZOO) operation is proposed to replace the first derivative operation, and the training of the all-optical neural network is completed based on the operation which can be realized on the optical device by sampling, matrix multiplication, summation and the like.

Zeroth-order optimization (ZOO, Zeroth-order optimization) is a series of optimization methods derived for model training when the relation between the loss function and the trainable parameters is unknown (e.g., reinforcement learning and black-box attack on DNN), Zeroth-order optimization (ZOO) only uses the output of the neural network to perform gradient-based parameter updating, it can handle higher-dimensional problems than the traditional methods (e.g., bayesian optimization), and can be integrated with the most advanced first-order optimization algorithm, the true gradient in the first-order optimization algorithm is replaced by the Zeroth-order gradient, and the common ZOO algorithm includes the following:

1) stochastic gradient estimation

Where L is the loss function of the optical neural network, Φ is the model parameter,

is the zeroth approximation of the first derivative, d is the number of optimization variables, S is the batch size, μ>0 is the smoothing parameter and u is randomly sampled from a uniform distribution on the unit sphere. The training samples corresponding to one batch share a set of u.

2) Mean random gradient estimation

And a sampling factor q is introduced, each parameter update is based on the average value of q sampling, and compared with random gradient estimation, q times of sampling is needed, the complexity of single iteration is higher, but the variance of gradient estimation is smaller, the convergence speed is accelerated, and the final convergence precision is more accurate.

If the back propagation algorithm is used for training directly on the optical device, a large number of matrix multiplication operations are needed, especially when the model size is large. The zeroth order optimization method avoids derivation operation in the traditional high-overhead back propagation algorithm, and realizes estimation of a first order derivative only based on operations such as sampling, function query, simple matrix multiplication and the like. The parameter updating of the optical neural network can be realized by combining any zero-order gradient estimation algorithm with a first-order optimization algorithm (such as GD and SGD).

In some embodiments, deriving the derivative of the loss function for the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm comprises: the derivative of the loss function with respect to the parameter is obtained based on a plurality of samples by a coordinate-wise gradient estimation algorithm.

In some embodiments, obtaining the value of the parameter for the next iteration based on the derivative of the loss function and the parameter by a first order optimization algorithm comprises: and obtaining the numerical value of the parameter of the next iteration through a random gradient descent algorithm based on the derivative of the loss function, the parameter and a preset learning rate.

In some embodiments, deriving the loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples, respectively, includes: and calculating cross entropy loss functions among a plurality of output data and a plurality of labels respectively corresponding to the plurality of samples in the training data set, and taking the cross entropy loss functions as loss functions.

Specifically, the most commonly used first-order optimization algorithm in the training of the neural network is Stochastic Gradient Descent (SGD), each iteration updates a batch-based random sample, and the Gradient value estimated based on the SGD is an unbiased estimation of a true Gradient value, but because the variance of a single estimation of the ZOO algorithm is large, the estimation variance of the SGD algorithm itself is superimposed, and the accuracy of the solution that is finally converged needs to be improved.

The zeroth order gradient estimation algorithm and the SGD are combined to obtain a RandGradEst-SGD algorithm and an AvgGradEst-SGD algorithm, and because the convergence process becomes slow and the final convergence accuracy is not high due to the high variance of the estimation, the coordinate-by-coordinate gradient estimation algorithm and the SGD are combined and used in the training of ONN to reduce the estimation variance and improve the final convergence accuracy. The coordinate-by-coordinate gradient estimation algorithm is as follows:

where d is the number of optimization variables, n is the number of batch samples, e_lIs a standard basis vector with the first coordinate of 1 and the other coordinates of 0, mu_l>0 is the coordinate-by-coordinate smoothing parameter. The following compares the squared error of the estimated gradient and the true gradient of different zeroth order gradient estimation algorithms, assuming that the true gradient is

The estimated gradient is

Mean square error

The method can obtain the estimation error of the coordinate-by-coordinate gradient estimation algorithm (CoordGradEst) which is smaller than the estimation error of the other two zeroth order gradient estimation algorithms, and the zeroth order estimation algorithm is combined with the SGD to obtain the parameter updating steps as follows:

randomly selecting n samples in the training data set and corresponding labels as { x_i,y_i}，x_iAnd y_iIs the sample and label selected for the ith iteration;

model parameter phi based on current optical neural network⁽ⁱ⁾(wherein the initial value of the parameter at the first iteration is Φ⁽⁰⁾) And training sample x_iComputing model output

Computing model output

And a label y_iCross entropy loss function between, noted L (x)_i；Φ⁽ⁱ⁾)；

Calculating derivatives of the loss function with respect to the optical neural network model parameters over n samples based on the following coordinate-wise gradient estimation algorithm

Updating the model parameters according to a stochastic gradient descent algorithm, i.e.

The iteration serial number i is i + 1;

the above steps are subjected to loop iteration T, i is 1,2, …, T, because the parameter used in the T-th iteration is phi^(T-1)And after the T-th iteration is finished, calculating the parameter phi according to a random gradient descent algorithm^(T)And (4) the final optical neural network model parameter is the parameter of updating end.

In some embodiments, inputting the plurality of samples in the training data set into the optical neural network for the first iteration with the parameter as the initial value comprises: and extracting a part of data in the data set as a training data set, and inputting a plurality of samples in the training data set into the optical neural network with the initial value of the parameter of the first iteration.

In some embodiments, the method further comprises: and extracting another part of data in the data set as a test data set, and testing the trained optical neural network by using the test data set.

In the training of the optical Neural Network, a general and comprehensive (including convolution, pooling, nonlinear and other operations), mature ANN (Artificial Neural Network) Network model with moderate complexity and a public general data set are required to be selected to complete related work, so that the LeNet and MNIST recognized in the industry are selected. The MNIST dataset is a classical small image classification dataset, organized by the National Institute of Standards and Technology (NIST), and statistically derived from 250 different handwritten digital pictures, 50% of which are high school students and 50% of which are from the staff of the census bureau of population. The purpose of this data set collection is to hopefully, through algorithms, achieve the recognition of handwritten numbers. The MNIST comprises 70000 handwritten digital pictures, each picture is composed of 28x 28 pixel points, and each pixel point is represented by a gray value. Wherein 60000 samples are used as a training data set, and 10000 samples are used as a testing data set. Each sample has its corresponding label expressed in a single decimal number corresponding to the category of the picture. The data set is widely used in the field of machine learning and deep learning to test the effect of algorithms, such as Linear Classifiers (Linear Classifiers), K-Nearest Neighbors (K-Nearest Neighbors), Support Vector Machines (SVMs), Neural networks (Neural Nets), Convolutional Neural networks (Convolutional Nets), and so on. Other data sets may be used in addition to the above.

In a second aspect of the embodiments of the present invention, an optical neural network training system is further provided. Fig. 2 is a schematic diagram of an embodiment of an optical neural network training system provided by the present invention. As shown in fig. 2, an optical neural network training system includes: an output data obtaining module 10, configured to input a plurality of samples in the training data set into an optical neural network with a parameter of the first iteration as an initial value, so as to obtain a plurality of output data corresponding to the plurality of samples respectively; a loss function obtaining module 20 configured to obtain a loss function based on the plurality of output data and a plurality of labels in the training data set, the labels corresponding to the plurality of samples, respectively; a derivative obtaining module 30 configured to obtain a derivative of the loss function with respect to the parameter based on a plurality of samples by a zeroth order gradient estimation algorithm; a parameter updating module 40 configured to obtain a value of a parameter of a next iteration based on a derivative of the loss function and the parameter by a first-order optimization algorithm, and input a plurality of samples into the optical neural network having the parameter as the value; and a training completion module 50 configured to obtain, in response to the last iteration being reached, an updated parameter based on the derivative of the loss function and the parameter of the last iteration through a first-order optimization algorithm, and obtain a trained optical neural network based on the updated parameter.

The optical neural network training system of the embodiment of the invention utilizes the characteristics of low delay, high bandwidth and high energy efficiency of an optical device, avoids a large amount of photoelectric conversion or complex derivation operation on the optical device, and realizes the training of an all-optical neural network; meanwhile, the influence of partial noise can be compensated by directly training the model based on the optical device, and the inference accuracy and robustness of the optical neural network are improved.

In a third aspect of the embodiment of the present invention, a computer-readable storage medium is further provided, and fig. 3 is a schematic diagram of a computer-readable storage medium for implementing an optical neural network training method according to an embodiment of the present invention. As shown in fig. 3, the computer-readable storage medium 3 stores computer program instructions 31. The computer program instructions 31, when executed by a processor, implement the method of any of the embodiments described above.

It is to be understood that all embodiments, features and advantages set forth above with respect to the optical neural network training method according to the present invention apply equally, without conflict therewith, to the optical neural network training system and to the storage medium according to the present invention.

In a fourth aspect of the embodiments of the present invention, there is further provided a computer device, including a memory 402 and a processor 401, where the memory stores a computer program, and the computer program, when executed by the processor, implements the method of any one of the above embodiments.

Fig. 4 is a schematic hardware structure diagram of an embodiment of a computer device for executing the optical neural network training method according to the present invention. Taking the computer device shown in fig. 4 as an example, the computer device includes a processor 401 and a memory 402, and may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus. The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the optical neural network training system. The output device 404 may include a display device such as a display screen.

The memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the optical neural network training method in the embodiments of the present application. The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the optical neural network training method, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to local modules via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor 401 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 402, so as to implement the optical neural network training method of the above method embodiment.

Finally, it should be noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. An optical neural network training method is characterized by comprising the following steps:

obtaining a loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples, respectively;

obtaining a derivative of the loss function with respect to the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm;

obtaining a numerical value of a parameter of the next iteration based on the derivative of the loss function and the parameter through a first-order optimization algorithm, and inputting the plurality of samples into an optical neural network with the parameter as the numerical value;

and in response to the current last round of iteration, obtaining a parameter of updating end through the first-order optimization algorithm based on the derivative and the parameter of the loss function of the last round of iteration, and obtaining a trained optical neural network based on the parameter of updating end.

2. The method of claim 1, wherein obtaining a derivative of the loss function for the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm comprises:

deriving a derivative of the loss function with respect to the parameter based on the plurality of samples by a coordinate-wise gradient estimation algorithm.

3. The method of claim 1, wherein the first order optimization algorithm comprises a stochastic gradient descent algorithm.

4. The method of claim 3, wherein obtaining the value of the parameter for the next iteration based on the derivative of the loss function and the parameter by a first order optimization algorithm comprises:

and obtaining the numerical value of the parameter of the next iteration through the random gradient descent algorithm based on the derivative of the loss function, the parameter and a preset learning rate.

5. The method of claim 1, wherein deriving a loss function based on the plurality of output data and a plurality of labels in the training data set corresponding to the plurality of samples, respectively, comprises:

and calculating cross entropy loss functions among the output data and the labels corresponding to the samples in the training data set, and taking the cross entropy loss functions as the loss functions.

6. The method of claim 1, wherein inputting the plurality of samples in the training data set into the optical neural network for the first iteration with the parameters at the initial values comprises:

7. The method of claim 6, further comprising:

8. An optical neural network training system, comprising:

a derivative obtaining module configured to obtain a derivative of the loss function with respect to the parameter based on the plurality of samples by a zeroth order gradient estimation algorithm;

the parameter updating module is configured to obtain a numerical value of a parameter of a next iteration based on the derivative of the loss function and the parameter through a first-order optimization algorithm, and input the plurality of samples into an optical neural network with the parameter as the numerical value; and

and the training completion module is configured to respond to the current last round of iteration, obtain a parameter of updating completion through the first-order optimization algorithm based on the derivative and the parameter of the loss function of the last round of iteration, and obtain a trained optical neural network based on the parameter of updating completion.

9. A computer-readable storage medium, characterized in that computer program instructions are stored which, when executed by a processor, implement the method according to any one of claims 1-7.

10. A computer device comprising a memory and a processor, characterized in that the memory has stored therein a computer program which, when executed by the processor, performs the method according to any one of claims 1-7.