CN117891023A - Photonic chip, heterogeneous computing system, precision adjusting method and product - Google Patents
Photonic chip, heterogeneous computing system, precision adjusting method and product Download PDFInfo
- Publication number
- CN117891023A CN117891023A CN202410295481.7A CN202410295481A CN117891023A CN 117891023 A CN117891023 A CN 117891023A CN 202410295481 A CN202410295481 A CN 202410295481A CN 117891023 A CN117891023 A CN 117891023A
- Authority
- CN
- China
- Prior art keywords
- calculation
- chip
- computing
- parameter set
- digital
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000004364 calculation method Methods 0.000 claims abstract description 251
- 238000013139 quantization Methods 0.000 claims abstract description 249
- 230000003287 optical effect Effects 0.000 claims abstract description 79
- 238000013528 artificial neural network Methods 0.000 claims description 135
- 238000004422 calculation algorithm Methods 0.000 claims description 77
- 238000011156 evaluation Methods 0.000 claims description 46
- 238000005457 optimization Methods 0.000 claims description 42
- 238000003860 storage Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 16
- 230000006835 compression Effects 0.000 claims description 14
- 238000007906 compression Methods 0.000 claims description 14
- 230000002068 genetic effect Effects 0.000 claims description 13
- 239000002245 particle Substances 0.000 claims description 13
- 230000002787 reinforcement Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 17
- 229920005994 diacetyl cellulose Polymers 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 10
- 229910052710 silicon Inorganic materials 0.000 description 10
- 239000010703 silicon Substances 0.000 description 10
- 230000004913 activation Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 239000000758 substrate Substances 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/067—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using optical means
- G06N3/0675—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using optical means using electro-optical, acousto-optical or opto-electronic means
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B6/00—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings
- G02B6/10—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings of the optical waveguide type
- G02B6/12—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings of the optical waveguide type of the integrated circuit kind
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B6/00—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings
- G02B6/10—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings of the optical waveguide type
- G02B6/12—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings of the optical waveguide type of the integrated circuit kind
- G02B2006/12133—Functions
- G02B2006/12142—Modulator
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B6/00—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings
- G02B6/10—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings of the optical waveguide type
- G02B6/12—Light guides; Structural details of arrangements comprising light guides and other optical elements, e.g. couplings of the optical waveguide type of the integrated circuit kind
- G02B2006/12133—Functions
- G02B2006/12147—Coupler
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Optics & Photonics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Neurology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Optical Modulation, Optical Deflection, Nonlinear Optics, Optical Demodulation, Optical Logic Elements (AREA)
Abstract
The application relates to the technical field of artificial intelligence, and discloses a photonic chip, a heterogeneous computing system, an accuracy adjusting method and a product. Wherein, the photon chip comprises a calculating unit, and the calculating unit at least comprises: the first coupler is used for branching the received laser signals; the two modulators are respectively used for being connected with an external digital-to-analog converter, receiving target parameters after the digital-to-analog converter carries out quantization with adjustable precision, encoding an optical signal, and generating input data and weight of a current network layer; a phase shifter for changing the phase to generate positive and negative codes; and the balance detector performs linear calculation to generate a first calculation result, converts the first calculation result into photocurrent, and transmits the photocurrent to the analog-to-digital converter for quantization with adjustable precision. The photon chip provided by the application can ensure the calculation accuracy and improve the calculation efficiency of the chip.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a photonic chip, a heterogeneous computing system, an accuracy adjusting method and a product.
Background
The neural network is an important calculation model in the field of artificial intelligence, is widely applied to scenes such as image processing, voice recognition, natural language processing and the like in recent years, and promotes the intelligent development of society. Chip technologies such as CPU (Central Processing Unit ), GPU (Graphics Processing Unit, graphics processor), TPU (Tensor Processing Unit ), etc. provide powerful computational support for neural network models. However, as moore's law evolves and slows, von neumann architecture bottlenecks stand out and chip power consumption increases gradually. In order to improve the calculation efficiency of the chip and reduce the power consumption of the chip, an improvement mode of quantization precision is provided.
Common neural networks include fully connected neural networks, convolutional neural networks, recurrent neural networks, and the like. Neural networks are composed of a large number of artificial neurons, including linear and nonlinear computations. In the linear calculation part, multiply-accumulate calculation is carried out on the input data of the neural network and the weight; in the nonlinear calculation section, nonlinear transformation is performed on the result of the linear calculation using a nonlinear function. In general, for the precision quantization of the neural network, in the calculation process, the weight and the activation value are converted from the floating point number to the low-precision representation, and the quantization of the weight and the activation value can reduce the storage capacity and the calculation amount of the neural network, so that the data processing efficiency is improved. But the quantization accuracy may lead to a decrease in the accuracy of the neural network, since lower accuracy may lead to rounding errors, reducing the accuracy of the calculation result. Therefore, how to improve the calculation efficiency of the chip while ensuring high-accuracy calculation is a problem to be solved.
Disclosure of Invention
In view of this, the present application aims to propose a photonic chip, a heterogeneous computing system, a precision adjustment method and a product, so as to improve the computing efficiency of the chip while maintaining the high accuracy of the neural network computation.
In order to achieve the above purpose, the technical scheme of the application is as follows:
The first aspect of the embodiment of the application provides a photonic chip for executing linear computation in a neural network computation task, which comprises a computation unit; the computing unit at least comprises:
the first coupler is used for splitting the received laser signal into a first optical signal and a second optical signal;
The two modulators are respectively used for being connected with an external digital-to-analog converter, receiving target parameters after the digital-to-analog converter performs quantization with adjustable precision, encoding the first optical signal and the second optical signal, and generating input data and weight of a current network layer; the target parameters are the calculation result of the last network layer and the weight of the current network layer;
a phase shifter for changing the phase of the first optical signal and the second optical signal to generate positive and negative codes;
the balance detector is used for being connected with an external analog-digital converter; performing linear computation based on the first optical signal and the second optical signal to generate a first computation result; and converting the first calculation result into photocurrent, and transmitting the photocurrent to the analog-to-digital converter for quantization with adjustable precision.
A second aspect of an embodiment of the present application provides a photonic chip for performing linear computation in a neural network computation task, including:
At least two calculation units in the photonic chip according to the first aspect of the embodiment of the present application, all calculation units are connected in parallel;
The third coupler is respectively connected with all the computing units; the third coupler is used for branching and transmitting the received laser signals to each computing unit.
A third aspect of the embodiments of the present application provides a heterogeneous computing system for performing neural network computing tasks; the system comprises: an optical module and an electrical module;
The optical module includes: a laser for providing a continuous optical signal or a pulsed optical signal; the photonic chip according to the first aspect of the embodiment of the present application is configured to perform linear computation in the neural network computation task, and generate a first computation result;
the electrical module includes:
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first control chip is used for setting the digital-to-analog converter and the quantization bit width of the analog-to-digital converter in the system; controlling the computing unit and the first computing chip to execute the neural network computing task;
The first memory chip is used for storing the weight, the first calculation result and the second calculation result;
The analog-to-digital converter is used for quantizing the first calculation result and transmitting the first calculation result to the first calculation chip;
The digital-to-analog converter is used for quantizing the second calculation result of the last network layer and the weight of the current network layer, and transmitting the second calculation result and the weight of the current network layer to the two modulators of the calculation unit respectively.
Optionally, the heterogeneous computing system includes: one of the analog-to-digital converters and two of the digital-to-analog converters; any one of the digital-to-analog converters is used for acquiring a second calculation result of the last network layer from the first memory chip, and the other digital-to-analog converter is used for acquiring the weight of the current network layer from the first memory chip.
A fourth aspect of the embodiment of the present application provides a heterogeneous computing system for performing a neural network computing task; the system comprises: an optical module and an electrical module;
The optical module includes: a laser for providing a continuous optical signal or a pulsed optical signal; the photonic chip according to the second aspect of the embodiment of the present application is configured to perform linear computation in the neural network computation task, and generate a first computation result;
the electrical module includes:
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first control chip is used for setting the digital-to-analog converter and the quantization bit width of the analog-to-digital converter in the system; controlling the photon chip and the first computing chip to execute the neural network computing task;
The first memory chip is used for storing the weight, the first calculation result and the second calculation result;
The analog-to-digital converters are used for quantizing a first calculation result output by one calculation unit and transmitting the first calculation result to the first calculation chip;
And each digital-to-analog converter is used for quantizing the second calculation result of the last network layer corresponding to one calculation unit and the weight of the current network layer, and transmitting the second calculation result and the weight of the current network layer to two modulators of the calculation unit respectively.
Optionally, the number of digital-to-analog converters is twice the number of computing units in the photonic chip; each digital-to-analog converter is used for quantizing input data or weights received by one modulator;
The number of the analog-to-digital converters is equal to the number of the computing units in the photonic chip; each analog-to-digital converter is used for quantizing the first calculation result output by one calculation unit.
Optionally, the quantization bit widths of the digital-to-analog converter and the analog-to-digital converter are the same in adjustable range and do not exceed 1-16 bits.
Optionally, the electrical module includes two first memory chips, one of which is connected to all the analog-to-digital converters and is used for storing the first calculation result; the other is connected with all the digital-to-analog converters and is used for storing the weight and the second calculation result.
A fifth aspect of an embodiment of the present application provides a heterogeneous computing system for performing a neural network computing task, including:
The multiple cores are respectively used for executing calculation of different parts in the network layer; or, each kernel is used for executing calculation of different network layers respectively; wherein each core comprises:
the photonic chip according to the first aspect or the second aspect of the embodiments of the present application is configured to perform linear computation in the neural network computation task, and generate a first computation result;
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first memory chip is used for a first calculation result, a second calculation result and parameters used in the kernel;
The analog-to-digital converter is used for quantifying a first calculation result output by the photon chip and transmitting the first calculation result to the first storage chip; the number of the analog-to-digital converters is equal to the number of the computing units in the chip;
The digital-to-analog converter is used for quantifying a second calculation result in the upper network and the weight of the current network and transmitting the second calculation result and the weight of the current network to the photon chip; the number of the digital-to-analog converters is twice the number of the computing units in the chip;
the system further comprises:
a laser for providing a continuous optical signal or a pulsed optical signal to all cores;
the second memory chip is used for storing the ownership weight of the neural network computing task and the computing results of all kernels;
the second control chip is used for setting the digital-to-analog converter and the quantization bit width of the analog-to-digital converter in all cores; and controlling all the kernels to execute the neural network computing task.
Optionally, the digital-to-analog converter and the quantization bit width adjustable range of the analog-to-digital converter in the system include two kinds, wherein any one of the quantization bit width adjustable ranges does not exceed 1-16 bits.
Optionally, the number of cores with larger adjustable ranges of the quantization bit widths of the digital-to-analog converter and the analog-to-digital converter is smaller than the number of cores with smaller adjustable ranges of the quantization bit widths of the digital-to-analog converter and the analog-to-digital converter.
According to a sixth aspect of the embodiments of the present application, there is provided a precision adjustment method applied to a heterogeneous computing system as provided in the third aspect, the fourth aspect or the fifth aspect of the embodiments of the present application, including:
Randomly initializing a sample task and a corresponding quantization parameter set; the quantization parameter set includes: in the heterogeneous computing system, the quantization bit width of the digital-to-analog converter and the analog-to-digital converter corresponding to each computing unit and the quantization bit width of the first computing chip; the sample tasks are: the method comprises the steps of generating a neural network computing task for pre-training based on a target neural network computing task to be executed;
Executing the sample task by adopting the heterogeneous computing system, carrying out iterative updating on the quantization parameter set by an optimization algorithm, judging whether the current quantization parameter set meets the standard or not based on an evaluation index of the optimization algorithm, and stopping updating the quantization parameter set under the condition that the current quantization parameter set is judged to meet the standard; the optimization algorithm is any one or a combination of a plurality of the following algorithms: genetic algorithm, particle swarm algorithm, or deep reinforcement learning algorithm;
and determining the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed based on the standard quantization parameter set.
Optionally, the quantization parameter set is iteratively updated by an optimization algorithm, and whether the current quantization parameter set meets the standard is judged based on an evaluation index of the optimization algorithm, including:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after each round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
Under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, adopting a genetic algorithm to carry out screening, crossing and mutation operations on the quantization parameter set, and updating the quantization parameter set; re-executing the updated sample task by adopting the heterogeneous computing system;
And under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
Optionally, the quantization parameter set is iteratively updated by an optimization algorithm, and whether the current quantization parameter set meets the standard is judged based on an evaluation index of the optimization algorithm, including:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after each round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, and updating the quantization parameter set and the corresponding updating speed by adopting a particle swarm algorithm; re-executing the updated sample task by adopting the heterogeneous computing system;
And under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
Optionally, the precision adjustment method further includes:
And when the number of times of updating the quantization parameter set reaches a first threshold value, judging that the quantization parameter set closest to the evaluation index currently meets the standard, and stopping updating the quantization parameter set.
According to a seventh aspect of the embodiments of the present application, there is provided an accuracy adjustment device for implementing the accuracy adjustment method provided in the sixth aspect of the embodiments of the present application, the device including:
The initialization module is configured to randomly initialize a sample task and a corresponding quantization parameter set; the quantization parameter set includes: the digital-to-analog converter and the quantization bit width of the analog-to-digital converter corresponding to each calculation unit in the system, and the quantization bit width of the first calculation chip; the sample tasks are: the method comprises the steps of generating a neural network computing task for pre-training based on a target neural network computing task to be executed;
The adjustment module is configured to execute the sample task by adopting a heterogeneous computing system, iteratively update the quantization parameter set by an optimization algorithm, judge whether the current quantization parameter set meets the standard or not based on an evaluation index of the optimization algorithm, and stop updating the quantization parameter set under the condition that the current quantization parameter set is judged to meet the standard; the optimization algorithm is any one or a combination of a plurality of the following algorithms: genetic algorithm, particle swarm algorithm, or deep reinforcement learning algorithm;
and the setting module is configured to determine the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed based on the quantized parameter set reaching standards.
According to an eighth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the precision adjustment method as provided by the sixth aspect of embodiments of the present application.
According to a ninth aspect of the embodiments of the present application, there is provided an electronic device including a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the accuracy adjustment method as provided in the sixth aspect of the embodiments of the present application when the computer program is executed by the processor.
By adopting the photon chip provided by the application, the branching of the laser signal is realized through the coupler, the modulator is used for encoding the laser signal, the two modulators are connected with the external digital-to-analog converter, the calculation result of the last network layer and the weight of the current network layer after the quantization of the external digital-to-analog converter are received, the calculation result of the last network layer is used as the input data of the current network layer, the positive and negative codes are realized through the phase shifter, and finally the linear calculation of the current network layer is executed through the balance detector, so that the first calculation result is obtained. The first calculation result is sent to an external analog-to-digital converter for quantization. Wherein, the digital-to-analog converter and the analog-to-digital converter are both adjustable precision devices. The digital-to-analog converter and the analog-to-digital converter with adjustable precision are used for carrying out mixed precision quantization on input data, weight and output data of the photonic chip, so that the calculation precision of a plurality of parameters is flexibly controlled, the calculation accuracy is ensured, and meanwhile, the calculation amount and the storage amount in the neural network calculation process are reduced, and the calculation efficiency of the chip is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a photonic chip according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a photonic chip according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a heterogeneous computing system according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a heterogeneous computing system according to an embodiment of the present application;
FIG. 5 is a flow chart of a precision adjustment method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an accuracy adjustment device according to an embodiment of the present application;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present application, it should be understood that the sequence numbers of the following processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with some aspects as detailed herein.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
Conventional electronic chips face increasingly serious problems such as power consumption, and photonic chips have been studied by researchers as a calculation support in recent years. Due to the fact that photons are transmitted at the speed of light, the transmission of photons has extremely low delay, and the photonic device consumes little energy, so that the photonic chip has the advantages of being higher in speed and lower in power consumption. The silicon-based photon technology prepares the photon device and the photon chip on the silicon and silicon-based substrate materials, has the advantages of compatible CMOS technology, small size and the like, and the silicon-based photon chip used for the calculation of the neural network is mainly combined with the electronic chip to realize the calculation task of the neural network, so that the calculation force can be obviously improved, and the energy consumption can be reduced.
The application provides a photonic chip based on a silicon-based photonic technology, which is used for executing linear computation in a neural network computing task, and the photonic chip is connected with an external precision-adjustable quantization device DAC and an ADC, so that flexible adjustment of the quantization bit width of input data, output data and weight of a network layer in the computing process is realized, and the use of parameters and memory is reduced under the condition of ensuring the computing accuracy, thereby improving the computing efficiency.
The application will be described in detail below with reference to the drawings in connection with embodiments.
The photon chip provided by the embodiment is used for executing linear calculation in a neural network calculation task and comprises a calculation unit; the computing unit at least comprises:
the first coupler is used for splitting the received laser signal into a first optical signal and a second optical signal;
The two modulators are respectively used for connecting an external DAC, receiving target parameters after the DAC carries out quantization with adjustable precision, encoding the first optical signal and the second optical signal, and generating input data and weight of a current network layer; the target parameters are the calculation result of the last network layer and the weight of the current network layer;
a phase shifter for changing the phase of the first optical signal and the second optical signal to generate positive and negative codes;
The balance detector is used for being connected with an external ADC; performing linear computation based on the first optical signal and the second optical signal to generate a first computation result; and converting the first calculation result into photocurrent, and transmitting the photocurrent to the ADC for quantization with adjustable precision.
In this embodiment, a silicon-based photonic chip based on a single computing unit is provided for performing linear computation in a neural network computing task. Fig. 1 is a schematic diagram of a photonic chip according to an embodiment of the present application. As shown in fig. 1, the silicon-based photonic chip includes a computing unit including: a1 x 2 coupler (first coupler), a modulator, a phase shifter, a 2 x 2 coupler (second coupler) and a balanced detector. In practical applications, the first coupler and the second coupler may be multimode interference couplers or directional couplers. In the silicon-based photon chip, the modulator can be a Mach-Zehnder modulator or a micro-ring modulator, and the phase shifter can be a thermo-optic phase shifter or an electro-optic phase shifter.
Two modulators respectively perform amplitude encoding on two beams of light, one beam of light represents input data x and the other beam of light represents weight parameters w. After amplitude encoding by the modulator, the optical signal E x for representing the input data is:
The optical signal E w for representing the weight is:
Wherein and/> respectively represent the amplitudes of the two optical signals,/> 、 is the phase of the two optical signals, ω is the frequency of the light, e is a natural constant, i is an imaginary unit, and t is time. The two paths of optical signals are separated into two paths after being interfered by the second coupler, and are detected by the two balance detectors respectively. The output current/> of the balance detector is:
Wherein 、 is the photocurrent after conversion of the two optical signals, and/( is the phase difference of the two optical signals. From the above expression, the output current/> of the balanced detector is proportional to the product of the amplitudes of the two optical signals. In this embodiment, the amplitudes of the two optical signals are encoded to represent the input data and the weights of the network layer in the neural network, respectively, so that the output current/> of the balance detector is proportional to the product of the weight w and the input data x, so that the linear calculation of the network layer in the neural network is realized through the balance detector, and the calculation result is converted into the photocurrent.
In this embodiment, two optical signals pass through two phase shifters, where the first phase shifter is used to change the phase of the optical signal, and by setting 、 to 0 or pi, positive and negative coding and calculation of input data and weights can be implemented. A second phase shifter is disposed after the first phase shifter for phase compensating the optical signal.
In this embodiment, the modulator is connected to an external DAC (Digital-to-Analog Converter), receives the calculation result of the previous network layer and the weight of the current network layer to be calculated, uses the calculation result of the previous network layer as the input data of the current network layer to be calculated, and performs amplitude encoding on the optical signal based on the input data and the weight. The purpose of accurately adjusting the input data and the weight of the network layer is achieved through the DAC with the preset quantization bit width. The balance detector is connected to an external ADC (analog-to-Digital Converter ), and the output first calculation result is transmitted to an ADC with a preset quantization bit width in the form of photocurrent, and the result of linear calculation (i.e., the first result) in the network layer is quantized by the ADC.
In this embodiment, the photonic chip receives the target parameters input from the outside through the modulator, outputs the first calculation result through the balance detector, and realizes more accurate precision control on the input data, the weight and the output data in the network layer by connecting the external ADC and the DAC with adjustable precision, thereby improving the calculation efficiency while maintaining high accuracy.
Based on the same inventive concept, an embodiment of the present application provides a photonic chip for performing linear computation in a neural network computation task, including:
At least two computing units in the photonic chip described in the above embodiments, all of the computing units being connected in parallel;
The third coupler is respectively connected with all the computing units; the third coupler is used for branching and transmitting the received laser signals to each computing unit.
Fig. 2 is a schematic diagram of a photonic chip according to an embodiment of the present application. As shown in fig. 2, in this embodiment, the photonic chip is a silicon-based photonic chip including a plurality of computing units, where the computing units are in parallel connection, so that parallel computation of the neural network can be implemented, specifically, each computing unit performs product computation of different input data and corresponding weights in the neural network, and outputs respective first computation results.
In this embodiment, the laser signal sent by the laser is split by the third coupler (1×n coupler), and the laser signal is split into N paths according to the number N of calculation units, and is input to each calculation unit. Alternatively, the third coupler may employ a multimode interference coupler or a directional coupler. By adopting the photon chip with the parallel architecture of a plurality of computing units, the parallel processing of a plurality of linear computations in the neural network is realized, and the computing efficiency is further improved.
Based on the same inventive concept, an embodiment of the present application provides a heterogeneous computing system for performing neural network computing tasks; the system comprises: an optical module and an electrical module;
The optical module includes: a laser for providing a continuous optical signal or a pulsed optical signal; the photonic chip in the above embodiment is configured to perform linear computation in the neural network computation task to generate a first computation result; the photon chip comprises a calculation unit;
the electrical module includes:
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first control chip is used for setting the quantization bit widths of the DAC and the ADC in the system; controlling the computing unit and the first computing chip to execute the neural network computing task;
The first memory chip is used for storing the weight, the first calculation result and the second calculation result;
The ADC is used for quantifying the first calculation result and transmitting the first calculation result to the first calculation chip;
and the DAC is used for quantizing the second calculation result of the last network layer and the weight of the current network layer and transmitting the second calculation result and the weight of the current network layer to two modulators of the calculation unit respectively.
FIG. 3 is a schematic diagram illustrating a heterogeneous computing system according to an embodiment of the present application. As shown in fig. 3, in the present embodiment, the heterogeneous computing system includes an optical module and an electrical module. The optical module includes: the laser provides continuous optical signals or pulse optical signals for the photonic chip, and the photonic chip is used for performing linear calculation of the neural network, and in the embodiment, the photonic chip comprises a calculation unit.
The electrical module comprises the following electronic components: the device comprises a DAC, an ADC, a first computing chip, a first control chip and a first storage chip. The DAC and the ADC are used for realizing the mutual conversion of the digital signal and the analog signal. In this embodiment, the first control chip flexibly sets the quantization bit width of the DAC and the ADC within an adjustable range, so as to implement calculation of each parameter in the neural network according to the required accuracy.
The DAC is connected with two modulators of the photon chip, and controls the optical signals output by the two modulators in the photon chip to realize the quantization of input data and weight. The ADC is connected to a balance detector in the photon chip, and obtains a linear calculation result (namely a first calculation result) output by the balance detector, so that output data is quantized.
The first calculation chip realizes the summation calculation of the neural network in the electric domain and the nonlinear function (activation function) calculation, and the obtained second calculation result (and the calculation result of the network layer) is quantized by the DAC and then is transmitted to the photon chip as the input data in the next network layer.
In this embodiment, the first control chip further performs overall control on the photoelectric architecture, and controls the photonic chip and the first computing chip to perform a neural network computing task.
The first memory chip stores a first calculation result, a second calculation result and a weight of the neural network.
Optionally, the DAC and the ADC are both in communication with the first memory chip, the DAC reads the weight and the second calculation result from the DAC, and the ADC writes the first calculation result into the first memory chip. And caching data in the calculation process through the storage chip, so that the system stability is improved.
As one embodiment of the present application, the heterogeneous computing system includes:
One ADC and two DACs; any DAC is used for acquiring a second calculation result of the last network layer from the first memory chip, and the other DAC is used for acquiring the weight of the current network layer from the first memory chip.
In the above embodiment, one ADC and two DACs are included, each DAC being connected to two modulators of the computation unit, respectively. One DAC acquires a second calculation result of the last network layer from the first memory chip, quantizes the second calculation result and then transmits the quantized second calculation result to one modulator to serve as input data of the current network layer; the other DAC acquires the weight of the current network layer from the first memory chip, quantizes the weight and then transmits the quantized weight to the other modulator to serve as the weight of the current network layer. The separate quantization of input data, weight and output data in the network layer is realized through one ADC and two DACs, and the calculation precision of each parameter is respectively controlled by the corresponding DAC or ADC, so that more flexible precision adjustment is realized.
Based on the same inventive concept, an embodiment of the present application provides a heterogeneous computing system for performing neural network computing tasks; the system comprises: an optical module and an electrical module;
the optical module includes: a laser for providing a continuous optical signal or a pulsed optical signal; the photonic chip in the above embodiment is configured to perform linear computation in the neural network computation task to generate a first computation result; the photonic chip comprises at least two computing units;
the electrical module includes:
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
the first control chip is used for setting the quantization bit widths of the DAC and the ADC in the system; controlling the photon chip and the first computing chip to execute the neural network computing task;
The first memory chip is used for storing the weight, the first calculation result and the second calculation result;
A plurality of ADCs, wherein each ADC is configured to quantize a first calculation result output by one calculation unit and transmit the first calculation result to the first calculation chip;
And each DAC is used for quantizing the second calculation result of the last network layer corresponding to one calculation unit and the weight of the current network layer, and transmitting the second calculation result and the weight of the current network layer to two modulators of the calculation unit respectively.
In one embodiment, a heterogeneous computing system includes an optical module and an electrical module. The optical module includes: the laser provides continuous optical signals or pulse optical signals for the photonic chip, and the photonic chip is used for performing linear calculation of the neural network.
In this embodiment, the electrical module includes a plurality of DACs and ADCs, and one DAC and one ADC corresponding to each computing unit are respectively used for quantizing the input data, the weight and the output data of each computing unit, so as to support a usage scenario of parallel computation of the plurality of computing units. And two modulators which are simultaneously connected to the computing unit by one DAC are adopted to realize unified quantization of input data and weight, and output data is quantized by one ADC, so that the number of adjustable precision devices in the system is reduced, and the hardware cost is saved.
As one embodiment of the present application, the number of DACs is twice the number of calculation units in the photonic chip; each DAC is configured to quantize input data or weights received by one modulator;
the number of the ADCs is equal to the number of the computing units in the photonic chip; each ADC is configured to quantize a first calculation result output from one calculation unit.
In this embodiment, each computing unit corresponds to two DACs and one ADC, and the two DACs are respectively connected to two modulators of the computing unit, and quantize input data and weights respectively.
As one implementation of the application, the quantization bit widths of the DAC and the ADC are the same in adjustable range and do not exceed 1-16 bits.
In the above embodiment, the adjustable range of the quantization bit widths of all DACs and ADCs in the heterogeneous computing system is kept consistent, and may be one quantization bit width not exceeding 1-16 bits. For example, the quantization bit width of all DACs and ADCs in the system can be adjustable over 1-8 bits.
As one embodiment of the present application, the electrical module includes two first memory chips, one of which is connected to all ADCs and is used for storing the first calculation result; and the other is connected with all the DACs and used for storing the weight and the second calculation result.
In this embodiment, the electrical module may include two first memory chips, which are respectively configured to store the first calculation result output by the photonic chip, and the second calculation result and the weight output by the electronic calculation chip. And the data of the optical module and the electrical module are stored separately through the two storage chips, so that the throughput of data transmission between the photon chip and the first computing chip is improved, and the efficiency of the computing system is further improved.
Optionally, in practical application, a plurality of first memory chips can be deployed in the electrical module according to the requirement of the data storage amount, so as to improve the efficiency of parallel data transmission and further improve the performance of the system.
Based on the same inventive concept, an embodiment of the present application provides a heterogeneous computing system for performing a neural network computing task, including:
The multiple cores are respectively used for executing calculation of different parts in the network layer; or, each kernel is used for executing calculation of different network layers respectively; wherein each core comprises:
the photonic chip in the above embodiment is configured to perform linear computation in the neural network computation task to generate a first computation result;
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first memory chip is used for a first calculation result, a second calculation result and parameters used in the kernel;
the ADC is used for quantifying a first calculation result output by the photon chip and transmitting the first calculation result to the first storage chip; the number of the ADCs is equal to the number of the computing units in the chip;
The DAC is used for quantifying a second calculation result in the upper network and the weight of the current network and transmitting the second calculation result and the weight of the current network to the photonic chip; the number of DACs is twice the number of computing units in the chip;
the system further comprises:
a laser for providing a continuous optical signal or a pulsed optical signal to all cores;
the second memory chip is used for storing the ownership weight of the neural network computing task and the computing results of all kernels;
the second control chip is used for setting the quantization bit widths of the DAC and the ADC in all the kernels; and controlling all the kernels to execute the neural network computing task.
FIG. 4 is a schematic diagram illustrating a heterogeneous computing system according to an embodiment of the present application. As shown in fig. 4, the heterogeneous computing system in this embodiment includes a plurality of cores, each of which includes the photonic chip, the first computing chip, the first memory chip, the DAC with adjustable precision, and the ADC disclosed in the above embodiment. The first memory chip is used for storing parameters such as weight, input data, calculation results and the like required by the kernel in calculation.
The system also comprises a laser, a second control chip and a second storage chip, wherein the second control chip is used for setting the quantization bit widths of the DAC and the ADC in all the cores, and the second storage chip is used for storing the calculation result of each core and parameters such as the weight of the neural network.
When the neural network computing task is executed, a part of computing tasks are distributed to each core through a second control chip according to actual requirements, and each core is controlled to execute the corresponding computing task. Under the condition that the calculation amount of calculation tasks is large, controlling each kernel to execute calculation of different parts in a network layer; in the case of a smaller calculation amount of calculation tasks, each core is controlled to independently process the calculation of one network layer.
In this embodiment, by integrating multiple cores in the system, the computing data amount of the large-scale neural network can be handled, the computing tasks are processed in parallel by the multiple cores, the performance of the system is improved, and the quantization bit widths of all DACs and ADCs are set by the second control chip, so that the input data, the weight and the output data of each computing unit in each core are individually quantized, the parameter precision control with high flexibility is realized, the computing amount and the memory amount are reduced as much as possible under the condition of keeping high accuracy, and the performance of the system is improved.
In one embodiment, the photonic and electronic chips in a multi-core architecture may be fabricated in different CMOS process nodes, integrated in 2.5 or 3-dimensional fashion. In the 2.5-dimensional integration mode, the photonic chip and the electronic chip are integrated on the adapter plate in a flip-chip bonding mode, and signal connection is achieved through metal wiring on the adapter plate. In the 3-dimensional integration mode, the photonic chip is used as an adapter plate, and is stacked with the electronic chip and the packaging substrate in the vertical direction, the electronic chip is integrated on the upper part of the photonic chip in a flip-chip bonding mode, and the photonic chip is connected with the packaging substrate at the bottom in a through silicon hole or wire bonding mode.
As one embodiment of the application, the adjustable range of the quantization bit width of the DAC and the ADC in the system comprises two types, wherein the adjustable range of any quantization bit width is not more than 1-16 bits.
In one embodiment, two tunable ranges of quantization bit widths are included in a heterogeneous computing system of a multi-core architecture. The adjustable ranges of the quantization bit widths of the DAC and the ADC between different kernels can be the same or different; the adjustable ranges of the quantization bit widths of the DAC and the ADC in the same kernel can be the same or different, and the adjustable range of any quantization bit width does not exceed 1-16 bits. For example, the system includes two quantization bit width adjustable ranges of 1-8bit and 1-16 bit.
In this embodiment, the two kinds of quantization bit width adjustable ranges are set to cope with the calculation tasks with different precision requirements.
As one embodiment of the present application, the quantization bit width of the DAC and ADC has a larger number of cores than the DAC and ADC have a smaller number of cores.
In this embodiment, DACs and ADCs with different quantization bit width adjustable ranges are set based on different requirements of different layers of the neural network for quantization bit widths. For example, a photonic convolutional neural network comprises: the two-photon full-connection device comprises 2 photon convolution layers, 2 pooling layers and 3 photon full-connection layers, wherein the first photon convolution layer and the last photon full-connection layer have higher requirements on precision, larger quantization bit widths are required to be set to ensure calculation accuracy, the middle layer has relatively lower requirements on precision, and smaller quantization bit widths can be set to reduce the power consumption of the device.
For a neural network, the parameters of the middle layer are much larger than those of the first and last layers, and most of the parameters do not require a higher quantization bit width. Thus, more small-scale adjustable precision kernels and fewer large-scale adjustable precision kernels can be deployed within the system. For example, in a system comprising 4 kernels, 3 kernels with smaller quantization accuracy adjustable ranges are deployed, wherein the quantization bit width adjustable ranges of the DAC and ADC are 1-8 bits; 1 kernel with larger adjustable range of quantization precision is deployed, wherein the adjustable range of quantization bit widths of the DAC and the ADC is 1-16 bits.
In the embodiment, two kernels with different adjustable precision ranges and quantization bit width are deployed in the system, so that the power consumption of the system is effectively reduced and the cost is saved while the large-scale neural network calculation task with mixed precision is met.
Based on the same inventive concept, an embodiment of the present application provides a precision adjustment method. Referring to fig. 5, fig. 5 is a flowchart of a precision adjusting method according to an embodiment of the application. As shown in fig. 5, the method is applied to the heterogeneous computing system provided in the above embodiment. The method comprises the following steps:
s11: randomly initializing a sample task and a corresponding quantization parameter set; the quantization parameter set includes: the quantization bit width of the DAC and the ADC corresponding to each calculation unit in the heterogeneous calculation system and the quantization bit width of the first calculation chip; the sample tasks are: the method comprises the steps of generating a neural network computing task for pre-training based on a target neural network computing task to be executed;
s12: executing the sample task by adopting the heterogeneous computing system, carrying out iterative updating on the quantization parameter set by an optimization algorithm, judging whether the current quantization parameter set meets the standard or not based on an evaluation index of the optimization algorithm, and stopping updating the quantization parameter set under the condition that the current quantization parameter set is judged to meet the standard; the optimization algorithm is any one or a combination of a plurality of the following algorithms: genetic algorithm, particle swarm algorithm, or deep reinforcement learning algorithm;
S13: and determining the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed based on the standard quantization parameter set.
In this embodiment, before executing the target neural network computing task, the heterogeneous computing system adaptively adjusts quantization bit widths of all DACs, ADCs and first computing chips in the system, and determines computing accuracy suitable for the target neural network computing task, so as to reduce the computing amount and memory capacity of the system while maintaining high accuracy.
In this embodiment, a sample task is generated in advance according to a target neural network computing task to be executed, and is used to adjust quantization bit widths of each DAC, ADC and first computing chip in the system, so as to adjust computing accuracy when the heterogeneous computing system executes the target neural network computing task. The sample task may be a neural network calculation task with a smaller calculation amount generated based on the target neural network calculation task, or may directly use the target neural network calculation task as the sample task.
In this embodiment, the process of adjusting the calculation accuracy of the system is as follows:
(1) And determining the adjustable range of the quantization bit width corresponding to the parameters (namely input data, weight, output data of linear calculation and an activation function calculation result) of each network layer in the sample task. In the system, a DAC is used for quantizing input data and weights, an ADC is used for quantizing output data of linear calculation, and a first calculation chip is used for quantizing an activation function calculation result (namely a nonlinear calculation result). The same or different quantization bit width adjustable ranges may be employed for different network layers. For example, the adjustable range of the quantization bit width of each DAC, ADC and the first computing chip corresponding to each network layer can be 1-16 bits; or, for the first network layer and the last network layer with higher calculation accuracy requirements, an adjustable range of 1-16 bits is adopted, and the rest middle network layers adopt an adjustable range of 1-8 bits so as to reduce the system power consumption.
Alternatively, the quantization devices respectively corresponding to the parameters of the neural network may be set to an adjustable range using different quantization bit widths.
(2) Each selectable quantization bit width in the adjustable range can be encoded, and an integer encoding mode, a binary encoding mode and the like can be selected. For example, for an adjustable range of 1-8bit quantization bit widths, there are 8 selectable quantization bit widths, thus generating 8 different quantization bit width encodings.
(3) And randomly initializing a sample task and a corresponding quantization parameter set, and randomly endowing all parameters in the quantization parameter set with an initial quantization bit width. The quantization parameter set includes all quantization parameters in the network of the whole sample task, namely, quantization bit widths corresponding to all parameters in the network.
(4) And executing the sample task which is initialized by adopting the heterogeneous computing system, and carrying out iterative updating on the quantization parameter set by an optimization algorithm to obtain the optimal quantization parameter set. The optimization algorithm can be one of genetic algorithm, particle swarm algorithm, deep reinforcement learning algorithm and the like, or a combination of a plurality of algorithms.
(5) And when the calculation result of the sample task meets the evaluation index of the optimization algorithm, judging that the quantization parameter set meets the standard, and ending optimization. The evaluation index may be one or more indexes such as identification accuracy of the neural network on the data set, parameter compression ratio compared with the high-quantization bit-width neural network, and the like. And determining the calculation precision of the heterogeneous calculation system when processing the target neural network calculation task according to the standard quantization parameter set.
In this embodiment, an optimization algorithm is adopted to iteratively update a quantization parameter set of each sample task (neural network), and the most appropriate calculation precision for the target neural network calculation task is determined according to the evaluation index, so that the heterogeneous calculation system maintains high accuracy when processing the target neural network calculation task, and simultaneously reduces the calculation amount and the storage amount as much as possible, and improves the calculation efficiency.
As one embodiment of the present application, the iterative updating of the quantization parameter set by an optimization algorithm, and judging whether the current quantization parameter set meets the standard based on the evaluation index of the optimization algorithm, includes:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after one round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
Under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, adopting a genetic algorithm to carry out screening, crossing and mutation operations on the quantization parameter set, and updating the quantization parameter set; re-executing the updated sample task by adopting the heterogeneous computing system;
And under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
In one embodiment, the process of adjusting the computational accuracy of the system based on the genetic algorithm is as follows:
(1) And determining the adjustable range of the quantization bit width corresponding to the parameters of each network layer in the sample task. The same or different quantization bit width adjustable ranges may be employed for different network layers.
(2) Each selectable quantization bit width in the adjustable range can be encoded, and an integer encoding mode, a binary encoding mode and the like can be selected.
(3) And randomly initializing a plurality of sample tasks with the same structure and corresponding quantization parameter sets, and randomly endowing all parameters in each quantization parameter set with an initial quantization bit width. The quantization parameter set includes quantization bit widths of all parameters in the neural network (i.e., the computational task).
(4) The sample task is executed by adopting a heterogeneous computing system, and the quantization parameter set is optimized by a genetic algorithm, which concretely comprises the following substeps:
1) And taking the fitness as an evaluation index, and calculating the fitness of each neural network (i.e. sample task) after each round of calculation is finished. The fitness f is associated with the accuracy and the parameter compression ratio at the same time, so that the higher accuracy and the smaller quantization bit width of the neural network parameter are balanced through the fitness, and the consumption of computing resources is reduced. The functional expression of the fitness f is as follows:
Wherein is the accuracy of unquantized neural network on the dataset,/> is the accuracy of the neural network after precision quantization, P is the parameter of the neural network, and/> is the quantization bit width of the neural network. The first term of the fitness function evaluates the accuracy change of the neural network after quantification, and the second term evaluates the parameter compression degree of the neural network. Alpha and beta are weight coefficients in the first term and the second term respectively, alpha is the weight coefficient of the accuracy rate, and beta is the weight coefficient of the parameter compression ratio. In the embodiment, the fitness is calculated, and optimization of the quantized bit width set and the quantized bit width set is simultaneously considered in the process of iterative updating, so that the balance of accuracy and parameter compression ratio is achieved;
2) A "select" operation is performed on the quantization parameter set. For example, a roulette algorithm may be used to select the next generation neural network and corresponding set of quantization parameters;
3) Performing 'crossing' operation on the quantization parameter set, performing random pairwise pairing on the quantization bit width genotypes of the screened new generation neural network, and then performing genotype crossing operation;
4) And executing a mutation operation on the quantization parameter set, and randomly fluctuating all parameters (all input data, weights, output data of linear calculation and quantization bit width of a calculation result of an activation function) in the quantization bit width set of the new generation neural network to realize genotype random mutation.
(5) When the calculation result of the neural network meets the index of the fitness, the current quantization parameter set of the sample task is judged to reach the standard, and the continuous optimization is stopped. And determining the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed according to the quantization bit width corresponding to each parameter in the quantization parameter set. In practical applications, the index of fitness may be set as needed. For example, it may be set to 0.03, and the quantization parameter set is determined to be up to the standard when the fitness f reaches 0.03.
As one embodiment of the present application, the iterative updating of the quantization parameter set by an optimization algorithm, and judging whether the current quantization parameter set meets the standard based on the evaluation index of the optimization algorithm, includes:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after one round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, and updating the quantization parameter set and the corresponding updating speed by adopting a particle swarm algorithm; re-executing the updated sample task by adopting the heterogeneous computing system;
And under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
In one embodiment, the process of adjusting the computational accuracy of the system based on the particle swarm algorithm is as follows:
(1) And determining the adjustable range of the quantization bit width corresponding to the parameters of each network layer in the sample task. The same or different quantization bit width adjustable ranges may be employed for different network layers.
(2) Each selectable quantization bit width in the adjustable range can be encoded, and an integer encoding mode, a binary encoding mode and the like can be selected.
(3) And randomly initializing a plurality of sample tasks with the same structure and corresponding quantization parameter sets, and randomly endowing all parameters in each quantization parameter set with an initial quantization bit width. The quantization parameter set includes quantization bit widths of all parameters in the neural network (calculation task).
(4) The heterogeneous computing system is adopted to execute the sample task, and the particle swarm algorithm is adopted to optimize the quantization parameter set, and the method specifically comprises the following substeps:
1) And taking the fitness as an evaluation index, and calculating the fitness of each neural network after each round of calculation is finished. In this embodiment, the calculation method of the fitness is the same as that in the above embodiment, and will not be described here again. The fitness f is associated with the accuracy and the parameter compression ratio at the same time, so that the higher accuracy and the smaller quantization bit width of the neural network parameter are balanced through the fitness, and the consumption of computing resources is reduced;
2) Updating individual historical optimal positions pbest and group optimal positions gbest for each neural network;
3) And updating the quantized bit width set and the quantized bit width update speed of each neural network according to the individual historical optimal position pbest and the group optimal position gbest. Assuming that in the t iteration, the quantized bit width update speed of the ith neural network is , the quantized bit width is/> , the individual history optimal position is/> , the group optimal position is/> , then in the (t+1) iteration, the quantized bit width update speed of the neural network/> is: /(I)
The quantization bit width is:
wherein w is inertia weight, r1 and r2 are random numbers in the [0,1] interval, c1 is self-learning factor, and c2 is global learning factor.
(5) When the calculation result of the neural network meets the index of the fitness, the current quantization parameter set of the neural network is judged to reach the standard, and the continuous optimization is stopped. And determining the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed according to the quantization bit width corresponding to each parameter in the quantization parameter set. In practical applications, the index of fitness may be set as needed. For example, it may be set to 0.03, and the quantization parameter set is determined to be up to the standard when the fitness f reaches 0.03.
As one embodiment of the present application, the precision adjustment method further includes:
And when the number of times of updating the quantization parameter set reaches a first threshold value, judging that the quantization parameter set closest to the evaluation index currently meets the standard, and stopping updating the quantization parameter set.
In one embodiment, in addition to ending the optimization when the calculation result of the neural network meets the evaluation index of the optimization algorithm, a first threshold (i.e., the maximum iteration number) may be set, and when the iteration update of the quantization parameter set reaches the maximum iteration number, continuing the optimization is stopped, so as to prevent too much calculation resources from being consumed for adjusting the quantization bit width or falling into a dead loop.
Optionally, in practical application, the maximum iteration number needs to be comprehensively determined according to the optimization algorithm and the size of the corresponding sample task. If the network structure of the sample task is simpler, smaller iteration times, such as 30, can be set; if the neural network is complex, a larger number of iterations, such as 50, needs to be set.
Based on the same inventive concept, an embodiment of the present application provides an accuracy adjusting device. Referring to fig. 6, fig. 6 is a schematic diagram of an accuracy adjustment device 100 according to an embodiment of the application. As shown in fig. 6, the apparatus includes:
An initialization module 101 configured to randomly initialize a sample task and a corresponding set of quantization parameters; the quantization parameter set includes: the quantization bit width of the DAC and the ADC corresponding to each calculation unit in the system and the quantization bit width of the first calculation chip; the sample tasks are: the method comprises the steps of generating a neural network computing task for pre-training based on a target neural network computing task to be executed;
The adjustment module 102 is configured to execute the sample task by adopting a heterogeneous computing system, perform iterative update on the quantization parameter set by an optimization algorithm, judge whether the current quantization parameter set meets the standard or not based on an evaluation index of the optimization algorithm, and stop updating the quantization parameter set under the condition that the current quantization parameter set is judged to meet the standard; the optimization algorithm is any one or a combination of a plurality of the following algorithms: genetic algorithm, particle swarm algorithm, or deep reinforcement learning algorithm;
A setting module 103, configured to determine, based on the set of quantized parameters that meet the criteria, a calculation accuracy of the heterogeneous computing system when performing the target neural network calculation task.
As an embodiment of the present application, the adjustment module 102 is specifically configured to perform the following steps:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after one round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
Under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, adopting a genetic algorithm to screen, cross and mutate the quantization parameter set, and updating the quantization parameter set; re-executing the updated sample task by adopting the heterogeneous computing system;
and under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
As an embodiment of the present application, the adjustment module 102 is specifically configured to perform the following steps:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after one round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
Under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, and updating each quantization parameter set and the corresponding updating speed by adopting a particle swarm algorithm; re-executing the updated sample task by adopting the heterogeneous computing system;
and under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
As an embodiment of the present application, the adjusting module 102 is further configured to determine that the quantization parameter set currently closest to the evaluation index meets the standard when the number of times of updating the quantization parameter set reaches the first threshold, and stop updating the quantization parameter set.
Based on the same inventive concept, an embodiment of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the precision adjustment method according to any of the above embodiments of the present application.
Based on the same inventive concept, an embodiment of the present application provides an electronic device. Fig. 7 is a schematic diagram of an electronic device 200 according to an embodiment of the application. As shown in fig. 7, the electronic device includes: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the accuracy adjustment method according to any of the above embodiments of the present application.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the application.
For the purposes of simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will recognize that the present application is not limited by the order of acts described, as some acts may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art will recognize that the embodiments described in the specification are all of the preferred embodiments, and that the acts and components referred to are not necessarily required by the present application.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the application be construed as including the preferred embodiment and all such variations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The photonic chip, heterogeneous computing system, precision adjusting method and product provided by the application are described in detail above, and specific examples are applied to illustrate the principles and embodiments of the application, and the description of the above examples is only used to help understand the method and core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (18)
1. The photonic chip is characterized by comprising a calculation unit, a first calculation unit and a second calculation unit, wherein the first calculation unit is used for performing linear calculation in a neural network calculation task; the computing unit at least comprises:
the first coupler is used for splitting the received laser signal into a first optical signal and a second optical signal;
The two modulators are respectively used for being connected with an external digital-to-analog converter, receiving target parameters after the digital-to-analog converter performs quantization with adjustable precision, encoding the first optical signal and the second optical signal, and generating input data and weight of a current network layer; the target parameters are the calculation result of the last network layer and the weight of the current network layer;
a phase shifter for changing the phase of the first optical signal and the second optical signal to generate positive and negative codes;
the balance detector is used for being connected with an external analog-digital converter; performing linear computation based on the first optical signal and the second optical signal to generate a first computation result; and converting the first calculation result into photocurrent, and transmitting the photocurrent to the analog-to-digital converter for quantization with adjustable precision.
2. A photonic chip for performing linear computations in a neural network computational task, comprising:
at least two computing units in the photonic chip of claim 1, all of the computing units being connected in parallel;
The third coupler is respectively connected with all the computing units; the third coupler is used for branching and transmitting the received laser signals to each computing unit.
3. A heterogeneous computing system for performing neural network computing tasks; the system comprises: an optical module and an electrical module;
The optical module includes: a laser for providing a continuous optical signal or a pulsed optical signal; the photonic chip of claim 1, configured to perform linear computation in the neural network computation task, generating a first computation result;
the electrical module includes:
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first control chip is used for setting the digital-to-analog converter and the quantization bit width of the analog-to-digital converter in the system; controlling the computing unit and the first computing chip to execute the neural network computing task;
The first memory chip is used for storing the weight, the first calculation result and the second calculation result;
The analog-to-digital converter is used for quantizing the first calculation result and transmitting the first calculation result to the first calculation chip;
The digital-to-analog converter is used for quantizing the second calculation result of the last network layer and the weight of the current network layer, and transmitting the second calculation result and the weight of the current network layer to the two modulators of the calculation unit respectively.
4. A heterogeneous computing system according to claim 3, comprising: one of the analog-to-digital converters and two of the digital-to-analog converters; any one of the digital-to-analog converters is used for acquiring a second calculation result of the last network layer from the first memory chip, and the other digital-to-analog converter is used for acquiring the weight of the current network layer from the first memory chip.
5. A heterogeneous computing system for performing neural network computing tasks; the system comprises: an optical module and an electrical module;
the optical module includes: a laser for providing a continuous optical signal or a pulsed optical signal; the photonic chip of claim 2, configured to perform linear computation in the neural network computation task, generating a first computation result;
the electrical module includes:
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first control chip is used for setting the digital-to-analog converter and the quantization bit width of the analog-to-digital converter in the system; controlling the photon chip and the first computing chip to execute the neural network computing task;
The first memory chip is used for storing the weight, the first calculation result and the second calculation result;
The analog-to-digital converters are used for quantizing a first calculation result output by one calculation unit and transmitting the first calculation result to the first calculation chip;
And each digital-to-analog converter is used for quantizing the second calculation result of the last network layer corresponding to one calculation unit and the weight of the current network layer, and transmitting the second calculation result and the weight of the current network layer to two modulators of the calculation unit respectively.
6. The heterogeneous computing system of claim 5, wherein the number of digital-to-analog converters is twice the number of computing units in the photonic chip; each digital-to-analog converter is used for quantizing input data or weights received by one modulator;
The number of the analog-to-digital converters is equal to the number of the computing units in the photonic chip; each analog-to-digital converter is used for quantizing the first calculation result output by one calculation unit.
7. The heterogeneous computing system of claim 5 or 6, wherein the quantization bit widths of the digital-to-analog converter and the analog-to-digital converter are the same in adjustable range and do not exceed 1-16 bits.
8. The heterogeneous computing system of claim 5 or 6, wherein the electrical module comprises two first memory chips, one of which is coupled to all of the analog-to-digital converters for storing the first computation result; the other is connected with all the digital-to-analog converters and is used for storing the weight and the second calculation result.
9. A heterogeneous computing system for performing neural network computing tasks, comprising:
The multiple cores are respectively used for executing calculation of different parts in the network layer; or, each kernel is used for executing calculation of different network layers respectively; wherein each core comprises:
the photonic chip of claim 1 or 2, configured to perform linear computation in the neural network computation task, and generate a first computation result;
the first computing chip is used for executing summation computation and nonlinear computation in the neural network computing task and generating a second computing result;
The first memory chip is used for a first calculation result, a second calculation result and parameters used in the kernel;
The analog-to-digital converter is used for quantifying a first calculation result output by the photon chip and transmitting the first calculation result to the first storage chip; the number of the analog-to-digital converters is equal to the number of the computing units in the chip;
The digital-to-analog converter is used for quantifying a second calculation result in the upper network and the weight of the current network and transmitting the second calculation result and the weight of the current network to the photon chip; the number of the digital-to-analog converters is twice the number of the computing units in the chip;
the system further comprises:
a laser for providing a continuous optical signal or a pulsed optical signal to all cores;
the second memory chip is used for storing the ownership weight of the neural network computing task and the computing results of all kernels;
the second control chip is used for setting the digital-to-analog converter and the quantization bit width of the analog-to-digital converter in all cores; and controlling all the kernels to execute the neural network computing task.
10. The heterogeneous computing system of claim 9, wherein the digital-to-analog converter and the adjustable range of quantization bit widths of the analog-to-digital converter in the system include two types, wherein the adjustable range of any one of the quantization bit widths does not exceed 1-16 bits.
11. The heterogeneous computing system of claim 10, wherein the number of cores with a larger adjustable range of quantization bit widths of the digital-to-analog converter and the analog-to-digital converter is less than the number of cores with a smaller adjustable range of quantization bit widths of the digital-to-analog converter and the analog-to-digital converter.
12. A method for adjusting accuracy, applied to the heterogeneous computing system of any of claims 3-11, comprising:
Randomly initializing a sample task and a corresponding quantization parameter set; the quantization parameter set includes: in the heterogeneous computing system, the quantization bit width of the digital-to-analog converter and the analog-to-digital converter corresponding to each computing unit and the quantization bit width of the first computing chip; the sample tasks are: the method comprises the steps of generating a neural network computing task for pre-training based on a target neural network computing task to be executed;
Executing the sample task by adopting the heterogeneous computing system, carrying out iterative updating on the quantization parameter set by an optimization algorithm, judging whether the current quantization parameter set meets the standard or not based on an evaluation index of the optimization algorithm, and stopping updating the quantization parameter set under the condition that the current quantization parameter set is judged to meet the standard; the optimization algorithm is any one or a combination of a plurality of the following algorithms: genetic algorithm, particle swarm algorithm, or deep reinforcement learning algorithm;
and determining the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed based on the standard quantization parameter set.
13. The method according to claim 12, wherein iteratively updating the quantization parameter set by an optimization algorithm and determining whether the current quantization parameter set meets a criterion based on an evaluation index of the optimization algorithm comprises:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after each round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
Under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, adopting a genetic algorithm to carry out screening, crossing and mutation operations on the quantization parameter set, and updating the quantization parameter set; re-executing the updated sample task by adopting the heterogeneous computing system;
And under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
14. The method according to claim 12, wherein iteratively updating the quantization parameter set by an optimization algorithm and determining whether the current quantization parameter set meets a criterion based on an evaluation index of the optimization algorithm comprises:
The heterogeneous computing system is adopted to execute a plurality of sample tasks with the same structure and different quantization parameter sets, and after each round of execution, the fitness of each sample task is calculated and compared with a set evaluation index; the adaptability is related to the accuracy of the sample task and the parameter compression ratio;
under the condition that the fitness of all sample tasks does not reach the evaluation index, judging that the current quantization parameter set does not reach the standard, and updating the quantization parameter set and the corresponding updating speed by adopting a particle swarm algorithm; re-executing the updated sample task by adopting the heterogeneous computing system;
And under the condition that the adaptability of any sample task reaches the evaluation index, judging that the current quantization parameter set meets the standard, and stopping updating the quantization parameter set.
15. The precision adjustment method according to claim 13 or 14, characterized by further comprising:
And when the number of times of updating the quantization parameter set reaches a first threshold value, judging that the quantization parameter set closest to the evaluation index currently meets the standard, and stopping updating the quantization parameter set.
16. A precision adjusting apparatus for implementing the precision adjusting method according to any one of claims 12 to 15, comprising:
The initialization module is configured to randomly initialize a sample task and a corresponding quantization parameter set; the quantization parameter set includes: the digital-to-analog converter and the quantization bit width of the analog-to-digital converter corresponding to each calculation unit in the system, and the quantization bit width of the first calculation chip; the sample tasks are: the method comprises the steps of generating a neural network computing task for pre-training based on a target neural network computing task to be executed;
The adjustment module is configured to execute the sample task by adopting a heterogeneous computing system, iteratively update the quantization parameter set by an optimization algorithm, judge whether the current quantization parameter set meets the standard or not based on an evaluation index of the optimization algorithm, and stop updating the quantization parameter set under the condition that the current quantization parameter set is judged to meet the standard; the optimization algorithm is any one or a combination of a plurality of the following algorithms: genetic algorithm, particle swarm algorithm, or deep reinforcement learning algorithm;
and the setting module is configured to determine the calculation precision of the heterogeneous calculation system when the target neural network calculation task is executed based on the quantized parameter set reaching standards.
17. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 12-15.
18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 12-15.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410295481.7A CN117891023B (en) | 2024-03-15 | 2024-03-15 | Photonic chip, heterogeneous computing system, precision adjusting method and product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410295481.7A CN117891023B (en) | 2024-03-15 | 2024-03-15 | Photonic chip, heterogeneous computing system, precision adjusting method and product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117891023A true CN117891023A (en) | 2024-04-16 |
CN117891023B CN117891023B (en) | 2024-05-31 |
Family
ID=90644421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410295481.7A Active CN117891023B (en) | 2024-03-15 | 2024-03-15 | Photonic chip, heterogeneous computing system, precision adjusting method and product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117891023B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118550355A (en) * | 2024-07-30 | 2024-08-27 | 光本位科技(上海)有限公司 | Method and system for adjusting link resolution in photoelectric hybrid computing system and storage medium |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018200391A (en) * | 2017-05-26 | 2018-12-20 | 日本電信電話株式会社 | Optical signal processing circuit |
CN109639359A (en) * | 2019-01-07 | 2019-04-16 | 上海交通大学 | Photon neural network convolutional layer chip based on micro-ring resonator |
CN109784486A (en) * | 2018-12-26 | 2019-05-21 | 中国科学院计算技术研究所 | A kind of optical neural network processor and its training method |
US20190319634A1 (en) * | 2018-04-14 | 2019-10-17 | Shanghai Jiao Tong University | High-speed and high-precision photonic analog-to-digital conversion device and method for realizing intelligent signal processing using the same |
CN110503196A (en) * | 2019-08-26 | 2019-11-26 | 光子算数(北京)科技有限责任公司 | A kind of photon neural network chip and data processing system |
US20190370644A1 (en) * | 2018-06-04 | 2019-12-05 | Lightmatter, Inc. | Convolutional layers for neural networks using programmable nanophotonics |
WO2020147282A1 (en) * | 2019-01-16 | 2020-07-23 | 南方科技大学 | Image recognition method based on optical neural network structure, apparatus and electronic device |
US20200363660A1 (en) * | 2018-03-02 | 2020-11-19 | Nippon Telegraph And Telephone Corporation | Optical Signal Processing Device |
CN112232504A (en) * | 2020-09-11 | 2021-01-15 | 联合微电子中心有限责任公司 | Photon neural network |
CN112232503A (en) * | 2020-06-09 | 2021-01-15 | 联合微电子中心有限责任公司 | Computing device, computing method, and computing system |
CN112823359A (en) * | 2019-01-14 | 2021-05-18 | 光子智能股份有限公司 | Photoelectric computing system |
CN113159306A (en) * | 2018-06-05 | 2021-07-23 | 光子智能股份有限公司 | Photoelectric computing system |
WO2021201773A1 (en) * | 2020-04-03 | 2021-10-07 | Nanyang Technological University | Apparatus and method for implementing a complex-valued neural network |
US20210357737A1 (en) * | 2018-11-12 | 2021-11-18 | Ryan HAMERLY | Large-Scale Artificial Neural-Network Accelerators Based on Coherent Detection and Optical Data Fan-Out |
CN113890620A (en) * | 2020-07-01 | 2022-01-04 | 浙江大学 | Silicon substrate photonic neural network based on tunable filter and modulation method thereof |
US20220045757A1 (en) * | 2020-08-06 | 2022-02-10 | Celestial Ai Inc. | Coherent photonic computing architectures |
CN114037070A (en) * | 2022-01-07 | 2022-02-11 | 苏州浪潮智能科技有限公司 | Optical signal processing method, photonic neural network chip and design method thereof |
CN114520694A (en) * | 2022-04-21 | 2022-05-20 | 苏州浪潮智能科技有限公司 | Computing chip, system and data processing method |
US20220263582A1 (en) * | 2020-12-17 | 2022-08-18 | Celestial Ai Inc. | Balanced photonic architectures for matrix computations |
CN115241074A (en) * | 2021-04-23 | 2022-10-25 | 南京光智元科技有限公司 | Photonic semiconductor device and method for manufacturing the same |
US20220413222A1 (en) * | 2019-05-09 | 2022-12-29 | Universitat Politècnica De València | Photonic chip, field programmable photonic array and photonic integrated circuit |
CN116739063A (en) * | 2023-05-15 | 2023-09-12 | 浙江大学 | Neural network accelerator based on multimode interferometer and coherent detection |
WO2023174072A1 (en) * | 2022-03-15 | 2023-09-21 | 上海曦智科技有限公司 | Data processing method and system |
-
2024
- 2024-03-15 CN CN202410295481.7A patent/CN117891023B/en active Active
Patent Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018200391A (en) * | 2017-05-26 | 2018-12-20 | 日本電信電話株式会社 | Optical signal processing circuit |
US20200363660A1 (en) * | 2018-03-02 | 2020-11-19 | Nippon Telegraph And Telephone Corporation | Optical Signal Processing Device |
US20190319634A1 (en) * | 2018-04-14 | 2019-10-17 | Shanghai Jiao Tong University | High-speed and high-precision photonic analog-to-digital conversion device and method for realizing intelligent signal processing using the same |
US20190370644A1 (en) * | 2018-06-04 | 2019-12-05 | Lightmatter, Inc. | Convolutional layers for neural networks using programmable nanophotonics |
CN113159305A (en) * | 2018-06-05 | 2021-07-23 | 光子智能股份有限公司 | Photoelectric computing system |
CN113159306A (en) * | 2018-06-05 | 2021-07-23 | 光子智能股份有限公司 | Photoelectric computing system |
US20210357737A1 (en) * | 2018-11-12 | 2021-11-18 | Ryan HAMERLY | Large-Scale Artificial Neural-Network Accelerators Based on Coherent Detection and Optical Data Fan-Out |
CN109784486A (en) * | 2018-12-26 | 2019-05-21 | 中国科学院计算技术研究所 | A kind of optical neural network processor and its training method |
CN109639359A (en) * | 2019-01-07 | 2019-04-16 | 上海交通大学 | Photon neural network convolutional layer chip based on micro-ring resonator |
CN112823359A (en) * | 2019-01-14 | 2021-05-18 | 光子智能股份有限公司 | Photoelectric computing system |
WO2020147282A1 (en) * | 2019-01-16 | 2020-07-23 | 南方科技大学 | Image recognition method based on optical neural network structure, apparatus and electronic device |
US20220413222A1 (en) * | 2019-05-09 | 2022-12-29 | Universitat Politècnica De València | Photonic chip, field programmable photonic array and photonic integrated circuit |
CN110503196A (en) * | 2019-08-26 | 2019-11-26 | 光子算数(北京)科技有限责任公司 | A kind of photon neural network chip and data processing system |
WO2021201773A1 (en) * | 2020-04-03 | 2021-10-07 | Nanyang Technological University | Apparatus and method for implementing a complex-valued neural network |
CN112232503A (en) * | 2020-06-09 | 2021-01-15 | 联合微电子中心有限责任公司 | Computing device, computing method, and computing system |
CN113890620A (en) * | 2020-07-01 | 2022-01-04 | 浙江大学 | Silicon substrate photonic neural network based on tunable filter and modulation method thereof |
US20220045757A1 (en) * | 2020-08-06 | 2022-02-10 | Celestial Ai Inc. | Coherent photonic computing architectures |
CN112232504A (en) * | 2020-09-11 | 2021-01-15 | 联合微电子中心有限责任公司 | Photon neural network |
US20220263582A1 (en) * | 2020-12-17 | 2022-08-18 | Celestial Ai Inc. | Balanced photonic architectures for matrix computations |
CN115241074A (en) * | 2021-04-23 | 2022-10-25 | 南京光智元科技有限公司 | Photonic semiconductor device and method for manufacturing the same |
CN114037070A (en) * | 2022-01-07 | 2022-02-11 | 苏州浪潮智能科技有限公司 | Optical signal processing method, photonic neural network chip and design method thereof |
WO2023174072A1 (en) * | 2022-03-15 | 2023-09-21 | 上海曦智科技有限公司 | Data processing method and system |
CN114520694A (en) * | 2022-04-21 | 2022-05-20 | 苏州浪潮智能科技有限公司 | Computing chip, system and data processing method |
WO2023201970A1 (en) * | 2022-04-21 | 2023-10-26 | 苏州浪潮智能科技有限公司 | Computing chip, system, and data processing method |
CN116739063A (en) * | 2023-05-15 | 2023-09-12 | 浙江大学 | Neural network accelerator based on multimode interferometer and coherent detection |
Non-Patent Citations (5)
Title |
---|
FENG CHENGHAO: "Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning", NANOPHOTONICS, 20 January 2024 (2024-01-20) * |
TAIT, ALEXANDER N.: "Neuromorphic photonic networks using silicon photonic weight banks", JOURNAL INFORMATION SCIENTIFIC REPORTS, 7 August 2017 (2017-08-07) * |
江俊;廖云;苏君;: "基于光延迟插值的光电混合数模转换研究", 光通信技术, no. 06, 15 June 2018 (2018-06-15) * |
赵正平;: "FinFET纳电子学与量子芯片的新进展(续)", 微纳电子技术, no. 02, 13 January 2020 (2020-01-13) * |
陈蓓: "面向光神经网络的硅基光子器件及系统的关键技术研究", 中国博士学位论文全文数据库 (基础科学辑), 15 February 2023 (2023-02-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118550355A (en) * | 2024-07-30 | 2024-08-27 | 光本位科技(上海)有限公司 | Method and system for adjusting link resolution in photoelectric hybrid computing system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117891023B (en) | 2024-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117891023B (en) | Photonic chip, heterogeneous computing system, precision adjusting method and product | |
CN109784486B (en) | Optical neural network processor and training method thereof | |
Garg et al. | Dynamic precision analog computing for neural networks | |
CN107636697B (en) | Method and apparatus for quantizing a floating point neural network to obtain a fixed point neural network | |
CN109784485B (en) | Optical neural network processor and calculation method thereof | |
KR20210011461A (en) | Method for determining quantization parameter of neural network, and related product | |
TW202103025A (en) | Hybrid analog-digital matrix processors | |
US20200242474A1 (en) | Neural network activation compression with non-uniform mantissas | |
KR20190034985A (en) | Method and apparatus of artificial neural network quantization | |
CN113112013A (en) | Optimized quantization for reduced resolution neural networks | |
WO2020176250A1 (en) | Neural network layer processing with normalization and transformation of data | |
WO2021086861A1 (en) | Quantized architecture search for machine learning models | |
KR20220009682A (en) | Method and system for distributed machine learning | |
de Bruin et al. | Quantization of deep neural networks for accumulator-constrained processors | |
CN111898316A (en) | Construction method and application of super-surface structure design model | |
Raha et al. | Design considerations for edge neural network accelerators: An industry perspective | |
CN116596056A (en) | Deep optical neural network training method and system based on mixed mutation strategy genetic algorithm | |
CN110874626A (en) | Quantization method and device | |
US20220043474A1 (en) | Path-number-balanced universal photonic network | |
CN115983324A (en) | Neural network quantization method and device and electronic equipment | |
US20220036185A1 (en) | Techniques for adapting neural networks to devices | |
CN113592069A (en) | Photon neural network for four-input logic operation | |
CN114742219A (en) | Neural network computing method and photonic neural network chip architecture | |
AU2020395435B2 (en) | Flexible precision neural inference processing units | |
CN114970831A (en) | Digital-analog hybrid storage and calculation integrated equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |