CN115542433B

CN115542433B - Self-attention-based deep neural network coding photonic crystal method

Info

Publication number: CN115542433B
Application number: CN202211546437.6A
Authority: CN
Inventors: 张昭宇; 李文烨; 李任杰; 俞跃耀
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-03-24
Anticipated expiration: 2042-12-05
Also published as: CN115542433A

Abstract

The invention discloses a method for coding photonic crystals by a deep neural network based on self attention, and provides a POViT model and applies the POViT model to the coded photonic crystals; the method comprises the following steps: acquiring a geometric structure parameter image of the photonic crystal; the photonic crystal is provided with a plurality of air holes, and each pixel of the geometric structure parameter image comprises: the location and radius of the air holes; performing dimensionality reshaping on the geometric structure parameter image to obtain a plurality of patch images; inputting the patch image into an embedding module and a position coding module to obtain a symbol sequence; inputting the symbol sequence into a transform coding module to obtain coding characteristics; and inputting the coding characteristics into a full connection layer module to obtain a quality factor Q and a mode volume V. The POViT is applied to the field of photoelectric design from an attention Transformer model, and the speed and the accuracy of predicting the Q factor and the mode volume V of the photonic crystal are improved.

Description

Self-attention-based deep neural network coding photonic crystal method

Technical Field

The invention relates to the technical field of photonic crystals, in particular to a method for coding a photonic crystal by a deep neural network based on self attention.

Background

In recent years, deep Learning (DL) has been widely used in a plurality of fields such as medical imaging, natural Language Processing (NLP), automatic driving, face recognition, and object detection. Deep Neural Networks (DNNs) can effectively learn rich features from large data through their powerful high-dimensional data processing capabilities. Under the influence of the wide application prospect of deep learning, in recent years, many scholars design and optimize various photoelectric devices such as nano semiconductor lasers by using DNNs represented by Multilayer perceptrons (MLPs) and Convolutional Neural Networks (CNNs). The unique properties of nanoscale lasers can be characterized by calculating the material gain in the quantum wells/dots and the transverse/longitudinal modes in the defect microcavity. However, the conventional method of designing a nanoscale laser is generally Time consuming and inefficient because all physical parameters are manually adjusted by simulation tools such as COMSOL and lumeric, and the applied Finite-Difference Time-Domain (FDTD) or Finite-Element Analysis (FEA) method has the disadvantages of high computational intensity and complexity. Furthermore, gradient-based optimization methods often suffer from difficult convergence due to the presence of multiple local minima in the high-dimensional parameter space associated with the physical system. Such complex designs depend to a large extent on the ease of computational processing and the expertise of the designer. Therefore, if deep learning can be successfully applied to the field of optoelectronic design, it will certainly save a lot of manpower and material resources in designing a structurally excellent photonic device. However, traditional DL models like CNN and MLP (as in fig. 1) appear to face performance bottlenecks that are difficult to overcome when designing highly complex physical systems. For example, it is still quite difficult to increase the correlation coefficient of the prediction result by adjusting the hyperparameters of the DNN or by using only a gradient-based optimization algorithm.

The InAs/GaAs quantum dot Photonic Crystal (PC) nano-cavity laser can be experimentally grown on a silicon wafer substrate, but due to the high complexity of the physical structure, how to effectively calculate the quality factor (Q-factor) of such nano-Photonic devices remains an unsolved problem. At the same time, FDTD-based simulation tools require a significant amount of time to simulate and calculate the optical properties of the target structure. Recently proposed CNN models can train and predict the Q factor with a small training data set (about 1000 samples), but do not take into account the effect of PC hole radius on Q. Meanwhile, the prediction error achieved by the model is as high as 16%, and the model cannot be reliably used in actual operation. Based on recent advances made by researchers in japan, some work has demonstrated that the performance of CNN models can be significantly improved using larger data sets. In addition to the Q-factor, the mode volume V is also an important parameter for evaluating the performance and properties of the nanolaser, since V is crucial for both reducing the size area of the device and achieving tight on-chip integration. Considering the mode volume V, some authors have successfully achieved simultaneous training and prediction of Q and V in recent years, while keeping the test error low, which makes it the most advanced result at present. But the correlation coefficient of V is still relatively low (coefficient of V =80.5% in the test set). Generally, the higher the correlation coefficient, the more accurate the prediction result of the model will be. Ideally, if the coefficients are equal to 100%, the DL model can obtain a design output that is extremely reliable and repeatable. Therefore, in the prior art, when the DNN model is used to predict the Q-factor and the mode volume V of the photonic crystal, the speed and accuracy need to be improved.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method for coding a photonic crystal by using a deep neural network based on self-attention, aiming at solving the problem that the speed and accuracy are still to be improved when a DNN model is used to predict the Q factor and the mode volume V of the photonic crystal in the prior art.

The technical scheme adopted by the invention for solving the technical problem is as follows:

a method for coding photonic crystals based on a self-attention deep neural network is provided, a POViT model is proposed and applied to the coded photonic crystals, and the POViT model comprises the following steps: the system comprises an embedding module, a position coding module, a transform coding module and a full connection layer module; the method comprises the following steps:

acquiring a geometric structure parameter image of the photonic crystal; wherein the photonic crystal has a plurality of air holes arranged to form a periodic air hole array, and each pixel of the geometric parameter image comprises: the location and radius of the air holes;

performing dimension reshaping on the geometric structure parameter image to obtain a plurality of patch images;

inputting the patch image into the embedding module and the position coding module to obtain a symbol sequence; the symbol sequence is ordered according to the corresponding position of the patch image in the geometric structure parameter image;

inputting the symbol sequence into the transformer coding module to obtain coding characteristics;

and inputting the coding features into the full-connection layer module to obtain a quality factor and a mode volume.

The method for coding photonic crystals based on the self-attention deep neural network comprises the following steps of: a number of coding blocks, each coding Block comprising: the multi-head self-attention display device comprises a first standardization layer, a multi-head self-attention layer, a first Dropot layer, a second standardization layer, an MLP module and a second Dropot layer; wherein the MLP module comprises: the first linear layer, the ABS active layer, the third Dropout layer, the second linear layer and the fourth Dropout layer.

The method for coding the photonic crystal through the deep neural network based on the self-attention comprises the steps of training by adopting an Adam optimizer, wherein the learning rate is 0.0001 to 0.01.

The method for coding the photonic crystal by the self-attention-based deep neural network comprises the following steps of performing dimension reshaping on the geometric structure parameter image to obtain a plurality of patch images, wherein the method comprises the following steps:

determining the position and radius of the air holes based on the position and radius of the air holes and the initial position and initial radius of the air holesxAn axis shift amount image,yAn axis offset image and a radius offset image;

when the air hole is formedxAn axis shift amount image,yFor the air holes when the sizes of the axial offset image and the radial offset image are not exactly divisible by the patch sizexAn axis shift amount image,ySupplementing dimensions with the axis offset image and the radius offset image;

when the air hole is formedxAn axis shift amount image,yWhen the sizes of the axial deviation image and the radial deviation image are divided by the size of the patch, the size of the air hole is adjustedxAn axis shift amount image,yAnd dividing the axis offset image and the radius offset image into a plurality of patch images according to a preset height and a preset width.

The method for coding the photonic crystal based on the self-attention depth neural network is characterized in that 54 air holes are formed in the geometric structure parameter image and are divided into 5 rows; the initial radius is 89.6nm, and the initial position is a hole position with the center distance between adjacent holes being 320 nm; the number of patch images is 18.

The method for coding the photonic crystal by the self-attention-based deep neural network comprises the steps of randomly changing the initial position and the initial radius of an air hole according to Gaussian distribution when data are collected, and obtaining training data.

The method for coding the photonic crystal by the self-attention-based deep neural network is evaluated by adopting a method based on MSE loss, prediction error and correlation coefficient.

The method for coding the photonic crystal based on the self-attention deep neural network comprises the following steps of:

wherein MSE represents the loss of MSE,t _i is shown asiThe output of each target is output,p _i is shown asiThe output of the prediction is used as the output,nrepresenting the number of training samples, sigma representing the summation sign;

the prediction error is:

wherein the content of the first and second substances,

representing a prediction error;

the correlation coefficient is:

wherein the content of the first and second substances,

represents the relevant coefficient, <' > is selected>

Indicates a mathematical expectation>

Represents the mean value of the target output, based on the comparison of the measured values>

Represents the average of the predicted outputs.

A computer device comprising a memory storing a computer program and a processor, wherein the processor implements the steps of the method as claimed in any one of the above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, realizes the steps of the method as set forth in any of the above.

Has the advantages that: a self-attention transducer model is applied to the field of photoelectric design, a self-attention mechanism is introduced, and the speed and the accuracy of predicting the Q factor and the mode volume V of the photonic crystal are greatly improved in the task of predicting the Q factor and the mode volume V of the photonic crystal.

Drawings

FIG. 1 is a schematic representation of the simplified history of DNN development.

Fig. 2 is a schematic diagram of a Photonic Crystal (PC).

Fig. 3 is a gaussian distribution plot of samples of a training data set.

Fig. 4 is an architecture diagram of the POViT model.

Fig. 5 is a graph of learning curves and training results for POViT using ABS as the activation function.

Fig. 6 is a graph of learning curves and training results for POViT using GELU as the activation function.

FIG. 7 is V _coeff Trend plots of change under both ABS and GELU activation functions.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1-7, the present invention provides some embodiments of a method for self-attention based deep neural network coding photonic crystals.

From NLP, computer Vision (CV), to the field of basic science, the transform model has demonstrated its powerful functions in a variety of tasks. Transformer was first introduced into NLP in about 2017, and later further development was achieved in CV in 2021, and the test performance was superior to CNN. Many successors have attempted to modify the architecture of Vision Transformer (ViT) to obtain better performance or to apply the ViT model to research across disciplinary areas. For example, researchers have proposed gated axial self-attention models to overcome the lack of data samples in medical image segmentation. It extends the structure of existing transmormers by adding control mechanisms in the self-attention module. In bioinformatics, a BERT (Bidirectional Encoder retrieval from transforms) -based multilingual model considers DNA sequences as natural sentences and successfully identifies DNA enhancers using Transformers. In addition, an improved transform network is also applied to learning semantic relationships between objects in an collider physical.

Transformer (or more advanced vision Transformer), a latest DL model based on the self-attention mechanism, has become a subversive alternative in the deep learning field. By virtue of the excellent performance of the Transformer in various engineering applications in recent years, the application of the Transformer in the aspect of photoelectric design is promising. According to our research, the present application is the first patent application existing worldwide from the attention Transformer model to the field of optoelectronic design. In this application, it will be shown that the applications of the original Vision Transformer (ViT), the later convolution Vision Transformer (CvT) and our own proposed unique POViT version to the design and characterization of Photonic Crystal (PC) nanocavities. PC (fig. 2) is the core component of high performance nanoscale semiconductor lasers used in next generation silicon Integrated Circuits (PICs) and laser radars. In this application, we name the proposed self-attention-based deep neural network model as the POViT model: photonics Vision Transformer.

Our work is to apply ViT to rapid multi-objective characterization and optimization of nanophotonic devices to pave the way without extensive manual intervention or trial and error. Our approach is inspired by the combination of artificial intelligence and Electronic Design Automation (EDA), which is also an area of extensive research in academia and industry. Our main goal is to push the rise of fully automated Photonic Design by our efforts, as the current state of development of Photonic Design Automation (PDA) is still a very preliminary stage. In this application, multi-objective means that multiple photon/electromagnetic properties, i.e., Q-factors (i.e., quality factors) and mode volumes V in our example, can be predicted, encoded.

The nanoscale laser is implemented according to the PC nanocavity shown in fig. 2, which has a regular array of air holes in the multilayer semiconductor (i.e., si and InP) structure. This particular structure is very powerful and efficient because spontaneous lasing can be significantly enhanced by manipulating the propagation of electromagnetic waves enabled by the photonic band gap, while due to the periodic array of air holes, photons will be concentrated to form a laser beam. These holes have periodically different effective refractive indices compared to indium phosphide (InP) based, which makes photons more easily trapped and confined. Since the peripheral air holes are far from the center, they contribute little to the quality factor Q and the mode volume V, i.e. they do not produce significant changes in the electromagnetic field when adjusted. For the purpose of simplifying computational complexity and efficient use of resources, the modeled region of this problem contains only 54 air holes in the PC, henceSurrounded by a white rectangle in fig. 2. The total number of the rows is 5, wherein, the 1 st row is provided with 11 air holes; row 2 has 12 air holes; the 3 rd row is provided with 8 air holes and is divided into two parts, and spacing vacancies (particularly spacing vacancies of 3 air holes) are formed between 4 air holes and the other 4 air holes; row 4 has 12 air holes; row 5 has 11 air holes. For holes outside this rectangle we fix them to reduce the computational cost. Lattice constant of PCa=320nmRadius of air holer=89.6nmIs normalized, i.e. before changing the position and radius of the air holes, the center distance of each pair of adjacent holes is 320nmDefault hole radius equals 89.6nmThe refractive index of the InP plate isn=3.4. These constants may differ from other semiconductor materials. The data set for training the ViT was obtained from FDTD simulation and had 12500 samples. The position and radius change of the 54 air holes from the PC structure in each sample is the input, and its corresponding simulation result Q factor and mode volume V are the target outputs. Before inputting a data sample into a model, we reshape its dimension size intoNX 3X 5X 12, whereinNIndicating batch size, "3" indicates three passes of the well: (dx，dy，dr) Similar to the RGB three channels of the image, the numbers "5" and "12" represent the height and width, respectively, of our PC (i.e., tensor) similar to the height and width, respectively, of the image. This dimensional transformation will approximate the sample to an actual image, forming a geometry parametric image of the photonic crystal, each pixel of the geometry parametric image comprising: the position and radius of the air hole, and the position of the air hole are expressed by coordinates, each pixel can be expressed as (A), (B), and (C)x′，y′，r′). Finally, to avoid any ambiguity, more details of the data set are briefly parsed below:

when data are collected, the initial positions and the initial radiuses of the air holes are randomly changed according to Gaussian distribution, and training data are obtained. The initial position of the air hole is represented by (x ₀ ，y ₀ ) Its initial radius is expressed asr ₀ . Then, we randomly move the hole positions horizontally and vertically according to the Gaussian distribution and the radius according to the heightThe distribution of the Si is randomly increased or decreased so that the position becomes (A), (B), (C)x′，y') and radius becomer'. Now defining air holesxOffset of shaftdx=x′-x ₀ ，yOffset of shaftdy=y′-y ₀ Radial offsetdr=r′-r ₀ 。dx、dy、drThe gaussian distribution as input element is as follows:

dx∼N（μ=−8.7270×10 ⁻¹³ ，σ ² =5×10 ⁻¹⁰ ）

dy∼N（μ=3.3969×10 ⁻¹³ ，σ ² =5×10 ⁻¹⁰ ）

dr∼N（μ=−1.6978×10 ⁻¹² ，σ ² =5×10 ⁻¹⁰ ）

in practice, the training data set is 10000 in size and the remaining 2500 samples are used as test data (i.e., the test data set is 2500 in size). In addition, 12500 data samples are randomly split, so that the characteristics of the data samples are diversified as much as possible, and the generalization capability of the ViT is maximized. The sample distribution of the data set is shown in the form of a histogram in fig. 3.

Firstly, performing dimensionality reshaping on a geometric structure parameter image, dividing the geometric structure parameter image into a plurality of patch images, and when 54 air holes are adopted, forming 5 rows of air holes, wherein each row has at most 12 air holes, and then forming 5 × 12 pixels (such as a square grid in fig. 4), and if the square grid has air holes, then the pixels can be expressed as (dx，dy，dr) (ii) a If there are no air holes in the square lattice, the pixel can be represented as (0, 0). When the image is divided, the image is divided according to the preset height and the preset width, as shown in fig. 4, the image is divided according to 2 × 2 pixels, the dimension needs to be supplemented so as to be divided into a plurality of images with the same size, 5 × 12 pixels are supplemented into 6 × 12 pixels, namely a row of pixels is supplemented, and the supplemented row of pixels are all (0, 0). And thus can be divided into 18 2 × 2 patch images. Segmentation may be implemented in particular using convolutional layers, e.g. using convolutional kernel sizes of 2 x 2The step pitch is 2, and the number of convolution kernels is 12.

The POViT model includes: an embedding module, a position coding module, a transform coding module and a full link layer module (specifically, an MLP layer may be used). A transformer encoding module: a number of coding blocks, each coding Block comprising: the multi-head self-attention display device comprises a first normalization layer, a multi-head self-attention layer, a first Dropot layer, a second normalization layer, an MLP module and a second Dropot layer; the MLP module includes: the first linear layer, the ABS active layer, the third Dropout layer, the second linear layer and the fourth Dropout layer. Compared with the ViT model in the prior art, the transform module does not adopt the GELU active layer, but adopts the ABS active layer. We performed several experiments to verify the robustness of this particular active layer, called the ABS active layer. We have demonstrated that ABS active layers can significantly improve POViT performance relative to conventional active layers such as GELU. Their expressions are as follows:

。/>

the multi-headed self-attention tier projects the output of the first normalization layer into the query Q, key K, and value V, respectively, by some linear projection, will search for existing key-value pairs and add these pairs by weight to give the prediction. The scaled dot product function of the multi-headed self-attention layer is as follows:

where A (Q, K, V) represents the output of a multi-headed self-attention tier, Q represents a query, K represents a key, V represents a value,drepresenting the scaling factor.

Since the input to the encoding module is a vector, each patch image is input to the embedding module, each patch image is mapped to a one-dimensional vector through linear mapping, and the input to the encoding module is input to obtain a symbol sequence, and the position encoding is ordered according to the corresponding position of the patch image in the geometric parameter image, for example, the position encoding may be 1, 2, 3, 17, 18. And then inputting the symbol sequence into a transformer coding module to obtain coding characteristics. The coding features are then input into the MLP layer, resulting in a quality factor Q and a mode volume V.

In the training process, an Adam optimizer can be adopted for training, and the learning rate is 0.0001 to 0.01. To measure the performance of the POViT model, we calculated the MSE loss, minimum and convergence prediction errors during the training process (S) ((S))

) And a correlation coefficient of (ρ). The minimum prediction error is measured and recorded by the program during the test phase, while the convergent prediction error is collected and averaged over the last few epochs. Calibrationt _i Output (i.e., tags) and labels for targets in a data setp _i The corresponding prediction output of (2). The performance of the POViT model can be evaluated by the following three formulas:

wherein MSE represents the loss in MSE,t _i denotes the firstiThe output of each target is output,p _i is shown asiThe output of the prediction is used to predict,nrepresents the number of training samples, sigma represents the sum sign,

indicates a prediction error, based on the prediction error>

Represents the relevant coefficient, <' > is selected>

Indicates a mathematical expectation>

Which represents the average of the predicted outputs and,Cov(t，p) Representing the covariance of the target output and the predicted output,σ _t a standard deviation representing a target output is shown,σ _p indicating the standard deviation of the predicted output.

Pearson correlation coefficient in the above equationρ(t，p)∈[−1，1]Can be used to measure the linear relationship between the prediction and the target output. If the coefficients are close to 1, the output will be perfectly positively correlated to the target, which also means that our POViT model fits well into this regression problem of coded photonic crystals.

As shown in fig. 4, the method for coding a photonic crystal based on a self-attention deep neural network of an embodiment of the present invention includes the following steps:

s100, acquiring a geometric structure parameter image of the photonic crystal; wherein the photonic crystal has a plurality of air holes arranged to form a periodic air hole array, and each pixel of the geometric parameter image comprises: the location and radius of the air holes.

Specifically, the positions and the radii of a plurality of air holes in the center of the photonic crystal are extracted to form a set structure parameter image, and the number of the air holes can be 54, for example.

And S200, performing dimension reshaping on the geometric structure parameter image to obtain a plurality of patch images.

Specifically, the geometric structure parameter image is subjected to dimension reshaping, and a plurality of patch images are obtained. The convolution layer can be adopted to extract the geometric structure parameter image to obtain a plurality of patch images.

Step S200 specifically includes:

step S210, determining the air holes according to the positions and the radii of the air holes and the initial positions and the initial radii of the air holesIs/are as followsxAn axis shift amount image,yAn axis offset image and a radius offset image.

Step S220, when the air holexAn axis shift amount image,yFor the air holes when the sizes of the axial offset image and the radial offset image are not exactly divisible by the patch sizexAn axis shift amount image,yThe axis offset image and the radius offset image complement the dimensions.

Step S230, when the air holexAn axis shift amount image,yWhen the sizes of the axial deviation image and the radial deviation image are divided by the size of the patch, the size of the air hole is adjustedxAn axis shift amount image,yAnd the axis offset image and the radius offset image are divided into a plurality of patch images according to a preset height and a preset width.

Specifically, each pixel of the geometry parameter image isx'，y'，r') The initial position of the air hole is (x ₀ ，y ₀ ) Initial radius ofr ₀ Then each pixel of the offset image is (dx，dy，dr). In the division, it is necessary to see whether the height and width of the offset image are divisible by the height and width of the patch, respectively, and if not, the dimensions need to be supplemented so that the dimensions can be divisible, and the patch images can be divided into a plurality of patch images. For example, the size of the offset image is 5 × 12, the size of the patch image is 2 × 2, the height 5 cannot be divided by the height 2, and the width 12 can be divided by the width 2, so that it is necessary to supplement one dimension in height to form an offset image of 6 × 12, and the offset image can be divided into 18 patch images.

S300, inputting the patch image into the embedding module and the position coding module to obtain a symbol sequence; and the symbol sequence is ordered according to the corresponding position of the patch image in the geometric structure parameter image.

Specifically, the patch image is input to an embedding module, linear projection is performed, and then position coding is input to a position coding module to embed position coding, so that a symbol sequence is obtained.

And S400, inputting the symbol sequence into the transform coding module to obtain coding characteristics.

And S500, inputting the coding features into the full-connection layer module to obtain a quality factor Q and a mode volume V.

Specifically, the symbol sequence is input into a transform coding module to obtain coding characteristics. And then inputting the coding characteristics into a full connection layer module to obtain a quality factor Q and a mode volume V.

Detailed description of the preferred embodiment

The POViT model provided by the patent aims to construct a reliable and efficient method to simplify the multi-target coding design of the nano-photonic device. Initially, 10000 data samples are randomly selected and shuffled from the dataset and input to the model, which runs 300 epochs at a time. After several rounds of experiments, the following table lists the superparameters that yield the best performance. Initial learning rate lr =0.01, optimizer Adam, learning rate scheduler multistep lr (milestone = [100, 160, 200] and gamma = 0.1). A complete list of POViT hyper-parameters we used is kept in table 1. The results of POViT training using ABS and GELU, respectively, are shown in FIGS. 5-6, with the four graphs on the left of FIG. 5 (a: prediction error, b: MSE, e and f: correlation coefficient) for Q and the four graphs on the right for V. QNN is the predicted correlation coefficient and QFDTD is the target correlation coefficient. Likewise, VNN and VFDTD are prediction and target correlation coefficients, respectively. The four graphs on the left in fig. 6 (a: prediction error, b: MSE, e and f: correlation coefficient) are for Q, while the four graphs on the right are for V. QNN is the predicted correlation coefficient and QFDTD is the target correlation coefficient. Likewise, VNN and VFDTD are prediction and target correlation coefficients, respectively. In fig. 5 and 6, the correlation coefficient of Q in training and testing seems to be the same, since we only retain three significant digits after the decimal point. It also indicates that no overfitting phenomenon occurred during the training of the Q-factor. Since no correlation coefficients were provided for the CNN model in previous work, we came from the following web sites: https:// gitubb.com/arcadinie/Deep-Learning-Based-Modeling-of-PC-nanocavities.

The CNN open source code is obtained and extended to include the ability to predict correlation coefficients. We have found that this CNN gives an assayTest coefficients are calculated as Q _coeff =98.7% and V _coeff =80.5%. From FIG. 6 we can see that the best test coefficient in the POViT model is Q _coeff =99.4% and V _coeff =92.0%, wherein V _coeff Values are as much as 11.5% higher than the best results obtained with the CNN model.

The POViT proposed by the patent has the advantages of prediction accuracy, convergence rate and correlation coefficient linearity of the model (see figures 5-6). The introduction of the self-attention mechanism has proven to surpass the traditional CNN, which was once the most common structure in the field of computer vision. In addition, the convergence rate of POViT is very fast, and the MSE loss is reduced to an extremely low level within only 100 epochs and then is kept in a stable state. The high correlation coefficient of POViT, including the transform-combined CvT, means that our model is robust to noise interference.

As for the active layer embedded in the feedforward network (FFN) in the transformer module, experiments found that when lr is relatively small (lr)<0.0005 Absolute value function (ABS) has significantly better performance than GELU. In fig. 7, data points are taken from the average between three separate experiments. V _coeff Plots are made against lr to show the performance of the ABS and GELU active layers in direct comparison. We see that ABS still holds a weak advantage over GELU when the learning rate becomes relatively large (lr ≈ 0.001), despite the gap narrowing gap. However, after lr ≧ 0.005, the curves for ABS and GELU almost overlap each other, although the former curve always remains slightly higher than the latter. Based on the above observations, we conclude that the ABS activation function is significantly better than the GELU for our application. We consider it worthwhile to investigate further why the activation function ABS is clearly superior to GELU in the case of a relatively small lr. Here we provide a possible explanation-the elapsing ReLU phenomenon. Consider the reference to FIG. 3dx、dyAnddrthe input data are gathered in a small range after normalization, and a considerable part of data elements are positioned on the negative half of the coordinate axis. Thus, reLU-like activation functions (e.g., GELUs) may be negatively affected by the evanescent ReLU phenomenon, leading to improvements in POViTSome neurons become inactive (the weights decrease to near zero, which will be detrimental to the loss convergence). Conversely, for the ABS, both positive and negative input data elements are retained and treated identically, which may mitigate the above-mentioned negative effects.

TABLE 1 hyper-parameter List of POViT

Based on the method for coding photonic crystals by using the self-attention-based deep neural network described in any one of the above embodiments, the present invention further provides an embodiment of a computer device:

the computer device of the present invention comprises a memory storing a computer program and a processor implementing the steps of the method according to any one of the above embodiments when the processor executes the computer program.

Based on the method for coding photonic crystals based on the self-attention deep neural network described in any one of the above embodiments, the present invention further provides an embodiment of a computer-readable storage medium:

the computer-readable storage medium of the present invention has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method according to any one of the above-mentioned embodiments.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A method for coding a photonic crystal based on a self-attention deep neural network, wherein a POViT model is applied to the coded photonic crystal, the POViT model comprising: the system comprises an embedding module, a position coding module, a transform coding module and a full connection layer module; the method comprises the following steps:

and inputting the coding characteristics into the full-connection layer module to obtain a quality factor Q and a mode volume V.

2. The self-attention based deep neural network coding photonic crystal method of claim 1, wherein the transform coding module comprises: a number of coding blocks, each coding Block comprising: the multi-head self-attention display device comprises a first normalization layer, a multi-head self-attention layer, a first Dropot layer, a second normalization layer, an MLP module and a second Dropot layer; wherein the MLP module comprises: the first linear layer, the ABS active layer, the third Dropout layer, the second linear layer and the fourth Dropout layer.

3. The method for coding photonic crystals based on the attention-based deep neural network of claim 2, wherein an Adam optimizer is adopted for training, and the learning rate is 0.0001 to 0.01.

4. The method of self-attention-based deep neural network coded photonic crystals as claimed in claim 1, wherein performing dimension reshaping on the geometric parameter image to obtain a plurality of patch images comprises:

determining the position and radius of the air holes based on the position and radius of the air holes and the initial position and initial radius of the air holesxAn axis shift amount image,yOffset of shaftAn image and a radius offset image;

when the air hole is formedxAn axis shift amount image,yFor the air holes when the sizes of the axial offset image and the radial offset image are not divisible by the size of the patchxAn axis shift amount image,ySupplementing dimensions with the axis offset image and the radius offset image;

5. The method for self-attention-based deep neural network coding photonic crystals as claimed in claim 4, wherein the geometrical parameter image has 54 air holes and is divided into 5 rows; the initial radius is 89.6nm, and the initial position is a hole position with the center distance between adjacent holes being 320 nm; the number of patch images is 18.

6. The self-attention-based deep neural network coding photonic crystal method of claim 1, wherein in collecting data, the initial position and initial radius of the air holes are randomly changed according to a gaussian distribution to obtain training data.

7. The self-attention-based deep neural network coding photonic crystal method of claim 1, wherein the POViT model is evaluated based on MSE loss, prediction error, and correlation coefficient.

8. The self-attention based deep neural network coding photonic crystal method of claim 7, wherein the MSE loss is:

wherein MSE represents the loss of MSE,t _i denotes the firstiThe output of each target is output,p _i is shown asiThe output of the prediction is used to predict,nrepresenting the number of training samples, sigma representing the summation sign;

the prediction error is:

wherein the content of the first and second substances,

representing a prediction error;

the correlation coefficient is:

wherein the content of the first and second substances,

the correlation coefficient is represented by a correlation coefficient,

the mathematical expectation is represented by the mathematical expectation,

which represents the average value of the target output,

represents the average of the predicted outputs.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.