CN116300075B

CN116300075B - Layered nano-photonics device design method based on multi-head series neural network

Info

Publication number: CN116300075B
Application number: CN202310579500.4A
Authority: CN
Inventors: 郭健平; 袁小艮; 魏正军
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-08-11
Anticipated expiration: 2043-05-23
Also published as: CN116300075A

Abstract

The invention discloses a layered nano-photonics device design method based on a multi-head series neural network, which comprises the following steps: acquiring a second data set, wherein the second data set consists of a second structural parameter of the layered nano-photonic device and a second optical response corresponding to the second structural parameter; training a reverse neural network according to the second data set; acquiring an expected optical response of a layered nano-photonic device to be designed; based on the expected optical response, the inverse design of the layered nano-photonic device is realized according to the trained inverse neural network. The embodiment of the invention can solve the technical problem of non-uniqueness of the nano-photonics device in design, has the advantages of strong robustness, high degree of freedom, high design speed and the like, and has certain guiding significance for the design of the complex multifunctional nano-photonics device.

Description

Layered nano-photonics device design method based on multi-head series neural network

Technical Field

The invention relates to a layered nano-photonics device design method based on a multi-head series neural network, and belongs to the field of nano-photonics device design.

Background

The nano photonics devices are devices with wide application prospects, and the performances and the characteristics of the nano photonics devices are influenced by the structural parameters of the nano photonics devices. Conventional nanophotonic device design methods are generally based on trial and error, and design optimization is achieved by trying different parameter combinations multiple times, however, such methods have the problem of low efficiency, and all possible parameter combinations cannot be explored. At present, the design of the nano-photonics device assisted by the deep learning technology is widely popular, so that the design efficiency is greatly improved, but in the design process, the condition that optical responses of multiple groups of different structural outputs are nearly identical often exists, and the non-uniqueness problem causes that the traditional neural network cannot converge.

Disclosure of Invention

In view of the above, the invention provides a design method, a system, a computer device and a storage medium of a layered nano-photonic device based on a multi-head series neural network, which utilizes the multi-head series neural network to fit the mapping relation between the structural parameters and the optical response of the layered nano-photonic device, thereby realizing the auxiliary design and simultaneously solving the technical problem that the neural network cannot be trained and converged due to the non-unique mapping problem of a sample in the reverse design process; in addition, more possible combinations of structural parameters can be explored through prediction and design, thereby improving design efficiency and working performance.

The first aim of the invention is to provide a layered nano-photonics device design method based on a multi-head series neural network.

The second object of the invention is to provide a layered nano-photonics device design system based on a multi-head series neural network.

A third object of the present invention is to provide a computer device.

A fourth object of the present invention is to provide a storage medium.

The first object of the present invention can be achieved by adopting the following technical scheme:

a layered nanophotonic device design method based on a multihop series neural network, the multihop series neural network comprising a reverse neural network and a forward neural network, the method comprising:

acquiring a first data set, wherein the first data set consists of a first structural parameter of the layered nano-photonic device and a first optical response corresponding to the first structural parameter;

training the forward neural network according to the first data set, wherein during training, input data is the first structural parameter, tag data is the first optical response, and output data is a first predicted optical response corresponding to the first structural parameter;

acquiring a second data set, wherein the second data set consists of a second structural parameter of the layered nano-photonic device and a second optical response corresponding to the second structural parameter;

Training the reverse neural network according to the second data set, wherein the reverse neural network comprises an input layer, a self-attention mechanism, a second fully-connected neural network, a parallel integrated network and an output layer of a plurality of groups of prediction structure parameters corresponding to the second optical response, which are sequentially connected, and when the reverse neural network is trained, comparing the mean square error of the plurality of groups of prediction structure parameters and the second structure parameters corresponding to the plurality of groups of prediction structure parameters, taking the minimum first loss value, inputting the plurality of groups of prediction structure parameters into the trained forward neural network to obtain a plurality of groups of second prediction optical responses, comparing the mean square error of the plurality of groups of second prediction optical responses and the second optical responses trained by the input reverse neural network, taking the maximum second loss value, taking the sum of the first loss value and the second loss value as the loss value trained by the reverse neural network, continuously training the reverse neural network, and keeping the parameters of the trained forward neural network unchanged until the reverse neural network converges;

acquiring an expected optical response of a layered nano-photonic device to be designed;

based on the expected optical response, the inverse design of the layered nano-photonic device is realized according to the trained inverse neural network.

In one possible embodiment, the self-attention mechanism is configured to perform the following operations:

based on the data input by the input layer, according to W ^q 、W ^k 、W ^v The three types of weight network processing modes are linearly changed to obtain Q, K, V three types of vectors, wherein the three types of weights are adjustable, the weight network comprises a plurality of neurons, and Q, K, V are respectively query, key and value;

multiplying the Q vector by the transpose of the K vector to obtain Attention moment array Attention;

inputting Attention moment array Attention into a Softmax layer for normalization processing;

multiplying the Attention moment array Attention after normalization processing with the V vector to obtain an Attention score;

and multiplying the attention score by the data input by the input layer to obtain the data output by the attention mechanism.

In one possible embodiment, the second fully-connected neural network includes a first fully-connected network module, a second fully-connected network module, and a third fully-connected network module that are sequentially connected;

the parallel integrated network comprises a plurality of branch networks, the parameters of each branch network are the same, and each branch network comprises a fourth fully-connected network module, a fifth fully-connected network module and a sixth fully-connected network module which are sequentially connected;

The outputs of the third fully-connected network modules are connected with the inputs of the fourth fully-connected network modules in a one-to-one correspondence.

In one possible embodiment, the forward neural network comprises a first fully connected neural network comprising seven sets of seventh fully connected network modules.

In one possible embodiment, the first structural parameter is subjected to a linear normalization process before being input into the forward neural network;

the output layer is preceded by a Sigmoid layer.

In one possible embodiment, acquiring the first data set includes:

and obtaining a first data set according to a time domain finite element difference method based on the layered nano-photonic device structure of the requirement.

In one possible embodiment, obtaining the second data set includes:

and predicting and expanding to obtain a second data set according to the trained forward neural network based on the layered nano-photonics device structure of the requirement.

The second object of the invention can be achieved by adopting the following technical scheme:

a layered nanophotonic device design system based on a multi-headed series neural network, the multi-headed series neural network comprising a reverse neural network and a forward neural network, the system comprising:

The first acquisition unit is used for acquiring a first data set, wherein the first data set consists of a first structural parameter of the layered nano-photonics device and a first optical response corresponding to the first structural parameter;

the first training unit is used for training the forward neural network according to the first data set, wherein during training, input data is the first structural parameter, tag data is the first optical response, and output data is a first predicted optical response corresponding to the first structural parameter;

the second acquisition unit is used for acquiring a second data set, and the second data set consists of a second structural parameter of the layered nano-photonics device and a second optical response corresponding to the second structural parameter;

the second training unit is used for training the reverse neural network according to the second data set, wherein the reverse neural network comprises an input layer, a self-attention mechanism, a second fully-connected neural network, a parallel integrated network and an output layer of a plurality of groups of prediction structure parameters corresponding to the second optical response, which are sequentially connected, when in training, the mean square error of the plurality of groups of prediction structure parameters and the second structure parameters corresponding to the plurality of groups of prediction structure parameters is compared, the first loss value with the minimum mean square error is taken, the plurality of groups of prediction structure parameters are input into the trained forward neural network, a plurality of groups of second prediction optical responses are obtained, the mean square error of the plurality of groups of second prediction optical responses and the second optical responses trained by the input reverse neural network is compared, the maximum second loss value is taken, and the sum of the first loss value and the second loss value is taken as the loss value for training the reverse neural network, so that the trained forward neural network is continuously trained until the parameters of the reverse neural network are unchanged;

A third obtaining unit, configured to obtain an expected optical response of the layered nano-photonic device to be designed;

and the reverse design unit is used for realizing the reverse design of the layered nano-photonic device according to the trained reverse neural network based on the expected optical response.

The third object of the present invention can be achieved by adopting the following technical scheme:

the computer equipment comprises a processor and a memory for storing a program executable by the processor, wherein when the processor executes the program stored by the memory, the layered nano-photonic device design method based on the multi-head series neural network is realized.

The fourth object of the present invention can be achieved by adopting the following technical scheme:

a storage medium storing a program which, when executed by a processor, implements the layered nanophotonics device design method based on a multi-headed series neural network described above.

Compared with the prior art, the invention has the following beneficial effects:

compared with other reverse design schemes, the invention has simpler network and faster speed, and the principle of the algorithm is concise and the effect is not inferior to other algorithms for solving the non-uniqueness problem; compared with the traditional series neural network, the network provided by the invention has the advantages that although certain simplicity is sacrificed, the possibility of exploring other output results is increased, and nevertheless, the network has larger advantages in structure and complexity compared with other networks for solving the multimodal problem; in addition, the network provided by the invention introduces a self-attention mechanism, and has important help for network convergence of a local optimal solution which is easy to fall into and has higher sample data similarity; in the introduced multi-head output structure, different results are derived from the input of the same neural network, but the multi-head output structure also has independent network parts, so that the independence is considered, and the uniformity is maintained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a design architecture and a flowchart of a multi-head series neural network according to embodiment 1 of the present invention.

Fig. 2 is a flow chart of a design method of a layered nano-photonic device based on a multi-headed series neural network in embodiment 1 of the present invention.

Fig. 3 is a unit structure diagram of a layered nanophotonic device of embodiment 1 of the present invention.

Fig. 4 is a schematic diagram of the formation and action of the first loss value in embodiment 1 of the present invention.

Fig. 5 to 6 are effects showing the trained forward neural network of embodiment 1 of the present invention.

Fig. 7 to 12 are effective illustrations of solving the sample non-uniqueness problem existing in the reverse design of the layered nanophotonic device of embodiment 1 of the present invention.

Fig. 13 is a block diagram of a layered nano-photonic device design system based on a multi-headed series neural network in accordance with embodiment 2 of the present invention.

Fig. 14 is a block diagram showing the structure of a computer device according to embodiment 3 of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present application are within the scope of protection of the present application.

In the present application, the terms "first," "second," and the like are used for distinguishing between similar objects and not for describing a specified sequence or order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the application may be practiced otherwise than as specifically illustrated and described herein, and that the "first" and "second" distinguishing between objects generally being of the same type, and not necessarily limited to the number of objects, such as the first object may be one or more.

Some of the terms or terminology that appear in describing embodiments of the application are applicable to the following explanation:

A "multi-headed series neural network" is a form of network that is modified based on a series neural network. It is similar to the ideas of ensemble learning (Ensemble Learning), multi-modal learning. The core idea of the integrated learning is to reduce the prediction error by combining the prediction results of a plurality of models, and improve the robustness and generalization capability of the models. Common ensemble learning methods include Bagging, boosting and Stacking, etc. In Bagging, a plurality of models are independently trained, and prediction results are comprehensively predicted in a voting mode; in Boosting, model performance is improved by training weak classifiers step by step and weighting the strong classifiers. For non-uniqueness problems, conventional tandem networks can only guarantee convergence of the prediction structure to one of many possible outcomes. The multi-headed series neural network expands the output possibilities, thus exploring the possible more outcomes.

Referring to fig. 1, the multihead series neural network includes a reverse network N ⁱ And forward network N ^f Reverse network N ⁱ Comprising a self-attention mechanism, a second fully-connected neural network and a parallel integrated network, a forward network N ^f Including a first fully connected neural network. The parallel integrated network includes n parallel branch networks, which output n structural parameters, in the embodiment, n=5. The input and output of the first fully-connected neural network are one. A label is a tensor of 4 values, i.e. 1 row and 4 columns, and an optical response is a tensor of 301 values, i.e. 1 row and 301 columns. It should be noted that, structural parameter x of lamellar nano-photonics device is utilized ^’i As input of the network, the obtained network output is the corresponding response y under the structure ^’i Such networks are collectively referred to as "forward networks N ^f "or" Forward neural network ", such a design process is collectively referred to as" Forward design ". Using the desired optical response y ^’i As input to the network, the resulting network output is predictive photonicsStructural parameter y of the device ^’i Such networks are collectively referred to as "reverse network N ⁱ "or" reverse neural network, "such design processes are collectively referred to as" reverse design. For forward and reverse neural networks, the input data and the tag data are exactly opposite. In the forward neural network the input data is the structural parameters, the labels are the spectra, and in the reverse neural network the input data is the spectra, the labels are the structural parameters.

"nanophotonic devices" (which may be referred to as "photonic structure devices" or "nanophotonic devices", which may be simply referred to as "devices") are reverse engineered targets: the transmittance of the device is transmitted into the trained neural network to obtain single or multiple groups of parameter values of each layer of the device, so that guiding significance is provided for the design of the device, however, the simple neural network cannot be trained and converged when facing the samples of one-to-many mapping relation, and embodiments 1-4 can effectively search and learn the situation and accurately predict as many structural parameters as possible.

Example 1:

as shown in fig. 2, the present embodiment provides a layered nano-photonic device design method based on a multi-head series neural network, which may include the following steps:

s101, acquiring a first data set, wherein the first data set consists of a first structural parameter of the layered nano-photonics device and a first optical response corresponding to the first structural parameter.

In one possible embodiment, referring to FIG. 3, taking the inverse design of a layered photon structure device (target device) as an example, the cell structure of the device comprisesAnd->The layered structure is alternately presented, and 4 layers are total, and the thickness of the 4 layers can form 4 structural parameters. The number of layers is not the only number, and may be 5 to 10, or may be less than 4 or more than 10.

Prior to S101, simulation conditions of the target device are set, for example: referring to fig. 3, the light source is required to be a plane wave, and may be a bottom vertical incidence structure unit (i.e. normal incidence) or a top vertical incidence structure unit (i.e. reverse incidence), and the frequency of the light source is between 300nm and 1000nm, and it should be noted that when the incident light encounters the surface of the first layer film, part of the light is reflected back, and part of the light is transmitted. The transmitted light encounters the next film, again with some reflection and some transmission. This continues until the light passes through the last film and into the vacuum or other medium. When light passes through each film, its intensity will decrease accordingly, since there will be some absorption loss per pass through the film. Based on this, one skilled in the art can analyze and calculate the optical parameters (e.g., refractive index, conductivity, etc.) of the material based on the absorption loss, thereby obtaining an optical response; boundary conditions of the simulation interval are respectively as follows: the x-direction is Anti-symmetry, the y-direction is symmetry, and the z-direction is PML. After the light source passes through the structural unit, a transmittance monitor is disposed within a preset distance, thereby obtaining a transmittance distribution between 300nm and 1000nm, and mapping to 301 point outputs. The thickness of each layer is 100nm-300nm, and other areas are reserved as air.

After finishing the setting of the simulation conditions of the target device, arranging and combining the layers of layers according to a rule in the parameter range of each layer to obtain an initial data set, or forming the initial data set in a mode of randomly selecting the thickness of each layer in the parameter range of each layer to ensure the data density.

Wherein, in the parameter range of each layer, each layer thickness is arranged and combined according to a rule to obtain an initial data set, which can be: the four layers were arranged and combined with a value of every 40nm, and there were 6 layers ⁴ Label values for the set 1296 of initial sample data (also referred to as "third structural parameters"). After generating 1296 groups of third structural parameters, simulating the layered nano-photonic devices of each group of third structural parameters by using a time domain finite element difference method to obtain 1296 groups of transmission spectrums (also called as third optical responses) corresponding to 1296 groups of third structural parameters, and 1296 groups of third lightsThe mathematical response and 1296 sets of third structural parameters together comprise the initial dataset.

In addition, an initial data set can be obtained by utilizing a method of solving a Fresnel equation by MATLAB.

Similarly, a value obtained by each layer at intervals of 50nm is arranged and combined, and data simulation is carried out according to a time domain finite element difference method to obtain 625 groups of data sets consisting of a fourth optical response and a fourth structural parameter; this data set and the initial data set are combined into a first data set.

S102, training the forward neural network according to the first data set, wherein during training, input data is the first structural parameter, tag data is the first optical response, and output data is a first predicted optical response corresponding to the first structural parameter.

In one possible embodiment, the forward neural network comprises a first fully connected neural network. The first fully-connected neural network comprises seven groups of seventh fully-connected network modules, and the seventh fully-connected network modules consist of a group linear layer, a batch normalization layer, an activation function layer and a dropout layer.

For example, the number of weight parameters and the dropout loss rate of the first fully connected neural network are (4-300,0.5), (300-600,0.5), (600-800,0.3), (800-1200,0.3), (1200-800,0.2), (800-600,0.2), (600-301), respectively.

It should be noted that the number of weight parameters refers to the number of all trainable parameters in the network, including weights and bias terms among neurons. The greater the number of weight parameters, the greater the complexity of the network, but also the ease of overfitting. Dropout loss rate refers to how much probability each neuron is randomly discarded during the training process. By randomly discarding some neurons, the overfitting of the network can be reduced and the generalization ability can be improved.

In one possible embodiment, before S102, a linear normalization process is required for the structural parameters, and the structural parameters are normalized to a value between 0 and 1, so as to solve the problem of poor distinction between data, and the tag structural parameters in this embodiment are the same.

Specifically, the structural parameters are subjected to linear normalization processing, and the following formula is adopted:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,datathe values after the normalization are represented as such,Xrepresenting the current structural parameters of the device,X _min the parameters of the minimum structure are represented by,X _max representing the maximum structural parameter.

In one possible embodiment, training the forward neural network according to the first data set may comprise the steps of:

s1021, randomly scrambling the data in the first data set and extracting three times according to the ratio of 4:1 to obtain three groups of first training sets and three groups of first test sets.

And S1022, respectively training the forward neural network by using the three first training sets.

S1023, utilizing the three groups of first test sets to perfect the forward neural network respectively.

S1024, recording MAE evaluation index values of all networks, and selecting an optimal network.

Through a large number of experiments, the training process can ensure that the accuracy rate of the network is more than 95% and the network has certain generalization capability. Until the forward neural network training is completed.

S103, acquiring a second data set, wherein the second data set consists of a second structural parameter of the layered nano-photonic device and a second optical response corresponding to the second structural parameter.

In one possible embodiment, the second data set is obtained as: taking a value (second structural parameter) every 20nm between 100nm and 300nm, and sending into a trained forward neural network for prediction to obtain 11 ⁴ =14641 second optical responses, i.e. second data sets, wherein 80% of the second data sets were used as second training sets and the remaining 20% were used as second test sets.

S104, training the reverse neural network according to the second data set, wherein the reverse neural network comprises an input layer, a self-attention mechanism, a second fully-connected neural network, a parallel integrated network and an output layer of a plurality of groups of prediction structure parameters corresponding to the second optical response, which are sequentially connected, and when training is performed, comparing the mean square error of the plurality of groups of prediction structure parameters and the second structure parameters corresponding to the plurality of groups of prediction structure parameters, taking the minimum first loss value, inputting the plurality of groups of prediction structure parameters into the trained forward neural network to obtain a plurality of groups of second prediction optical responses, comparing the mean square error of the plurality of groups of second prediction optical responses and the second optical responses trained by the input reverse neural network, taking the maximum second loss value, taking the sum of the first loss value and the second loss value as the loss value for training of the reverse neural network, continuously training the reverse neural network, and keeping the parameters of the trained forward neural network unchanged until the reverse neural network converges.

In one possible embodiment, a self-attention mechanism is used to perform the following operations:

s11, based on the data input by the input layer, according to W ^q 、W ^k 、W ^v And the three types of weight network processing modes are changed linearly to obtain Q, K, V three types of vectors, wherein the three types of weights are adjustable, and the weight network comprises a plurality of neurons.

Illustratively, the data input by the input layer may be: an optical response size 301 (its corresponding structural parameter size 4); according to W ^q 、W ^k 、W ^v And (3) carrying out linear change processing on tensor data with the input length of 301 by three weight matrixes to obtain three groups of tensors Q, K, V, wherein Q, K, V are query, key and value respectively.

Illustratively, the weighting network is made up of linear layers, each having a number of neurons (301-301).

In the process of continuously iterating the training network, W ^q 、W ^k 、W ^v The three weight matrices require learning of the best parameters.

S12, multiplying the Q vector and the transpose of the K vector to obtain the Attention moment array Attention.

S13, inputting Attention moment array Attention into the Softmax layer for normalization processing.

In S13, 301 weight value tensors of 0-1 are obtained.

Alternatively, S12-S13 may be expressed as follows:

；

The formula is used for calculating the Attention score of each point, the obtained Attention is a weight vector after the calculation is completed, wherein the value of each point is 0-1, the sum of the values of all points is 1,representing custom parameters, which in this embodiment are set to the number of structural parameters-4; softmax represents the activation function, converting the output of the neural network into a probability distribution; q, K, V, which respectively represent query, key and value, are vectors; t represents the transpose.

S14, multiplying the Attention moment array Attention after normalization processing by the V vector to obtain the Attention score.

The Attention moment array Attention after normalization processing is multiplied by the input optical response vector, and then the weight can be given to each point. The weight value of the point needing to be noticed is large, and the weight value of the point not needing to be noticed is small, so that the input optical response is purposefully trained and learned, and the performance of the network is improved.

And S15, multiplying the attention score by the data input by the input layer to obtain the data output by the attention mechanism.

In one possible embodiment, the second fully-connected neural network includes a first fully-connected network module, a second fully-connected network module, and a third fully-connected network module connected in sequence; the parallel integrated network comprises five branch networks, the parameters of each branch network are the same, and each branch network comprises a fourth full-connection network module, a fifth full-connection network module and a sixth full-connection network module which are connected in sequence; the outputs of the third fully-connected network module are connected in one-to-one correspondence with the inputs of the fourth fully-connected network modules. All fully connected network modules consist of a group linear layer, a batch normalization layer, an activation function layer and a dropout layer.

For example, the number of weight parameters and Dropout loss rate of the second fully connected neural network are (301-800,0.3), (800-1200,0.3), (1200-1200,0.3), respectively; the number of weight parameters and the Dropout loss rate of the parallel integrated network are respectively (1200-800,0.18), (800-300,0.18) and (300-4).

In one possible embodiment, the output layer is preceded by a Sigmoid layer. Before the forward neural network is trained, the structural parameters of all samples are linearly normalized, under the condition, a Sigmoid layer needs to be added before the final output layer of the reverse neural network, the output result is fixed between 0 and 1, and the forward neural network is prevented from inputting data larger than 1, so that the prediction error of the forward neural network is avoided.

All of the above networks may be built in a deep learning framework Pytorch, and the development language may be Python.

In one possible embodiment, the mean square error of the plurality of sets of predicted structural parameters and the second structural parameters corresponding to the plurality of sets of predicted structural parameters are compared and the first loss value that is the smallest is taken as follows:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,a first loss value is indicated and is indicative of,MSErepresenting mean square error calculation +.>The n predicted structural parameters are represented,the label values corresponding to the n predicted structural parameters are represented, in the embodiment n=5.

If the network outputs only one predicted structure parameter, the network cannot approach when facing two tag structure parameters (also called "tags"); if the network outputs a plurality of groups of prediction structure parameters, and performs loss value calculation with the tag values each time, and takes the minimum value, when the network faces a plurality of tags, the plurality of groups of prediction structure parameters are respectively corresponding to the plurality of tags, so that the loss value of the network can be ensured to be minimum. When the same input is faced, but different labels exist, multiple groups of prediction structure parameters output by the network correspond to the different labels, and because the number of the multiple groups of prediction structure parameters output by the network may be larger than that of the multiple groups of labels, the situation that one label corresponds to two groups of outputs and the other label corresponds to the other label may also occur.

Referring to fig. 4, regarding the invalid case, the nearly identical input data is given to the neural network, and the multiple groups of structural parameters output by the neural network are nearly identical and fit to the tag a, but when another group of nearly identical input data is changed to the tag a, the tag B is used as the tag, and the loss value is increased at this time, so that the network cannot converge. Regarding validity, the configuration parameters of the multi-headed output must approach multiple sets of possible labels (e.g., label a and label B), respectively, so that the loss value is minimized when the loss value is calculated, thereby ensuring network convergence, and various possible prediction results can be represented. It should be noted that, the network may try various methods to converge during the training process, and in a certain attempt, the "effective situation" is tried, so that the loss value is reduced, that is, the network explores a better solution, and the lower the loss value of the network, the better the prediction effect of the network. Therefore, as long as the network can explore the better solution, the network is continuously trained, the loss value is continuously reduced, and the network convergence can be ensured.

In one possible embodiment, the mean square error of the plurality of sets of second predicted optical responses and the second optical responses input to the inverse neural network training are compared and the second loss value that is the greatest is taken as follows:

；

Wherein, the liquid crystal display device comprises a liquid crystal display device,a second loss value is indicated and is indicative of,MSErepresenting mean square error calculation +.>Representing a plurality of sets of second predicted optical responses predicted by the same and trained forward neural network, in an embodiment, five sets +.>Representing a second optical response of the input inverse neural network training.

It should be noted that the second loss value is used to ensure that the multiple groups of structural parameters output by the reverse neural network can be input into the forward neural network and obtain the same optical response, so as to ensure the "correctness" of the multiple groups of structural parameters. In the case of a unique tag value, the output uniqueness and correctness of the inverse neural network are also ensured, since by controlling the second loss value. The combined action of the first loss value and the second loss value ensures the correctness of the output of the reverse neural network and the fitting of the reverse neural network to the multi-label. Thus, the multi-head series neural network can respectively map and output possible multi-label situations existing in the sample.

In one possible embodiment, taking the sum of the first loss value and the second loss value as a loss value for training the reverse neural network, continuously training the reverse neural network and keeping parameters of the trained forward neural network unchanged until the reverse neural network converges, including:

Counter-propagating the first and second loss values as final loss values and updating the weighting parameters of the inverse neural network using an adaptive time estimation method (Adaptive Moment Estimation) including adjusting W ^q 、W ^k 、W ^v And (3) continuously iterating training until the final loss value is no longer reduced, and converging the reverse neural network. It should be noted that, during the training process of the inverse neural network, the momentum bean value in the optimization algorithm needs to be reduced at the beginning, which can be set as (0.7,0.99), and the learning rate is not set too low, such as l _r =0.01, so that it can be fully explored to avoid trapping in local optimum as much as possible, the learning rate is reduced during the stepwise training, the momentum bean value is increased, and finally the loss value of the network is smaller than 0.01 and as small as possible.

In the embodiment, the different effects of the network are controlled by using the first partial loss value and the second partial loss value, so that the accuracy of the prediction result is ensured; the meaning of the first partial loss value is that the structural parameters which are approximately similar to the training label value exist in the structural parameters which are output by the reverse network; the meaning of the loss value of the second part is that all the output structural parameters are ensured to be correct, and the optical response similar to the input optical response can be designed. But more importantly, the structure of the whole network can realize the expected prediction effect only by using a reverse network with multiple outputs and a forward network connected in series according to a double-loss value mechanism. In other words, the structure and training mechanism of the whole network are interdependent, and only if the two cooperate, the best prediction result can be obtained.

S105, acquiring the expected optical response of the layered nano-photonic device to be designed.

S106, based on the expected optical response, the inverse design of the layered nano-photonics device is realized according to the trained inverse neural network.

In S106, inverse linear normalization processing is required for the final predicted structural parameters.

Regarding experimental effects:

the predictive effect of the trained forward neural network is shown in fig. 5-6, exhibiting high accuracy and generalization capability. The forward neural network is then used to train the reverse neural network in accordance with the algorithm block diagram and the network is designed according to the requirements of the present embodiment. When the loss value of the network is less than 0.001, the training effect of the network can be checked, and parameters can be adjusted according to the effect so as to achieve the optimal reverse neural network. If the loss value is larger (larger than 0.001), the network training effect is not ideal, and the network layer number or network parameters need to be continuously modified to achieve the optimal effect. Fig. 7 shows five sets of structural parameters output by the inverse neural network. It can be seen that some of the structural parameters are the same, while others are different. This means that the same set of inputs may correspond to multiple sets of structural parameters. The five sets of structural parameters were fed into the forward neural network separately to give fig. 8, where two distinct sets of structural parameters can be clearly seen, but their optical responses are very similar. The same results are shown in fig. 9-10, which illustrate that the present embodiment has a certain capability of solving the non-unique problem existing in the sample data during the reverse design process of the layered nano-photonic device. In addition, as shown in fig. 11-12, if a certain optical response is mapped only with a unique structural parameter, all five groups of structural parameters will converge on a unique solution, so that the accuracy of output is ensured.

According to the results given by the network, it is clear that in the reverse design process of the layered nano-photonic device, the non-uniqueness problem of the sample can indeed prevent the traditional neural network from learning and converging the mapping relation. Conventional neural networks act as powerful "function fitters" that can fit basically arbitrary functions, but do not converge in the face of such nearly identical inputs, but with disparate tag data. The design method of the embodiment can solve the problem of non-uniqueness of samples in forward and reverse designs of some nano-photonics devices, can provide different design parameter schemes, and has certain heuristic significance. Under the background that the nano photonics devices are rapidly developed and the design requirements of high-performance devices are continuously increased, the design method of the embodiment has extremely strong supporting power and reference value for the design of the nano photonics devices in the future.

Those skilled in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.

It should be noted that although the method operations of the above embodiments are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all illustrated operations be performed in order to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Example 2:

as shown in fig. 13, the present embodiment provides a layered nano-photonic device design system based on a multi-head series neural network, where the system may include a first acquisition unit 1301, a first training unit 1302, a second acquisition unit 1303, a second training unit 1304, a third acquisition unit 1305, and an inverse design unit 1306, and specific functions of the respective units are as follows:

a first obtaining unit 1301, configured to obtain a first data set, where the first data set is composed of a first structural parameter of the layered nano-photonic device and a first optical response corresponding to the first structural parameter;

a first training unit 1302, configured to train the forward neural network according to the first data set, wherein during training, input data is the first structural parameter, tag data is the first optical response, and output data is a first predicted optical response corresponding to the first structural parameter;

a second obtaining unit 1303, configured to obtain a second data set, where the second data set is composed of a second structural parameter of the layered nano-photonic device and a second optical response corresponding to the second structural parameter;

the second training unit 1304 is configured to train the inverse neural network according to the second data set, where the inverse neural network includes an input layer of the second optical response, a self-attention mechanism, a second fully-connected neural network, a parallel integrated network, and an output layer of a plurality of groups of predicted structural parameters corresponding to the second optical response, and compare mean square errors of the plurality of groups of predicted structural parameters and the second structural parameters corresponding to the plurality of groups of predicted structural parameters and take a first loss value with minimum mean square errors, input the plurality of groups of predicted structural parameters into the trained forward neural network to obtain a plurality of groups of second predicted optical responses, compare the plurality of groups of second predicted optical responses with a mean square error of the second optical response trained by the input inverse neural network and take a second loss value with maximum mean square errors, and continuously train the inverse neural network by taking a sum of the first loss value and the second loss value as a loss value for training the inverse neural network until the inverse neural network converges;

A third acquiring unit 1305 for acquiring a desired optical response of the layered nanophotonic device to be designed;

and the reverse design unit 1306 is used for realizing the reverse design of the layered nano-photonic device according to the trained reverse neural network based on the expected optical response.

Example 3:

as shown in fig. 14, the present embodiment provides a computer apparatus, which may include a processor 1402, a memory, an input device 1403, a display device 1404, and a network interface 1405, which are connected through a system bus 1401. The processor 1402 is configured to provide computing and control capabilities, and the memory includes a nonvolatile storage medium 1406 and an internal memory 1407, where the nonvolatile storage medium 1406 stores an operating system, a computer program, and a database, and the internal memory 1407 provides an environment for the operating system and the computer program in the nonvolatile storage medium 1406 to run, and when the computer program is executed by the processor 1402, the method for designing a layered nano-photonic device based on a multi-head serial neural network of the above embodiment 1 is implemented as follows:

Example 4:

the present embodiment provides a storage medium, which is a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the method for designing a layered nanophotonics device based on a multi-headed series neural network of the above embodiment 1, as follows:

The computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present embodiment, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable storage medium may be written in one or more programming languages, including an object oriented programming language such as Java, python, C ++ and conventional procedural programming languages, such as the C-language or similar programming languages, or combinations thereof for performing the present embodiments. The program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

In summary, the invention is a scheme for reverse design by utilizing the multi-head series neural network for the first time, and can solve the non-unique problems of high degree of freedom and high sensitivity of structural parameters of the layered nano-photonics device. The method provided by the invention is not only suitable for layered nano-photonics devices, but also suitable for other nano-photonics devices with one-to-many non-uniqueness problems in the reverse design process. Compared with the empirical design paradigm of the structure of the nano-photonics device based on fewer parameters at present, the method provided by the invention has important guiding significance for the design of the complex multifunctional nano-photonics device. On the basis, the invention has wide application prospect.

The above-mentioned embodiments are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can make equivalent substitutions or modifications according to the technical solution and the inventive concept of the present invention within the scope of the present invention disclosed in the present invention patent, and all those skilled in the art belong to the protection scope of the present invention.

Claims

1. A layered nano-photonic device design method based on a multi-headed series neural network, characterized in that the multi-headed series neural network comprises a reverse neural network and a forward neural network, the method comprising:

2. The method of claim 1, wherein the self-attention mechanism is configured to perform the following operations:

based on the data input by the input layer, according to W ^q 、W ^k 、W ^v Three kinds of weight network processing modes are linearly changed to obtain Q, K, V three kinds of vectors, wherein the three kinds of weights are adjustableThe node, the weight network comprises a plurality of neurons, W ^q 、W ^k 、W ^v A query weight matrix, a key weight matrix, and a value weight matrix, respectively, and Q, K, V is a query, a key, and a value, respectively;

3. The method of any of claims 1-2, wherein the second fully-connected neural network comprises a first fully-connected network module, a second fully-connected network module, and a third fully-connected network module connected in sequence;

4. The method of claim 1, wherein the forward neural network comprises a first fully-connected neural network comprising seven sets of seventh fully-connected network modules.

5. The method of claim 1, wherein the first structural parameter is subjected to a linear normalization process prior to the first structural parameter being input into the forward neural network;

the output layer is preceded by a Sigmoid layer.

6. The method of claim 1, wherein acquiring the first data set comprises:

7. The method of claim 1, wherein obtaining the second data set comprises:

8. A layered nanophotonic device design system based on a multi-headed series neural network, the multi-headed series neural network comprising a reverse neural network and a forward neural network, the system comprising:

9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the method of any of claims 1-7 when executing the program stored by the memory.

10. A storage medium storing a program, which when executed by a processor, implements the method of any one of claims 1-7.