CN117217150A - DTCO formula modeling method based on genetic algorithm symbolic regression - Google Patents

DTCO formula modeling method based on genetic algorithm symbolic regression Download PDF

Info

Publication number
CN117217150A
CN117217150A CN202311177038.1A CN202311177038A CN117217150A CN 117217150 A CN117217150 A CN 117217150A CN 202311177038 A CN202311177038 A CN 202311177038A CN 117217150 A CN117217150 A CN 117217150A
Authority
CN
China
Prior art keywords
ast
expression
fitness
formula
gen
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311177038.1A
Other languages
Chinese (zh)
Other versions
CN117217150B (en
Inventor
李斌
黄奕铭
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202311177038.1A priority Critical patent/CN117217150B/en
Priority claimed from CN202311177038.1A external-priority patent/CN117217150B/en
Publication of CN117217150A publication Critical patent/CN117217150A/en
Application granted granted Critical
Publication of CN117217150B publication Critical patent/CN117217150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a DTCO formula modeling method based on genetic algorithm symbolic regression, and relates to collaborative optimization of process design in chip research and development. Generating a random-scale AST expression population according to the technological parameters and the electrical characteristic curve data of the semiconductor device, calculating the fitness of AST expressions in the population, and evaluating the fitness to distinguish a target individual group from an alternative individual group; performing tournament selection, crossing and mutation operation on two individual groups through genetic iteration to generate a generalized AST expression group, and then performing the Osmer razor principle simplification and reduction on the expression length of a new group, wherein the process gradually optimizes the fitness of the newly generated group and the interpretability of the expression; and sharing the process information, and selecting a target AST expression as a Spice Model according to the fitness and the formula length. The invention can accurately map the relation between the technological parameters and the electrical performance, improves the accuracy and the efficiency of chip design, and is effectively applied to the device design and optimization process.

Description

DTCO formula modeling method based on genetic algorithm symbolic regression
Technical Field
The invention relates to the technical field of process design collaborative optimization in chip research and development, in particular to a DTCO formula modeling method based on genetic algorithm symbolic regression.
Background
The traditional Spice Model modeling method is usually based on the physical structure of a device in academic research, deduces and solves through algebraic calculation and rules based on a first sexual principle, focuses on the physical meaning of a Model, and is complex in solving process. In contrast, the Spice Model modeling in engineering mostly adopts template matching, and the matched template formula is selected and parameters are finely adjusted through data fitting, so that practicability and convenience are emphasized.
As the feature size of semiconductor devices enters deep nano nodes, design Technology Collaborative Optimization (DTCO) method is attracting attention because it can consider chip design requirements in process development, and improve chip design efficiency and performance. In this process, efficient and accurate Spice Model becomes a key to improving feedback efficiency.
However, the existing novel Spice Model alternative Model method, such as adopting a neural network, greatly accelerates the modeling process, but is difficult to be completely integrated into the DTCO flow system due to the unintelligible and unexplained black box Model.
Symbol learning is a powerful machine learning technique, and by analyzing the relationship between data, a mathematical expression with physical meaning can be automatically extracted from the data without manual intervention. Compared with a black box model, the modeling method based on the symbolic regression is easier to understand and explain and has wider application prospect. Symbol learning can also quickly explore different model structures and parameter combinations, so that the optimization process of the model is accelerated, and the model can reach the optimal performance more quickly.
In the DTCO process, efficient and accurate Spice Model is crucial, and symbol learning can just meet this requirement. The method not only has the characteristics of high efficiency and high precision, but also maintains easy understanding and easy interpretation, and is more suitable for being integrated into a DTCO flow system at the present stage. Through symbol learning, a designer can better understand the physical mechanism behind the model, make more accurate decisions, and further improve the efficiency and performance of chip design. Therefore, symbol learning has wide development prospect in the Spice Model of DTCO, and a powerful and reliable modeling method is provided for chip design and process optimization.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides the DTCO formula modeling method based on genetic algorithm symbolic regression, which can more rapidly characterize a semiconductor device and has the characteristics of high model precision and high modeling efficiency.
The invention discloses a DTCO formula modeling method based on genetic algorithm symbolic regression, which comprises the following steps:
s1, acquiring technological parameters and electrical characteristic curve data of a semiconductor device, and taking the technological parameters and the electrical characteristic curve data as training and verification materials;
s2, carrying out parallel computing processing on the training and verifying materials to generate a plurality of processes taking the training and verifying materials as genetic populations;
s3, randomly initializing genetic populations of all the processes to generate random-scale AST expression populations containing operators, constants and variables;
s4, carrying out fitness calculation on AST expressions in the random-scale AST expression population, and carrying out fitness evaluation on the fitness to identify a target individual group and an alternative individual group;
s5, guiding the candidate individual groups to carry out genetic iteration through the target individual groups so as to generate a target AST expression population formed by the target individual groups;
s6, sharing information of all the processes, and selecting a target AST expression from a target AST expression population of all the processes according to the fitness and the formula length to be used as a Spice Model for representing the process parameters and the electrical characteristic curve relationship of the semiconductor device.
With further improvement, in step S5, the genetic iteration is specifically:
acquiring AST expressions of optimal fitness in the target individual group, selecting and copying tree structure components of the AST expressions of the optimal fitness, and performing tournament selection, mutation operation and cross operation on the AST expressions and AST expressions in the candidate individual group to generate a generalized AST expression population; and performing adaptive reduction processing on the formula length of the AST expression in the generalized AST expression population to obtain a new AST expression group, and performing fitness calculation and fitness evaluation on the new AST expression group in the step S4.
Further, the adaptive reduction process is specifically performed by performing lazy evaluation and clipping on the screened AST expression.
Still further, the lazy evaluation is specifically:
gradually downwards calculating the value of the screened AST expression from the root node of the random abstract syntax tree; in the calculation process, if the current data can be obtained without the current branch calculation of the screened AST expression, skipping the current branch;
the cutting is specifically as follows:
setting an expression complexity threshold according to the oxmer razor criterion; and deleting branches, nodes or subtrees exceeding the expression complexity threshold in the AST expression generated in the lazy evaluation if the complexity of the AST expression generated in the lazy evaluation exceeds the expression complexity threshold.
Further, if the difference between the optimal fitness of the target AST expression population and the optimal fitness before the plurality of iterations is smaller than a set threshold after the target individual population guides the candidate individual population to perform the plurality of genetic iterations, dynamically adjusting the hyper-parameters in the genetic algorithm according to the convergence rate of the new AST expression population fitness.
Still further, the super parameters include the number of AST expressions selected for participation by the tournament, the number of randomly substituted symbols for the mutation operation, the cross point location of the crossover operation, and an expression complexity threshold;
the dynamic adjustment of the super parameter is specifically as follows:
dynamically adjusting the number of AST expressions selected for participation by the tournament each time according to the convergence rate; if the convergence rate is smaller than the first set value, increasing the number of AST expressions selected to participate in the tournament, otherwise, reducing the number of participation;
adaptively adjusting the number of random substitution symbols of the abrupt change operation according to the convergence rate; if the convergence rate is smaller than a second set value, increasing the number of the random substitution symbols of the mutation operation, otherwise, reducing the number of the random substitution symbols of the mutation operation;
adjusting the intersection point position of the intersection operation according to the convergence rate; if the convergence rate is smaller than a set value III, increasing the number of the crossing point positions of the crossing operation, otherwise, keeping the original selection of the crossing point positions of the crossing operation;
adaptively adjusting the expression complexity threshold according to the convergence rate; if the convergence rate is smaller than the set value four, the expression complexity threshold is increased, otherwise, the expression complexity threshold is decreased.
Further improved, in step S4, the fitness of the AST expression is calculated according to the following formula:
wherein SSE represents the sum of squares of residuals of AST expressions on the training dataset; SST represents the total sum of squares of AST expressions over the training dataset; r is R 2 Indicating the fitness, the effective judgment interval is [0,1]And the closer the fitness is to 1, the better the fitting ability of the AST expression is represented.
Further improved, in step S3, the genetic population is randomly initialized by the following formula,
P i ={G 1 ,G 2 ,...,G N };
wherein P is i Representing individual processes; g i Each expression combination representing random initialization, N representing the number of individuals in each process。
Further improvements are made, the tournament selection formula is:
wherein G is i Each AST expression combination representing random initialization; p (G) i ) Represents G i Probability of being selected as a tournament participant; k represents the super parameter of the number of AST expressions participating in the tournament; n is the number of individuals in each process;
the mutation operation formula is as follows:
New_Gen=Δ(Original_Gen);
wherein New_Gen represents a New gene value obtained after mutation operation; original_gen represents the Original gene value of an AST expression individual at a certain gene position; delta represents the operation of a preset AST expression gene in a random substitution number of symbols of a certain segment.
The crossing operation is as follows:
Children_Gen 1 =Parent_Gen 1 ×α+Parent_Gen 2 ×(1-α);
Children_Gen 2 =Parent_Gen 1 ×(1-α)+Parent_Gen 2 ×α;
in Children_Gen 1 And children_gen 2 Genes of two new expression individuals generated by crossover operations; parent_Gen 1 And Parent_Gen 2 Genes of two father AST expression individuals; alpha is a hyper-parameter representing the position of the intersection, determining the position of the formula node where the two parent AST expressions are swapped.
In a further improvement, in step S6, the selecting a target AST expression is selected in such a way that an AST expression with the largest fitness value and the shortest formula length of AST expressions in the target population is selected as a target AST expression.
Advantageous effects
The invention has the advantages that:
1. compared with the traditional model derivation and template matching method based on the physical structure, the method realizes an efficient modeling process through a genetic algorithm and a symbolic regression technology, and simultaneously maintains the accuracy of the model. This enables higher quality Spice models to be generated more quickly in the DTCO flow. Compared with a black box model, the mathematical expression generated by the invention has clear physical meaning and can better explain the behavior of the model.
In addition, the invention can automatically learn and generate the mathematical expression conforming to the chip characteristics from the example data by a genetic iterative processing mode, thereby reducing the need of manual intervention and improving the modeling efficiency and accuracy. The symbolic regression method enables the generated model to be easier to understand and interpret, and can better meet the interpretability requirement of the model in the DTCO flow compared with black box models such as a neural network and the like. The method can flexibly adapt to different devices and process conditions, so that the generated Spice Model has better generalization capability, and is suitable for various chip design and process development scenes.
2. By automatically adjusting the super parameters and dynamically adjusting the genetic algorithm parameters, the invention can quickly optimize the model and accelerate the model to converge to the optimal performance. By adopting the multi-process asynchronous parallelization technology, the multi-core CPU can be fully utilized in the execution process of the genetic algorithm, and the execution efficiency of the algorithm is improved.
Drawings
FIG. 1 is a flow chart of a DTCO formula modeling method of the present invention;
fig. 2 is a schematic diagram of an abstract syntax tree of a preset AST expression of the present invention;
FIG. 3 is a flow chart of clipping an AST expression based on the Olympic razor criteria of the present invention;
FIG. 4 is a schematic view of the clipping of the present invention.
Detailed Description
The invention is further described below in connection with the examples, which are not to be construed as limiting the invention in any way, but rather as falling within the scope of the claims.
As shown in fig. 1, the invention provides a DTCO formula modeling method based on genetic algorithm symbolic regression, which comprises the following steps:
s1, acquiring technological parameters and electrical characteristic curve data of a semiconductor device, and taking the technological parameters and the electrical characteristic curve data as training and verification materials.
Wherein, the process parameter and the electrical characteristic curve data of the semiconductor device can be obtained by one of the following three methods. The three methods are respectively as follows:
the method comprises the steps of compiling a circuit netlist and obtaining a semiconductor device model data set by utilizing Spice simulation software.
And secondly, device modeling is carried out through TCAD software, and a semiconductor device model data set is obtained through simulation extraction.
And thirdly, measuring the semiconductor device through a measuring instrument to obtain a semiconductor device model data set.
The specific operation of the first method is as follows: and (3) writing a circuit netlist, importing the circuit netlist into Spice software by using a python script, and carrying out circuit solving in batch by combining a compact model of a short-channel insulated gate field effect transistor group of the university of berkeley commonly used in industry to obtain model data of each device.
The technological parameters and the electrical characteristic curve data of the semiconductor device to be verified are the contents of a written circuit netlist, and a data set is established based on an ideal expression describing the performance of the NMOS device in a linear region in a BSIM compact model. The compact model formula is:
the specific operation of the second method is as follows: and carrying out device simulation by using TCAD based on model data issued by the IRDS community by using the python script, and extracting corresponding device model data sets in batches.
S2, introducing a multi-machine multi-process asynchronous parallelization technology for carrying out parallel computing processing on the training and verifying materials so as to generate a plurality of processes taking the training and verifying materials as genetic populations, and playing a role in improving the execution efficiency of a genetic algorithm.
Multi-machine multi-process asynchronous parallelization is a parallel computing method, and multiple tasks can be simultaneously executed in different processing units, so that the computing efficiency is improved. The concept of multi-process asynchronous parallelization can be expressed by the following symbolic formula:
P={P 1 ,P 2 ,…,P n };
Q={1P,2P,…,kP}。
wherein P represents a plurality of parallel process groups; p (P) i Representing individual processes; n represents the number of processes; q refers to a machine group of a plurality of running process groups; iP denotes each individual machine; k represents the number of processes. This embodiment uses 10 machines, each with 5000 parallel process groups.
These processes can perform asynchronous AST expression generation tasks simultaneously without having to wait for other processes to complete AST expression generation of the population and iterate new AST expression exploration. This approach is very useful when it involves a large number of computation and time-consuming tasks, distributing each discrete expression population computation effort to multiple processing units, thereby speeding up overall computation, making full use of multi-core resources to traverse the correct AST expression form of the Spice Model without gradients.
S3, randomly initializing genetic populations of each process according to a certain probability, and generating random-scale AST expression populations containing operators, constants and variables.
In this embodiment, the superparameter presets for the randomly initialized genetic population are shown in the following table.
Wherein, the formula for randomly initializing the genetic population of each process is as follows:
P i ={G 1 ,G 2 ,…,G N }。
wherein P is i Representing individual processes; g i Representing random initialisationEach expression combination, N, represents the number of individuals in each process. The present embodiment employs a random initialization of 10000 AST expression numbers per parallel process group.
To speed up the gradient-free traversal search efficiency, the symbolic set search space of the Abstract Syntax Tree (AST) in this embodiment is reduced. The operators, constants, contain the following:
constant bin = { "e", "pi", "mu", "epsilon", "T ox ″,″W″,″L″,...}。
Operators include, but are not limited to, addition, subtraction, multiplication, division, power, root, trigonometric functions, exponential operations. Constants include μ representing mobility; epsilon represents the dielectric constant; t (T) OX Representing the oxide layer thickness; w represents the gate width; l represents the gate length; v (V) T Representing the threshold voltage. The use of vectorized string representation avoids the duplication of storing identical subtrees, reduces memory footprint, and enables the algorithm to handle larger-scale data.
More specifically, the operators, constants are:
operator library = { "+", "-", "," "-", "+" };
constant bin = { "μ", "ε", "T = {" and% ox ″,″W″,″L″,″1/2″,″V T ″}。
Variables of the abstract syntax tree are bias variables and electrical characteristic curve data of the obtained device, and the variables comprise V GS 、V DS Voltage bias variable conditions.
S4, carrying out fitness calculation on AST expressions in the random-scale AST expression population, and carrying out fitness evaluation on the fitness to identify a target individual group and an alternative individual group. Namely, individuals with high fitness can be distinguished through the evaluation of fitness, and the set of individuals is the target individual group. Further, after the respective individual groups are selected, expression simplification processing is required for the target individual group and the candidate individual group. For example, an AST expression in a population itself has a target formula of x+1, but when it is generated, for example, x+1+ (x-x) + (1-1), it becomes tedious to generate an expression, and therefore, each time a new expression is generated, simplification processing is required, so that the AST expressions in the target individual population and the candidate individual population are the simplest expressions.
Wherein, the algorithm formula for calculating the fitness of each AST expression in the generalized AST expression population is as follows:
wherein SSE represents the sum of squares of residuals of AST expressions on a training dataset and is used for measuring statistics of model fitting degree; SST represents the sum of the total squares of AST expressions over the training dataset for statistics that measure the degree of variation of the dependent variables; r is R 2 Indicating the fitness, the effective judgment interval is [0,1]And the closer the fitness is to 1, the better the fitting ability of the AST expression is represented. In the actual implementation process, if the fitness is higher than 0.9999999999, the data noise error is considered to be eliminated, all process iterations are stopped at this time, and the optimal AST expression is output.
S5, guiding the candidate individual groups to carry out genetic iteration through the target individual groups so as to generate a target AST expression population formed by clustering the target individual groups. By utilizing the target individual group to guide the genetic iteration of the candidate individual group, the target AST expression group is formed, and the adaptability of AST expressions in the group is improved.
Specifically, the genetic iteration is to obtain an AST expression with optimal fitness in a target individual group, select and copy a tree structure composition of the AST expression with optimal fitness, and perform tournament selection, mutation operation and cross operation on the AST expression with AST expression in an alternative individual group to generate a generalized AST expression group, so that the effect of improving the highest fitness of the AST expression in the group is achieved. And carrying out adaptive reduction processing on the formula length of the AST expressions in the generalized AST expression population to form a new AST expression group. The adaptive reduction process may prevent the length of the expression from growing longer and longer during the iteration process. The adaptation calculation and adaptation evaluation as described in step S4 is entered for the new AST expression group.
For the generalized AST expression population, it is also necessary to retain AST expressions in the random-scale AST expression population into the candidate individual population to promote inheritance of the preset trait in the population genetic iteration process. In order to accelerate convergence of a symbol learning algorithm and meet the deduction process in actual model deduction and template matching, AST expressions and abstract syntax trees in a random-scale AST expression population are shown in fig. 2, and specific expressions are as follows:
(V Gs -V T )·V Ds -V Ds 2
the tournament selection formula of the AST expression in the genetic algorithm is as follows:
wherein P (G) i ) Represents G i Probability of being selected as a tournament participant; k represents the super parameter of the number of AST expressions participating in the tournament; n is the number of individuals in each process.
The mutation operation formula is:
New_Gen=Δ(Original_Gen)。
wherein New_Gen represents a New gene value obtained after mutation operation; original_gen represents the Original gene value of an AST expression individual at a certain gene position; delta represents the operation of random substitution number symbols of a preset AST expression gene in a certain segment, and is a preset super parameter.
The crossover operation is as follows:
Children_Gen 1 =Parent_Gen 1 ×α+Parent_Gen 2 ×(1-α);
Children_Gen 2 =Parent_Gen 1 ×(1-α)+Parent_Gen 2 ×α。
in Children_Gen 1 And children_gen 2 Genes of two new expression individuals generated by crossover operations; parent_Gen 1 And Parent_Gen 2 Genes of two father AST expression individuals; alpha is a hyper-parameter representing the position of the intersection, determining the position of the formula node where the two parent AST expressions are swapped.
The adaptive reduction processing is specifically implemented by performing lazy evaluation and clipping on the screened AST expressions.
Among these, lazy assessment is specifically: the values of the screened AST expressions are computed step by step down, starting from the root node of the AST. In the calculation process, for some branches, if the current data can obtain a result without passing through the branch calculation, the calculation of the branches can be skipped, so that the calculation time is saved. Specific calculations are performed on certain sub-tree branches only when these branches are necessary, i.e. the current data needs to be computed through these branches.
As shown in fig. 3, the specific cutting method comprises the following steps: after lazy evaluation, the AST expressions of each individual become partially computed, some subtrees of which may not be accessed or computed. The complexity metrics, such as the number of nodes of the tree, the number of branches, etc., are calculated for each individual AST expression. This may help to quantify the complexity of each individual. According to the Olympic razor criterion, an expression complexity threshold superparameter is set for controlling the complexity of the formula. If the AST expression complexity generated in the lazy evaluation exceeds an expression complexity threshold, a clipping operation may be performed. Clipping may be achieved by deleting some branches, nodes, or subtrees, thereby reducing overall complexity. The necessary subtree branches closely related to the data fit are preserved based on lazy evaluation while less relevant or redundant expression branches are deleted.
As shown in fig. 4, with the generated AST expressionFor example, two large branch subtrees are determined after lazy evaluation, where subtree one of the AST expression is +.>The node number is 19; the second sub-tree is +>The number of nodes is 14. Clipping the subtrees with the expression complexity threshold value larger than or equal to 12, comparing the fitness values of the two large branch subtrees, and finally retaining the expression subtrees related to data fittingAnd entering the next round of iterative searching.
Lazy evaluation reduces unnecessary computational overhead, especially when dealing with complex expression trees, and increases computational efficiency significantly. Lazy evaluation helps to accelerate fitness calculation by calculating only the portion relevant to the current data, thereby optimizing the genetic algorithm-based symbolic regression process, making the modeling method more efficient and feasible. By clipping the expression according to the Olympic razor criterion in each iteration, the complexity of the model can be effectively controlled, the over-fitting problem is avoided, and the generalization capability of the model is improved. This approach can make the modeling process more rational, resulting in a more interpretative and interpretable model.
However, in the genetic iteration process of the individual groups, the optimal fitness of the groups is not improved after a plurality of rounds of iteration. For this phenomenon, we can dynamically adjust the hyper-parameters of the next round of genetic algorithm according to the convergence rate of fitness to ensure that the algorithm has robustness and high adaptability. In the following iteration, the corresponding AST expression is operated on and regenerated according to the super-parametrically adjusted steps S4 and S5. The regenerated AST expression gradually evolves, and the optimal individual has higher fitness until a target AST expression population is finally formed, wherein the target individual contains the individuals meeting the fitness threshold condition.
Based on a preset super-parameter set, according to the convergence rate of the AST expression adaptability in genetic iteration, the super-parameters are dynamically and adaptively adjusted, and the specific adjustment of the super-parameters is as follows:
(1) The number of AST expressions involved in each tournament selection can be dynamically adjusted according to the convergence rate of AST expression fitness. If the convergence rate is slower, the number of tournament participants can be increased to speed up the evolution process. Conversely, if the convergence rate is faster, the number of participation can be moderately reduced to avoid premature trapping into local optimum.
(2) The number of random substitution symbols for the abrupt operation may be adaptively adjusted based on the convergence rate of the degree of adaptation of the AST expression. If the convergence rate is slower, the number of alternatives can be moderately increased to increase exploratory. If the convergence rate is faster, the number of alternatives can be reduced to maintain convergence stability.
(3) The intersection point position of the intersection operation can be adjusted according to the convergence rate of the AST expression fitness. If the convergence rate is slower, more different junction locations may be tried to increase diversity. If the convergence rate is faster, the intersection location can be selected more conservatively to stabilize convergence.
(4) The expression complexity threshold may be adaptively adjusted according to the convergence rate of the AST expression fitness. If the convergence rate is slower, the threshold can be moderately increased to preserve more complex structures. If the convergence rate is faster, the threshold can be moderately lowered to speed up the model simplification.
The superparameter set in this example is shown in the table below.
S6, sharing information by all processes, and selecting an AST expression with highest fitness and the simplest formula from all processes as a Spice Model for representing the technological parameter and electrical characteristic curve relation of the device.
The method for sharing information by each process comprises the following steps:
(1) Shared memory: creating a shared object and controlling synchronous access among multiple processes, so as to ensure correct sharing and updating of optimal solution information among the processes.
(2) Vectorized character string: in a genetic algorithm-based symbolic regression process, a batch of AST expressions can be computed together, thus utilizing the SIMD (single instruction multiple data) instruction set of modern computers, processing multiple operations simultaneously.
(3) And (3) optimizing a memory: to reduce memory usage, the use of string representations avoids the duplication of storing identical sub-trees. When constructing an AST expression, the same subtrees can share the same character string without repeated storage, so that the memory usage is reduced.
By combining the methods, efficient multi-process parallel computation, inter-process communication, vectorization computation and memory optimization can be realized, so that computing resources are fully utilized in the modeling process, the efficiency is improved, and the memory occupation is reduced. By the method, the optimal expression is found in the genetic generation 329 of the 48 th thread of the 2 nd machine, the adaptability of the optimal expression of the thread is 0.9999999999034169, and the optimal expression is:
the DTCO formula modeling method based on genetic algorithm symbolic regression effectively balances high efficiency and accuracy and rapidly generates a high-quality Spice Model; the generated mathematical expression has clear physical explanation, which is helpful for engineers to understand the model behavior; the method has the advantages that manual intervention is not needed, a proper model is automatically learned and generated from example data, and modeling efficiency and accuracy are improved; compared with a black box model, the generation model is easy to understand and interpret; the method is suitable for different devices and process conditions, and has stronger generalization capability; automatically adjusting the super parameters and dynamic genetic algorithm parameters, and rapidly optimizing the performance of the model; the multi-core CPU is fully utilized, and the algorithm execution efficiency is improved; the vectorized character string is used for representing, so that repeated storage is avoided, memory occupation is reduced, and larger-scale data are processed.
While only the preferred embodiments of the present invention have been described above, it should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these do not affect the effect of the implementation of the present invention and the utility of the patent.

Claims (10)

1. A DTCO formula modeling method based on genetic algorithm symbolic regression is characterized by comprising the following steps:
s1, acquiring technological parameters and electrical characteristic curve data of a semiconductor device, and taking the technological parameters and the electrical characteristic curve data as training and verification materials;
s2, carrying out parallel computing processing on the training and verifying materials to generate a plurality of processes taking the training and verifying materials as genetic populations;
s3, randomly initializing genetic populations of all the processes to generate random-scale AST expression populations containing operators, constants and variables;
s4, carrying out fitness calculation on AST expressions in the random-scale AST expression population, and carrying out fitness evaluation on the fitness to identify a target individual group and an alternative individual group;
s5, guiding the candidate individual groups to carry out genetic iteration through the target individual groups so as to generate a target AST expression population formed by the target individual groups;
s6, sharing information of all the processes, and selecting a target AST expression from a target AST expression population of all the processes according to the fitness and the formula length to be used as a Spice Model for representing the process parameters and the electrical characteristic curve relationship of the semiconductor device.
2. The DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 1, wherein in step S5, the genetic iteration is specifically:
acquiring AST expressions of optimal fitness in the target individual group, selecting and copying tree structure components of the AST expressions of the optimal fitness, and performing tournament selection, mutation operation and cross operation on the AST expressions and AST expressions in the candidate individual group to generate a generalized AST expression population; and performing adaptive reduction processing on the formula length of the AST expression in the generalized AST expression population to obtain a new AST expression group, and performing fitness calculation and fitness evaluation on the new AST expression group in the step S4.
3. The DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 2, wherein the adaptive reduction process is specifically lazy evaluation and clipping of the screened AST expression.
4. A DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 3, characterized in that the lazy evaluation is specifically:
gradually downwards calculating the value of the screened AST expression from the root node of the random abstract syntax tree; in the calculation process, if the current data can be obtained without the current branch calculation of the screened AST expression, skipping the current branch;
the cutting is specifically as follows:
setting an expression complexity threshold according to the oxmer razor criterion; and deleting branches, nodes or subtrees exceeding the expression complexity threshold in the AST expression generated in the lazy evaluation if the complexity of the AST expression generated in the lazy evaluation exceeds the expression complexity threshold.
5. The DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 4, wherein if the difference between the optimal fitness of the target AST expression population and the optimal fitness before a plurality of iterations is smaller than a set threshold after the target individual population guides the candidate individual population to perform a plurality of genetic iteration processes, the super-parameters in the genetic algorithm are dynamically adjusted according to the convergence rate of the new AST expression population fitness.
6. The DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 5, wherein the super parameters include the number of AST expressions involved in tournament selection, the number of randomly substituted symbols for mutation operation, the cross point position for crossover operation and expression complexity threshold;
the dynamic adjustment of the super parameter is specifically as follows:
dynamically adjusting the number of AST expressions selected for participation by the tournament each time according to the convergence rate; if the convergence rate is smaller than the first set value, increasing the number of AST expressions selected to participate in the tournament, otherwise, reducing the number of participation;
adaptively adjusting the number of random substitution symbols of the abrupt change operation according to the convergence rate; if the convergence rate is smaller than a second set value, increasing the number of the random substitution symbols of the mutation operation, otherwise, reducing the number of the random substitution symbols of the mutation operation;
adjusting the intersection point position of the intersection operation according to the convergence rate; if the convergence rate is smaller than a set value III, increasing the number of the crossing point positions of the crossing operation, otherwise, keeping the original selection of the crossing point positions of the crossing operation;
adaptively adjusting the expression complexity threshold according to the convergence rate; if the convergence rate is smaller than the set value four, the expression complexity threshold is increased, otherwise, the expression complexity threshold is decreased.
7. The DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 1, wherein in step S4, the calculation formula of the fitness of the AST expression is:
wherein SSE represents the sum of squares of residuals of AST expressions on the training dataset; SST represents the total sum of squares of AST expressions over the training dataset; r is R 2 Indicating the fitness, the effective judgment interval is [0,1]And the closer the fitness is to 1, the better the fitting ability of the AST expression is represented.
8. The method of modeling a DTCO formula based on symbolic regression of a genetic algorithm according to claim 1, wherein in step S3 the genetic population is randomly initialized by the following formula,
P i ={G 1 ,G 2 ,...,G N };
wherein P is i Representing individual processes; g i Each expression combination representing random initialization, N representing the number of individuals in each process.
9. A DTCO formula modeling method based on genetic algorithm symbolic regression as claimed in claim 2, wherein the tournament selection formula is:
wherein G is i Each AST expression combination representing random initialization; p (G) i ) Represents G i Probability of being selected as a tournament participant; k represents the super parameter of the number of AST expressions participating in the tournament; n is the number of individuals in each process;
the mutation operation formula is as follows:
New_Gen=Δ(Original_Gen);
wherein New_Gen represents a New gene value obtained after mutation operation; original_gen represents the Original gene value of an AST expression individual at a certain gene position; delta represents the operation of random substitution number symbols of a preset AST expression gene in a certain segment;
the crossing operation is as follows:
Children_Gen 1 =Parent_Gen 1 ×α+Parent_Gen 2 ×(1-α);
Children_Gen 2 =Parent_Gen 1 ×(1-α)+Parent_Gen 2 ×α;
in Children_Gen 1 And children_gen 2 Is produced by cross operationGenes of two new expression individuals; parent_Gen 1 And Parent_Gen 2 Genes of two father AST expression individuals; alpha is a hyper-parameter representing the position of the intersection, determining the position of the formula node where the two parent AST expressions are swapped.
10. The DTCO formula modeling method based on genetic algorithm symbolic regression according to claim 1, wherein in step S6, the selection mode of the selected target AST expression is to select an AST expression with the largest fitness value and the shortest formula length of AST expressions in the target population as the target AST expression.
CN202311177038.1A 2023-09-13 DTCO formula modeling method based on genetic algorithm symbolic regression Active CN117217150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311177038.1A CN117217150B (en) 2023-09-13 DTCO formula modeling method based on genetic algorithm symbolic regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311177038.1A CN117217150B (en) 2023-09-13 DTCO formula modeling method based on genetic algorithm symbolic regression

Publications (2)

Publication Number Publication Date
CN117217150A true CN117217150A (en) 2023-12-12
CN117217150B CN117217150B (en) 2024-05-17

Family

ID=

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210278827A1 (en) * 2020-03-09 2021-09-09 Board Of Trustees Of Michigan State University Systems And Method For Dimensionally Aware Rule Extraction
CN114840873A (en) * 2022-04-08 2022-08-02 华南理工大学 Symbolic regression method based on federal genetic programming
CN115329269A (en) * 2022-07-01 2022-11-11 四川大学 Differentiable genetic programming symbol regression method
CN115543556A (en) * 2022-09-01 2022-12-30 华南理工大学 Adaptive symbolic regression method based on multitask genetic programming algorithm
CN116434887A (en) * 2023-03-28 2023-07-14 中国科学院长春应用化学研究所 Material formula design and performance prediction method
CN116702678A (en) * 2023-08-02 2023-09-05 华南理工大学 DTCO optimization method integrating deep learning and swarm intelligence algorithm
CN117313620A (en) * 2023-10-17 2023-12-29 华南理工大学 DTCO formula modeling method based on multitask deep learning symbolic regression

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210278827A1 (en) * 2020-03-09 2021-09-09 Board Of Trustees Of Michigan State University Systems And Method For Dimensionally Aware Rule Extraction
CN114840873A (en) * 2022-04-08 2022-08-02 华南理工大学 Symbolic regression method based on federal genetic programming
CN115329269A (en) * 2022-07-01 2022-11-11 四川大学 Differentiable genetic programming symbol regression method
CN115543556A (en) * 2022-09-01 2022-12-30 华南理工大学 Adaptive symbolic regression method based on multitask genetic programming algorithm
CN116434887A (en) * 2023-03-28 2023-07-14 中国科学院长春应用化学研究所 Material formula design and performance prediction method
CN116702678A (en) * 2023-08-02 2023-09-05 华南理工大学 DTCO optimization method integrating deep learning and swarm intelligence algorithm
CN117313620A (en) * 2023-10-17 2023-12-29 华南理工大学 DTCO formula modeling method based on multitask deep learning symbolic regression

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RANDALL, DL ET AL.: "Bingo: A Customizable Framework for Symbolic Regression with Genetic Programming", PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022, 1 September 2023 (2023-09-01) *
于鹏;: "基于遗传规划的符号回归方法在电能质量分析中的新应用", 电子设计工程, no. 07, 5 April 2013 (2013-04-05) *
盛文韬: "基于多目标优化的符号回归泛化性能研究", 中国优秀硕士学位论文库 信息科技辑, 15 January 2022 (2022-01-15) *

Similar Documents

Publication Publication Date Title
US20180260714A1 (en) Global optimization, search and machine learning method based on the lamarckian principle of inheritance of acquired characteristics
Bingham et al. Discovering parametric activation functions
CN106250461A (en) A kind of algorithm utilizing gradient lifting decision tree to carry out data mining based on Spark framework
KR20060044772A (en) Using tables to learn trees
Neumann Computational complexity analysis of multi-objective genetic programming
CN113222165A (en) Quantum line optimization method based on genetic algorithm
Bi et al. Make heterophily graphs better fit gnn: A graph rewiring approach
Liu et al. Bandit-based random mutation hill-climbing
CN114841106A (en) Integrated circuit optimization method and system based on rule-guided genetic algorithm
Kawamura et al. A hybrid approach for optimal feature subset selection with evolutionary algorithms
CN114548414A (en) Method, device, storage medium and compiling system for compiling quantum circuit
CN117217150B (en) DTCO formula modeling method based on genetic algorithm symbolic regression
Farooq Genetic algorithm technique in hybrid intelligent systems for pattern recognition
Bernard et al. Stochastic L-system inference from multiple string sequence inputs
CN112487110A (en) Overlapped community evolution analysis method and system based on network structure and node content
CN117217150A (en) DTCO formula modeling method based on genetic algorithm symbolic regression
CN111352650A (en) Software modularization multi-objective optimization method and system based on INSGA-II
CN116432125A (en) Code classification method based on hash algorithm
Xie et al. Scalenet: Searching for the model to scale
Chen et al. Clustering without prior knowledge based on gene expression programming
US10776548B1 (en) Parallel Monte Carlo sampling for predicting tail performance of integrated circuits
Wu et al. An improved genetic algorithm based on explosion mechanism
Zhang et al. Optimization of the NSGA-III Algorithm Using Adaptive Scheduling.
CN113191486B (en) Graph data and parameter data mixed dividing method based on parameter server architecture
Yang et al. Optimization of classification algorithm based on gene expression programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant