WO2007041966A1

WO2007041966A1 - Method and system for designing a composite sample library

Info

Publication number: WO2007041966A1
Application number: PCT/CN2006/002691
Authority: WO
Inventors: Xinlei Hua; Xichen Feng
Original assignee: Accelergy Shanghai R & D Center Co., Ltd
Priority date: 2005-10-13
Filing date: 2006-10-13
Publication date: 2007-04-19
Also published as: CN1948559A; CN100558948C

Abstract

A method for designing a composite sample library is provided, through which we can make the best of the existing knowledge or given hypothesis to reduce the times of sample experiment or to aggrandize the effective information from the sample experiment of fixed times. The method includs the following steps: (1) providing multi-components composing the sample; (2) providing variable for each component, the variable getting value at certain intervals; (3) setting at least one constraint condition for at least one variable; (4) producing fake sample; (5) detecting the fake sample to make sure whether it’s a conforming sample; (6) repeating steps (4) and (5), until getting at least one conforming sample. The method of the invention can avoid the system deviation of design, and is effective and accurate.

Description

Design method and system for combined sample library

Technical field

The present invention relates to an efficient test method - a high throughput experimental method, and more particularly to the field of design of a combined sample library therein. Background technique

Many of the material's properties, such as thermal conductivity, luminosity, catalytic activity, etc., can be exploited using composite material discovery methods and systems to identify new materials or to optimize existing materials. The current combinatorial research method searches through the grids in the sample space, 'comparing a large number of samples with brute force, and then screening these samples according to the desired characteristics. However, this method hardly takes into account the known empirical knowledge of the relevant components of the sample. Even with this empirical knowledge in mind, there is no suitable way to design a sample library that is sufficiently randomized in the sample space.

Therefore, it is necessary to develop a new combined experimental design system and method to effectively integrate empirical knowledge into the sample library design, and samples of samples outside this empirical knowledge should be completely random to avoid human factors. interference. In addition, we need to know which samples are to be synthesized, and what is the total number of representative samples. Summary of the invention

It is an object of the present invention to provide a new combined experimental design system and method for efficiently integrating empirical knowledge into the design of a sample library.

The present invention provides a method for integrating empirical knowledge to design a combined sample library. Specifically, this risk knowledge can be embodied as components of the sample, variables associated with the components, and constraints of these variables.

The invention also provides a method of designing a sample library comprising the following steps:

(1) providing components in the target sample;

(2) Set variables for each component;

(3) setting at least one constraint for the above variables;

(4) generating a pseudo sample library;

(5) selecting qualified samples in the pseudo sample that meet the constraint conditions;

In one embodiment, the constraint of the variable is the relationship between variables determined by experience or previously known knowledge. In yet another embodiment, the empirical knowledge is a physical or chemical natural law represented by a variable.

-1- Confirmation In an embodiment, the pseudo sample can be generated by random sampling. In one embodiment, the random sampling described above is performed by employing a set of variables, each of which corresponds to a component and randomly takes values within a certain interval. An example of random sampling includes Monte Carlo simulations. In yet another embodiment, the random values are generated by a random number generator. In yet another embodiment, the random values are related to a probability distribution or probability density. The probability distribution is a uniformly distributed niform distribution or a non-uniform distribution. Among them, the non-uniform distribution includes the Bernoulli distribution, the beta distribution, the Chi-square distribution, the exponential distribution, the F distribution, the gamma distribution, the Gaussian distribution, Normal distribution (eg lognormal, multivariate normal distribution and univariate normal distribution), non-central X-square distribution, non-central F distribution, binomial distribution, negative binomial distribution, polynomial distribution, Pare Pareto distribution, Poisson distribution, student t distribution, and Tsallis distribution. The probability distribution includes a uniform distribution, a normal distribution, and a Gaussian distribution.

Yet another aspect of the present invention provides a method of obtaining a desired number of samples in a sample library, the method comprising the steps of:

(1) providing components constituting a sample;

(2) setting a variable for each component;

(3) setting at least one constraint for the variable;

(4) providing the required number of samples;

(5) generating a random pseudo sample;

(6) Determine whether the pseudo sample is a qualified sample according to whether the variable of the pseudo sample satisfies the constraint condition.

(7) Repeat steps (5) and (6) until the number of qualified samples reaches the required number.

In still another aspect, the present invention provides a method of measuring the optimal number of samples that need to be designed and/or synthesized, the method comprising the steps of:

(1) providing components constituting a sample;

(2) Set a variable for each component;

(3) Set at least one constraint for the variable;

(4) Providing the required segmentation for each variable in a given interval;

(5) generating a pseudo sample;

(6) selecting a pseudo sample that satisfies the constraint condition as a qualified sample;

(7) Determine the qualified sample ratio by dividing the number of qualified samples by the number of pseudo samples;

(8) calculating the number of samples; (9) Determine the optimal number of samples, wherein the optimal number of samples can be calculated by multiplying the number of samples by the qualified sample ratio.

The method further includes the step of determining a ratio of qualified samples divided by the number of pseudo samples to obtain a qualified sample ratio.

In another aspect, the invention also provides a computer product comprising computer software. Once the computer software is running, the methods and calculations of the present invention can be performed. E.g. The computer software can perform random sampling. Brief introduction

Figure 1 is a schematic representation of a two-component sample produced by the Monte Carlo simulation method, each sample consisting of cerium (Ce) and iron (Fe). All the dots (including hollow, gray, and color) in the figure represent pseudo-samples consisting of uniformly distributed independent and randomly generated enthalpy variables and iron variables; there is no constraint between the 铈 variable and the iron variable. The gray dots and black dots in the figure represent pseudo samples that satisfy the first constraint. The black dots represent pseudo samples that satisfy both the first and second constraints (detailed reference to Example 1).

Figure 2 is a three-dimensional view of a four-component sample produced by the Monte Carlo simulation method, each sample consisting of ruthenium (Ce), iron (Fe), tungsten (W), and nickel (Ni). All points represent pseudo-samples randomly distributed independently of each of the four variables without any constraint therebetween.

Figure 3 is a schematic illustration of the pseudo sample in Figure 2 that satisfies the first constraint.

Figure 4 is a schematic illustration of the dummy sample of Figure 3 further satisfying the second constraint.

Figure 5 is a graphical user interface (GUI) that allows a user to design a multi-component sample library that provides a variable, a range of variables, and a desired segmentation for each component.

Figure 6 is a graphical user interface given after selecting a component and corresponding variables.

Figure 7 is a graphical user interface that allows a user to specify one or more constraints on a variable. The graphical user interface shown in Figure 8 allows the user to choose 1) whether to perform the Monte Carlo simulation method; 2) how to perform the Monte Carlo simulation method.

The graphical user interface shown in Figure 9 allows the user to enter the specified number of samples to be obtained and the specified number of components for the input sample.

The graphical user interface shown in Figure 10 allows the user to specify each component.

The graphical user interface shown in Figure 11 allows the user to define constraints using variables.

The graphical user interface shown in Figure 12 allows the user to specify constraint tolerances. detailed description

The present invention relates to a design strategy for a combined sample library to be designed, synthesized, Screen and measure the sample library.

One aspect of the invention provides a method of designing a sample library comprising providing a plurality of components of a sample. The term "combined sample library" as used herein refers to a collection comprising a plurality of samples, "sample," refers to a material comprising a plurality of components. "component" refers to a substance, such as an element, a molecule, a compound, A substance, a mass, etc., or a combination of these shields.

In one embodiment of the invention, a sample comprises n different components, d, C ₂ , C ₃ ... Ci...C _n , where n is an integer and refers to the amount of different components in the sample. The mass of each component Ci is expressed as MWi, where ie {0, l, 2 ... n}, the composition number in the sample is expressed as, and the corresponding composition ratio is expressed as . Mass MWi refers to the molecular weight or atomic weight of the component. The so-called composition quantity refers to the number of the ith component in the sample, so the sample can be expressed as (Οι) χι(Ο ₂ ) Χ ₂ ...(Ο χί...(Ο _η ) _Χη , where i≡{ 0, 1, 2, · · η}, the composition ratio can be characterized as the relative weight of one component in the sample, which can be expressed by Equation 1:

∑Μ^. χ Χ _{; The} composition number Xi can also refer to the molar ratio of the ith component in the sample. In this case, the composition ratio can also be expressed as the mole fraction of a component in the sample, which is between 0 and 1, which can be defined by the following formula 2:

Composition ratio may be further expressed as a percentage of a component of the sample, its value between 0% and 100% ₀

In any sample of the library, the sum of the total composition ratios of all components is 1. As shown in Equation 3: ∑R; =1 Equation 3

i=l

For example, the glucose molecule C ₆ H ₁₂ 0 ₆ can be considered as a sample containing three components: carbon (C), hydrogen (H), and oxygen (0), each component having a composition number, such as C. It is 6, H is 12, and 0 is 6. The mass of material (MW) of each component can be derived from the mass of each atom, C is 12, H is 1 and O is 16. Therefore, the (weight) composition ratio of C is 0.4 or 40%, (12*6/(12*6+1 * 12+16*6)); H is 0.067 or 6.7%; 0 is 0.533 or 53.3%. The sum of the composition ratios of the components is 1. Another feature of the combined sample library is that each sample in the sample library consists of the same type of components, but these components have different composition ratios.

A method of designing a combined sample library provided by another aspect of the invention includes providing a variable for each component of the multi-component sample. In other words, the variable corresponds to the component in the sample. It is assumed that the variable V is a random value in the interval [v _min , v _max ], where v _{min is} not less than 0, V _{max is} not more than 1, and v _min < v _max . In an embodiment, the interval is [0, 1]. If the variable V is assumed to be a value in the discrete interval {V^Vz · . V _x }, then V can be discrete, where the discrete value falls within the interval [v _min , V _max ] (eg [0, 1]) in. If V is a random value of the interval [v _min , v _max ], it may be a continuous value. When a variable is related to a component without any constraint or independent of other variables, the setting of the random value of the first variable is not subject to the assumption of the second variable. If the variables are continuous, the process of setting a random value from the interval of the same variable depends on the probability or probability density of the possible values of the variable. If the variable is discrete, the setting of the random value depends on the particular probability of the respective discrete value in the interval.

For example, Vi is a variable of the first component Q, and Vj is a variable of the component second Cj. Vi can be set to a random value in the [Vi, _min , Vi, wake-up] interval, Vj can be set to a random value in the [Vj, _min , Vj, _max ] interval, and the Vi value is independent of Vj. When Q, Cj are components in a sample consisting of C ₂ , ... Q, ... Cj, ... C _n , where I, j G {0, 1, 2 ... n}, The synthetic variable Vi becomes the composition ratio of the component Ci, and the synthetic variable Vj becomes the composition ratio Rj, and the sum of the variables of all the components in the sample satisfies the following formula 4:

∑V _; =1 formula 4

i=l

Another aspect of the invention consists in: providing or setting at least one constraint for at least one variable in the sample in the provided method of designing the combined sample library. The term "constraint" refers to the condition of at least one variable or the relationship between variables. In particular, a constraint is a constraint that a variable or variables in a sample must satisfy. In other words, a set of variables {Vi} in a valid or qualified sample must satisfy at least one constraint or a specific set of constraints. For example, assume that the sample includes components d, C ₂ ... C _n , and each component Q has a variable Vi, where ie {0, 1, 2 ... n}, then, in the effective sample, the component The sum of the variables must satisfy the following constraints, Equation 5:

1-Δ<∑ν _; < 1+Δ Equation 5

i=l

Where Δ is the error (such as the constraint tolerance or the constraint deviation), and Δ is the value that varies between 0 and 0.2. In In a preferred embodiment, Δ is 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, or 0.10. When Δ approaches 0, the constraint approximates the equation as shown in Equation 5.

In addition, in the sample library design, empirical knowledge about the components of the sample can be embodied as a relationship between multiple variables, which can also be understood as a constraint. In other words, we can achieve the relationship between several components in empirical knowledge by setting constraints. For example, according to previous experience, in a sample consisting of d, C ₂ ... C _n , perhaps the ratio of the composition ratio of Q to 2 should be 2:1, where {0, 1, 2 ... n}, je {0, 1, 2 ...n}, and i≠j. At this time, in addition to the inherent constraints shown in Equation 4, the variables of the components of an effective sample must satisfy the second constraint Vi: Vj = 2: l. For another example, the sum of the composition ratios to achieve Ci and Cj is X, where X is a value between 0 and 1. Here, the variable of the component of an effective sample must satisfy the second constraint Vi + Vj = x, where i, j G {0, 1, 2 ... n} and i ≠ j.

The method of designing the combined sample library provided by the present invention involves generating a pseudo sample. "Pseudo-sample" refers to a multi-component hypothetical sample, each of which has an independent variable such that any assignment of one variable is an independent event arbitrarily assigned relative to another variable and all variables of the pseudo-sample are Not subject to any restrictions. In other words, its variables may or may not satisfy the constraints. For example, a pseudo sample includes d, C ₂ ... Q, ... Cj, ... C _n , where i, je {0, 1, 2 . . . 11} and 1 ≠", V _; and Vj Is a random value in the interval [0,1], Vi may take a value between [0, 1], and Vj may take another value between [0, 1]. The sum of the values of these variables does not need to meet the requirements of Equations 4 and 5.

A pseudo sample may or may not correspond to a real physical sample, and a pseudo sample is a sample point that has no independent, assumed variable values. Therefore a large number of pseudo samples constitute a sample space.

Pseudo samples can be generated using random sampling. The random sampling method is a method of generating a sample point by randomly assigning a value to each component of a sample point. The method of generating random values in the interval [V _min , V _max ] is clear, where V _{min is} not less than 0 and V _{max is} not greater than 1. Please refer to Carter's "Generation and Application of Random Numbers (Fourth Dimensions)", Vol. XVI, 1994). The algorithms and calculation programs for random number generators are well known in the field of computer science. Please refer to DE Knuth's Art-Semi-Numerical Algorithms for Computer Programs (Vol. 2, Addison-Wesley Second Edition, 1981) Published ("The Art of Computer Programming - Seminumerical Algorithms" Vol. 2, 2 ^nd Ed. Addison- Wesley, 1981); Press, et al., "Numerical Recipes: The Art of Art"("Numerical Recipes: The Art of Scientific Computing "Cambridge University Press, 1986 ; and" Numerical Recipes (FORTRAN) ", page 1 1- ^225, l ^988); and SL Anderson" written in the vector supercomputer systems and other advanced random number generator ("Random Number Generators on Vector Supercomputers and Other Advanced Architectures". SIAM Rev,, 32:221-225, 1990). The randomly generated value is the occurrence of an event or a possible assigned variable (V) in a particular interval [v _min , v _max ], where the probability of occurrence of the event depends on the probability density or probability distribution of the variable. Therefore, the variable is further defined by the probability function assigning the value contained in the variable interval. For example, a discrete variable can be defined by assigning a correlation probability to each discrete value in the interval. Continuous variables can be defined by assigning a probability distribution to intervals that include all possible values of the variable.

The probability distribution used here refers to the arrangement of the values of the variables that reflect the frequency of their observations or theoretical occurrences. Probability distributions well known in the art include balanced distributions and non-equilibrium distributions. Unbalanced distributions include Bernoulli distribution, beta distribution, X-square distribution, exponential distribution, F-distribution, gamma distribution, Gaussian distribution, normal distribution (eg, lognormal, multivariate normal distribution, and univariate positive) State distribution), non-central X-square distribution, non-central F distribution, binomial distribution, negative binomial distribution, polynomial distribution, Pareto distribution, cypress distribution, Student's t distribution, Salis distribution, and above distribution Any combination.

In one embodiment, the random variables are assigned by a non-equilibrium probability distribution, for example, including a normal distribution, a Poisson distribution, and a Gaussian distribution. In another embodiment, the randomly generated value of the variable is assigned or related to the equilibrium distribution, and therefore, the random variable of the equilibrium distribution can be the same in the interval [V _min , V _max ] (V _min > 0, V _max < 1) Probability determines any random value.

One of ordinary skill in the art will recognize that non-equilibrium distributed random values (or numbers) can be generated by a random number generator (e.g., a linear superimposed generator). The general formula for this linear superimposed generator is Vi = (aVi.! + c) mod m, where a, c and m are pre-set constants, a is the multiplier value, c is the increment, and m is the coefficient. Please refer to Park and Miller's "Random Number Generators: Good Ones are Hard to Find", Comm. ACM 31: 1192-1201, 1988). The random number generator includes the "A Very Fast Shift-Register Sequence Random Number Generator" by Kirkpatrick and Stoll (Journal of Computational Physics 40: 517-526, 1981). The described sequence of transfer registers. Furthermore, the random number generator also includes a quasi-random number generator, please refer to Press and Teukolsky's "Quasi Random Numbers" (Computers in Physics 3: 76-79, 1989).

Non-equalized random values, such as normal or Gaussian random values, can also be generated by methods well known in the relevant art. Please refer to Rubinstein's Simulation and Monte Carlo Method (Rubinstein, "Simulation and the Monte Carlo Method" published by John Wiley & Sons, 1981). One of the methods includes a transformation function, such as the well-known Boks Moeller conversion, to convert the equilibrium distribution random variable into a new set of non-equilibrium distribution random variables (for example, Gaussian or normal distribution), please refer to Boks Moeller Box & Muller, "A Note on the Generation of Random Deviates", Annals Math. Stat. 29:610-611, 1958 In an embodiment, the random sampling method comprises a Monte Carlo method or simulation. The term "Monte Carlo method" or "Monte Carlo simulation" here refers to a random sampling method used to study a problem and obtain an approximation of the probability of solving the problem. In particular, the term "Monte Carlo method" or "Monte Carlo simulation" as used herein refers in particular to the process of generating a random event (such as a randomly occurring value of any given variable). This process is usually done by computer algorithms, which are repeated multiple times, and all test results are analyzed and calculated to provide an approximate solution. For Monte Carlo simulations, please refer to the Monte Carlo Method of Mitropolis and Uram (Journal of the American Statistical Association 44: 335-341, 1949) (Metropolis and Ulam, "The Monte Carlo Method", Journal of American Statistical Association 44: 335-341 1949 ); Sherbert's "Monte Carlo Method" (Sobol, "The Monte Carlo Method", The University of Chicago Press, 1974); Mooney's Monte Carlo Simulation ( Mooney, "Monte Carlo Simulation", Sage University Paper, 1997).

The Monte Carlo method is constantly evolving in the field. For example, the method is initially applied to values estimated by throwing darts on standard coordinates (a circumference circumscribed by a square). Through a lot of experiments, it was found that the number of hitting circles and squares of the dart is proportional to the circumferential area and the square area, respectively, and has considerable precision. Correspondingly, the ratio of the number of times the dart hits the circumference and the square is similar to the fraction of the ^ value, please refer to Ross "The First Lesson in Probability" (Ross, "A First Course in Probability" ^2nd Edition, Macmillan, 1976).

As another example, Monte Carlo simulation can be applied to estimate the following integral formula 6: Equation 6

In this example, there is a range box around the function V(x), and the integral of V(x) can be understood as the portion of the range box at V(x). If the selection of points in the range box is random and non-uniform, the probability that the point is in V(x) is determined by the area of the area occupied by V(x) in the box. The Monte Carlo simulation then generates a large number of random points (random occurrence values) in the box and calculates the number of points in V(x) to obtain the area. As a result, the integral of Equation 6 can be expressed as Equation 7 below:

N« - C formula 7

B

Where A is the number of points in V(x), Β is the number of all points produced in the box, and C is the area of the range box. In addition, the ratios Α/Β and V(x) are related in proportion to the area occupied by the range box.

Another example of Monte Carlo simulation involves generating random variable values that are emphasized by empirical knowledge. For example, empirical knowledge about components (or component ratios) requires assigning variables with different probability densities (continuous variables) in different specific intervals, or requiring that variables be assigned different values at different values. (discrete variable). Another example of Monte Carlo simulation includes Markov chain operations. A Markov operation is a sequence of random values whose probability of occurrence of each event depends on the value generated at the previous moment. Please refer to Frank and Smith's "Understanding Molecular Simulation: From Algorithm to Application" (Frenkd & Smith, "Understanding Molecular Simulation: From Algorithm to Applications" Academic Press, 1996).

Another aspect of the invention relates to a method of selecting a qualified sample from a pseudo sample. The term "qualified sample" as used herein refers to a pseudo sample produced by the method described in the present invention, the variable satisfies one or more specific constraints, and the pseudo sample of the non-conforming sample is referred to as a non-conforming sample. In one embodiment of the present invention, the dummy samples produced (e.g., Monte Carlo simulation) by random sampling method, i.e. a large number of tests, in the interval _{_{[V min, V max] (}} V min> 0 and V _max <1) Press Uniformly distributed randomly generated values are assigned to several component variables to produce pseudo samples that are not subject to any constraints. Each pseudo sample is examined (inspected), for example, using a computer algorithm to determine if it meets a particular constraint or constraints. A pseudo sample that satisfies the constraint is selected and stored as a qualified sample. At the same time, the values associated with each eligible sample are recorded as a vector and associated with the component ratio to synthesize and design a qualified sample in the sample library, since in the qualified sample, the values are the groups. Ratio.

Another aspect of the invention provides a method of producing a given number of samples in a sample library. The method includes the following steps:

(1) providing a number of components that make up the sample;

(2) assigning each component a variable;

(3) Set at least one constraint for the variable;

(4) Provide the required number of samples;

(5) generating a pseudo sample;

(6) if the variable of the pseudo sample satisfies the constraint condition, determining that the pseudo sample is a qualified sample;

Another aspect of the present invention discloses a method of calculating a ratio of acceptable samples, where the term "qualified sample ratio (Rq _S )," means that in a random sampling method, a variable satisfies one or more constrained pseudo samples. In one embodiment, the acceptable sample ratio (R _qs ) can be estimated by dividing the number of qualified samples (Nq _S ) in the random sampling method by the number of pseudo samples (N _ps ) (Equation 8).

Nqs

Rqs« Equation 8

Nps When N _ps increases, the calculation accuracy becomes smaller, and the change rule follows the following formula 9: Accuracy ~ 士法9 where N is the number of random simulations (such as Monte Carlo). When a large number of Monte Carlo simulation tests were performed, as 1/N continued to decrease, the variation in the acceptable sample ratio decreased and the accuracy increased. In other words, when a sufficient number of tests are performed, the ratio of acceptable samples can achieve a relatively high degree of accuracy and accuracy. For example, for a constraint 1-eight ≤ 3⁄4 ≤ 1 + eight, Monte Carlo simulation of the sample (the more V0, the more accurate

i=l

high.

In one embodiment, in a Monte Carlo test operation using a random number generator generated by Microsoft's random number generator (C++ compiler version 7.3.1091, 2003), it is observed that a qualified sample is obtained (eg,

100%; when N _qs reaches 10, its accuracy is between -30% and 30%; when Nq _S reaches 10 ² , its accuracy is between -10% and 10%; when N _qs reaches 10 ³ , Its accuracy is between -3% and 3%; when N _QS reaches 10 ⁴ , its accuracy is between -1% and 1%.

In another aspect, the invention also discloses a method of estimating the optimal number of acceptable samples. The term "optimal number of qualified samples" herein refers to the number of samples that satisfy a particular one or more constraints and that properly represent the sample space.

In one embodiment of the invention, the optimal number of qualified samples is obtained by detecting all possible pseudo samples produced by discrete variables and identifying pseudo samples that satisfy a particular constraint. For discrete variables, a set of discrete values is generated by dividing (uniformly or non-uniformly) the interval [Vi, _min , Vi, _max ] into M parts or cells to a specific variable 1⁄4, thereby generating a set of defined values for the interval. . If the interval is evenly divided into M parts, the discrete value of the variable can be any value in }, where Vi=Vi, _min + * (Vi, -Vi, _min )/Mi, 1 {0,1, 2,. . . M } M is a positive integer and can be any number between 1 and 1,000,000. In one embodiment, ME{ 1, 2 3, 4, 5, 6, 7 8, 9, 10, 11, 12, 13, 14, 15, 16 17, 18, . . . .30. . .40 . . 50 . .10 ² .10 ³ . .10 ⁴ . . 10 ⁵ . 10 ⁶ }

If the variable is only set to a discrete value in the interval, the total number of pseudo samples (Z) can be generated by the number of specific grids of the variable. For example, in any sample containing components d, c ₂ ... c _n , each component Q has a variable Vi, where ie {0, 1, 2.·.η }, each 1⁄4 is separated into Mi Part or lattice or point, so Vi takes a set of discrete values in the interval [V _min , V _max ] (V _MIN > 0 and V _max < l). This separation is determined based on our experience with the variables corresponding to the components, which can be either segmented or non-uniform segments. If you do not consider or provide constraints for variables, based on a set of {MJ pseudo samples total The quantity (Z) represents the sample point of the n-dimensional sample space, and Z can be derived by the following formula 10: Mi Equation 10

When at least one constraint is given, all pseudo samples can be detected, a pseudo sample whose variables satisfy the constraint is selected and stored in the vector to form a qualified sample set. When all sample spaces (the entire Z pseudo samples) are detected, the number of eligible samples, i.e., the number of vectors described above, is the optimum number of eligible samples in the sample space corresponding to a given set of discrete values.

If the number of components (or variables) is increased and the number of segments per component is also increased, a full search through the sample space can become quite cumbersome. For example, for a sample library consisting of five components and each specific interval of each component variable is divided into 100 cells, the sample space is a five-dimensional space, and the total number of all pseudo samples (Z) is 100 ⁵ (or 10 ^1G , 10,000,000,000). When one or more constraints are introduced, the calculation becomes more complicated. Although it is possible to identify whether each of the pseudo samples satisfies the one or more constraints, another method may be employed to provide an approximation of the optimal values by performing random sampling as described herein (e.g., Monte Carlo simulation). estimate.

Thus, in one embodiment, we generate a pseudo sample by random simulation, detecting whether the pseudo sample satisfies the beam, and obtaining a qualified sample and a qualified sample ratio according to the method of the present invention. In this stochastic simulation, the random number can be generated based on a set of discrete values, where each discrete value is located in the interval assumed by the variable and has a certain probability. The random number may also be any value corresponding to a certain interval having a specific probability distribution. Therefore, the optimal number of qualified samples depends on the product of Z and R _qs . It is expected that the optimal number of eligible samples will vary depending on a set of parameters. Examples of this parameter include the number of variable segments (M) in the sample, the method of generating random numbers, the statistical distribution of the variables, the Monte Carlo simulation, the number of Monte Carlo tests, the variable constraints, the tolerance limits, and the required accuracy or Precision.

It will be appreciated that the generation of random numbers associated with probability distributions (e.g., Monte Carlo simulations) associated with probability distributions, the selection and calculation of qualified samples made in accordance with the methods provided herein, are typically performed by computer systems or server systems.

A computer system (e.g., a server system) in the present invention refers to a computer or computer readable medium that is designed and configured to perform some or all of the methods described herein. The computer (e.g., server) employed herein can be any of a variety of general purpose computers, such as personal computers, network servers, workstations, or other computer platforms that are currently or in the future. As is well known in the art, computers include, in particular, some or all of the components such as processors, operating systems, computer memories, input devices, and output devices. The computer can further include, for example, a cache, a data backup Unit and some other equipment. One of ordinary skill in the art will appreciate that these computer components can have many other possible configurations.

A processor as used herein may include one or more microprocessors, domain programmable logic arrays, or one or more specialized integrated circuits corresponding to a particular application. For example, processors include, but are not limited to, Intel's Pentium series processors, S-Chip's microprocessors, Sun's workstation system processors, Motorola's personal desktop processors, MIPS Technologies' MIPs processors. Xilinx's highest range of domain programmable logic arrays and other processors.

The operating system employed herein includes machine code that, through execution of the processor, coordinates and performs functions of other portions of the computer, and assists the processor in performing functions of various computer programs that may be written in various programming languages. In addition to managing the data flow in other parts of the computer, the operating system also provides scheduling, input and output control, file data management, memory management and communication control, and related services, all of which are prior art. Typical operating systems include Windows operating systems such as Microsoft Corporation, Unix or Linux operating systems from a variety of vendors, additional or future operating systems, and combinations of these operating systems.

The computer memory used herein can be any of a variety of different types of memory storage devices. Examples include random access memory, magnetic media storage such as permanent hard disks or tapes, optical shields such as reading and writing laser discs, or other access storage devices. The memory storage device can be any existing or future development device, including a compact disc drive, a tape drive, a removable hard drive, or a disk drive. These types of memory storage devices are typically read from or written to a computer program storage medium, such as an optical disk, magnetic tape, removable hard disk or floppy disk. All of these computer program storage media can be considered a product of a computer program. The products of these computer programs typically store computer software programs and/or data. Computer software programs are typically stored in system memory and/or memory storage.

It will be readily apparent to those skilled in the art that the computer software program of the present invention can be executed by loading it into a system memory and/or memory storage device using some type of input device. On the other hand, all or part of the software program may also be present in a read only memory or similar memory storage device, such device not requiring the software program to be loaded first through the input device. One of ordinary skill in the relevant art will appreciate that the software program, or portions thereof, can be loaded by the processor into system memory or a cache or a combination of both in an existing manner to facilitate performing and performing random sampling.

Further, the data obtained by the formulation and process design using the combined sample library design method of the present invention can be directly input into the computer control system of the material processing apparatus, so that the material processing apparatus performs sample preparation based on the data. The substance handling device may be the applicant's international patent application The substance treatment disclosed in PCT/CN2005/002177, "Material Treatment Apparatus and Application thereof" may also be a reaction system disclosed in the applicant's Chinese Patent Application No. 200510029727.3, and may also be the applicant's Chinese patent application. The parallel reaction system disclosed in No. 200610085162.5 may also be the reaction system disclosed in the Chinese patent application "Reaction System" filed by the applicant on September 30, 2006, or may be applied by the applicant on September 30, 2006. The reactor disclosed in the Chinese patent "Reactor" and the like.

In one embodiment of the invention, the software is stored in a computer server that is coupled to the user terminal, input device, or output device via a data line, wireless line, or network system. As is well known in the art, network systems include hardware and software that are electrically coupled together in a computer or device. For example, the network system may include the Internet, 10/1000 Ethernet, Electrical and Electronic Engineering Association 802.11x, Electrical and Electronic Engineering Association 1394, xDSL, Bluetooth, LAN, WLAN, GSP, CDMA, 3G, PACS or any other ANSI recognized standard medium. Based on the equipment.

Further, on the one hand, the researcher can access the computer server to design the recipe and process through the above network system anywhere; on the other hand, the researchers in the ground can access the computer server through the above network system for formulation and process. Design, B researchers can access the computer server through the above network system to obtain the formulation and process design data, thus achieving collaborative research in different regions, facilitating centralized management of experimental equipment, enabling R&D and implementation in different regions. For details, refer to the applicant's Chinese Patent Application No. 200610100921.0 "Computer Aided Graphical Experimental Design System and Method".

Further, the data obtained from the formulation and process design using the combined sample library design method of the present invention can also be stored on a server for sharing by different researchers. For example, data sharing can be done through the Portal system designed by the applicant. Researchers can also communicate through the Portal system and use the combined sample library design method of the present invention for formulation and process design. For details, please refer to the Chinese Patent Application No. 200610100921.0, "Computer Aided Graphical Experimental Design System and Method".

The invention has been described above generally, and further exemplified by the specific embodiments of the invention.

Example 1

This example shows how to select a qualified sample from a pseudo sample consisting of two components (tantalum and iron) produced by Monte Carlo simulation. The variable V _{Ce of}铈 takes a value between 0 and 1, and the variable V _{Fe of iron} also takes a value between 0 and 1. The Monte Carlo simulation was performed using a uniformly distributed randomly generated V _Ce and V _Fe values between 0 and 1, in which the randomly generated values of V _Ce were independent of the randomly generated values of V _Fe . And in this simulation, V _Ce and V _{Fe have} no mandatory relationship of any relationship or constraint. Mongolia The result of the Tecal simulation is the generation of a pseudo sample population. All of the points (including hollow, gray, and dark) as shown in Figure 1 constitute a collection of pseudo samples.

We can use empirical knowledge to reduce the number of qualified samples by introducing constraints. The first constraint is defined as 0.2 < V _Ce < 0.8 and 0.2 < V _Fe < 0.8. When the selection process considers the first constraint, the selected set of dummy samples are displayed as black or gray dots as shown in FIG. The second constraint is defined as lA< V _Ce + V _Fe <l+A. When the second constraint is further considered on the basis of the first constraint, a set of pseudo samples that simultaneously satisfy the two constraints are displayed as Black dots as shown in Figure 1.

Therefore, when designing a sample library consisting of C _e and F _e , the Monte Carlo simulation introduces the experience into the design through the two constraints, and obtains the identification information of the design sample library. For example, since the number associated with the number of pseudo samples satisfying the two constraints is known, the number of qualified samples can be known. As shown in Table I, we know the composition ratio of each eligible sample. Table I shows the pseudo sample values produced by Monte Carlo simulation. The numbers in italics are the pseudo sample values that satisfy the first constraint, and the numbers in the boxes are the pseudo sample values that satisfy the first and second constraints.

If the variable is segmented by a particular grid point, the optimal number of samples can be obtained according to the methods provided herein. Table I Values of V _Ce and V _Fe in Monte Carlo simulations

Fe Ce

0.040000 0.160000

0.120000 0.340000

0.440000 0.120000

0.620000 0,340000

0.160000 0.040000

0.460000 0.020000

0.460000 0.040000

0.460000 1.000000

0.560000 0.060000

0.860000 0.040000

0.960000 0.020000

0.000000 0.180000

0.000000 0.500000

Oooso. ΟΟΟΓΟΖ

Ooosr

Oozo. οοοοπο 0000089,, ΟΟΟΟΟΡΌ OOOO 'O

000009V ΟΟΟΟΡΖΌ ΟΟΟΟΟ 'Ο OOOOPZ'O ΟΟΟΟΟ Ο oooowo ooooos'o OOOOZl ·0

000096Ό 000089Ό 000098Ό 00008SO 00009 /0 000088.0 000026Ό 000098Ό 000081 '0 00008 '0 000096Ό 00009SO 000088 000098·.

Ό O 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008

00002SO 000088 oooow/o OOOOWO oooowo 0000^8'0

00008 SO 00009 '0 ΟΟΟΟΡ Ο OOOOPl V 000086Ό oooo o 0000176Ό oooo o

T69Z00/900ZN3/X3d 996 bribe 00Ζ OAV - -

ΟΟΟΟΟΑΌ 000096.0

000009Ό 000098*0 oooooro 000098.0

OOOOO SO 00009 0

OOOOOl '0 00009ΡΌ oooooro 00009Ε oooooro οοοο9ε'ο oooooro 000090Ό

000008Ό 000006Ό

000083Ό 000008Ό oooooro 000008Ό oooowo ΟΟΟΟΟΑΌ oooo ro 0Ο0ΟΟΑΌ

000081 '0 000009 V

OOOOP 'O oooooro

OOOOZP'O 000009-0

000086Ό oooooro

OOOOPl '0 OOOOO SO

OOOOOl ·0 ooooo ς'ο

OOOOZS'O oooooro

00009 ·0 OOOOOP'O

000009 V oooooro

OOOOZS'O OOOOOP'O

ΟΟΟΟΖ Ο OOOOOP'O

00009 ΙΌ oooooro oooowo oooooro

000095Ό oooooro

000009.0 OOQOPi '0

000090 oooow/o

ΟΟΟΟΡΓΟ OOOOZ '0

000000' I oooowo ooooo o 00001717Ό

T69Z00/900ZN3/X3d 996 bribe 00Ζ OAV 0.380000 0.100000

0.380000 0.700000

0.480000 0.800000

0.580000 0.600000

0.120000 0.080000

0.520000 1.000000

0.720000 1.000000

0.920000 0.020000

0.020000 0.180000

0.620000 0.160000

Example 2

This example shows how to select a qualified sample from a pseudo sample composed of four components (铈, iron, tungsten, and nickel) generated by Monte Carlo simulation. Variables of bismuth, iron, tungsten and nickel v _Ce , v _Fe , v _w ,

V _Ni takes values between 0 and 1. In Monte Carlo simulations, we use a uniformly distributed randomly generated value of 0 to 1 for each variable. The randomly generated values in this simulation are independent of each other and are not subject to any constraints. The result of the Monte Carlo simulation is a sample point (pseudo sample) in a four-dimensional space, and the projection of the four-dimensional sample point in three-dimensional space is as shown in FIG.

A first constraint is proposed here for selecting a sample corresponding to a physically realistic one, which is defined as V _Ce + V _Fe + V _w + V _Ni = l. When a first constraint is considered in the selection process, a selected set of pseudo samples (first pseudo samples) that satisfy the first constraint are shown in FIG.

Imagine an experience that leads us to conclude that the sum of the components of strontium and iron is always equal to the sum of the components of strontium and nickel. Then we can introduce a second constraint, which is defined as V _Ce +V _Fe =V _w + V _Ni . When the second constraint condition is further considered on the basis of the first constraint condition, a selected set of pseudo samples (second pseudo samples) that satisfy the two constraint conditions are displayed as one of two scattered in the space of the surroundings On the dimension plane (as shown in Figure 4). The qualified sample points shown in Figure 4 are observed from one side of the two-dimensional plane.

Thus, this Monte Carlo simulation, which takes into account two constraints, provides deterministic information about the design of a sample library consisting of four components with these two constraints. For example, a qualified sample ratio can be calculated by removing the number of pseudo samples satisfying two constraints by the total number of pseudo samples. Since the number associated with the number of pseudo samples satisfying the two constraints is known, the number of qualified samples can be known. The component ratio of the variables in each of the qualified samples is recorded, and the component ratio of the qualified samples is known. If the variable is segmented by a particular grid point, the optimal number of samples can likewise be obtained according to the method provided by the present invention.

In the Monte Carlo simulation described above, all 28561 pseudo samples were obtained, of which 460 were satisfied. With one constraint, 47 of the two constraints are satisfied at the same time, so the ratio of qualified samples satisfying the two constraints of the pseudo sample is 0.0016456.

Example 3

This embodiment illustrates a computer program that allows a user to enter information through a graphical user interface and perform calculations and simulations (including Monte Carlo simulation) to design a qualified sample library.

As shown in Figure 5, the graphical user interface allows the user to select the components required to design the sample. For example, a sample consisting of components A, B, and C, component A may be any one of the group consisting of vanadium (V), niobium (Nb), and molybdenum (Mo), the variable of component A. (V _a) in the range between 0 and 1 (refer to FIG. 5 in the range of 0.00 to 1.00), and the variable variation range is divided into 10 portions (10 segments as shown in FIG. 5). As a result, component A is assigned a variable (V _a ) that takes a value between 0 and 1 (see Fig. 6), and likewise, components B and C are also given corresponding variables (V _b and V _{c ).} ) (Please refer to Figure 6).

As a way to incorporate empirical knowledge into the sampling design, the graphical user interface allows the user to specify constraints between a variable or multiple variables. The default or first hidden constraint for the variables V _a , V _b and V _c is V _a +V _b +V _e = 1 ± Δ. Δ is the error (or constraint tolerance) and is given as 0.01 in this example (please refer to Figure 6). The second constraint required is V _a : V _b = 2: 1, and is entered via a graphical user interface (see Figure 7 for clarity).

The graphical user interface further allows the user to decide how to estimate the optimal number of eligible samples. As shown in Figure 8, the graphical user interface provides calculations for six different levels of accuracy. In the exact calculation therein, the pseudo sample is produced according to the formula 10 without any constraint shown in the present invention, and then the pseudo sample is detected by a computer and only the portion satisfying the constraint is selected. In this example, 198 samples satisfy the constraints. In addition, the component ratios of all 198 eligible samples were obtained.

When the calculation accuracy is between -100% and 100%, it is considered to be very low, when it is considered to be low between -30% and 30%, and the accuracy at -10% to 10% is medium, and It is high between -3% and 3%, and between -1.0% and 1.0% is very high precision.

Example 4

This embodiment shows a computer program that obtains a specified number of eligible sample points. The computer program allows the user to enter information through a graphical user interface and perform calculations and simulations (including Monte Carlo simulations) to derive the component ratios for each of these sample points.

Figure 9 shows that the graphical user interface allows the user to enter a designated total sample point 125, with each sample point having four components. The graphical user interface also allows the user to specify the desired component (see Figure 10) and define the constraints for the variable (see Figure 11). After defining the constraint tolerances (refer to Figure 12), the simulation test is started, and each dummy sample is checked for compliance with the set constraints. Stop the simulation when the number reaches a qualified sample ^125, the sample points 125 in order to obtain four components (elements Pd, /3/:i O90il£ 996/-00iAV

Λ

Inch 9 inch 6S0 £ΐ S6S07..

Ό inchΌ

Inch 6Ό λ inch I6S0O Z/A7

ΐΖ inch ∞π S0 inch inch 6/, .

Inch 0,

Inch 0 6 Bu,

Inch 0 inch ∞ΐ,

Ε§ΐ卜7

S 6SS

3⁄49 εζΌ卜7

£inch ΐ0 λ7·

Inch 9 o zfc0.

o s inch 68 b

0

0 6S} inch SZ 0,

Π inch 8 9ΖΓ0SHS

§0•

Inch δΌ

δ inch 6S

IO §00 inch 6S0 S//7..

9 9ΐ£Ό寸寸§/, - -

6L9l£V0 Π 1789 6Α5ΐεΐ

68"Ι8Ό 91 £9 B 0·0

£9Z99£'0 68"90Ό 9ΐε 0 29£LW0 29£L 0 £9^0ΐΌ 0 Z 292V0 9 ·0 9ΐ£ 0 UZ 2V0 L£L 9'0 9ΐ£9∑:0Ό 0

£9 99^ izmvo 0 9\£9f0 9ΐε930 LP6£ '0 £9 1LV0

9ΐε 0 SOI ·0 Limvo 9£LY 9ΐε o ε9^0Γ0 50Τ360 ^£9308 ΐ7/,1/6εθ 6 L990O 01Ζ60Ό 68"90·0

UZ V0 176£0'0 17Λ1/6 0 UZV2V0 f29£ZL'0 68Α590Ό 9ΐ£930 UZPSV

85ΐ£ΐΟΌ uz vo 2C9 0e Ζ 68Α0Ό ΐΐB 1/8ΓΟ 9£L V0 L£L^6£'0 £9350Γ0

9ΐ£9Ζ,Γ0 9ΐ£920Ό e9^oro

Z£930£"0 3e93S0"0 S0T ;60 9^0ΐΖΌ Ι^8ΐ9 50Τ360Ό L 6 L0'0 3£930£Ό Z£9K) 89 6ΐΌ 9£ZL'0 9Κ9 3ε9 :^0

62L9\£'0 9ΐε9Ζ0"0

6L 1S£'0 9ΐε9 Z S9£ 0 9I£ 0 uz vo

6 ΐε9·0 9TC930 Π 178Ϊ 6 L9V0 L£LPWO ΐ^δΐΓΟ L£L '0 L£L 9'0 umvo 9ΐ£9 L£LP YO 9^0ΐ Ό 6L9\£V0 £9^0I 9ΐ£9^ 0 85ΐ£ΐΟΌ L 6 LQ'0 9Ι£9^0Ό 68ASt8 90ΙΖ60Ό 68"90·0

9ΐ£9Ζ0

T69ZOO/900ZN3/X3d 0.157895 0.039474 0.184211 0.618421

0.157895 0.052632 0.684211 0.105263

0 0.171053 0.789474 0.039474

0.013158 0.039474 0.144737 0.802632

0.052632 0.039474 0.473684 0.434211

0.052632 0.065789 0.723684 0.157895

0.065789 0.078947 0.789474 0.065789

0.092105 0.171053 0.736842 0

0.105263 0.065789 0.105263 0.723684

0.118421 0.013158 0.842105 0.026316

0.131579 0.078947 0.434211 0.355263

0.157895 0.026316 0.513158 0.302632

0.026316 0.078947 0.736842 0.157895

0.039474 0.052632 0.210526 0.697368

0.065789 0.013158 0.315789 0.605263

0.092105 0.078947 0.118421 0.710526

0.092105 0.078947 0.210526 0.618421

0.105263 0.039474 0.592105 0.263158

0.184211 0.013158 0.210526 0.592105

0 0.157895 0.223684 0.618421

0 0.184211 0.618421 0.197368

0.013158 0.078947 0.315789 0.592105

0.065789 0.078947 0.802632 0.052632

0.078947 0.026316 0.381579 0.513158

0.105263 0.065789 0.684211 0.144737

0.131579 0.013158 0.618421 0.236842

0.157895 0.092105 0.539474 0.210526

0.157895 0.144737 0.078947 0.618421

0.013158 0.118421 0.868421 0

0.105263 0.013158 0.223684 0.657895

The papers and patents listed in the present application are hereby incorporated by reference. The description, examples and data contained in the above-exemplified embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Any insubstantial modification processing performed in accordance with the present invention falls within the scope of the present invention Find the range. Therefore, the spirit and scope of the appended claims are not limited to the illustrated version of the invention.

Claims

Rights request

1. A method of designing a combined sample library, comprising the following steps:

(1) providing components constituting the sample;

(2) providing a variable for each of the described components, the variable taking a value within a certain interval;

(3) setting at least one constraint for at least one of the variables;

(4) generating a pseudo sample;

(5) Inspect the pseudo sample to determine if it is a qualified sample;

(6) Repeat steps (4) and (5) until at least one of the qualified samples is determined.

2. The method of claim 1 wherein the range of values of the component variables is determined by empirical knowledge.

3. The method of claim 1 wherein the constraint is a relationship between said variables determined by empirical knowledge.

4. The method of claim 1 further comprising the step of determining a qualified sample ratio obtained by dividing said acceptable sample number by said pseudo sample amount.

5. The method of claim 1 wherein the pseudo sample is produced by a random sampling method.

6. The method of claim 5 wherein the random sampling is a Monte Carlo simulation.

7. The method of claim 5, wherein the random sampling is performed by using a variable value having a certain probability distribution.

8. The method of claim 7 wherein the probability distribution is evenly distributed.

9. The method of claim 7 wherein the probability distribution is a non-uniform distribution.

10. The method of claim 9 wherein the non-uniform distribution is selected from one or more of the group consisting of: Bernoulli distribution, beta distribution, X square distribution, exponential distribution, F distribution, gamma distribution Gaussian distribution, normal distribution (eg lognormal, multivariate normal distribution and univariate normal distribution), non-central X-square distribution, non-central F distribution, binomial distribution, negative binomial distribution, polynomial Minute Cloth, Pareto distribution, cypress distribution, student t distribution and Salisbury distribution.

11. The method of claim 7 wherein the variable values are randomly generated by a random number generator.

12. The method of claim 11 wherein the random number generator is selected from the group consisting of a set of linear congruential generators, a shift register sequencer, and a quasi-random number generator.

13. A method of providing a sample library of a specified number of eligible samples, comprising the steps of:

(1) provide the number of samples of the desired design;

(2) providing components constituting the sample;

(3) providing a variable for each of the components, wherein the variable takes a value within a certain interval;

(4) setting at least one constraint for at least one of the variables;

(5) Producing a pseudo sample;

(6) Inspect the pseudo sample to determine if it is a qualified sample;

Repeat steps (4) and (5) until the number of qualified samples scheduled in (1) is reached.

14. The method of claim 13 wherein the range of component variables is determined by risk knowledge.

15. The method of claim 13 wherein the constraint is a relationship between said variables determined by empirical knowledge.

16. A method of determining an optimal number of samples in a combined sample library, comprising the steps of:

(1) providing components constituting the sample;

(2) providing a variable for each of the components, wherein the variable takes a value within a certain interval;

(3) setting at least one constraint for at least one of the variables;

(4) generating a pseudo sample;

(5) Determining a qualified sample from the pseudo sample;

(6) Determine the qualified sample ratio, which is obtained by dividing the number of qualified samples by the number of pseudo samples;

(7) Decide on the number of samples;

(8) Calculate the optimum number of experiments, wherein the optimum amount is the product of the acceptable sample ratio and the number of samples.

17. The method of claim 16 wherein the variable value interval is determined by the r knowledge.

18. The method of claim 16 wherein the constraint is the above variable determined by empirical knowledge The relationship between.

19. A computer readable medium storing instructions which, when executed in one or more processors, enable the one or more processors to perform the method of any one of claims 1-18 method.

20. A system for designing a combined sample library, comprising:

(1) A method of providing components constituting the sample;

(2) A method of providing a variable for each of said components, and wherein the variable takes a value within an interval;

(3) A method of setting at least one constraint condition for at least one of said variables;

(4) A method of generating a pseudo sample;

(5) A method of testing the pseudo sample to determine if it is a qualified sample;

(6) A method of determining at least one of said acceptable samples.