WO2007055927A2

WO2007055927A2 - Statistical optimizer

Info

Publication number: WO2007055927A2
Application number: PCT/US2006/041863
Authority: WO
Inventors: Bassam Baroudi
Original assignee: Bassam Baroudi
Priority date: 2005-11-03
Filing date: 2006-10-25
Publication date: 2007-05-18
Also published as: WO2007055927A3

Abstract

Methods and apparatus for rapidly converging on a solution in a large ragged search space, such as NP-Complete space, where the solution is good but not necessarily the most optimum are provided First, Frequency and Neighborhood Tables are built (103) as seen in Figure 1 Next, vectors are instantiated (105) and a Fitness Engine evaluates the fitness of each vector (107) The N% of vectors with the lowest fitness are discarded (109) This is repeated until the criteria are met (111) After the criteria are met, the optional solver/optimizer may be used (113) If it is used (1 15), it is used until the criteria are met (117) Then the overcall criteria are checked, if they are not met, the method returns to the beginning (119) If the criteria are met, the vector with the best fitness is selected (121), and the optional optimization is performed on that vector (123)

Description

Statistical Optimizer

BACKGROUND

[0001] This invention relates generally to methods and apparatus for converging on a solution to a problem in NP-Complete space, more specifically a non-genetic algorithm where the solution to be found is not necessarily the best, or most optimal, solution within the problem space.

[0002] Many real-world problems, for example, the Traveling Salesman Problem, resource scheduling, component placement, circuit routing, etc., involve finding an acceptable solution when given a multiplicity of objects and constraints. This class of problems is sometimes referred to as combinatorial optimization problems.

[0003] In the Traveling Salesman Problem ("TSP"), the goal is to find an optimal route to visit a set of cities or a subset thereof, based on certain criteria such as minimizing distance or time spent traversing the route. Determining the optimal solution with 100% certainty requires that every possible combination of routes be tried and evaluated. A complete search of the solution space involves testing N-Factorial ("N!") solutions where N is the number of cities. For a small number, for example five cities, 5! (5*4*3*2*1 = 120) solutions must be tested in order to determine the optimal solution. Such exhaustive analysis becomes literally impossible when the number of cities increases even modestly. If the number of cities were increased to only 20, the number of test solutions rises to approximately 2.4 x 10^Λ18 (2.4 Billion Billion) tests. [0004] The TSP is one of a class of problems called combinatorial optimization problems because the goal is to optimize some combination of elements (e.g., the ordering of the paths comprising a solution in order to discover the shortest route). As seen above, the number of solutions grows faster than exponentially with the number of cities. As the number of cities increases, it quickly becomes impossible to search exhaustively all possible solutions and consequently more selective search strategies must be used. More generally, TSP is a subclass of a more general set of problems known as "NP-Complete," characterized by a problem space so large that it cannot be exhaustively searched, whereas any single potential solution can be quickly and easily tested. [0005] Another example of NP-Complete problems includes Shop Floor Scheduling. Shop floor scheduling typically involves situations where numerous machines, materials and processing steps (i.e., time, capacity and resource constraints) must be scheduled within non- negotiable time windows (i.e. unmovable earliest/latest possible start time windows). This problem, in many cases, turns out to be NP-Complete. Rescheduling occurs frequently in response to changing demands, unexpected downtime, unavailability of materials, changing customer demands, and so on. Efforts to avoid the computationally expensive search of NP- Complete space include techniques known as "backtracking" or "look-back" algorithms, where the objective is to minimize searching NP-Complete space by looking back at recent scheduling decisions with the hope of identifying conflicts that can be used to learn and then avoid "deadend" solutions, i.e., those where certain constraints remain unsolved. This technique in itself results in additional computational complexity however, and may not necessarily lead to any acceptable solutions. By providing a way to more efficiently search the NP-Complete problem space, the Invention could alleviate the need for specialized search techniques employed in scheduling algorithms.

[0006] Among prior attempts to solve NP-Complete problems is work done in so-called "Steepest Descent" methods, which involve the iterative taking of derivatives along search contours in an attempt to locate a direction for moving from one trial solution to the next, leading ultimately to a maximum or a minimum, as the case may be. A recognized shortcoming of these prior methods is they rely on the problem space to be continuous. If the problem space happens to be ragged, as occurs frequently in practice, Steepest Descent methods tend to get stuck in local maxima and/or minima and therefore have a low probability of converging on optimal or near-optimal solutions encountered in everyday situations. [0007] Other Algorithms that have been developed to attempt to solve NP-Complete problems include Genetics and moments-based descent methods described in US Patents 5,222,192 and 5,255,345 and 4,935,877 (Genetics only). These attempts take an initial set of randomly generated test cases, then operate on and mutate intermediate results (generally, in unguided random fashion) in order to move the test points within the search space. The objective of at least some prior art is to determine the optimal solution as if the problem space had been searched exhaustively. Other solution attempts within the prior art involved techniques for "pruning' the search space by, among other techniques, discarding intermediate potential solutions deemed to have little promise. The present invention differs from these prior methods in that the present invention relies on statistical analyses to determine where and how to approach the problem space.

[0008] In many cases of every day problems, the cost of finding the most optimal solution cannot be justified. In fact, any "good-enough" solution is acceptable. Finding such a solution though remains difficult and intensive in terms of labor and computing power. If a search is started at some point in the search space and then is expanded, for example concentrically around that point, a "good enough" solution may not be found if the starting point happens to be in a "bad area" of the search space.

[0009] hi light of the foregoing, an improved technique for finding acceptable solutions to problems in the NP-Complete space is needed.

SUMMARY

[0010] Broadly speaking, the present invention fills these needs by providing a method and apparatus for finding acceptable solutions to problems in the NP-Complete space. It should be appreciated that the present invention can be implemented in numerous ways, including as a method, a system, or an apparatus. Several inventive embodiments of the present invention are described below.

[0011] hi one embodiment, a method for solving NP-Complete or similar problems appearing in every day situations in a short time within a specified degree of certainty is provided. The method initiates with a set of randomly generated trial solutions. The randomly generated solutions are statistically analyzed and a set of guidelines is produced. The initial trial solutions and their results are discarded, and a new set is instantiated using the newly produced guidelines. This process repeats iteratively until an acceptable solution is reached. [0012] In another embodiment, a computer readable medium having program instructions for solving NP-Complete type problems is provided. The computer readable medium includes program instructions for defining a set of vectors within a NP-Complete space, and program instructions for performing statistical analysis on the vectors. In one embodiment the statistical analysis includes assigning fitness values to each vector. The fitness values represent the satisfaction of the particular vector with reference to certain factors defined for the particular problem. Program instructions for generating a new set of vectors based on the statistical analysis and program instructions for iterating the performing and the generating until defined criteria have been achieved are included.

[0013] hi yet another embodiment, a computing device for solving NP-Complete type problems is provided. The computing device includes a processor, an input output module, and a memory storing statistical optimizer logic, all of which are in communication with each other through a bus. The statistical optimizer logic is configured to cause the processor to perform a method comprising method operations of defining a set of vectors within a NP-Complete space, performing statistical analysis on the vectors, generating a new set of vectors based on the statistical analysis, and iterating the performing and the generating until defined criteria have been achieved.

[0014] In still yet another embodiment, a computer implemented method for solving a combinatorial optimization problem is provided. The computer implemented method includes generating a first set of vectors providing a possible solution to the problem. Also included are generating a first table defining a frequency of occurrence of an element at various locations within a portion of the first set of vectors, and generating a second table defining inter-element adjacencies within the portion of the first set of vectors. The method then includes updating elements of the first table based on a corresponding value from the second table and generating a second set of vectors providing a next solution to the problem, hi one embodiment, each of the method operations may be embodied on a computer readable medium as program instructions to be executed.

[0015] Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, and like reference numerals designating like structural elements.

[0017] Figure 1 illustrates a flowchart depicting an overview of the overall method in accordance with one embodiment of the invention.

[0018] Figure 2 displays a flowchart, which is a more detailed description of the operation where the Frequency and Neighborhood Tables are initialized based on a set of previously

Instantiated seed vectors in accordance with one embodiment of the invention.

[0019] Figure 3 displays a flowchart, which is a more detailed description of the operation where the seed vectors are Instantiated in accordance with one embodiment of the invention.

[0020] Figures 4A, 4B and 4C illustrate a set of hypothetical test vectors and tables which depict a view of the operation where the Frequency Table is manipulated during Instantiation in accordance with one embodiment of the invention. [0021] Figures 5A, 5B and 5C illustrate tables which depict a view of the operation where Frequency Table data are weighted by Neighborhood Table data during Instantiation in accordance with one embodiment of the invention.

[0022] Figure 6 is a simplified schematic diagram of a system storing the program instructions, which when executed, performs the functionality described with reference to the embodiments contained herein.

DETAILED DESCRIPTION

[0023] The present invention describes an apparatus and method for searching and converging on solutions in an NP-Complete problem space or such similar search spaces. One example of an NP-Complete problem is the Traveling Salesman Problem ("TSP"). The goal of the TSP is to find an optimal route, for example to visit a set of cities or a subset thereof, based on certain criteria such as minimizing distance or time spent traversing the route. It should be apparent that the TSP is but one example of a plethora of NP-Complete and similar problems. While the TSP problem is referred to throughout this application, this reference is only for exemplary purposes to provide a specific application of the embodiments described herein. It should be appreciated that the embodiments described herein may be extended to any problem that may be characterized as a NP Compete problem or similar problem having a problem space too large to be exhaustively searched. It will also be apparent to one skilled in the art that the present invention may be practiced without some or all of these specific details described herein, hi some instances, well known process operations have not been described in detail in order to not unnecessarily obscure the present invention.

[0024] The embodiments of the present invention provide a means to quickly converge on large, complex, multivariate solutions in NP-Complete or similar problem space. NP-Complete problems are characterized by a problem space so large, and at times ragged in character, that they cannot be exhaustively searched, whereas any single potential solution can be quickly and easily tested. It should be appreciated that NP-complete type problems exist across a host of industries. For example, the TSP problem may affect any business requiring travel related expenses whether associated with sales issues or not. Shop floor scheduling impacts any industry having a manufacturing plant. Circuit designs are to the point where the routing and timing analysis falls into a NP-complete problem space, and so on. A method and system for solving problems in NP-Complete and similar problem spaces is described herein. A set of seed vectors within the NP-Complete problem space is defined and evaluated and statistical analysis is performed on the vectors. A new set of vectors is then generated based on the statistical analysis. It should be appreciated that the statistical analysis described herein could use different adaptive techniques depending on the nature of the problem under consideration. For example, the statistical analysis described with reference to the embodiments described herein utilize adaptive techniques customized to the particular problem being solved. That is, the TSP has characteristics concerned with variables such as the number of cities, the paths between the cities, while a problem concerned with scheduling for gamma sterilization of a product may be concerned with variables such as the weight of the product and the dose being received over the path of the product in the sterilization chamber, or a circuit design problem may be concerned with the electrical characteristics of a path within the circuit, and so on. As will be explained in more detail below, the above generation and statistical analysis operations are iterated until certain defined criteria have been achieved.

[0025] Figure 1 displays flowchart 101 depicting an overview of the overall method in accordance with one embodiment of the invention. The method initiates with operation 103 where the Frequency and Neighborhood tables are initialized. Here, the method builds data structures used in the evaluation of test vectors as will be discussed in more detail with reference to Figure 2. In one embodiment, the Frequency and Neighborhood tables are initially loaded with default values as described in further detail below. It should be appreciated that default values and other parameters used by the method can be provided by any number of commonly used means, for example acquiring them from a hard disk computer file, a network or directly from the user at program invocation. The method proceeds to the Instantiate operation at operation 105 where the initial set of test vectors is produced. The Instantiate operation consults the Frequency and Neighborhood tables during the test vector creation process, which will be described in more detail with reference to Figure 3. [0026] The method of Figure 1 then advances to operation 107, where the test vectors are presented to a Fitness Engine, which determines a fitness value ranking a desirability of each test vector and assigns the fitness value to each test vector. It should be appreciated that determination of Fitness can be customized according to the requirements and constraints of a particular problem set, thus enabling the embodiments described herein to weigh an arbitrary number of constraints and/or factors and provide a meaningful and accurate ultimate Fitness criterion, hi one embodiment, the Fitness Engine is a modular entity, customized to a particular problem set with a particularized set of initialization and operating parameters. By way of example and without limitation, in the Traveling Salesman Problem, factors such as road quality and penalties associated with a late arrival would be included by the Fitness Engine in its evaluation of the Fitness of a particular vector. It should be appreciated that numerous other parameters may be considered by the Fitness Engine in the evaluation depending on the specific application. For example, with reference to shop floor scheduling, certain scheduling constraints such as parts having to be manufactured prior to other parts, minimum doses of radiation required in a sterilization chamber, etc., maybe included.

[0027] The method of Figure 1, then proceeds to operation 109, where the vectors are sorted and ordered by their Fitness values, and a portion of the vectors deemed to be fit for analysis is selected. Vectors not selected for analysis are permanently discarded. The portion selected fit for analysis may be selected according to any suitable pre-determined criteria, such as the top X%, some absolute number of the ordered vectors having the highest/lowest Fitness values, etc. Thus, in one embodiment, the Fitness value is used to cull the set of vectors. Continuing to decision operation 111, certain definable criteria are examined to determine whether to terminate the present set of operations or return to the beginning at operation 103 for an additional iteration. By way of example and without limitation, defined criteria could be adaptive and include parameters determined during Instantiation operation 105 and Fitness operation 107, elapsed execution time, vector improvements relative to previous iterations, and the number of iterations that have already taken place. It should be appreciated that any number of criteria may be employed at this stage depending on the nature of the problem under scrutiny. If the decision in operation 111 determines additional iterations are warranted, the method of flowchart 101 returns to operation 103 where the selected vectors are analyzed and the analysis is used to populate the Frequency and the Neighborhood tables. The method thus continues iterating' until the defined criteria have been achieved. Otherwise, the method continues to decision operation 113 where an optional Solver/Optimizer is employed. [0028] Operations 113, 115, and 117 provide a set of modular hooks, comprising an optional Solver/Optimizer. In one embodiment, further enhancement of results could be achieved by the employment of an optional and different solver/optimizer. It should be apparent to one skilled in the art that any other suitable Solver/Optimizer, including by way of example and without limitation, genetic algorithms, could be modularly added to the invention as an option, where indicated. Thus, genetic algorithms may be used in conjunction with the statistically based embodiments described herein to further define an acceptable solution. If the optional Solver/Optimizer is employed, the method proceeds to operation 117 where some or all of the seed vectors emerging from selection operation 109 are provided to the optional Solver/Optimizer in operation 115. For the sake of clarity, the Fitness determination of operation 107 and selection operation 109 are implicitly included and executed in the Solver/Optimizer operation 115, working as described above. The method then moves to operation 117 where a set of criteria is applied to determine whether to end optional Solver/Optimizer processing or to continue. As with the criteria applied in operation 111, these criteria could include specific metrics provided by the optional Solver/Optimizer at operation 115, as well as more general criteria such as elapsed execution time, number of iterations and seed vector improvement. If decision operation 117 determines additional iterations are warranted, the method of flowchart 101 returns to operation 115. Otherwise, the method shifts to the decision operation 119 where criteria related to the overall process are examined to determine whether to continue processing or to prepare to exit. By way of example and without limitation, criteria could include total execution time, metrics from the main and optional Solver/Optimizers, vector fitness, the total number of iterations so far, etc. It should be appreciated that criteria may be given differing weights and precedence, the implementation of which would be apparent to one skilled in the art. If decision operation 119 determines additional iterations are warranted, the method of flowchart 101 returns to operation 103. Otherwise, processing shifts to operation 121 where the vector having the best Fitness is selected as the output vector.

[0029] Generally, the Fitness Engine employed in operation 107 would be employed to determine fitness at operation 121. A final local optimization "polishing" of the output vector may be applied in operation 123 before presentation to the user. It should be noted that within operation 123, the invention could apply certain global criteria. By way of example and without limitation, the method could iteratively "polish" the final vector until the total amount of execution time available to the method has expired. It should be appreciated that the "polishing" operation could be achieved by an ordinary Local Annealing algorithm or another locally- focused method. The objective of operation 123 is to use additional execution time, if any, to improve if possible the final vector by means of a fast and simple procedure without performing a global search in the search space.

[0030] Figure 2 displays flowchart 103, showing a more detailed description of operation 103 of Figure 1 where the Frequency and Neighborhood tables are created, hi operation 201 of flowchart 103, the empty Frequency Table is created as an array, based on the number of positions in a seed vector and the identity of each position. In one embodiment, the array is a square array. In another embodiment and as a performance enhancement, an additional row and column are added to the Frequency Table to store the count of non-zero items that appear in each row and column of the Frequency Table. The method proceeds to operation 203 where the empty Neighborhood Table is created as an array based on the number of positions in a seed vector. In one embodiment, the array is a square array. Although the embodiment is described here as utilizing only the immediate adjacency information in the Neighborhood table, it will be apparent to one skilled in the art that this can be easily extended to cover farther neighbors as well, using the same or multiple Neighborhood tables. In operation 205, the method reads the next seed vector and proceeds to fill the Frequency table at operation 207. hi operation 207, the method examines the current seed vector beginning at position 1 of the vector and proceeds sequentially, examining each position, until the end of the vector is reached. At each position, the method increments the number in the cell in the Frequency Table at a location determined by the column corresponding to the current position within the seed vector and the row corresponding to the identity of the item at the current position within the seed vector. When so filled, and including the table counts calculated in operation 213 as will be described below, the Frequency Table defines a frequency of occurrence of an element within the selected vectors by providing a numeric summary of the number of times each item appears in each position, i.e., the "Frequency" of each item, thus providing one basis for production of new seed vectors during the Instantiate procedure as will be described in more detail with reference to Figure 3. An exemplary Frequency Table, shown initialized with data from a set of hypothetical test vectors shown in Figure 4 A, is depicted in Figure 4B.

[0031] Referring to Figures 5 A, a hypothetical set of test vectors is shown. One skilled in the art will appreciate that the set of test vectors may represent an initial default set or a later set of vector undergoing previous iterations of the process described herein, hi terms of the TSP, positions 1-5 represents the cities to be visited. Thus, position 1 represents the first city to be visited, which happens to be city number 2 in test vector number 1, city number 5, in test vector number 2, etc. In other words, test vector number 1, represents the traveling salesman visiting city number 2, then city number 1, then city number 3, then city number 4 and then city number 5. In one embodiment, the traveling sales man returns back to city number 2 after city number 5 if the vector is defined as circular.

[0032] Figure 5B represents an exemplary initialized frequency table in accordance with one embodiment of the invention. The test vectors of Figure 5 A are used to populate the Frequency table of Figure 5B. hi the frequency table positions 1-5 represents the same positions of the table of Figure 5 A. The corresponding item ID for that position represents the number of times, i.e., the frequency, that the city was in that corresponding position for the table of Figure 5A. For example, with regard to city 1 (position 1), the number of times city 1 was the first city visited, i.e., item 1, was 1, while the number of times city 1 was the second city visited was 2. Moving along the row corresponding to item 1, the frequency table is populated with a 1 and then a 2 for positions 1 and 2. This process is repeated until the frequency table is populated. The row count and column count are generated according to the number of times the corresponding row or column has array values greater than 0. For example, the row associated with item ID 1, has a value of 3 since city 1 appeared in the first, second, and fifth positions. As described in more detail below, a hard hit may be represented by a value of 0 in the row or column count.

[0033] Figure 5C represents the marking of a row and column as inactive in accordance with one embodiment of the invention. When building the output vector, if in position 2, item 3 is used in the output vector, thus, as item 3 is placed in one position in the output vector, this item is now restricted from further processing. Therefore, item 3 and position 2 are marked as used. As will be described in more detail later in the description of operation 105 of Fig 1, as the output vector is built, the marking of inactive rows and columns continues. [0034] Returning to Figure 2, the method then proceeds from operation 207 to operation 209, where the Neighborhood table is filled. In a manner similar to filling the Frequency Table in operation 207, the method at operation 209 examines the current seed vector beginning at position 1 of the vector and proceeds sequentially, examining each position, until it reaches the end of the vector. At each position, the method looks ahead to the next position in the seed vector. The method then increments the number in the cell in the Neighborhood table determined by the column corresponding to identity of the item in the current position within the seed vector and the row corresponding to the identity of the item at the next position within the seed vector. When filled, the Neighborhood Table defines inter-element adjacencies within the selected vectors by providing a numeric summary of the number of times items appear adjacent to one another, thus providing another basis for production of new seed vectors during the Instantiate procedure as will be described in more detail with reference to Figure 3. Based on the specific problem being solved, it will also be apparent to one skilled in the art that the last element in a vector may or may not be considered adjacent to the first element in the same vector. An exemplary Neighborhood Table, shown initialized with sample data in five rows and five columns, is depicted in Figure 5A. It will be apparent to one skilled in the art that many of the processes described in operation 207 and 209 can be performed within the same routine, thereby gaining execution efficiency. [0035] The method of Figure 2 next proceeds to decision operation 211, where it determines if the last seed vector has been examined. If the final seed vector has not been reached, the method returns to operation 205 for additional iteration, and repeats as described above. Otherwise, the method proceeds to operation 213 where the Frequency Table counts are updated. It will be apparent to one skilled in the art that Frequency Table counts could have been kept up to date during operation 207 to gain operational efficiency. Update operation 213 is described here for the sake of clarity. For each row, the method calculates the total number of non-zero cells appearing in the row and writes the total to the count cell corresponding to the row. Similarly, the method calculates the total number of non-zero cells appearing in each column and writes the total to the count cell corresponding to the column. At operation 215, the method modifies entries in the Neighborhood Table. In one embodiment of the invention, the method examines each entry in the Neighborhood table and performs adjustments based on the value in each cell and certain parameters. It should be appreciated that any number of parameters and criteria could be applied to the table to provide weighting to the Neighborhood representation, thus varying the character and effect of the Neighborhood Table on the production of new seed vectors in the Instantiate procedure.

[0036] It should be appreciated that flowchart 103 depicted in Figure 2 could be invoked without any seed vectors available for analysis, e.g., when the method is first invoked. It will be apparent to one skilled in the art that suitable default values can be applied. By way of example and without limitation, default values could include initializing the Frequency Table to represent equal frequency for all elements and initializing the Neighborhood Table to values representing no known relationships among the elements. It should be further appreciated that any arbitrary arrangement of default values could be applied to either table depending on the nature of the problem under scrutiny.

[0037] Figure 3 displays flowchart 105, showing a detailed description of operation 105 of Figure 1, where a new set of vectors is derived from elements in the Frequency Table during Instantiation. It should be appreciated that in one embodiment, operation 105 employs adaptive statistical analysis based in part on optional modes that can be used to control the way in which seed vectors are Instantiated. This statistical analysis may use adaptive parameters that are modified during execution of the method. Examples of such parameters include the Static versus Dynamic modes as well as the Accurate versus Standard modes. Thus, in Accurate and Dynamic modes, the nature of the selection for the output vector is adaptive because the data is analyzed, for example, to minimize the possibility of hard hits, or to determine which value to select, whereas the added computation is not incurred in Static or Standard modes and the selection is random when multiple possibilities are available. Generally speaking, these modes enable the method to balance execution speed against convergence characteristics based on user-supplied criteria, criteria learned or derived during execution of the method, or a combination of both types of criteria. Moreover, the criteria may be statistical in nature or fixed. By way of example and without limitation, the number of seed vectors to be instantiated could be adjusted downward based on the percentage of non-zero entries found in the Frequency Table from the prior iteration. In operation 301 of flowchart 105, a master copy of the Frequency Table created in flowchart 103 is generated. The method next proceeds to decision operation 305 where it determines whether to handle Instantiate operations involving the Neighborhood Table in "Static" or "Dynamic" mode. It will be apparent to one skilled in the art that the "Static/Dynamic" setting tested in operation 305 need not be fixed, but could be varied on successive invocations of flowchart 105. By way of example and without limitation, an algorithm could be employed to set Static mode for the first 25% of the invocations of flowchart 105 and Dynamic for other invocations. If decision operation 305 determines that Static mode should be used, control transfers to operation 303 where non-zero values from the Neighborhood table are scaled and applied, cell-by-cell, to the master copy of the Frequency Table and its row and column count totals are adjusted accordingly. The Frequency Table, after said modification, becomes weighted in favor of Neighborhood groupings so that the Instantiate operation with reference to Figure 3 will create seed vectors tending to favor groupings represented in the Neighborhood table. The master Frequency Table, so modified, will remain constant while the Instantiate operation creates the complete set of seed vectors. A more detailed view of one embodiment of the procedure by which Frequency Table entries are modified according to adjustment values derived from the Neighborhood Table data will be described with reference to Figures 5A, 5B, and 5C. As illustrated with reference to Figures 5A-5C, under Static mode the data in the Neighborhood table is used to modify corresponding data in the Frequency table before generating new vectors. The data in the modified Frequency table is then used to generate the desired set of vectors.

[0038] Still referring to Figure 3, if decision operation 305 determines that Dynamic rather than Static mode should be used, the Frequency Table is not modified in operation 303. Instead, the Frequency Table will undergo modifications on a per position-assigned basis throughout Instantiation of the seed vector. It should be appreciated that Static mode, by virtue of its one- step modification of the Frequency Table (versus modifications on a per-position of each seed vector basis) results in greater execution efficiency but is less flexible, since Static mode adapts only once to the entire set rather than conforming dynamically (contemporaneously with generating the new set of vectors) to the characteristics of each potential seed vector under Dynamic mode. Under Dynamic mode once an item is placed into a position within a vector, the item and position are marked inactive and then the Neighborhood table is referenced to determine what item is a most probable item to come before/after the item just placed into position. The Frequency table is then referenced to increase the probability corresponding to the most probable item. Thus, by operating under Dynamic mode enhanced conformance to the data of the Neighborhood table is achieved, as compared to Static mode.

[0039] Under Dynamic mode, the method of Figure 3 then proceeds to operation 307 where a working copy of the Master Frequency Table is made. Because this copy is refreshed before the creation of each seed vector, the operation is free to make real-time modifications to the working copy without affecting the generation of successive seed vectors. Operations such as Dynamic mode, as discussed above in operation 303, are thus possible. The method then moves to operation 325 where a determination is made whether to use Accurate Mode ("A- Mode") during seed vector generation. Similar to the way criteria were used in operation 305 to determine whether the operation will use Dynamic or Static mode, the decision to use A- Mode is based on parameters that can include user-supplied values and can include the evaluation of certain results emerging from prior Instantiate invocations. By way of example and without limitation, A-Mode could be invoked after a certain number or percentage of "Hard Hits" is exceeded as a result of the processing at operation 329 as will be explained below. By delaying the use of A-Mode until later stages of Instantiation, unneeded operational steps are avoided, thus enhancing execution efficiency. If operation 325 determines that A-Mode is to be used, the method proceeds to operation 327 where it scans the working copy of the Master Frequency Table and determines a preferred vector selection. It should be appreciated that determining a preferred vector selection could be performed by adaptive techniques. In one embodiment of the invention, the method narrows the focus of Instantiate to only rows and columns with the smallest count as will be seen below. The effect of this narrowing is that Instantiate will first select items from the Frequency Table having the smallest number of potentially good output vector positions, thereby enhancing the probability that every item will end up with a beneficial/favored output position. The method then proceeds to operation 309. If the result of decision operation 325 is not to use A-Mode, the method moves directly to operation 309. At operation 309, if A-Mode is not set, the method randomly selects any row from the working copy of the Frequency Table. If A-Mode is enabled, the method instead randomly looks only at the limited subset of rows and columns as narrowed in operation 327 above and randomly selects one from this subset. The method then moves to operation 311 where the Frequency Table count corresponding to the row (or column) just selected is examined. If the count value is zero, it indicates every potential seed vector position for this item was taken in previous iterations. Such a condition means a favored choice for the current item is impossible. Accordingly, this condition is referred to as a "Hard Hit." If the decision in operation 311 encounters a Hard Hit, the method moves to operation 329 where the "Hard Hit" flag is set, then in operation 331, the row (row or column if A-Mode is enabled) is marked for deferred processing. It should be appreciated that the "Hard Hit" flag is one criterion that will be examined by operation 335 as described below, to assign positions (or rows) for rows (or positions) that were deferred. The number of times the Hard Hit flag was set may also be examined to determine whether to initiate A-Mode processing during a future iteration of this Instantiate flowchart 105. The method next returns to operation 325 where another row or column is selected for processing. However, if the decision in operation 311 does not encounter a Hard Hit, processing continues at operation 313 where the method will select an output position.

[0040] At operation 313, the method consults the working copy of the Frequency Table and examines the row (or column) selected at operation 309. In operation 313, the method randomly selects a position (or an item ID) based on the probabilities defined in the working copy of the master Frequency Table, where the numbers are interpreted as probabilities. Once this is done, the item ID (corresponding to the row) and its position in the seed vector (corresponding to the column) become known. The row and column corresponding to the cell just selected are then marked inactive and the row and column counts are updated based on the rows and columns remaining. A Frequency Table, shown by way of example with a single row and column marked inactive, is depicted in Figure 4C. The method next moves to operation 315 where the Item ID is written to the seed vector at the column position determined in operation 313.

[0041] The method of Figure 3 proceeds to decision operation 317 where it checks if Static mode is enabled. If Static mode is not enabled, i.e. Dynamic mode is active, the method moves to operation 333 where the working copy of the Frequency Table is adjusted using the Neighborhood Table to account for the cell selection made in operation 313. If decision operation 317 determined that Static mode is active, the method proceeds to operation 319 to determine whether all rows in the working copy of the Frequency Table have been processed. If all the rows have not been processed, the method returns to operation 325 where an additional iteration begins as described above. If decision operation 319 determines that all rows have been processed, the method proceeds to decision operation 321 to test the "Hit Flag" for the existence of deferred rows or columns that were marked, if at all, in operation 331 above. If deferred rows or columns are present, the method moves to operation 335 where the item ID corresponding to each deferred row is randomly assigned to an unassigned output column in the seed vector. Once all deferred rows have been assigned, the method proceeds to decision operation 323. If decision operation 321 does not detect deferred rows, the method simply drops to decision operation 323 to determine whether additional seed vectors should be created. It should be appreciated that the number of seed vectors to be generated, as determined in decision operation 323, could be governed by an arbitrary number of criteria. By way of example and without limitation, such criteria could include user-supplied values read from a configuration file or criteria developed dynamically during other processing steps based on independent criteria supplied thereto. If decision operation 323 determines additional vectors are required, the method returns to operation 307 to perform an additional iteration. Otherwise, the operation of flowchart 105 terminates.

[0042] Figure 5A depicts a Neighborhood Table initialized with sample data in five rows and five columns. Figure 5B depicts a Frequency Table with sample data in five rows and five columns before weighting by Neighborhood Table data. Figure 5C depicts one step of weighting a Frequency Table by Neighborhood Table data. As explained in relation to operation 209, the Neighborhood Table, shown here in Figure 5A, provides a numeric summary of the number of times items appear adjacent to one another. The numeral "1" in cell 609 is a count of the number of output vectors in which Item 2 was followed by Item 1. By comparison, cell 611 contains a value of "0" meaning that within the test vector set, Item 4 was never followed by Item 1. The data gathered from the analysis of the selected vectors, i.e., those having the best Fitness, is used to focus the search of the next iteration to areas of the search space having characteristics similar to the selected vectors. By way of example and without limitation, the method, upon determining Items 3 and 4 are adjacent in three of the five selected vectors, would focus the next search on areas of the search space where Items 3 and 4 are adjacent. In another example, the method would focus the next search on areas of the search space where Item 2 is placed in position 3 because it appeared in that position in three of the selected five vectors (see Figure 5B 613). [0043] In one embodiment, the entire Frequency Table undergoes modification according to adjustment values derived from the Neighborhood Table and is referred to as Static mode. To facilitate illustration of this process, calculations for a single value, here cell 613 of the Frequency Table depicted in Figure 5B, will be used as an example. It should be appreciated that that the value from 613 is used in conjunction with the neighborhood table to update the frequency values for the neighboring positions (positions 2 and 4, i.e., the values of columns 2 and 4). Thus, the value of cell 613 is used to eventually update cells 619 and 605, as well as the remainder of the values in each column. Cell 613 represents that Item 2 appeared three times in the selected portion of output vectors at position 3. The method first calculates an intermediate scale factor, "Q," by dividing the value from cell 613 by the sum of the elements from Figure 5B, row Item 2 and multiplying by scale factor 607. It should be appreciated that the scale factor is an adaptive parameter; the greater the value of the scale factor, the more effect the neighborhood data has on the output and vise versa. In one embodiment, a value of 0 for the scale factor leads to completely ignoring all the neighborhood data. The scale factor may come from any of many possible inputs including but not limited to a configuration file or directly from the user. The value 5 in the example shown is purely an exemplary choice for ease of calculation to avoid introducing fractions or decimal points. As shown in box 617, the resulting value of Q is 3. The method next multiplies the value from cell 601 by Q, adds the result to the value from cell 615 and writes the result to cell 619 of Figure 5C as shown in box 621. The method repeats this operation in turn for each cell to the right of cell 601 and for each cell below cells 615 and 619 respectively, until complete. The method next multiplies the value from cell 609 by Q, adds the result to the value from cell 603 and writes the result to cell 605 of Figure 5C as shown in box 623. The method repeats this operation in turn for each cell below cells 609, 603 and 605 respectively, until complete. Finally, the method repeats until each cell in Figure 5B has been processed. In a different mode of one embodiment, the Frequency Table is modified contemporaneously with generation of each new vector and is referred to as Dynamic mode. It should be appreciated that Dynamic mode and Static modes relate to when the changes to the Frequency Table (based on the Neighborhood Table data) are performed. As discussed above, during Dynamic mode the changes to the Frequency table occurs contemporaneous with the generation of each new vector, while during Static mode the changes to the Frequency table is completed and then the new vectors are generated. Rather than modify the entire Frequency Table at once as in Static mode, the method incrementally modifies only that portion of the Frequency Table corresponding to the element to be generated contemporaneously with vector generation under Dynamic mode. It should be appreciated that modification of the Frequency Table according to adjustment values derived from the Neighborhood table, including without limitation scale factor 607 and the adjustment value itself, could be determined by adaptive techniques. An example of such adaptive techniques could be modifying the value of the scale factor 607 based on statistical analysis of prior invocations. The Accurate and Standard modes relate to the selection order of the items or positions to process within a vector, which is intended to reduce the probability of having too many Hard Hits. Thus, the Accurate and Standard modes determine where within the vector to start placement of the values, e.g., based on a favorable order or randomly within the vector. In Accurate mode the options for positioning an item within the vector is considered when placing the item, while under Standard mode these option are not considered.

[0044] Figure 6 is a simplified schematic diagram of a system storing the program instructions, which when executed, performs the functionality described with reference to Figures 1-3. System 400 includes central processing unit (CPU) 402, memory 404, and I/O module 408, all in communication through bus 406. Memory 404 further includes statistical optimizer logic 410. It should be appreciated that statistical optimizer logic 410 may be program instructions, which when executed by CPU 402, causes the functionality described above with reference to Figures 1-3 to be performed. In another embodiment, statistical optimizer logic 410 may be embodied on a programmable logic device (PLD) synthesized through hardware description language (HDL). Of course, statistical optimizer logic 410 may be a combination of hardware and software in yet another embodiment.

[0045] With the above embodiments in mind, it should be understood that the invention may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms such as producing, identifying, determining, or comparing.

[0046] Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations. [0047] The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system and includes electromagnetic wave carriers. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, DVDs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

[0048] Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the claims, elements and/or operations do not imply any particular order of operation, unless explicitly stated in the claims.

Claims

1. A method of solving NP-Complete type problems, comprising method operations of: defining a set of vectors within a NP-Complete space; performing statistical analysis on the vectors; generating a new set of vectors based on the statistical analysis; and iterating the performing and the generating until defined criteria have been achieved.

2. The method of claim 1, wherein the method operation of performing statistical analysis on the vectors includes, determining a fitness value ranking a desirability of each vector within the set.

3. The method of claim 1, wherein the method operation of performing statistical analysis on the vectors includes, sorting the vectors according to a fitness value; and selecting a portion of the sorted vectors.

4. The method of claim 3, further comprising; generating a first and second table, the first table defining a frequency of occurrence of an element at various locations within the selected portion of the sorted vectors, the second table defining inter-element adjacencies within the selected portion of the sorted vectors.

5. The method of claim 4, further comprising; modifying entries within the first table according to adjustment values derived from the second table.

6. The method of claim 1 wherein the method operation of statistical analysis is adaptive.

7. The method of claim 1 , further comprising; generating a first and second table, the first table defining a frequency of occurrence of an element at various locations within the selected portion of the sorted vectors, the second table defining inter-element adjacencies within the selected portion of the sorted vectors, wherein the new set of vectors is derived from elements in the first table.

8. The method of claim 7, further comprising; modifying the first table contemporaneously with generating the new set of vectors.

9. The method of claim 8, wherein the method operation of modifying the first table contemporaneously with generating the new set of vectors is adaptive.

10. The method of claim 1, wherein the method operation of generating a new set of vectors based on the statistical analysis includes, determining a preferred order of one of assigning elements to locations or locations to elements within a vector.

11. The method of claim 10, wherein the method operation of determining a preferred order of one of assigning elements to locations or locations to elements within a vector is adaptive.

12. The method of claim 1 wherein the defined criteria are adaptive.

13. The method of claim 1, further comprising; an optional solver/optimizer; and iterating the optional solver/optimizer until defined criteria have been achieved.

14. The method of claim 1, further comprising; performing a final local optimization on the output vector.

15. The method of claim 14 wherein the method operation of performing a final local optimization on the output vector is adaptive.

16. The method of claim 1, further comprising; presenting an output vector to the user, and wherein non-genetic algorithms are used for solving the NP-complete type problems.

17. A computer readable medium having program instructions for solving NP-Complete type problems, comprising: program instructions for defining a set of vectors within a NP-Complete space; program instructions for performing statistical analysis on the vectors; program instructions for generating a new set of vectors based on the statistical analysis; and program instructions for iterating the performing and the generating until defined criteria have been achieved.

18. The computer readable medium of claim 17, wherein the program instructions for performing statistical analysis on the vectors includes, program instructions for determining a fitness value ranking a desirability of each vector within the set.

19. The computer readable medium of claim 17, wherein the program instructions for performing statistical analysis on the vectors includes, program instructions for sorting the vectors according to a fitness value; and program instructions for selecting a portion of the sorted vectors.

20. The computer readable medium of claim 19, further comprising; program instructions for generating a first and second table, the first table defining a frequency of occurrence of an element at various locations within the selected portion of the sorted vectors, the second table defining inter-element adjacencies within the selected portion of the sorted vectors.

21. The computer readable medium of claim 20, further comprising; program instructions for modifying entries within the first table according to adjustment values derived from the second table.

22. The computer readable medium of claim 17 wherein the program instructions for statistical analysis is adaptive.

23. The computer readable medium of claim 19, further comprising; program instructions for generating a first and second table, the first table defining a frequency of occurrence of an element at various locations within the selected portion of the sorted vectors, the second table defining inter-element adjacencies within the selected portion of the sorted vectors, wherein the new set of vectors is derived from elements in the first table.

24. The computer readable medium of claim 23, further comprising; program instructions for modifying the first table contemporaneously with generating the new set of vectors.

25. The computer readable medium of claim 24, wherein the program instructions for modifying the first table contemporaneously with generating the new set of vectors are adaptive.

26. The computer readable medium of claim 17, wherein the program instructions for generating a new set of vectors based on the statistical analysis includes, program instructions for determining a preferred order of one of assigning elements to locations or locations to elements within a vector.

27. The computer readable medium of claim 26, wherein the program instructions for determining a preferred order of one of assigning elements to locations or locations to elements within a vector are adaptive.

28. The computer readable medium of claim 17 wherein the defined criteria are adaptive.

29. The computer readable medium of claim 17, further comprising; program instructions for initiating an additional optimizer; and program instructions for iterating the additional optimizer until defined criteria have been achieved.

30. The computer readable medium of claim 17, further comprising; program instructions for performing a final local optimization on the output vector.

31. The computer readable medium of claim 30 wherein the program instructions for performing a final local optimization on the output vector are adaptive.

32. The computer readable medium of claim 17, further comprising; program instructions for presenting an output vector, and wherein the program instructions for generating a new set of vectors based on the statistical analysis utilize non-genetic algorithms.

33. A computing device for solving NP-Complete type problems, comprising: a processor; an input output module; a memory storing statistical optimizer logic, the statistical optimizer logic configured to cause the processor to perform a method comprising method operations of: defining a set of vectors within a NP-Complete space; performing statistical analysis on the vectors; generating a new set of vectors based on the statistical analysis; and iterating the performing and the generating until defined criteria have been achieved.

34. The computing device of claim 33, wherein the method operation of performing statistical analysis on the vectors includes, determining a fitness value ranking a desirability of each vector within the set.

35. The computing device of claim 33, wherein the method operation of performing statistical analysis on the vectors includes, sorting the vectors according to a fitness value; and selecting a portion of the sorted vectors.

36. The computing device of claim 35, further comprising; generating a first and second table, the first table defining a frequency of occurrence of an element at various locations within the selected portion of the sorted vectors, the second table defining inter-element adjacencies within the selected portion of the sorted vectors.

37. The computing device of claim 36, further comprising; modifying entries within the first table according to adjustment values derived from the second table.

38. The computing device of claim 33 wherein the method operation of statistical analysis is adaptive.

39. The computing device of claim 35, further comprising; generating a first and second table, the first table defining a frequency of occurrence of an element at various locations within the selected portion of the sorted vectors, the second table defining inter-element adjacencies within the selected portion of the sorted vectors, wherein the new set of vectors is derived from elements in the first table.

40. The computing device of claim 39, further comprising; modifying the first table contemporaneously with generating the new set of vectors.

41. The computing device of claim 40, wherein the method operation of modifying the first table contemporaneously with generating the new set of vectors is adaptive.

42. The computing device of claim 33, wherein the method operation of generating a new set of vectors based on the statistical analysis includes, determining a preferred order of assigning elements to locations or locations to elements within a vector.

43. The computing device of claim 40, wherein the method operation of determining a preferred order of assigning elements to locations or locations to elements within a vector is adaptive.

44. The computing device of claim 33 wherein the defined criteria are adaptive.

45. The computing device of claim 33, further comprising; an additional optimizer module in the memory configured to iterate until defined criteria have been achieved.

46. The computing device of claim 33, further comprising; performing a final local optimization on the output vector.

47. The computing device of claim 46 wherein the method operation of performing a final local optimization on the output vector is adaptive.

48. The apparatus of claim 33, further comprising; presenting an output vector.

49. A computer implemented method for solving a combinatorial optimization problem comprising method operations of: generating a first set of vectors providing a possible solution to the problem; generating a first table defining a frequency of occurrence of an element at various locations within a portion of the first set of vectors; generating a second table defining inter-element adjacencies within the portion of the first set of vectors; updating elements of the first table based on a corresponding value from the second table; and generating a second set of vectors providing a next solution to the problem.

50. The method of claim 49, wherein the method operation of updating elements of the first table based on a corresponding value from the second table includes, calculating a factor based on a first element of the second table; and applying the factor and the corresponding value to update a second element of the first table.

51. The method of claim 50, wherein the second element is an adjacent neighbor to the first element.

52. The method of claim 49, where the first and second tables are square arrays.

53. The method of claim 49, further comprising: updating the first table contemporaneously with generating the second set of vectors.

54. The method of claim 53, wherein the method operation of updating the first table contemporaneously with generating the second set of vectors is triggered after determining a first table count corresponding to one of a row or a column of the first table, is zero.

55. The method of claim 49, further comprising: repeating each method operation to converge on a vector set; and providing the vector set to another statistical optimizer to further converge towards an optimal solution.

56. The method of claim 55, wherein the another statistical optimizer employs a genetic algorithm.

57. The method of claim 50, wherein the factor controls a weight assigned to the second table for generating the next solution.

58. The method of claim 57, wherein as the factor is increased, the impact of the values from the second table on the next solution increases.

59. The method of claim 49 wherein the method operation of generating a second set of vectors providing a next solution to the problem includes, evaluating options to select an order of placement of a vector element within the vector.

60. A computer readable medium having program instructions for solving a combinatorial optimization problem comprising: program instructions for generating a first set of vectors providing a possible solution to the problem; program instructions for generating a first table defining a frequency of occurrence of an element at various locations within a portion of the first set of vectors; program instructions for generating a second table defining inter-element adjacencies within the portion of the first set of vectors; program instructions for updating elements of the first table based on a corresponding value from the second table; and program instructions for generating a second set of vectors providing a next solution to the problem.

61. The computer readable medium of claim 60, wherein the program instructions for updating elements of the first table based on a corresponding value from the second table includes, program instructions for calculating a factor based on a first element of the second table; and program instructions for applying the factor and the corresponding value to update a second element of the first table.

62. The computer readable medium of claim 61, wherein the second element is an adjacent neighbor to the first element.

63. The computer readable medium of claim 60, where the first and second tables are square arrays.

64. The computer readable medium of claim 60, further comprising: program instructions for updating the first table contemporaneously with generating the second set of vectors.

65. The computer readable medium of claim 64, wherein the program instructions for updating the first table contemporaneously with generating the second set of vectors is triggered after determining a first table count corresponding to one of a row or a column of the first table, is zero.

66. The computer readable medium of claim 60, further comprising: program instructions for repeating each method operation to converge on a vector set; and program instructions for providing the vector set to another statistical optimizer to further converge towards an optimal solution.

67. The computer readable medium of claim 66, wherein the another statistical optimizer employs a genetic algorithm.

68. The computer readable medium of claim 61, wherein the factor controls a weight assigned to the second table for generating the next solution.

69. The computer readable medium of claim 68, wherein as the factor is increased, the impact of the values from the second table on the next solution increases.

70. The computer readable medium of claim 60 wherein the program instructions for generating a second set of vectors providing a next solution to the problem includes, program instructions for evaluating options to select an order of placement of a vector element within the vector.