EP2656242A2 - Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization - Google Patents
Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelizationInfo
- Publication number
- EP2656242A2 EP2656242A2 EP11808992.9A EP11808992A EP2656242A2 EP 2656242 A2 EP2656242 A2 EP 2656242A2 EP 11808992 A EP11808992 A EP 11808992A EP 2656242 A2 EP2656242 A2 EP 2656242A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- generating
- instructions
- cross
- matrices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 123
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000000694 effects Effects 0.000 claims abstract description 94
- 239000000872 buffer Substances 0.000 claims description 30
- 238000004458 analytical method Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 description 32
- 238000010586 diagram Methods 0.000 description 12
- 238000007619 statistical method Methods 0.000 description 7
- 239000003814 drug Substances 0.000 description 5
- 229940079593 drug Drugs 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
Definitions
- the technology described herein relates generally to data processing systems and more specifically to data processing systems that perform statistical analysis.
- Cross-product matrices are frequently generated by data processing systems that perform statistical analysis, such as data processing systems that use the method of least squares to fit general linear models to data.
- X'X matrix a dense cross-product matrix
- n denotes the number of observations
- the matrix X'X is of order (p x p)
- the vector x is of order (p x 1).
- Multi-pass algorithms to solve such matrices may be used in such non-limiting situations as when the elements of Xj depend on elements in x j (where j is different from i). In these types of situations, it is customary to compute the X'X matrix in multiple passes through the data. For example, on a first pass one might compute the information necessary to subsequently construct the vector Xj for any observation and then computes the cross-product matrix on a second pass.
- Classification variables are variables whose raw
- CLI-1947406v2 values are mapped to an integer encoding.
- a study of a species of fish might include a classification variable for gender with three categories: male, female, and undetermined. If a gender effect is in a statistical model regarding the study ⁇ i.e., occupies columns in the X matrix), the knowledge of a number of factors would be required to construct the X matrix. Such factors might include: (i) the number of levels of the gender effect that are represented in the data; (ii) the proper order for these levels; and (iii) the position of the first column of the gender effect in the X matrix— that is, which other terms precede the gender effect in the model and how many columns do they occupy.
- systems, methods, and computer- readable storage mediums are provided for a data processing system having multiple executable threads that is configured to generate a cross-product matrix in a single pass through data to be analyzed.
- An example system comprises memory for receiving the data to be analyzed, one or more processors having a plurality of executable threads for executing code to analyze data, and software code for generating a cross-product matrix in a single pass through data to be analyzed.
- the software code includes threaded variable levelization code for generating a plurality of
- variable tree merge code for combining a plurality of the thread-specific trees into a plurality of overall trees for the plurality of classification variables
- effect levelization code for generating a plurality of sub-matrices of the cross-product matrix using the plurality of the overall trees for the plurality of classification variables
- cross-product matrix generation code for generating the cross-product matrix by storing and ordering the elements of the sub-matrices in contiguous memory space.
- FIG. 1 is a block diagram depicting an example environment wherein users can interact with a computing environment that can perform statistical analysis.
- FIGs. 2-4 are block diagrams depicting example hardware and software components of data processing systems for generating a cross-product matrix.
- FIG. 5 is a block diagram depicting example hardware components of a data processing system that uses multiple computing threads to perform variable levelization.
- Fig. 6 is a process flow diagram depicting an example operational scenario involving a data processing system for performing variable levelization.
- Fig. 7 is a process flow diagram depicting an example operational scenario involving a data processing system for merging analysis performed by multiple computing threads.
- Fig. 8 is a process flow diagram depicting an example operational scenario involving a data processing system for performing effect levelization.
- FIG. 9 is a process flow diagram depicting an example operational scenario involving a data processing system for assembling a cross-product matrix.
- FIGs. 10-11 are block diagrams depicting example hardware and software components of data processing systems for continuously generating a cross-product matrix as data are streamed.
- Fig. 12 is a block diagram depicting example hardware components and example data flow in a data processing system that performs variable levelization.
- Fig. 13 is a process flow diagram depicting an example data flow in a process for merging analysis performed by two computing threads.
- FIGs. 14-15 are process flow diagrams depicting an example data flow in a process for performing effect levelization.
- Fig. 16 is a process flow diagram depicting an example data flow in a process for assembling a cross-product matrix.
- Fig. 1 depicts at 30 a computing environment for processing large amounts of data for many different types of applications, such as for scientific, technical or business applications.
- One or more user computers 32 can interact with the computing environment 30 through a number of ways, including a network 34.
- One or more data stores 36 may be coupled to the computing environment 30 to store data to be processed by the computing environment 30 as well as to store any intermediate or final data generated by the computing environment.
- An example application for the computing environment 30 involves the performance of statistical analysis. Frequently, in statistical analysis, models for sets of data are generated, and cross-product matrices ("X'X") are generated during the modeling process by the data processing systems in the computing environment 30 that perform statistical analysis. The models involve variables and the effects of those variables reflected in the data.
- Effects in the context of X'X formation are linear mathematical structures—that is, an effect is associated with certain columns of the X matrix. Except for specially defined tokens and keywords (like "Intercept"), effects depend on variables. An effect typically includes one or more variables that contribute to the effect.
- a continuous variable is a numeric variable and the raw values of the variable are used in constructing the effects.
- the heights and weights of subjects are continuous variables.
- a classification variable is a numeric or character variable whose raw values are used indirectly in forming the effect contribution.
- the values of a classification variable are called levels.
- the classification variable Sex has the levels "male” and "female.”
- the values of the classification variable are mapped to integer values that represent levels of the variable.
- the process of mapping the values of the classification variable to a level is referred to herein as variable levelization.
- These classification levels of the variables are then used to define the levels of the effect.
- the process of mapping the levels of the effect is referred to herein as effect levelization.
- the levels of the effect are typically the levels of the classification variable, unless all observations associated with a particular level of the variable are not useable in the analysis.
- the effects of the level depend on the levels of the classification variables that occur together in the data.
- the computing environment 30 includes a data processing system that can perform variable and effect levelization in a single pass through the data.
- Fig. 2 depicts an example data processing system for constructing an X'X Matrix 100 in a single pass through data that includes classification variable data.
- the example data processing system includes one or more data processors (not shown) having a number of execution threads that are capable of independently performing data analysis steps, a data buffer 102 for receiving data from a data store 36, and a single pass levelization engine 110.
- the single pass levelization engine 110 in this example includes a threaded variable levelization software components or code 112, a variable tree merge software component or code 114, an effect levelization software component or code 116, decision instructions 117 and an X'X matrix assembly software component or cross product matrix generation code 118.
- the single pass levelization engine 110 can generate an X'X matrix 100 in a single pass through the data in the data buffer 102.
- one or more execution threads execute instructions from the threaded variable levelization software component 112.
- the processing time for the threaded variable levelization software components or code may be reduced on multi-core or hyper- threaded platforms because multiple threads may execute concurrently.
- the performance gain from multi-threaded execution should outweigh the computational expense of merging the thread-specific trees into an overall tree at the end of the variable levelization. The results
- CLl-19474 06 v2 generated by the threaded variable levelization software component 112 are provided as input to the variable tree merge software component 114.
- the results generated from executing the instructions from the variable tree merge software component 114 are in turn provided as input to the effect levelization software component 116.
- Decision instructions 117 are executed which determine whether additional data to be processed exists in the data buffer 102 before proceeding to assemble an X'X matrix. If additional data exists, data are read from the data buffer 102 and control of the process is returned to the threaded variable levelization software component 112. If no additional data exists, then the results generated from executing the instructions from the effect levelization software component 116 are provided to the X'X matrix assembly software component 118, which assembles an X'X matrix 100.
- FIG. 3 depicts, in more detail, an example data processing system for constructing an X'X Matrix 100 in a single pass through data that includes classification variable data.
- This example data processing system also includes a data processor (not shown) having a number of execution threads that are capable of independently performing data analysis steps, a data buffer 102 for receiving data from a data store 36, and a single pass levelization software component 110.
- the single pass levelization engine 110 in this example includes a plurality of threaded variable levelization software sub-components 112a - 112n, a variable tree merge software component 114, an effect levelization software component 116, decision instructions 117 and an X'X matrix assembly software component 118.
- variable levelization components can build up the information needed for down-stream steps—
- CLI-1947406v2 such as the formation of the X'X matrix— one buffer at a time. It is not necessary to hold the entire data in memory.
- processor instructions 120 are provided for reading a new buffer of data from the data buffer 102 and inputting that data to the threaded variable levelization software sub-components 112a - 112n.
- Each threaded variable levelization software sub-components is executed by a separate processor executing thread, which results in the generation of thread-specific binary trees 122a - 122n that describe characteristics of classification variables found in the data. Formation of separate thread-specific binary trees has the advantages that the tree in one thread can be formed independently of the trees in other threads. If a common tree were formed (in stage 122 of Figure 3), then the tree would have to be locked every time a thread wanted to add a new value to the tree.
- variable tree merge software component 114 This locking would essentially serialize the work and reduce the advantage gained by allowing the threads to operate on data independently.
- these thread-specific binary trees 122a - 122n are combined by the variable tree merge software component 114 to generate overall binary trees 124 for each classification variable.
- the binary trees 124 are processed by the effect levelization software component 116, which generates partial sub-matrices 126a - 126m of the overall cross- product matrix using the overall binary trees 124.
- Decision instructions 117 are executed which determine whether additional data to be processed exists in the data buffer 102 before proceeding to assemble an X'X matrix. If additional data exists, data are read from the data buffer 102 and control of the process is returned to processor instructions 120. If no additional data exists, then the partial sub-matrices 126a - 126m are provided to the X'X matrix assembly software
- CLI-1947406v2 component 118 which assembles an X'X matrix 100.
- Storing the components of the eventual X'X matrix in partial sub-matrices offers several advantages. When the matrices are stored in separate computer memory, it is easy to add rows and columns to the sub-matrices. It is more complicated to insert rows/column into a matrix. The algorithm builds the sub-matrices in the order in which the unique values of the variables appear in the data. If a new data buffer is fetched, the new values will lead to adding rows/columns to the sub-matrices, but will not lead to an insertion of rows or columns.
- Fig. 4 depicts another example data processing system for constructing an X'X matrix 100 in a single pass through data that includes classification variable data.
- This example data processing system contains elements similar to the example system depicted in Fig. 3.
- processor execution thread 0 executes the effect levelization software sub-component 116a
- processor execution thread 1 executes the effect levelization software sub-component 116b
- processor execution thread n executes the effect levelization software sub-component 116n.
- Each processor execution thread executes instructions that result in the generation of one or more sub-matrices of the overall X'X matrix.
- Fig. 5 depicts an example system and Fig. 6 depicts an example process for generating thread specific trees 122a -122c using the threaded variable levelization software component 112 (shown in Figs. 2-4).
- the process commences (step 200 in Fig. 6) with a buffer 100 (Fig. 5) of raw data containing k observations being passed to the levelization code. If levelization is conducted in multiple threads, the buffer memory 100 is apportioned to the threads 130a -130c (Fig. 5) in such a way that each thread 130a -130c processes approximately the same number of observations. In this example the levelization is conducted with three threads 130a -130c, and
- each thread 130a -130c process approximately 1/3 of the observations. This apportioning in this example includes setting each threads' read pointer to the correct position in the buffer 100.
- Each thread 130a -130c examines each row in the assigned buffer area 132a - 132c (step 202) and determines whether the observation is used for the analysis (step 204). If the observation is to be used, the unique raw values for each variable are treed in a binary tree 122a - 122c that also contains auxiliary information on each tree node (step 206). Whenever a new raw value is found (step 208), a formatted value is derived (step 210), the observation number in the overall application is derived (step 212), and the frequency with which the value has occurred is updated (step 214).
- step 206 the formatted values are derived for each observation regardless of the raw value.
- step 208 is bypassed.
- Each observation used in the analysis is mapped to a formatted value but a new formatted value is not derived for each unique raw value. This variation is useful when the number of raw values is far greater than the number of formatted values; for example, when a continuous variable is grouped into intervals.
- step 216 After the assigned row of data has been read and processed, a check is made to determine if additional assigned rows of data exist that have not been processed (step 216). If yes, then the additional row of data is read (step 218) and examined (step 202). If no, then the thread-specific binary trees for each classification variable are complete (step 220).
- Fig. 7 depicts an example process for generating an overall tree 124 for each classification variable using the variable tree merge software component 114. After all of the threads 130a - 130c (Fig. 5) have completed treeing the observations in their buffer, the thread- specific trees 122a - 122c for each classification variable are combined into an overall tree 124.
- CLI-1947406v2 Multiple ways can be used to accomplish this, such as by accumulating trees in the tree constructed by the first thread.
- the overall trees for each classification variable retain information regarding the order in which the raw/formatted values were seen.
- the associated level of the variable corresponds to the data order, i.e., variable levels are organized by the order in which they appear in the data.
- Fig. 8 depicts an example process for performing effect levelization.
- the overall trees for each classification variable (230) generated by the variable tree merge software component are used to determine the levels for each effect (step 232). Because the variable levels were organized by the order in which they appeared in the data, the effect levels will also be organized by the order in which they appear in the data (step 232).
- partial sub-matrices of the overall ⁇ ⁇ ⁇ matrix are constructed (step 234).
- Each of the sub-matrices are stored separately in memory and as additional levels are found in the data, new rows and columns can be added to the end of the used memory space allocated to the sub-matrices.
- a sub-matrix C may be a 3x3 matrix after processing a certain number of observations and becomes a 4x4 matrix after processing the next observation.
- the information added to the 4 th row and 4 th column are stored in the memory space allocated to the sub-matrix C after the information that makes up the first three rows and columns in sub-matrix C.
- the partial sub-matrices can be assembled as illustrated in the following example. If, for example, there are three effects in a model, Ei, E 2 , and E 3 , the X ' X matrix can be constructed from six sub-matrices in accordance with the following table:
- the position of the diagonal sub-matrix for ⁇ 2 ⁇ 2 cannot be determined without knowing the dimension of the X'EIXE 2 sub-matrix (or at least without knowing the number of levels in effect E ⁇ ).
- a new level of effect E 2 will lead to the addition of a new row/column at the end of the ⁇ 2 ⁇ 2 sub-matrix.
- the effect levelization software component maintains the sub-matrices of the X ' X table in non-contiguous memory and adds rows and columns to the end as new levels are encountered.
- the sub-matrices are sparse and the effect levelization software component causes the sub-matrices to be stored sparsely in such a way that the memory can be easily grown, for example, by maintaining rows of symmetric or rectangular matrices in balanced binary sub-trees.
- step 236 After the partial sub-matrices have been constructed, a check is made to determine if a new buffer of data is available for analysis (step 236). If a new buffer of data is received the process begins again with the threaded levelization of the variables (step 238). If it is determined that all data has been received, the X'X matrix can be assembled in a multi-step process (step 236)
- Illustrated in Fig. 9 is an example process for assembling an X ' X matrix from the partial sub-matrices.
- the elements of the partial sub-matrices (242) generated by the effect levelization process are reordered. This involves determining the effect level ordering
- CLI-1947406v2 based on a request by the client application that initiated the data analysis (step 244) and reordering the levels of the classification variables and the effects to comply with the requested ordering specified by the client application. If the client application requested that variables be arranged in data order, then reordering is not necessary. If, however, the client application specified a different ordering, the variable trees and the effect trees must be suitably re-arranged to match the specified order.
- the ⁇ ⁇ matrix 248 is formed in dense form by copying elements of the sub-matrices into the correct level-order position of the overall X'X matrix.
- the X'X matrix can be formed in one pass through the data in the data buffer.
- the X'X matrix 248 could be assembled into a sparse matrix using any number of methods for sparse matrix representation.
- Figs. 10 and 11 depict additional example systems for forming an X'X matrix in a single pass through data.
- data may be streamed to the system.
- the system can generate an X'X matrix in a manner similar to that described in the prior examples.
- the example systems of Figs. 10 and 11, additionally, can re-compute the X'X matrix if additional data are received after the X'X matrix is initially generated.
- These example systems have instructions 140 that cause these systems to periodically check for new data in the data buffer 102. If new data are found, the new data are read, the thread specific trees are updated, the overall trees are updated and the sub-matrices previously generated are updated. Because the sub-matrices formed in the effect levelization process are maintained in non-contiguous memory spaces and the levels are maintained in data order, as new data are processed, the rows and columns of the sub-matrices can be updated and new rows and columns can be added to the end to reflect the new data. After the sub-matrices are updated, the
- CLI-1947406v2 elements of the partial sub-matrices are reordered, if necessary.
- the X'X matrix 100 is re-formed in dense or sparse form by copying elements of the sub-matrices into the correct level-order position in the overall X'X matrix.
- the X'X matrix 100 can continuously be re-formed in a single pass as new data are streamed to the data buffer 102.
- Fig. 12 depicts an example data flow in a threaded variable levelization process.
- an example table 300 containing nine observations is provided to the buffer memory 100.
- the nine observations include data relating to the classification variables Gender and Drug, and a response variable (Y).
- the nine observations in this example are allocated to two threads. The first five observations are allocated to the first thread as show at 302a and the last four observations are allocated to the second thread as shown at 302b.
- a model intercept, which has been added in this example, is represented by a column of'l".
- each thread When the threaded variable levelization process is applied, especially by variable levelization software sub-components 112 [and/or components 112a - 112n], each thread generates a thread-specific binary tree in form of a drug tree and a gender tree from the observations assigned to it as illustrated at 304a-d.
- the level encoding is separate for each thread and the order of the levels for each thread is the order in which the levels were encountered by the particular thread.
- the tables shown at 304a-d represent the information that is stored and managed in binary trees by the code.
- the thread- specific trees 304a-d are merged, especially by the variable tree merge software component 114, into overall binary trees
- CLI-1947406v2 in form of one overall tree 306 a-b for each classification variable as illustrated in Fig. 13.
- the order for the levels in the overall trees 306 a-b is in data order with respect to the entire data set.
- the two threads assigned a different level for the value "A" of the Drug variable. Because the "A" value had a lower observation number in thread 1 than the "B" value in thread 2, the value "A” was assigned to a lower level in the overall tree for the Drug variable.
- Figs. 14 and 15 continue the example data flow through the effect levelization stage, especially by effect levelization software component 116 and/or components 116a, 116b, 116n.
- the effect trees generated in the effect levelization process have the same number of levels as the variable trees as illustrated at 308.
- an X'X matrix can be generated, for example by matrix assembly software component 118 and/or in steps 246 from 10 sub-matrices: X'IXI, X'GXI, X'DXI, X'YXI, X'GXG, X'DXG, X'YXG, X'DXD, X'YXD, ⁇ ' ⁇ (as illustrated at 310) which occupy the locations in X'X matrix specified at 310.
- each of the 10 sub-matrices is generated in non-contiguous memory to allow them to grow as needed.
- the sub-matrices For each of the 9 observations, its contributions to the various sub-matrices are accumulated. If additional levels are detected from later observations, the sub-matrices can be expanded because they are stored in non-contiguous memory to allow more rows and columns to be added as needed when new levels of an effect are detected. If the sub-matrices were
- the final X'X can be constructed in contiguous memory, for example in step 246.
- the effect levels must be determined based on the order specified by the client application (step 244). In this example the correct order of the class variable levels is provided in the last column of the table at 314.
- the model contains only main effects (no interactions) and the effect level ordering is the same as the variable ordering.
- the elements of the sub-matrices are permuted into the correct location within the X'X matrix as shown in the example at 316.
- an X'X matrix can be generated as illustrated by the aforementioned examples.
- the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem.
- the software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein.
- Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
- the systems' and methods' data may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.).
- storage devices and programming constructs e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.
- data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
- a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code.
- the software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Complex Calculations (AREA)
- Stored Programmes (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/972,840 US8996518B2 (en) | 2010-12-20 | 2010-12-20 | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization |
PCT/US2011/064340 WO2012087629A2 (en) | 2010-12-20 | 2011-12-12 | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2656242A2 true EP2656242A2 (en) | 2013-10-30 |
EP2656242B1 EP2656242B1 (en) | 2018-08-22 |
Family
ID=45496260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP11808992.9A Active EP2656242B1 (en) | 2010-12-20 | 2011-12-12 | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization |
Country Status (6)
Country | Link |
---|---|
US (2) | US8996518B2 (en) |
EP (1) | EP2656242B1 (en) |
CN (1) | CN103262068B (en) |
CA (1) | CA2818905C (en) |
ES (1) | ES2691417T3 (en) |
WO (1) | WO2012087629A2 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996518B2 (en) | 2010-12-20 | 2015-03-31 | Sas Institute Inc. | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization |
US9805001B2 (en) * | 2016-02-05 | 2017-10-31 | Google Inc. | Matrix processing apparatus |
US9898441B2 (en) * | 2016-02-05 | 2018-02-20 | Google Llc | Matrix processing apparatus |
US20180298520A1 (en) * | 2017-04-17 | 2018-10-18 | Nanjing University | Self-limited organic molecular beam epitaxy for precisely growing ultrathin C8-BTBT, PTCDA and their heterojunctions on surface |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6708163B1 (en) * | 1999-02-24 | 2004-03-16 | Hillol Kargupta | Collective data mining from distributed, vertically partitioned feature space |
US6795815B2 (en) * | 2000-12-13 | 2004-09-21 | George Guonan Zhang | Computer based knowledge system |
ATE321422T1 (en) * | 2001-01-09 | 2006-04-15 | Metabyte Networks Inc | SYSTEM, METHOD AND SOFTWARE FOR PROVIDING TARGETED ADVERTISING THROUGH USER PROFILE DATA STRUCTURE BASED ON USER PREFERENCES |
EP1461719A4 (en) | 2001-12-04 | 2007-11-07 | Powerllel Corp | Parallel computing system, method and architecture |
US7657540B1 (en) * | 2003-02-04 | 2010-02-02 | Seisint, Inc. | Method and system for linking and delinking data records |
US8032635B2 (en) | 2005-07-29 | 2011-10-04 | Sap Ag | Grid processing in a trading network |
US20070118839A1 (en) | 2005-10-24 | 2007-05-24 | Viktors Berstis | Method and apparatus for grid project modeling language |
US7974221B2 (en) * | 2006-01-24 | 2011-07-05 | Brown Universtiy | Efficient content authentication in peer-to-peer networks |
US7680765B2 (en) * | 2006-12-27 | 2010-03-16 | Microsoft Corporation | Iterate-aggregate query parallelization |
US8250550B2 (en) * | 2007-02-14 | 2012-08-21 | The Mathworks, Inc. | Parallel processing of distributed arrays and optimum data distribution |
US8380778B1 (en) * | 2007-10-25 | 2013-02-19 | Nvidia Corporation | System, method, and computer program product for assigning elements of a matrix to processing threads with increased contiguousness |
EP2165260A1 (en) | 2008-05-19 | 2010-03-24 | The Mathworks, Inc. | Parallel processing of distributed arrays |
US8180177B1 (en) * | 2008-10-13 | 2012-05-15 | Adobe Systems Incorporated | Seam-based reduction and expansion of images using parallel processing of retargeting matrix strips |
US8996518B2 (en) | 2010-12-20 | 2015-03-31 | Sas Institute Inc. | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization |
-
2010
- 2010-12-20 US US12/972,840 patent/US8996518B2/en active Active
-
2011
- 2011-12-12 CA CA2818905A patent/CA2818905C/en active Active
- 2011-12-12 WO PCT/US2011/064340 patent/WO2012087629A2/en active Application Filing
- 2011-12-12 ES ES11808992.9T patent/ES2691417T3/en active Active
- 2011-12-12 CN CN201180060589.9A patent/CN103262068B/en active Active
- 2011-12-12 EP EP11808992.9A patent/EP2656242B1/en active Active
-
2015
- 2015-02-12 US US14/620,892 patent/US9798755B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
WO2012087629A3 (en) | 2013-03-28 |
US20150154238A1 (en) | 2015-06-04 |
US9798755B2 (en) | 2017-10-24 |
WO2012087629A2 (en) | 2012-06-28 |
EP2656242B1 (en) | 2018-08-22 |
CA2818905C (en) | 2015-03-17 |
US8996518B2 (en) | 2015-03-31 |
CN103262068B (en) | 2016-10-12 |
CA2818905A1 (en) | 2012-06-28 |
ES2691417T3 (en) | 2018-11-27 |
CN103262068A (en) | 2013-08-21 |
US20120159489A1 (en) | 2012-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10445657B2 (en) | General framework for cross-validation of machine learning algorithms using SQL on distributed systems | |
JP6357162B2 (en) | Data profiling using location information | |
US20120254183A1 (en) | Method and System for Clustering Data Points | |
US7743058B2 (en) | Co-clustering objects of heterogeneous types | |
US9798755B2 (en) | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization | |
Nelson et al. | Quantum enhancements and biquandle brackets | |
WO2016177405A1 (en) | Systems and methods for transformation of a dataflow graph for execution on a processing system | |
US11599540B2 (en) | Query execution apparatus, method, and system for processing data, query containing a composite primitive | |
US10459703B2 (en) | Systems and methods for task parallelization | |
US8290930B2 (en) | Query result generation based on query category and data source category | |
US9600446B2 (en) | Parallel multicolor incomplete LU factorization preconditioning processor and method of use thereof | |
US9411657B2 (en) | Load-balanced sparse array processing | |
US8856126B2 (en) | Simplifying grouping of data items stored in a database | |
CN105224649A (en) | A kind of data processing method and device | |
Gonzaga de Oliveira et al. | Metaheuristic algorithms for the bandwidth reduction of large-scale matrices | |
Oki | Improved structural methods for nonlinear differential-algebraic equations via combinatorial relaxation | |
AU2019200112A1 (en) | Reciprocal distribution calculating method and reciprocal distribution calculating system for cost accounting | |
Halbig et al. | Exploiting user-supplied decompositions inside heuristics | |
CN117934049B (en) | Multi-level cost calculation optimization method and device, electronic equipment and storage medium | |
US8176407B2 (en) | Comparing values of a bounded domain | |
Yang et al. | Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles | |
Yanovich | Compact representation of polynomials for algorithms for computing Gröbner and involutive bases | |
Dahl | Martingale matrix classes and polytopes | |
Sikka | Branch-and-cut for cardinality optimization | |
Coleman | SAS® Macros for Constraining Arrays of Numbers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20130627 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1190807 Country of ref document: HK |
|
17Q | First examination report despatched |
Effective date: 20161012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 17/30 20060101ALI20180403BHEP Ipc: G06F 17/16 20060101AFI20180403BHEP Ipc: G06F 9/50 20060101ALI20180403BHEP |
|
INTG | Intention to grant announced |
Effective date: 20180423 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1033290 Country of ref document: AT Kind code of ref document: T Effective date: 20180915 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602011051368 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2691417 Country of ref document: ES Kind code of ref document: T3 Effective date: 20181127 |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: FP |
|
REG | Reference to a national code |
Ref country code: SE Ref legal event code: TRGR |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181222 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181123 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181122 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181122 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1033290 Country of ref document: AT Kind code of ref document: T Effective date: 20180822 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602011051368 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20190523 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181212 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20181231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181212 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181231 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181212 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180822 Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20111212 Ref country code: MK Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180822 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20231229 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: SE Payment date: 20231228 Year of fee payment: 13 Ref country code: NL Payment date: 20231222 Year of fee payment: 13 Ref country code: FR Payment date: 20231219 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20240110 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20231222 Year of fee payment: 13 |