WO2006066556A2 - Relationale komprimierte datenbank-abbilder (zur beschleunigten abfrage von datenbanken) - Google Patents
Relationale komprimierte datenbank-abbilder (zur beschleunigten abfrage von datenbanken) Download PDFInfo
- Publication number
- WO2006066556A2 WO2006066556A2 PCT/DE2005/002287 DE2005002287W WO2006066556A2 WO 2006066556 A2 WO2006066556 A2 WO 2006066556A2 DE 2005002287 W DE2005002287 W DE 2005002287W WO 2006066556 A2 WO2006066556 A2 WO 2006066556A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- database
- records
- customer
- database table
- data
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims description 54
- 238000013179 statistical model Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 5
- 101100064676 Mus musculus Edem1 gene Proteins 0.000 claims 1
- 230000000875 corresponding effect Effects 0.000 description 87
- 239000000047 product Substances 0.000 description 72
- 241000196324 Embryophyta Species 0.000 description 44
- 238000009826 distribution Methods 0.000 description 32
- 230000015654 memory Effects 0.000 description 29
- 238000004590 computer program Methods 0.000 description 28
- 230000014509 gene expression Effects 0.000 description 19
- 230000006835 compression Effects 0.000 description 18
- 238000007906 compression Methods 0.000 description 18
- 238000011156 evaluation Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000003860 storage Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000007619 statistical method Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000007306 turnover Effects 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 241000208152 Geranium Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000010972 statistical evaluation Methods 0.000 description 2
- 230000003936 working memory Effects 0.000 description 2
- 101100515517 Arabidopsis thaliana XI-I gene Proteins 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 235000006508 Nelumbo nucifera Nutrition 0.000 description 1
- 240000002853 Nelumbo nucifera Species 0.000 description 1
- 235000006510 Nelumbo pentapetala Nutrition 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004138 cluster model Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2428—Query predicate definition using graphical user interfaces, including menus and forms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2423—Interactive query statement specification based on a database schema
Definitions
- Relational compressed database images (for accelerated query of databases)
- the invention relates to a database query system and a method for computer-aided database query.
- information about customers shopping in a hardware store is collected and the data collected, such as the age of the customer and the place of residence of the customers analyzed in order to adjust the offered assortment of the DIY store or to better estimate which advertising strategies are successful could be.
- a hardware store might have a customer database table in which information about the clients of the hardware store is stored in the form of customer records.
- a customer record contains, for example, the customer number of the customer, the gender of the customer and the year of birth of the customer.
- the hardware store could also have a transaction database table in which
- a transaction record could, for example, a transaction number, a specification of the product sold under the transaction, the statement of revenue in the transaction, indicating the date of the day on which the transaction was made, the customer number of the customer involved in the transaction and a specification of the transaction Payment method used by the customer (cash payment, card payment).
- the sales manager can not answer this question by querying the first database table or the second database table.
- the sales manager can not answer the question because the first database table contains no information about the products purchased by a customer.
- the sales manager can not answer the question because the second database table contains no information about the age of the customers who made transactions.
- Contain selection criteria that span multiple database tables Contain selection criteria that span multiple database tables. Queries that concern only a single database table can be handled by a so-called "fill table scan", ie the entire database table is read from the hard disk (or another memory) into the working memory once and each data record is processed individually. The runtime of such queries thus finds a natural upper bound, and linking multiple database tables means that this simple approach no longer works and potentially very long polling times can arise.
- a possible way out which is partly done in the context of data warehousing, is to change the structuring of the information in different database tables so that all are needed for a query
- the question could be answered by querying the first database table, if any
- Customer record containing the information as to whether the customer who matches this customer record purchased bedding and balcony plants in January.
- a customer record could have a field that contains a first value if the customer purchased bedding and balcony plants in January and includes a second value if the customer did not purchase bedding and balcony plants in January.
- the structure of the database table must already be selected before the request.
- the customer database table must be designed so that each customer record contains the information as to whether the corresponding customer purchased bedding and balcony plants in January. This is not possible without further ado, as it is typically not evident when designing the database table which queries are made to the database table in the future.
- each client record could include information as to whether the customer purchased bedding and balcony plants in January, whether the customer bought bedding and balcony plants in February, and so on for all months, and if the customer bought screws in January Customer in February has bought screws and so on for all products and months.
- the customer database table also grows significantly when a list of products purchased by each customer is included in each customer record. In order to be able to answer the above question, in particular in such a list, the month of sale would also have to be stored for each product purchased. Furthermore, if inquiries are to be expected concerning the payment method used by the customer when purchasing the product, the corresponding information must also be included in the customer database table. In accordance with the expected queries to the customer database table, a customer database table with an unacceptable size may also be required in this case if a so-called flat data structure is used for the customer database table. In particular, storing a list of products and additional information is problematic as the length of these Product list may vary greatly from customer to customer, but in database tables but usually a fixed number of fields is provided for all records.
- An acceptable size of the customer database table can be achieved if information (from the transaction database table) is aggregated into the
- Customer database table for example, if each customer receives information about whether he made any transaction in January, made any transaction in February, and so on. In this way, however, the answer to the above request is not possible because the information is not included with sufficient accuracy in the customer database table.
- [2] discloses methods for learning dependency structures underlying a data set using Bayesian networks and Markov networks.
- [4] discloses a method for arithmetic coding of data.
- [6] discloses the generation of a statistical clustering model for a database by means of which requests to the
- a method is described in which a first statistical image for a database is formed, which represents the statistical relationships of the data elements contained in the first database. Then, the first statistical image is stored in a server computer and from there via a
- the received first statistical image is further processed by the client computer.
- Reference [12] discloses a method for managing data by means of a multi-dimensional database.
- a data aggregation server is set up to deliver requested aggregated data to client devices.
- the invention is based on the problem of creating a possibility to determine results of queries, for the determination of which data from several database tables are required, more efficiently, less computationally intensive and less memory-intensive compared with the prior art.
- a database query system is provided with a first database image of a first database table having a first plurality of data records and a second database image of a second database table having a second plurality of data records.
- Each Record of the first plurality of records and each record of the second plurality of records is associated with a value of a database key.
- the database retrieval system has an input device configured to receive an analysis request to the second database image, a selection device configured to select a part of the first plurality of data records according to a first selection, a determination device that is set up to determine a second selection of a part of the second plurality of data records, wherein according to the second selection such records are assigned to which values of the database key are assigned, which are each associated with at least one record selected according to the first selection and a processing device that is configured, the
- the data records of the first database table and the data records of the second database table, which contain related information, are illustrated by means of a
- Database key linked and stored in compressed form as a database images store the database key values for the records.
- Concurrent information is that relating to the same person or thing, for example, the second database table contains records of information about customers of a hardware store, and the first database table contains information about transactions performed in the hardware store.
- a record contains the second database table and a record of the first database table related information, if the record of the first database table contains information about a transaction that was performed by the customer, through which the record of the second database table contains information.
- the database key that links the two records could, in this example, be a customer number of the customer contained in both records.
- a database key may consist of a single data field of a database table (eg, a customer number uniquely identifies a customer in a customer table), or a combination of multiple data fields (eg, the combination of a store number and a customer number within the store ).
- Database table are required, answered by in the first database image records are selected according to the required information, that is, records are selected for a particular condition is met. Subsequently, the corresponding data records of the second database image are selected, that is to say the data records in the second database image are selected which correspond to the selected data records of the first database image in accordance with the association by means of the database key. Based on the selected data records, the query can be answered because the required information from the first database image was used to generate the selection of the data records of the second database image.
- An idea underlying the invention can be seen in the fact that for each database table involved, a database image is created which contains in compressed form certain information from the database table. This database image is typically much smaller than the original database table, and is more suitable for certain operations because of its structure.
- the first database image and the second database image which are linked by means of the data key as explained, form a compressed relational structure.
- Memory (main memory) of a computer can be stored. Along with the described methods of speeding up queries in relational structures, a method is described that enables efficient triggering of relational queries in a graphical user interface using accelerated query times.
- the first database table and the second database table can be two database tables created from a database architectural perspective from two different perspectives. As in the example above, the first one contains
- Database table for example, one record each for the customers of the DIY store, which contains information about the respective customer, and the second database table j e a record for the executed in the hardware store transactions containing information about the transaction.
- the second database table contains records of information about clients of a hardware store, including the age of the respective customer, but not when the customer made a transaction in the hardware store
- the first database table contains information about transactions made in the hardware store, including the date of each transaction, but not how old the customer is who made the transaction.
- Database table that contains information about customers who made a transaction in May. Subsequently, the query can be answered on the basis of the selected data records of the second database table. In this way, it is possible to answer queries to the second database table, for the answers of which information from the first database table is required, without taking over the information in the second database table, for example in the form of a list or additional entries in the records of the second database table ,
- the first database table and the second database table may be stored in a storage device of the database retrieval system.
- they can be stored distributed, for example by means of a plurality of data server computers, which are coupled by means of a communication network.
- the use of the invention is of particular advantage since, as explained above, when evaluating the second database table, it is not necessary to permanently access additional information on the first database table, in particular in the case of distributed database tables a considerable effort, in particular communication costs, would be required.
- evaluations and / or selections may be made simultaneously in the first database table and the second database table.
- a query is based on the data records corresponding to the selections.
- all transactions or the corresponding transaction records) in which bedding and balcony plants were sold could be selected in the first database table.
- all customers (or the corresponding customer data records) older than 59 years could be selected in the second database table.
- a request to the first database table and / or to the second database table is then based on the transaction records corresponding to transactions where a customer older than 59 years has (at least) bought a bedding and balcony plant Customer records that correspond to customers over the age of 59 and have purchased at least one bed and balcony plant answered.
- the database tables vividly export a list of database keys corresponding to the existing ("own") selection, importing the list of the respective other database table, which is combined with the "own" selection.
- more than two database tables are linked in the manner described in an analogous manner. These can be done using one (for all database tables) common database key or by means of several pairs of shared database keys. For example, a customer table and a checkout table could be linked by means of a customer number and the check-in table with a transaction table by means of a check-in code number.
- a common database key must exist for each link of two database tables, and all database tables must be linked directly (by means of a common database key) or indirectly (via the "detour" of another database table).
- relational database is typically understood to mean a software system that manages one or more database tables in a database.
- Each database table may contain many records (for example, one customer table one record per customer, one transaction table one record per transaction).
- Each record in a database table contains values for the same fields (for example, customer number, age, gender).
- the invention clearly relates to the combination of several such database tables.
- the database tables can come from the same database, but also from different databases.
- the first compressed database image and the second compressed database image are independently created database images.
- the statistical model is a graphical probability model.
- a Bayesian network is used as the probabilistic model.
- the input device is further adapted to receive a selection instruction and the selection means is arranged to select the part of the first plurality of data records according to the selection instruction.
- the database retrieval system further comprises a display device configured to display a screen display showing the display of possible values of at least one random variable for which values are included in the first plurality of data sets. and that the selection instruction is selecting the display of at least one possible value (s) of the random variable, and the first selection is selecting all the records of the first plurality of records for which the random variable is one of the selected at least one possible one Takes values.
- the display device is further configured to display a further screen display having an indication of the result of the analysis request, and that the display device is further configured to switch between the screen display and the further screen display.
- a user can thus use the screen display to select data records and then switch to the further screen display so that the analysis results corresponding to the selection are displayed.
- the database query system has an access device which is set up to access the second database table and to determine data contained in the data sets of the second database table selected according to the second selection, and wherein the processing device is set up, determine the result of the analysis request using the data.
- the second database image does not contain sufficient information to answer the analysis request, the underlying second database table is used. However, it is not necessary to access the entire second database table, but only to the records selected according to the second selection.
- the second database image is used as a multidimensional index of the second database table. This will be explained in more detail below.
- the first plurality of data sets are grouped into a first plurality of segments (clusters) and / or in the second database image the second plurality of data sets are grouped into a second plurality of segments ,
- the first database image and / or the second database image are generated according to a statistical clustering model.
- the value of the database key is a record of the first database image (that is, a record of the first plurality of records) of a number of the segment in which the record is contained and a number of the record according to a numbering of the records of the segment.
- the value of the database key is a record of the second database image (that is, a record of the second plurality of records) of a number of the segment in which the record is contained and a number of the record according to a numbering of the records of the segment.
- Database key used in the first database table or in the second database table (for example, a customer number) used to link the first database image and the second database image.
- each record of the first plurality of data records the value of the database key is stored in the first database table and / or for each record of the second plurality of data records the value of the database key is stored in the second database table.
- the "natural key” is used to link the first database image and the second database image.
- the first database table or the second database table is used, for example in the context of the above-mentioned use as a multidimensional index, it is necessary to set the value of the "natural key” to the value of the database key stored in the first database table (for example Transaction number) or the second database table (for example, customer number) is used, which is made possible by the fact that for each record the value of the "natural key" is stored in the first database table or the second database table.
- a method for generating a compressed image of a database table containing a plurality of records, each record including a value of at least one statistical variable the steps
- a statistical probabilistic model for describing the relative frequencies of the values of the at least one statistical variable in the database table records and for grouping the data records into one segment of each of a plurality of segments;
- the assignment of the first coding value to the representative value and the assignment of the second coding value to the value of the statistical variable contained in the data record can clearly show a compression of the representative value or value. of the one contained in the record
- the second encoding value is preferably stored.
- a database table is divided into a large number of segments. For each segment and for each statistical variable to which each record contained in the segment contains an expression, a representative value, viz. A default value, of the statistical variable is determined.
- the representative value is an expression of the statistical variable that occurs with high relative frequency within the segment, that is, the data records contained in the segment. For each record contained in the segment, it is now assumed that the expression is the representative value corresponds to that contained in the data record, and accordingly the expression contained in the data record is coded only if the expression deviates from the representative value.
- the value of a random variable is explicitly stored / encoded only if that value deviates from the value that one would expect based on statistical modeling (i.e., on the representative value).
- the expected value is the most common value in a database table or in the segment of a database table. For higher compression, one can also choose the value that is the most likely value based on the prediction of a statistical model as the default value.
- the representative value be determined based on the description given by the statistical probability model of the relative frequencies of the values of the at least one statistical variable in the data records of the segment.
- Probability model is used to determine which value qualifies as a representative value for the statistical variable in the segment.
- the value is chosen as the representative value for which the statistical probability model indicates a high relative frequency within the segment.
- the representative value corresponds to an expression of the statistical variables that occurs in the data sets contained in the segment with a relative frequency that is above a predetermined threshold.
- the occurrence of the statistical variables is chosen as the representative value that occurs at the highest relative frequency within the segment.
- the statistical probability model is a graphical probability model.
- a Bayesian network is used as the probabilistic model.
- the values of the statistical variables contained in data sets contained in the same segment and which (values) differ from the representative value of the segment are determined by an arithmetic coding method and / or a method encoded for runlength encoding.
- the data sets are efficiently encoded by grouping the records into segments of similar records, stored in a data structure constructed in accordance with those segments, and the similarity of the records within the segments more efficient coding by statistical methods (eg run-length coding, arithmetic coding).
- the data of each segment can be stored line by line (that is, all values of the same data set are stored next to each other, ie at adjacent memory locations, in the memory).
- the data can be stored column by column (i.e., vividly field by field, values of the first field of all data sets are immediately in memory).
- a computing arrangement for analyzing data is provided.
- a display device which is set up, has at least one first window, which has a first display element which has the display of a designation of a first analysis result, which relates to a first statistical variable, and / or the display of the first analysis result, and a second window, which has a second display element displaying the display of a designation of a second analysis result relating to a second statistical quantity and / or the display of the second analysis result;
- a detection device configured to detect whether the first display element has been moved to the location of the second display element
- a calculating means which is arranged, in the case that the first display element is shifted to the location of the second display element, a third analysis result to calculate the first statistical size and the second statistical size;
- a user can drag and drop on a graphical user interface the first display element to the second display element move and thereby control the computer assembly so that the third analysis result is determined.
- An indicator that is indicative of a designation of a first analysis result concerning a statistical quantity and / or the display of the analysis result is, for example
- a label field of a screen surface window the window containing the relative frequencies of the occurrences of a statistical variable occurring in a database table; the display, for example the displayed value, a relative frequency of occurrence of a statistical variable occurring in a database table or the display of another analysis result;
- an improved usability concept is provided, in particular for the operation of computer programs which allow the query of databases and the statistical analysis of data stored in a database. It is preferred that the first analysis result is based on data contained in a first database table and that the second analysis result is based on data contained in a second database table.
- the first window thus serves to analyze the first database table and the second window to analyze the second database table.
- the user can therefore generate analysis results across the windows, based in particular on data contained in the first database table and on data contained in the second database table.
- the first database table is a transactional database table that has data in one
- the second database is a customer database table containing data about the clients of the construction market. A user can look in a first window as the first analysis result the distribution of the random variable "total turnover of the customers" (relative
- the first window shows in a table that in 2004, 30% of the clients of the DIY store made a total turnover between 100 and 150 euros through transactions (and correspondingly other values for other value ranges of the total turnover).
- the first table has the title "Total customer revenue”.
- a second analysis result relating to the transaction database is displayed, for example in a second table titled "Products", the relative frequency of the products purchased.
- the second table contains the entry that accounts for 3% of all transactions Balcony plants were bought, in 7% of all transactions were purchased garden furniture, etc.
- the user can now, for example, let the customer break down on the products, ie generate and display an analysis result that contains, for example, the information that 25% of the customers in the context of purchases of bedding and balcony plants made a total turnover between 100 euros and 150 euros ⁇ and other values for other value ranges of total sales and other products).
- the user achieves this by, for example, selecting the title bar of the first window, for example a field with the string "total sales of the customers", and dragging it into the second window, for example by dragging and dropping it into the second window.
- the display device is preferably a computer screen.
- the selection device is preferably a computer mouse.
- the selector is an element of the touch screen.
- FIG. 1 shows a computer arrangement according to an embodiment of the invention.
- FIG. 2 shows a first screen display of an Explorer computer program according to an exemplary embodiment of the invention.
- FIG. 3 shows a second screen display of an Explorer computer program according to an exemplary embodiment of the invention.
- FIG. 4 shows a third screen display of an Explorer computer program according to an exemplary embodiment of the invention.
- FIG. 5 shows a fourth screen display of an Explorer computer program according to an exemplary embodiment of the invention.
- FIG. 6 shows a fifth screen display of an Explorer computer program according to an embodiment of the invention.
- FIG. 7 shows a sixth screen display of an Explorer computer program according to an embodiment of the invention.
- FIG. 8 illustrates a cluster hierarchy corresponding to a database image according to an embodiment of the invention.
- FIG. 9 illustrates a cluster according to an exemplary embodiment of the invention.
- FIG. 1 shows a computer arrangement 100 according to an embodiment of the invention.
- a computer system 101 is coupled to a database system 102.
- the computer system 101 is a personal computer (PC) but may also be another computer, for example a workstation.
- PC personal computer
- workstation for example a workstation
- the computer system 101 includes a screen 110, a microprocessor 103, a memory 104, and various input devices 111, such as a keyboard and a computer mouse.
- Database system 102 is a computer system for storing database tables.
- the database system 102 may accordingly be a high-end computer
- Memory capacity is equipped and with the computer system 101, for example by means of an Ethernet interface or wirelessly, for example by means of Blue-tooth coupled.
- the database system may function as an Oracle database, a Microsoft Access database, a Lotus 1-2-3 database, or a dBase database.
- a customer database table 105 and a transaction database table 106 are stored, which are described in more detail below.
- a customer database table image 107 that is a compressed image of the customer database table 105
- a transaction database table image 108 that is a compressed image of the transaction database table 106
- the customer database table image 107 and the transaction database table image 108 are illustrative Data structures that contain the data from the customer database table 105 or the transaction database table 106 in compressed form.
- Transaction database table image 108 will be described in detail below.
- the database system 102 is part of the computer system 101.
- the computer system 101 has a hard disk in which the customer database table 105 and the transaction database table 106 are stored, and further has a working memory in which the customer database table image 107 and the
- Transaction database table image 108 are stored, so that in particular the customer database table image 107 and the transaction database table image 108 can be accessed quickly.
- Transaction database table image 108 (and thus the transaction database table 106) on the screen 110 graphically.
- FIG. 2 shows a first screen display 200 of an Explorer computer program according to an embodiment of the invention.
- the first screen display 200 shows results of a statistical analysis of the customer database table image 107 and thus results of a statistical analysis of the customer database table 105.
- the customer database table 105 contains information about the customers of a hardware store.
- the customer database table contains a customer data record for each customer of the DIY store (or for each registered customer of the DIY store) that contains a customer number of the customer, the gender of the customer, the income class of the customer and the year of birth of the customer.
- Customer database table 105 may still contain a variety of other information about the respective customer, in this example, however, it is assumed that they contain only the above information.
- the customer database table image 107 accordingly contains this information about the customers of the hardware store in compressed form, as explained below.
- the Explorer computer program 109 allows the analysis of the data contained in the customer database table image 107 and the graphical display of results of such analysis.
- the Explorer computer program 109 examined how the age distribution of the clients of the building market is and the result of the Explorer computer program 109 in a first window 201 of the first screen 200 shown. From this it can be seen that 68. 65% of the building market announcements are male and that 31, 33% of the building market announcements are female.
- Explorer computer program 109 performs this analysis by counting all customer records that contain the information that the customer corresponding to the customer record is male and counts all customer records that contain the information that the corresponding customer is female and the count results relative to the total number of customer records.
- the age distribution of the customers of the building market was analyzed by means of the Explorer computer program 109 by counting customer data records which contain the information that the birth year of the corresponding customer is within a certain range.
- the result of this age distribution analysis is displayed in a second window 202 of the first screen 200 on the screen 110.
- the analyzes whose results are displayed in the first window 201 in the second window 202 and in the third window 203 are based on all customer records, for example, all customer records were counted, the Information indicates that the corresponding customer is male and set in proportion to the number of customer records to determine the corresponding analysis result (68, 65%).
- the selection information field 204 in another embodiment, further includes the total number of customer records that underlie the analyzes.
- the first screen display 200 has a first selection window 205 and a second selection window 206.
- the first selection window 205 and the second selection window 206 allow the user to set additional windows to be displayed in the area adjacent to the first selection window 205 and the second selection window 206, for example, windows having analysis results analogous to the first window 201, the second window 202 and the third window 203, which relate to other statistical variables, such as the sales of customers of the construction market.
- the transactional database table image 108 and thus the transaction database table 106 can also be analyzed by means of the explorer computer program 109.
- the analysis results may also be displayed on the screen 110, Figure 3 shows a corresponding display.
- FIG. 3 shows a second screen display 300 of an Explorer computer program according to an embodiment of the invention.
- switching between the first screen display 200 and the second screen display 300 can be accomplished by operating (clicking) an icon in a toolbar.
- the transaction database table 106 includes a plurality of transaction records.
- Each transaction record corresponds to a transaction, ie, a sales transaction in the hardware store, and contains a transaction number that uniquely identifies the transaction, a specification of the product sold during the transaction, the gross sales value of the transaction, the date of the transaction, and the transaction Customer number of the customer involved in the transaction, that is, the one sold
- This information is correspondingly included in the transaction database table image 108 in compressed form.
- the second screen 300 shows in a first
- Window 301 shows the results of an analysis of how often certain products in the hardware store transactions have been purchased by customers in relation to all DIY store transactions.
- Transaction records are counted that contain the information that was sold in the relevant transaction bedding and balcony plants.
- the count result is related to the total number of transaction data, giving the percentage value (6.68%).
- a second window 302 displays the result of an analysis of how the number of transactions is distributed over the year.
- a third window displays the result of an analysis of the gross sales value distribution on the transactions. For example, it can be seen that for 13, 72% of all transactions, the gross sales value was between 10 and 25 euros.
- the analyzes whose results are displayed in the first window 301, in the second window 302 and in the third window 303 are all assigned to transaction records Basically, why analogous to Figure 2 in a selection information field 304, the value 100% is displayed.
- an example is explained in which an analysis is based on only a part of the transaction data sets.
- FIG. 4 shows a third screen 400 of an explorer computer program according to an embodiment of the invention.
- the third screen 400 emerges from the second screen 300 when a user selects bed and balcony plants by means of one of the input devices 111 in the first window 301 of the second screen corresponding to a first window 401, and the second window 302 of the second Screen 300 corresponding to a second window 402 selects March 2003.
- the user clicks the value 6, 68 in the first window 301 of the second screen display 300, replacing it with a first bar 404 and the value 100, as shown in the first window 401.
- the user has clicked on the value 9, 01 by means of a computer mouse, for example in the second window 302 of the second screen display 300, whereby this value is replaced by a second sheet 405 and the value 100, as in the second window 402 is shown.
- the first bar 404 indicates that now only
- Transaction records are selected that contain the information that a bed and balcony plant was sold in during the transaction.
- the selected (selected) data records are based on the analyzes whose results are displayed in the first window 401, in the second window 402 and in the third window 403, respectively.
- a DIY store sales manager wants to perform an analysis of the age distribution of some customers who bought at least one bedding and balcony plant in March 2003.
- the sales manager may want to conduct this analysis to determine whether it is worth starting a "geranium for retiree" discount sale next March.
- the sales manager starts the Explorer computer program 109 on the basis of the customer database table image 107, so that the first screen display 200 is displayed on the screen 110.
- the sales manager evaluates, as described above with reference to FIG. 4, bed and balcony plants in the first window 301 of the second screen display 300 and March 2003 in the second window 302 of the second screen display 300, so that the second screen display 300 enters the third screen display 400 passes.
- the sales manager changes, for example by clicking on a corresponding icon, to the first screen display 200, which according to the selection j, however, has passed into the fourth screen display 500, which is shown in FIG.
- FIG. 5 shows a fourth screen display 500 of an Explorer computer program according to an embodiment of the invention.
- the analyzes whose results in a first window 501 corresponding to the first window 201 of the first screen display 200, in a second Window 502, which corresponds to the second window 202 of the first screen display 200, or in a third window 503, which corresponds to the third window 203 of the first screen 200, are represented, based exactly the customer records that correspond to customers in March 2003 bought a bed and balcony plant.
- Transaction database table image 108 all those customer numbers are determined, each corresponding to a transaction record that a transaction which was completed in March 2003 and in which a customer (namely the customer specified by the customer number) has bought a bed and balcony plant.
- the analyzes whose results are displayed in the first window 501, in the second window 502 and in the third window 503, are now based exactly on the customer records, which contain one of the thus determined customer numbers. These customer records are referred to below as the selected customer records.
- the customer number is used as a database key that links related customer records and transaction records together.
- the proportion of the selected customer data records in the total number of customer data records is displayed in a selection information field 504 corresponding to the selection information field 204 of the first screen display 200, in this example 1.02%. This means that 1, 02% of the (registered)
- the selected customer data records are sent to the analyzes, the results of which in the first window 501, in the second window 502 or in the second window 502. are displayed in the third window 503.
- the sales manager is interested in the result of the analysis, the result of which is displayed in the second window 502.
- the data are in the one described above
- Embodiment not in the form of a so-called flat data structure before, that is in a single database table, but are distributed to multiple database tables, in this example, the customer database table 105 and the transaction database table 106.
- the customer database table 105 and the transaction database table 106 are distributed to multiple database tables, in this example, the customer database table 105 and the transaction database table 106.
- the customer database table 105 and the transaction database table 106 are distributed to multiple database tables, in this example, the customer database table 105 and the transaction database table 106.
- Transaction database table 106 is in a 1: n relationship using the customer number because in this example, a customer may be involved in multiple transactions.
- m: n relationships are also conceivable, for example, if a customer may be involved in multiple transactions, and multiple customers can perform a transaction together.
- a further window is displayed in the first screen 200, by means of which the user can select whether the selection according to FIG. 4 shows the analyzes whose results in the first window 201, FIG. in the second
- Window 202 and the third window 203 are to be based.
- the additional window can be set to the state "yes", which means that the selection according to FIG. 4 is used as the basis for the analyzes. This condition may be in the further
- Windows have a state "no" (or correspondingly designated state).
- the user in this example the sales manager, can put the further window in one of the two states by using, for example, a computer mouse, i. H. make a selection of one of the two states and thereby determine whether the currently entered selections in the other database table should be taken into account when evaluating this database table.
- the further window may optionally retain its designation and the effect of selections made therein when the selection in the second screen display is changed, or adjust automatically. Depending on that, so will the either continue to refer to bedding plants (for example, if the "keep” mode is activated) or switch to drilling machines, if you change the selection in the second display of bedding plants on drills.
- Screen 400 a re-selection, in this case by customers, be performed.
- this selection by means of the common key (customer number) of the transaction database table image 108 and the customer database table image 107, it is possible to select transactions on which the analyzes are based, the results of which are shown in the third screen display.
- the user could select the customers who purchased at least one bed and balcony plant in March 2003 and who belong to income class six, for example by clicking on the value 2, 87 in the third window 503.
- the mode of the other windows is set to "maintained"
- the selection of customers defined in the last paragraph, in the interaction between the transaction table and the customer table can be transferred back to the transaction world, so that more information about the other transactions of this customer group can be found the previously defined bedding and balcony plants in March.
- the selections in the third screen display are first removed (which has no effect on the fourth screen 400 according to the "keep” mode) and in the one displayed there further windows select the state "yes", whereby the customer list currently active in the fourth screen 400 is transferred to the third screen 300. Accordingly, the third screen 300 would change and in the third window 403 the distribution of the gross sales values of the transactions displayed by customers who are in income class six and bought at least one bed and balcony plant in March 2003.
- the customer database table image 107 corresponds to a view of the customers of the hardware store and the transaction database table image 108 of a view of the transactions made in the hardware store.
- FIG. 6 shows a fifth screen display 600 of an Explorer computer program according to an embodiment of the invention.
- the fifth screen 600 is shown in the third screen 400.
- the fifth screen display 600 includes (partially) a first window 601 corresponding to the first window 301 of the second screen display 300.
- Screen 600 further includes (partially) a second window 602 that corresponds to third window 303 of second screen display 300.
- a third window 603 shows the result of an analysis, in which it was determined for each product group how high the proportion of transactions in which a product from the respective product group was sold and in which the gross sales value was less than 5 euros all transactions in which a product of the respective product group was sold.
- a first bar 604 shows that in about 60% of all transactions where a Product was sold from the Product Group "Technology", the gross sales value was below € 5.
- Corresponding bars are shown for the product groups “Ambiente”, “Garten”, “Baustoffe / Sanitär” etc.
- Random variable "gross sales value" broken down by product group The user of the explorer computer program 109 may select the fifth screen display 600 from the third
- Display screen 400 by clicking on the value (65, 84) for the expression " ⁇ 5" in the third window 403 of the third screen 400 with a computer mouse holding the mouse button pressed and the value in the first window 401 of the third screen 400 pulls (drag and drop).
- an expression of a first random variable over a second random variable can be broken down by dragging the value for the relative frequency of the expression of the first random variable into a window in which the relative frequencies of the occurrences of the second random variable are represented. This can also have one
- the user may click the value (65 r 84) for the expression " ⁇ 5" in the third window 403 of the third screen display 400 with a computer mouse, change to the fifth screen display 500 by a corresponding command, and drag into the first window 501.
- the expression "below 5 Euro” would be broken down by the gender "gross sales value” random variable and, for example, a bar would appear stating that 40% of all transactions made by a male customer were priced below 5 Euros (and another bar accordingly for the female customers).
- the first random variable is the gross sales value and the second random variable is the product.
- a three-dimensional diagrammatic representation can be generated. For example, a diagrammatic three-dimensional representation in which all product groups are represented along one axis (that is, occurrences of a first random variable), as is the case in the third window 603, along a second coordinate axis, ranges of gross sales values, for example " ⁇ 5" f " 5-10 ", etc. (occurrences of a second random variable).
- a location of the grid formed by the first coordinate axis and the second coordinate axis which corresponds to a certain product group and a given gross sales value range, could by a
- a third coordinate axis bar shows the percentage of transactions in which a product of the product group was sold and the sales value is in the sales value range, on the transactions where a product from the product group was sold.
- this corresponds to the representation of the analysis result shown in the third window 603 for all gross sales value ranges (and not just the gross sales value range " ⁇ 5") by the representation shown in the third window about a further coordinate axis (the above-mentioned second coordinate axis ) and accordingly a two-dimensional scheme of beams is created.
- FIG. 7 shows a sixth screen 700 of an explorer computer program according to an embodiment of the invention.
- the sixth screen 700 has (partially) a first window 701 corresponding to the first window 301 of the second screen display 300.
- the sixth screen 700 further includes (partially) a second window 702 corresponding to the third window 303 of the second screen display 300.
- a third window 703 the result of another analysis is shown.
- the analysis determined the average gross sales value of all transaction records that correspond to a transaction where a product from a particular product group was sold, and performed accordingly for multiple product groups.
- a flag 704 shows that the average gross sales value of all gross sales values for transactions in which a product from the product group technology was sold is about 8 euros.
- the average gross sales value (the gross sales values from all transaction records) is broken down across the different product groups.
- the user may generate the sixth screen 700 from the second screen 300 by, for example, dragging and dropping the field with the string "percentage values" from the third window 303 into the first window 301.
- the user could be presented with a selection menu by means of which the user can select from several options.
- the user may choose to display a window instead of the third window 703 that does not indicate the average gross sales value for each product group but the sum of all gross sales values contained in transaction records corresponding to the transaction each one product from the respective product group was sold.
- another tag (analogous to tag 704) indicating the sum of all the sales values from transaction records corresponding to the transaction where a product was sold from the Product Group "Engineering" might be displayed.
- the customer in the first age range if the customers in the first age range made more revenue than the customers in the second age range, the customer would have a higher customer share than a second age range, as indicated by the second window 202 of the first screen the customer in the first age range is not higher than the number of customers in the second age range (since the weighting is taken into account when counting the corresponding customer data records). This presupposes that each customer data record contains information about the turnover of the respective customer.
- transactions may be weighted according to their share of revenue.
- a window in which the selected customers are broken down according to the occurrence of a random variable can be displayed in the screen display relating to the customer database table 105.
- another window could be displayed in the fourth screen 500, which shows different sales areas (for example through bars), how high the proportion of customers who made the respective sales and bought a bed and balcony plant in March, to all customers who bought bedding and balcony plants in March.
- the database table has several data records which clearly form the database table among each other. For example, as in the example described above, there is one record for each (registered) customer of a hardware store. For example, each record has a database table entry that contains the age of each customer. Illustratively, the data records form rows in which the age of the customer corresponding to the respective row is indicated in an "age" column.
- the attribute age (and other existing attributes such as income, gender, etc.) of the customer is interpreted as a random variable, that is construed. Depending on the customer, this random variable assumes a certain value (state, form), for example the value 23, if the corresponding customer is 23 years old.
- the possible values of the random variables occur with a relative frequency in the database table. For example, if one quarter of all (registered) customers of the DIY store 23, the relative frequency of the value (state) 23 of the random variable age is 0, 25 or 25%. ⁇
- a statistical model of the data in the database table is created generated.
- the statistical model is illustratively an approximation of the common probability distribution of the random variables of the database table.
- the statistical model is "learned" by a learning process from the database table entries, that is, using the database table entries, preferably using a maximum likelihood approach.
- the probabilities present within the framework of the statistical model of the database table describe, as mentioned, the relative frequencies of the states of the database table entries, depending on the procedure exactly or approximately.
- the database table entries may assume a variety of states, which states may occur with different relative frequencies.
- Random variables are given according to a predetermined condition and corresponding to the predetermined relative frequencies of the states of the random variables relative frequencies of the states thereof dependent (thus correlated) further random variables are determined.
- a statistical model for example, a graphical probability model (Graphical
- the graphical truth models include in particular Bayesian networks (Bayesian Networks or Belief Networks) and Markov-Net ze.
- a statistical model can be generated, for example, by structural learning in Bayesian networks, as described, for example, in [2].
- Another possibility is to learn the parameters of the statistical model for a fixed structure, that is, to determine, as described for example in [3].
- a likelihood function is used as an optimization criterion for the parameters of the model.
- a particular implementation here is the expectation-maximization (EM) learning method, which is described in more detail below with reference to a specific model.
- a statistical model is preferably a statistical clustering model, in particular a Bayesian
- Clustering model which divides the data into a plurality of clusters (also called segments).
- the database table is divided into several smaller parts (clusters, segments), which in turn can be considered as separate database tables and, because of their smaller size, can be handled more efficiently.
- a more efficient statistical evaluation of the database table using a clustering model can be achieved, for example, by checking in the statistical evaluation of the database table whether a given selection condition leads to the statistical model recognizing that all the data that contains the selection conditions meet in a single or a subset of clusters. If this is true, then one can restrict oneself to these clusters in the evaluation. Likewise, it is possible to have a restriction to those clusters in which the data satisfying the given condition is included with at least a certain relative frequency. The remaining clusters, in which data according to the given condition are contained only in a smaller proportion, can be neglected, if only approximate statements are desired.
- a Bayesian clustering model a model with a discrete latent variable
- Random variable Xj_ can, for example, the
- a record in the database table contains a value (expression) for each of the random variables X] _, ..., X ⁇ .
- the ⁇ -th dataset of the database table can accordingly be in the form
- the datasets When written among each other, the datasets vividly form a database table (or panel) that has a column for each random variable.
- the board M has entries.
- the entire database table can be used as a matrix
- cluster variable When using a clustering model, a so-called hidden variable (cluster variable), which is denoted by ⁇ , is additionally used.
- the value of the ⁇ variable for a record indicates which cluster (segment) the record is associated with as part of the clustering model. In this example, therefore, there are R different clusters.
- the a priori distribution describes how much of the data is assigned to the j eching clusters'.
- the set of random variables ⁇ can take the possible parameter vectors ⁇ of the statistical model.
- ⁇ ) and the distributions of the conditional probabilities P (XI ⁇ - tö ⁇ , ⁇ ⁇ ) (for each cluster) together form a probability model P (X, ⁇ I ⁇ ) for (Xi , ..., X ⁇ , ⁇ ).
- the probability model is given by the product of the a priori distribution and the conditional probability distribution, that is:
- each of the iteration steps consists of an E and a M step.
- the E step corresponds to the right part of the above equation.
- each record x ⁇ is assigned to a cluster (segment).
- the data set x is assigned to the i-th cluster, whose weight is highest, that is to say when valid
- the cluster membership of each record can be stored in an additional field of the record in the database table and appropriate indexes can be prepared to quickly access the data belonging to a particular cluster.
- This distribution shows (possibly only approximately) what proportion of the data is to be found in which clusters of the database table according to the specified condition. So it is possible to limit itself to the parts (clusters) of the database table, which correspond to P ( ⁇
- the property of the cluster models described here is exploited that the a posteriori probability of a cluster for a selection condition is 0 only if no single data record satisfying the condition is contained in the cluster. In this respect, the models are exact.
- the statistical model can also be used to directly calculate certain desired probabilities (possibly approximatively). For example,
- OLAP on-line analytical processing
- Database image compressed as explained below.
- the entire database image that is, all clusters, need not be decompressed on a request.
- clusters are excluded from the evaluation below a certain minimum
- the data belonging to a cluster is advantageously stored in a manner corresponding to the cluster membership.
- the data associated with a cluster may be stored in a portion of the memory 104 so that the associated data may be read in blocks quickly.
- random variables that take on continuous values can be discretized.
- a "Income” random variable that is, a random variable that corresponds to the indication in the customer records of the income of the respective customer, are divided into income classes.
- the division into income classes can be different or coarse, according to the analytical
- the variable may first be discretized at intervals.
- the mean of each interval may additionally be stored and for each discrete value the deviation from the mean. Since then only small differences have to be stored, this can be done very memory efficient.
- Variants of categorical variables are coded accordingly, for example, for a "gender” random variable the expression “male” is coded by means of a zero and the expression “female” by means of a one.
- ⁇ these can be grouped into classes when the data image is created, as long as this allows the requirements for the database image.
- the product index of the above mentioned DIY store could be organized hierarchically, for example the product titled "M4 screw” could be part of the "Machine screws” product group.
- machine screws could in turn be assigned to the product group “screws”, which in turn is assigned to the product group “tool accessories”, wherein “Tool Accessories” itself is a product subgroup of the product group “Tools”. According to the requirements of the data tape image, it might now be sufficient not to differentiate between different machine screws, but to combine them into a class "machine screws”. Accordingly, for example, each transaction record in the transaction database table image 108 in the field corresponding to the product specification has the entry "machine screws” (or a value assigned to this characteristic, respectively), if the corresponding one
- a query to the database image can now be processed based on this categorical variable's categorization. If a more precise classification of the values of the categorical variable (for example a differentiation between different machine screws) is required to answer the request, the database table is used. In this case, however, typically only a few details have to be queried from the database table.
- the database image can be used to provide approximate answers to statistical queries.
- the database image is constructed hierarchically.
- the clusters generated as described above are themselves understood as database tables and subdivided into segments analogously to the entire database table, that is to say each data record in the ith cluster is assigned to a jth subcluster of a plurality of sub clusters of the ith cluster , Continuing analogously, a tree of clusters and vividly becomes
- the resulting cluster hierarchy is shown in FIG.
- FIG. 8 illustrates a cluster hierarchy 800 corresponding to a database image according to an embodiment of the invention.
- the cluster hierarchy 800 is in the form of a tree.
- a statistical clustering model is determined.
- Parameter vector ⁇ and waives the random variable ⁇ accordingly. It is assumed that the statistical clustering model is specified by a corresponding set of parameters. )
- the database table 801 becomes a first plurality of R] _ clusters
- the probability distribution for the data sets in the ith cluster of the first plurality of clusters 802 is given by P (X
- the i-th cluster of the first plurality of clusters 802 contains N-j_ data sets.
- the probability that a cluster belongs to the i-th cluster of the first plurality of clusters 802 is P ( ⁇ OJ_), where a> ⁇ the value is the cluster variable ⁇ corresponding to the i-th cluster of the first plurality of clusters 802.
- the clusters of the first plurality of clusters 802 are clustered to form a second plurality of clusters 803.
- the i-th cluster of the first plurality of clusters 802 is thereby divided into R2, i (sub-) clusters.
- the j-th subcluster (which is one of the clusters of the second plurality of clusters 803) of the i-th cluster of the first plurality of clusters 802 is assigned the value G> ⁇ r j of the cluster variable ⁇ .
- the probability distribution for the records in the j-th subcluster of the i-th cluster of the first plurality of clusters 802 is given by P (X
- the jth subcluster of the ith cluster of the first plurality of clusters 802 contains N j records. The probability that one
- Clusters of the jth subcluster of the i-th cluster of the first plurality of clusters 802 is P (OOj ⁇ j).
- the clusters of the second plurality of clusters 803 are further subdivided into clusters analogously to the first plurality of clusters 802, so that a third plurality of clusters 803 are clustered Clusters 804 are created for which the quantities P (X
- the records in the lowest level of the cluster hierarchy 800 are stored in compressed form and stored, for example, in the memory 104 as a database image.
- the database image has additional data in addition to the stored records, such as the parameter set of the statistical (clustering) model that was determined.
- FIG 9 illustrates a cluster 900 according to an embodiment of the invention.
- the cluster 900 is shown in the form of a table. Each row of a plurality of N rows 901, 902 corresponds to a record contained in the cluster 900.
- Each column of a plurality of K columns 903, 904 corresponds to a random variable.
- the cluster 900 corresponds to the value ⁇ of the cluster variable ⁇ .
- a data set thus corresponds to a K-tuple of possible occurrences, wherein the K-tuple at the i-th point has one of the possible values of the ith random variable Xj_.
- the probability distribution of the random variables for the records in cluster 900 that is, the relative ones
- K-tuple frequencies of occurrences in cluster 900 be given by P (X
- x. 1 ( ⁇ . Or ..., x. ⁇ (for all i with 1 ⁇ i ⁇ K) are discrete values.)
- x. ⁇ (for all i with 1 ⁇ i ⁇ K) are discrete values.)
- x. ⁇ (for all i with 1 ⁇ i ⁇ K) are discrete values.)
- x. ⁇ (for all i with 1 ⁇ i ⁇ K) are discrete values.)
- Xi j may correspond to a discretization interval.
- the cluster hierarchy 800 is formed so that the data within the clusters of the cluster hierarchy 800 is more homogeneous than the entire data in the underlying database table.
- every random variable is given a value (one characteristic) which is most frequently (or relatively frequently) contained in the data records of cluster 900 and thus in the majority of rows 901, 902.
- the excellent value for the i-th random variable X ⁇ (also as the default value of the ith random variable or as
- Representative value is called XJ_.
- the default value can be calculated using the statistical model, so the occurrences contained in the data sets do not have to be counted in each case in order to determine their relative relative frequency.
- Probability P (Xj_ XJ_
- ⁇ -jj is relatively high, that is, in the ith cluster can be assumed that the i-th random variable has the value ⁇ ⁇ .
- 90% of all (registered) male customers between the age of 30 and 40 years of the above-mentioned hardware store may have a call money account (to recognize this, the customer database table 105 must contain the information as to whether the customers have a call money account). For this class of customers, it can therefore be assumed with a high degree of certainty that they (each) own a call money account.
- the generation of the clustering model now also shows that a cluster predominantly consists of customers of this type, for example, the customers in this cluster are 85% male, 95% between 30 and 40, and 92% have a call money account,
- the default value "yes” is used for the call money account random variable, ie the entry whether the corresponding customer has a call money account ("yes" being coded, for example, by the value 1).
- the value of the cluster variable ⁇ for a cluster for prediction of the data sets in the cluster can be illustrated in this example, the value of the random variable that indicates whether the corresponding customer has a call money account.
- the data sets in the cluster 900 are compressed based on the basic principle that only the deviation of an occurrence of a random variable from the corresponding default value is always stored. This is done, for example, by means of runlength coding.
- the i-th column is runlength encoded.
- the i-th column contains the values
- the default value Xi is not encoded, but only encodes how often it occurs in consecutive lines. Accordingly, the i-th column becomes 2, ⁇ i (5 , 0, * i, 2 ' 4 ' x i, l ' 3 ' x i, 4
- one is added to the number of consecutive lines in which the default value is contained, so that the coded column the
- the cluster 900 is arithmetically coded column by column.
- Arithmetic coding (see, for example, [4]) is a
- Compression method in which a data stream into a Bit representation of a real interval is converted. In doing so, a given probability distribution is used.
- the probability distribution is used to calculate the probability distribution
- the data stream is represented by the ith column 904 (or by all of them written one after the other)
- the compression is then performed according to an arithmetic compressor.
- the i-th column is given, for example, by
- the procedure is not column by column, but by rows. Analogous to the column-wise procedure, the above options are available (run-length coding, arithmetic coding, combination of run-length coding and arithmetic coding).
- the cluster hierarchy 800 is preferably constructed to such an extent that no further storage space is saved by further segmentation (that is, subdivision into clusters) of the lowest level of clusters (in FIG. 8 of the third plurality of clusters 804) in this case, because the space required to store the statistical model offsets the additional compression achieved).
- the cluster 900 can then be compressed in a second step by means of a further compression method, for example by means of a Lempel-Ziv compression method, in order to eliminate possibly existing redundancies. Since compression of the cluster has already been achieved by means of one of the abovementioned compression methods, complex compression methods can be used in the second step without requiring unacceptable computational overhead in compression and / or decompression.
- the statistical methods of compression and the data structures built up thereby not only have a positive effect on the size of a database image.
- the data structures can also be easily used to accelerate analytical queries. If z. For example, if only one value is coded for a variable, if it deviates from the default value, corrections to a default statistic must always be made for all the data records just selected when determining statistics about the different values, corresponding to each coded deviation from the default value.
- the coding of the cluster 900 or of the data sets contained in the cluster makes it possible to store a key in the data image for each data record contained in the cluster 900, by means of which the corresponding data record in FIG the underlying database table can be found.
- Each record in the underlying database table has a key associated with it.
- the database image of the database table contains this key for each compressed record stored as explained above.
- a "natural key" of the segmentation may be used, that is, as a key to a record in the cluster 900, a correspondence of a first key containing the Cluster number of clusters 900 specified, and a second key, which corresponds to a number of the record corresponding to a numbering of the records contained in the cluster 900.
- the second key is thus illustratively the number of the record within the cluster 900.
- the cluster number of the cluster 900 may be a hierarchical cluster number configured according to the cluster hierarchy 800.
- the subclusters of a cluster can be numbered consecutively, and the subclusters of such a subcluster can be numbered consecutively again, so that, for example, a hierarchical cluster number of the cluster 900 of the form 1/3/2 results if the cluster 900 the second subcluster (in the third plurality of clusters 804) of the third subcluster (in the second plurality of clusters 803) of the first cluster of the first plurality of clusters 802.
- the second key which corresponds to a number of the record corresponding to a numbering of the records contained in the cluster 900, can typically be chosen to be very short (one byte or few bytes in length) because only a few records are contained in the cluster 900 due to the segmentation.
- the assignment of the "natural keys" to the keys used in the underlying database table (which is required to find the record corresponding to a record in the database image in the database table) can take the form of a database table in the database the database table contains, itself to be stored and with an access to the
- Database table or to the database accordingly. If a plurality of database tables and corresponding database images exist, for example, according to FIG. 1, a transaction database table image 108 for a transaction database table 106 and a customer database table image 107 for a
- corresponding customer records in the customer database table image 107 are selected. This is done by means of a common key of the customer database table 105 and the transaction database table 106, for example by the customer number of a customer corresponding to a customer record or a customer involved in a transaction corresponding to a transaction record.
- the corresponding transaction records in the transaction database table 106 may be identified (eg, by means of a transaction database record key stored in the transaction database table image 108 in the transaction database table image 108) appropriate
- Allocation table By means of the customer numbers, the correspondingly selected customer data records in the customer database table 105 can now be determined and, by means of an allocation table, which corresponds to the keys of the customer data records of the customer database table image 107
- Assigning customer data records keys to the customer database table 105 which are determined according to selected customer records in the customer database table image 107 and the corresponding selection (for example according to FIG. 5) can be used.
- the transaction database table image and the customer database table image 107 itself have a common key (for example, customer numbers) enable the corresponding selection of customer records in the customer database table image 107 to select transaction records in the transaction database table image 108 analogous to the procedure described above.
- the proposed method has the following advantages, in particular in the context of relational queries (that is, queries involving multiple database tables).
- the compression allows the database images to be kept in a small but fast memory (in main memory).
- the database images are designed so that keys can be stored in the compressed images and still allow (almost) random access. This allows different database images (as originally different tables (database tables) in the relational database) to connect via keys and thus to answer relational queries. This gives a considerable speed gain for the following reasons:
- the database images are constructed in such a way that segmentation allows fast access to the data and fast counting.
- the transaction database table image 108 contains references to the data records in the other database image (eg, the customer database table image 107).
- an increase in efficiency is achieved in that the two database images are not generated independently of each other, but that the grouping of data sets to clusters for generating one of the two database images takes place with regard to the other database image.
- the transaction database table image 108 is generated with respect to the customer database table image 107 by mapping all transaction records that correspond to the same customer record, that is, correspond to the transactions in which the same customer was involved, to the same cluster. This makes it possible, for example when selecting customer records in the customer database table image 107, to quickly access the corresponding transaction records in the transaction database table image 108, since they are all assigned to the same cluster of the transaction database table image 108. This is particularly advantageous when the clusters of the transaction database table image 108 are compressed and must be decompressed on access. In a grouping as above, therefore, only a few clusters need to be decompressed on a request.
- a coordinated cluster structure can, for example. be achieved by first clustering as usual a blackboard (i.e., database table) is generated by a learning process. All the data from the second panel corresponding to the keys to a cluster from the first panel are then combined into a cluster for the second panel without a learning procedure.
- the customers are first grouped into typical customer classes (ie, a clustering of the customer database table data records is performed). The transaction records for all the transactions that belong to the customers of a customer class are then combined into a cluster for the transaction data. Accordingly, learning takes place only on the first board.
- the clustering on the second panel depends on the clusters of the first panel.
- a common clustering can also be achieved through joint learning.
- a common clustering can z. B. can be achieved through common EM steps in an EM learning process, using a common cluster variable.
- the cluster affiliations are first estimated (E-step).
- the affiliation z. B. a customer from a customer table to a cluster made not only on the basis of his customer characteristics but also on the basis of his transactions (stored in the transaction table).
- the transactions belonging to a customer there are not different a posteriori estimates for the customer
- the common clustering can be done as follows. To obtain the a-posteriori estimate for the latent variable (the cluster variable) for a client, first, as in known inference techniques (see, e.g., the inference methods described in [10], using Message Passing algorithms) a message from each of the customer's known variables (or variable groups or cliques) from the customer table to the cluster variable Posted . As usual, the
- a message is now sent to the cluster variable from each entry in the transaction table belonging to the customer just considered, in order to obtain the information from the transaction table in the a posteriori estimation of the customer's affiliation To consider clusters. For each transaction that belongs to a customer, the
- Probability tables of a selected "transaction model” (a common probabilistic model for the variables from the transaction table and the latent variable) can be used, and the resulting a posteriori estimate for the cluster variable can form the basis for the M step.
- this is the usual M-step using the jointly calculated posterior for each customer and calculation of the "sufficient statistics" (see [1] and [3]) as the sum across all customers.
- the calculation of sufficient statistics for the M step can be done as the sum of all transactions of a customer with the associated posterior and as an additional sum across all customers.
- a database image contains keys as described above, the database image can be used as a multidimensional index for a database. This will be explained below.
- multiple database-associated database images allow for multidimensional access to a database in conditions
- an index can be created for a column of the database table that allows to quickly find records of the database table for which the size stored in the column assumes a certain value.
- the customer database table 105 could have a column indicating the nationality of the customers, that is, each customer record has a field that contains a specification of the nationality of the corresponding customer.
- an index that is, a list. In this way, the customer records that correspond to customers of nationality can be found quickly in the database table.
- This allows an index to be created for each column of the database table.
- the database table has a large number of columns, a considerable outlay arises, which in particular leads to performance difficulties. In extreme cases, it is, for example
- a database image can be used as a "multidimensional" index for the database table if, as explained above, the records in the database image have keys stored that allow them to find the corresponding records in the underlying database table.
- the corresponding data records can be found in the underlying database table without having to check the specified conditions for all data records of the database table.
- the customer database table for each (registered customer) of the hardware store contains a customer record that contains the customer's address in addition to the age of the customer, the customer number, the gender of the customer (etc.).
- the customer database table image 107 there is a customer record for each customer which contains only a portion of this information, for example the gender of the corresponding customer and the age of the corresponding customer, but in particular not the address of the corresponding customer.
- a target group could have been determined, for example, all customers between the ages of 30 and 40 with a certain income who are unmarried.
- the customer database table image 107 can now be used as a multidimensional index for the customer database table 105 in the sense that the customer data records of the customer database table 105 that correspond to the target group can be determined quickly by means of the keys stored in the customer database table image 107.
- the customer database table image outputs the corresponding keys and the
- Keys are passed to the database.
- the database can directly retrieve the addresses of the customers of the target group from the customer database table 105, without having to examine the condition defining the target group on all customer data records in a complex process.
- the occurrences present in the database can be grouped in the database image, thus requiring less memory, in particular for the database image, since fewer different occurrences have to be encoded.
- the database image may contain discretizations of occurrences existing in the database, or different values may be combined in value ranges in the database image.
- the customer database table 105 contains in each customer record the information in which month the corresponding customer was born so that the age of the corresponding customer is known to one month.
- the customer data records of the customer database table image 107 always have the specification of the age of the corresponding customer only for one year.
- Database table the request will be answered, with only the records of the database table corresponding to the preselection must be taken into account, whereby a speed advantage is achieved.
- Customer database table image 107 which refers to all customers under 17, 5 years.
- the age of the customers is only known for one year.
- the request can be answered for all customers under the age of 17, since the corresponding data records can be uniquely determined.
- the keys of the customer data sets are determined, for which the corresponding customers are between 17 and 18 years old. This key can now be accessed by accessing the
- Customer database table 105 which of these customer records actually correspond to customers who are under 17, 5 years old. If these are determined accordingly, the request can be completely answered.
- the function as a multidimensional index is particularly advantageous if several database tables are involved in the query, so if z. B. to query the addresses of all customers who are under 18 years old, and bought flower bulbs in January.
- SQL database query language
- Such queries are called "JOIN ⁇ .
- Such queries which require linking multiple database tables, are often slow in databases.
- a list of the IDs (identifications, for example customer numbers) of such customers can, as described in detail in the preceding embodiments, be very efficiently determined by the combination of two suitable database images, the z. B. through statistical modeling achieve a compression that makes it possible to calculate the list completely in main memory.
- a database image can be graphically used as a transparent accelerator for a database.
- a program sends a request to the database.
- the query is quickly answered using the database image, as explained above, by accessing the database only when necessary because the data in the database image is insufficient.
- the address of a customer is not stored in the database image, but only in the database image underlying database table in the database or in the database image. This is transparent in that, for the program transmitting the request, there is no difference in whether the request is answered directly by accessing the underlying database table, or by using the database image of the database table.
- requests from other software are clearly taken from the database image instead of the database, evaluated, and then either independently answered based on the information stored in the database image (or multiple database images), or - if certain required information not in the database image - a possibly. forwarded optimized request to the database, retrieved the results, possibly further processed, and transmitted the result to the requesting software. For example, optimizations made may be that
- Selection criteria are removed in the query, and by direct control of individual records using a selections corresponding to the database image generated list of keys.
- the invention can accept and answer queries in the query language SQL (structured query language).
- SQL structured query language
- JDBC java database connectivity
- ODBC open database connectivity
- the invention can be used transparently as an accelerator, ie, such that an application software designed for direct access to the database can be accelerated without intervention by the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/793,802 US20080133573A1 (en) | 2004-12-24 | 2005-12-19 | Relational Compressed Database Images (for Accelerated Querying of Databases) |
EP05850178A EP1831804A1 (de) | 2004-12-24 | 2005-12-19 | Relationale komprimierte datenbank-abbilder (zur beschleunigten abfrage von datenbanken) |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102004062532.8 | 2004-12-24 | ||
DE102004062532 | 2004-12-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2006066556A2 true WO2006066556A2 (de) | 2006-06-29 |
WO2006066556A8 WO2006066556A8 (de) | 2006-10-05 |
Family
ID=36097216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DE2005/002287 WO2006066556A2 (de) | 2004-12-24 | 2005-12-19 | Relationale komprimierte datenbank-abbilder (zur beschleunigten abfrage von datenbanken) |
Country Status (3)
Country | Link |
---|---|
US (1) | US20080133573A1 (de) |
EP (1) | EP1831804A1 (de) |
WO (1) | WO2006066556A2 (de) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007029530A1 (en) * | 2005-09-02 | 2007-03-15 | Semiconductor Energy Laboratory Co., Ltd. | Anthracene derivative |
US8099674B2 (en) | 2005-09-09 | 2012-01-17 | Tableau Software Llc | Computer systems and methods for automatically viewing multidimensional databases |
US7999809B2 (en) * | 2006-04-19 | 2011-08-16 | Tableau Software, Inc. | Computer systems and methods for automatic generation of models for a dataset |
JPWO2010041377A1 (ja) * | 2008-10-06 | 2012-03-01 | パナソニック株式会社 | 代表画像表示装置及び代表画像選択方法 |
US20110191141A1 (en) * | 2010-02-04 | 2011-08-04 | Thompson Michael L | Method for Conducting Consumer Research |
US8423522B2 (en) | 2011-01-04 | 2013-04-16 | International Business Machines Corporation | Query-aware compression of join results |
US8799240B2 (en) * | 2011-06-23 | 2014-08-05 | Palantir Technologies, Inc. | System and method for investigating large amounts of data |
US10621206B2 (en) | 2012-04-19 | 2020-04-14 | Full Circle Insights, Inc. | Method and system for recording responses in a CRM system |
US10599620B2 (en) * | 2011-09-01 | 2020-03-24 | Full Circle Insights, Inc. | Method and system for object synchronization in CRM systems |
US9305045B1 (en) * | 2012-10-02 | 2016-04-05 | Teradata Us, Inc. | Data-temperature-based compression in a database system |
WO2015034905A1 (en) * | 2013-09-03 | 2015-03-12 | String Enterprises, Inc. | Computer-implemented methods and systems for generating visual representations of complex and voluminous marketing and sales and other data |
US20150278214A1 (en) | 2014-04-01 | 2015-10-01 | Tableau Software, Inc. | Systems and Methods for Ranking Data Visualizations Using Different Data Fields |
US9424318B2 (en) | 2014-04-01 | 2016-08-23 | Tableau Software, Inc. | Systems and methods for ranking data visualizations |
US9613102B2 (en) | 2014-04-01 | 2017-04-04 | Tableau Software, Inc. | Systems and methods for ranking data visualizations |
US11474978B2 (en) * | 2018-07-06 | 2022-10-18 | Capital One Services, Llc | Systems and methods for a data search engine based on data profiles |
US11520695B2 (en) * | 2021-03-02 | 2022-12-06 | Western Digital Technologies, Inc. | Storage system and method for automatic defragmentation of memory |
Family Cites Families (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4620286A (en) * | 1984-01-16 | 1986-10-28 | Itt Corporation | Probabilistic learning element |
US5241648A (en) * | 1990-02-13 | 1993-08-31 | International Business Machines Corporation | Hybrid technique for joining tables |
EP0444358B1 (de) * | 1990-02-27 | 1998-08-19 | Oracle Corporation | Dynamische Optimierung eines einzelnen relationalen Zugriffs |
US5583500A (en) * | 1993-02-10 | 1996-12-10 | Ricoh Corporation | Method and apparatus for parallel encoding and decoding of data |
US5765146A (en) * | 1993-11-04 | 1998-06-09 | International Business Machines Corporation | Method of performing a parallel relational database query in a multiprocessor environment |
US5574906A (en) * | 1994-10-24 | 1996-11-12 | International Business Machines Corporation | System and method for reducing storage requirement in backup subsystems utilizing segmented compression and differencing |
US5758257A (en) * | 1994-11-29 | 1998-05-26 | Herz; Frederick | System and method for scheduling broadcast of and access to video programs and other data using customer profiles |
US5687361A (en) * | 1995-02-13 | 1997-11-11 | Unisys Corporation | System for managing and accessing a dynamically expanding computer database |
US6134564A (en) * | 1995-11-20 | 2000-10-17 | Execware, Inc. | Computer program for rapidly creating and altering presentation of parametric text data objects and associated graphic images |
US5704017A (en) * | 1996-02-16 | 1997-12-30 | Microsoft Corporation | Collaborative filtering utilizing a belief network |
US6026397A (en) * | 1996-05-22 | 2000-02-15 | Electronic Data Systems Corporation | Data analysis system and method |
US5870559A (en) * | 1996-10-15 | 1999-02-09 | Mercury Interactive | Software system and associated methods for facilitating the analysis and management of web sites |
US6226629B1 (en) * | 1997-02-28 | 2001-05-01 | Compaq Computer Corporation | Method and apparatus determining and using hash functions and hash values |
US6205447B1 (en) * | 1997-06-30 | 2001-03-20 | International Business Machines Corporation | Relational database management of multi-dimensional data |
US5960428A (en) * | 1997-08-28 | 1999-09-28 | International Business Machines Corporation | Star/join query optimization |
US6807537B1 (en) * | 1997-12-04 | 2004-10-19 | Microsoft Corporation | Mixtures of Bayesian networks |
US6449612B1 (en) * | 1998-03-17 | 2002-09-10 | Microsoft Corporation | Varying cluster number in a scalable clustering system for use with large databases |
US6263337B1 (en) * | 1998-03-17 | 2001-07-17 | Microsoft Corporation | Scalable system for expectation maximization clustering of large databases |
US6012058A (en) * | 1998-03-17 | 2000-01-04 | Microsoft Corporation | Scalable system for K-means clustering of large databases |
US20020039990A1 (en) * | 1998-07-20 | 2002-04-04 | Stanton Vincent P. | Gene sequence variances in genes related to folate metabolism having utility in determining the treatment of disease |
US6263334B1 (en) * | 1998-11-11 | 2001-07-17 | Microsoft Corporation | Density-based indexing method for efficient execution of high dimensional nearest-neighbor queries on large databases |
US6385172B1 (en) * | 1999-03-19 | 2002-05-07 | Lucent Technologies Inc. | Administrative weight assignment for enhanced network operation |
US6728713B1 (en) * | 1999-03-30 | 2004-04-27 | Tivo, Inc. | Distributed database management system |
US6549907B1 (en) * | 1999-04-22 | 2003-04-15 | Microsoft Corporation | Multi-dimensional database and data cube compression for aggregate query support on numeric dimensions |
US6564197B2 (en) * | 1999-05-03 | 2003-05-13 | E.Piphany, Inc. | Method and apparatus for scalable probabilistic clustering using decision trees |
WO2001001260A2 (en) * | 1999-06-30 | 2001-01-04 | Raf Technology, Inc. | Secure, limited-access database system and method |
US6842758B1 (en) * | 1999-07-30 | 2005-01-11 | Computer Associates Think, Inc. | Modular method and system for performing database queries |
US6898603B1 (en) * | 1999-10-15 | 2005-05-24 | Microsoft Corporation | Multi-dimensional data structure caching |
US6981040B1 (en) * | 1999-12-28 | 2005-12-27 | Utopy, Inc. | Automatic, personalized online information and product services |
US6611834B1 (en) * | 2000-01-12 | 2003-08-26 | International Business Machines Corporation | Customization of information retrieval through user-supplied code |
US20020029207A1 (en) * | 2000-02-28 | 2002-03-07 | Hyperroll, Inc. | Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein |
EP1264253B1 (de) * | 2000-02-28 | 2006-09-20 | Panoratio Database Images GmbH | Verfahren und anordnung zur modellierung eines systems |
US6694301B1 (en) * | 2000-03-31 | 2004-02-17 | Microsoft Corporation | Goal-oriented clustering |
US20020103793A1 (en) * | 2000-08-02 | 2002-08-01 | Daphne Koller | Method and apparatus for learning probabilistic relational models having attribute and link uncertainty and for performing selectivity estimation using probabilistic relational models |
US6795825B2 (en) * | 2000-09-12 | 2004-09-21 | Naphtali David Rishe | Database querying system and method |
WO2002025588A2 (en) * | 2000-09-21 | 2002-03-28 | Md Online Inc. | Medical image processing systems |
US6922660B2 (en) * | 2000-12-01 | 2005-07-26 | Microsoft Corporation | Determining near-optimal block size for incremental-type expectation maximization (EM) algorithms |
US20020129038A1 (en) * | 2000-12-18 | 2002-09-12 | Cunningham Scott Woodroofe | Gaussian mixture models in a data mining system |
US20030028564A1 (en) * | 2000-12-19 | 2003-02-06 | Lingomotors, Inc. | Natural language method and system for matching and ranking documents in terms of semantic relatedness |
EP1395924A2 (de) * | 2001-06-08 | 2004-03-10 | Siemens Aktiengesellschaft | Statistische modelle zur performanzsteigerung von datenbankoperationen |
US7113936B1 (en) * | 2001-12-06 | 2006-09-26 | Emc Corporation | Optimizer improved statistics collection |
WO2003057011A2 (en) * | 2002-01-04 | 2003-07-17 | Canswers Llc | Systems and methods for predicting disease behavior |
US7003158B1 (en) * | 2002-02-14 | 2006-02-21 | Microsoft Corporation | Handwriting recognition with mixtures of Bayesian networks |
US7266541B2 (en) * | 2002-04-12 | 2007-09-04 | International Business Machines Corporation | Adaptive edge processing of application data |
US6988107B2 (en) * | 2002-06-28 | 2006-01-17 | Microsoft Corporation | Reducing and controlling sizes of model-based recognizers |
US7133811B2 (en) * | 2002-10-15 | 2006-11-07 | Microsoft Corporation | Staged mixture modeling |
DE10252445A1 (de) * | 2002-11-12 | 2004-05-27 | Siemens Ag | Verfahren und Computer-Anordnung zum Bereitstellen von Datenbankinformation einer ersten Datenbank und Verfahren zum rechnergestützten Bilden eines statistischen Abbildes einer Datenbank |
US7136850B2 (en) * | 2002-12-20 | 2006-11-14 | International Business Machines Corporation | Self tuning database retrieval optimization using regression functions |
US7110997B1 (en) * | 2003-05-15 | 2006-09-19 | Oracle International Corporation | Enhanced ad-hoc query aggregation |
US7184591B2 (en) * | 2003-05-21 | 2007-02-27 | Microsoft Corporation | Systems and methods for adaptive handwriting recognition |
US7089266B2 (en) * | 2003-06-02 | 2006-08-08 | The Board Of Trustees Of The Leland Stanford Jr. University | Computer systems and methods for the query and visualization of multidimensional databases |
US7225200B2 (en) * | 2004-04-14 | 2007-05-29 | Microsoft Corporation | Automatic data perspective generation for a target variable |
-
2005
- 2005-12-19 WO PCT/DE2005/002287 patent/WO2006066556A2/de active Application Filing
- 2005-12-19 EP EP05850178A patent/EP1831804A1/de not_active Withdrawn
- 2005-12-19 US US11/793,802 patent/US20080133573A1/en not_active Abandoned
Non-Patent Citations (2)
Title |
---|
Keine Recherche * |
See also references of EP1831804A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2006066556A8 (de) | 2006-10-05 |
EP1831804A1 (de) | 2007-09-12 |
US20080133573A1 (en) | 2008-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006066556A2 (de) | Relationale komprimierte datenbank-abbilder (zur beschleunigten abfrage von datenbanken) | |
DE69938339T2 (de) | Ein skalierbares system zum gruppieren von grossen datenbänken | |
DE60221153T2 (de) | Verfahren und vorrichtung für ähnlichkeitssuche und gruppenbildung | |
DE60121231T2 (de) | Datenverarbeitungsverfahren | |
DE112016005350T5 (de) | Speichern und abrufen von daten eines datenwürfels | |
DE10028688B4 (de) | Methode, System und Programm für eine Verbindungsoperation in einer mehrspaltigen Tabelle sowie in Satellitentabellen mit doppelten Werten | |
He et al. | Mining a web citation database for author co-citation analysis | |
DE69910219T2 (de) | Transformation der perspektive auf tabellen von relationalen datenbanken | |
Sağlam et al. | A mixed-integer programming approach to the clustering problem with an application in customer segmentation | |
DE10120870A1 (de) | Navigieren in einem Index für den Zugriff auf eine mehrdimensionale Subjektdatenbank | |
DE202017007212U1 (de) | System zur inkrementellen Clusterwartung einer Tabelle | |
DE10120869A1 (de) | Verwendung eines Index für den Zugriff auf eine mehrdimensionale Subjektdatenbank | |
DE602004006485T2 (de) | Verfahren zur automatisierten anmerkung von berichten mehrdimensionaler datenbanken mit informationsobjekten eines datenspeichers | |
DE102014204827A1 (de) | Auflösen ähnlicher Entitäten aus einer Transaktionsdatenbank | |
DE10311311A1 (de) | Berechnung von Preiselastizität | |
DE102020126569A1 (de) | Systeme und verfahren für die dynamische bedarfserfassung | |
DE60030735T2 (de) | Voraussage der realisierbarkeit eines verbindungsweges | |
DE10239292A1 (de) | Konflikterfassung und -lösung in Zusammenhang mit einer Datenzuweisung | |
DE102012214196A1 (de) | Erkennen nicht eindeutiger Namen in einer Gruppe von Namen | |
DE60032258T2 (de) | Bestimmen ob eine variable numerisch oder nicht numerisch ist | |
DE60013138T2 (de) | Ein verfahren und ein gerät für die verarbeitung von abfragen einer datenbank | |
Benítez et al. | Consistent clustering of entries in large pairwise comparison matrices | |
DE112021001743T5 (de) | Vektoreinbettungsmodelle für relationale tabellen mit null- oder äquivalenten werten | |
DE10320419A1 (de) | Datenbank-Abfragesystem und Verfahren zum rechnergestützten Abfragen einer Datenbank | |
DE102012025349B4 (de) | Bestimmung eines Ähnlichkeitsmaßes und Verarbeitung von Dokumenten |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005850178 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2005850178 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11793802 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 11793802 Country of ref document: US |