WO2004051514A1

WO2004051514A1 - Statistical system and statistical method based on categorization definition for a plurality of records

Info

Publication number: WO2004051514A1
Application number: PCT/JP2002/012789
Authority: WO
Inventors: Naoki Akaboshi
Original assignee: Fujitsu Limited
Priority date: 2002-12-05
Filing date: 2002-12-05
Publication date: 2004-06-17
Also published as: JPWO2004051514A1

Abstract

A statistical system and a statistical method capable of categorizing records according to a categorization rule for a plurality of records and performing statistical processing. In order to achieve this object, the information statistical system totals a plurality of records composed of a plurality of data items stored in a table, according to a predetermined rule. The system includes means for performing definition for specifying a categorizing record for a plurality of records stored in a table, means for specifying a categorizing record for the plurality of records, and categorization-statistical means. The categorization-statistical means references the categorization result of the means for specifying the categorizing record and totals a plurality of records composed of data items stored in a table according to the definition for specifying the categorizing record defined by means for performing definition for specifying the categorizing record for a plurality of records.

Description

Description Aggregation system and aggregation method based on classification definitions that span multiple records

The present invention relates to an information processing system, and more particularly to a tallying system and a tallying method for utilizing stored information. Background art

Information that is premised on use by computers is tabular data (tables) that records information (records) that has a fixed structure corresponding to one event, such as transaction information that records daily transactions, by the number of events. Data). In this record, information that expresses the event is stored in the form of several sections, organized into fields (fields).

Figure 1 shows an example of such tabular data. In Figure 1, the receipt number of reference number 101, the sales date of reference number 102, the reference number of reference number 103, the reference number, the product number of reference number 104, and the reference number of reference number 105 The vertical columns of sales amounts make up each field. Then, the record 106 is, for example, the receipt number 0 0 0 1, the sales date 2 0 2/0 6/3 0 in the first line of FIG. It is composed of a data group of each field of product A and sales amount of 300000. > In order to utilize the information stored as table data, each record is classified by Fiber IJ based on the value stored in a specific field in each record, and a set of classified records Aggregation is performed for each (category) to determine differences and trends between categories. Fig. 2 shows the tabular data that records the records shown in Fig. 1, and classifies whether the sales date field 102 is June or July, and calculates the sales amount of the corresponding record. This is the result of totaling the fields of each month on a monthly basis, and consists of a monthly classification of 201 and a total amount of 202.

For the classification fiber IJ, the following designations are possible as conventional technology. All of these classifications by IJ can be performed by applying rules to each record alone. There is a characteristic that it can be easily classified.

The classification rule of the first example is a category type, which is a rule for performing classification based on a classification of, for example, daily necessities, perishables, and the like.

The classification rule in the second example is a time rule, for example, a rule that classifies a record so that a time field of the record satisfies a specified condition. There are monthly classifications such as January and February, and weekly and daily classifications by Fiber IJ.

The classification rule in the third example is a range type, which is, for example, a classification based on a range in which the sales amount is 100,000 yen or less, 100,000 yen or more, and 100,000 yen or less. It is a rule.

The fourth example, classification Fiber IJ, is a full-value type, which is a classification rule based on, for example, values recorded in records. For example, in a large store, etc., when there are cash register numbers from 1 to 10, a rule is used in which classification is performed by using all the recorded cash register numbers.

FIG. 3 is a diagram showing the configuration of a conventional system for utilizing stored information. The first to conventional systems shown in FIG. 3 mainly include an information processing unit 301, an input device 302, an output device 303, a database 304, and a classification definition storage unit 300. You. The information processing unit 301 has means for performing various kinds of processing, and mainly includes data registration means 311, classification definition means 312, classification instruction section 3 13 and classification and aggregation means 3 1 4 It is composed of The display device 303 displays a screen or the like. The input device 304 performs each ¾λ force, such as a mouse or a keyboard.

The data registration means 311 composes data input from the input device 302 into a record and registers (accumulates) it in the database 304. The classification definition means 312 stores the classification definition input from the input device 302 in the classification definition storage means 305. The classification instructing means 3 13 instructs the classification definitions stored in the classification definition accumulating means 3 05 in accordance with the instruction input from the input device 302 and sends it to the classification and aggregation means 3 14. The classifying and totaling means 314 classifies the records stored in the database 304 and performs a totalizing process. The database 304 stores data as records. The classification definition is defined and registered. Classification Aggregation means 3 1 4 is, for example, from the sales database, the record recorded in this Each record is taken out sequentially, and the summarization process is performed for each applicable class according to the field to be summed up for each record while referring to the class definition. Then, the classified and totaled results are displayed on the display device 303 by the classification and totaling means 3 14.

In the past, rules that could be defined as a classification definition were limited to those that could be classified immediately by applying the rule to a single record. However, there is a need to use the results of various analytical tools, called business intelligence, as a classification definition for classifying records. When classifying records based on such analysis results, it is necessary to define the classification across multiple records, and thus cannot be aggregated using conventional simple classification rules. Data mining is a typical example of such an analysis tool called business intelligence.

Data mining is an analytical technique that finds some regularity or rule in a large amount of data. The task of analyzing vast amounts of data, converting it into valuable information, and linking it to business actions. As data mining techniques, techniques such as correlation analysis and clustering are generally used. '

Correlation analysis, which is one analysis method for data mining, is a method of picking out combinations of purchased products. In the analysis, the contents of the account purchased by the customer are stored in advance as POS (Point-Of-Sales). In this case, one receipt is called a transaction. For example, out of the 100 customer receipts collected, 20 customers purchased product A and 12 customers purchased both product A and product B. I do. In this case, one product is called an item. Also, a single transaction typically includes multiple items.

At this time, the following definition formula,

Item support

= Based on the number of transactions including the item Z and the total number of transactions (1), "support" for product A = 20%, and "support" for product A and product B = 12%. In this way, a simple conditional probability calculation Thus, "60% (= 12% / 20%) of customers who purchase A also purchase B" is obtained. This is expressed as “A → B confidence 60%, support 12%” and defined as an association rule. In other words, the confidence in the association rule “A → B” is

Confidence of “A → B” (c on f i d e n c e)

= Α Λ Β (both Α and Β purchased) support / “Α support.” Here, the symbol “Λ” indicates that both Α and Β have been purchased.

For example, a rule such as “bread → putter confidence 70%” means “70% of customers who bought bread also bought butter”.

In this way, as a result of the correlation analysis, it is possible to obtain a rule such as "a customer who purchases product A purchases product B together". For example, the results of performing a correlation analysis on the data shown in Fig. 1 indicate that "customers who purchased products Α and Β together" and "customers who purchased products A and D together" It is not possible with current analysis tools to extract the two basic rules and aggregate them based on the classification based on these rules. This is because whether a record satisfies these rules cannot be categorized by looking at one record and applying simple rules. In this way, correlation analysis extracts relationships that span multiple records, not the results obtained from a single record.

—On the other hand, clustering, one of the other analysis methods for data mining, is a method of putting similar data into the same group. For example, by applying the clustering method to classify sales data, it is possible to find two classifications: youth-oriented customers and .mas-oriented customers. Figure 4 shows an example of data mining by clustering. In this clustering example, the records in the tabular data 401 that recorded the records were classified into two categories that resembled the power of salary, gender, age, and four attributes of the product. This is an example in which grouping is performed into 402 (class 1) and 40 3 (class 2).

In order to perform classification and aggregation using the results of such clustering, it is not possible to determine which classification a certain record belongs to just by looking at one record. This is because clustering creates not only a single record but also a classification based on the similarity to other records. Therefore, part or all of the results obtained by applying data mining to tabular data (table data) accumulated in the database are used as new classification definitions, and the original table is used. Aggregating records of formal data is not possible with conventional systems. For this purpose, a new mechanism is needed to classify records and perform aggregation processing according to the classification rules that span multiple records. Disclosure of the invention

The present invention has been made in view of the above points, and an object of the present invention is to provide a totaling system and a totaling method capable of classifying records and performing a totalizing process in accordance with a classification rule that spans a plurality of records. I do.

In order to achieve this object, an information aggregation system of the present invention is an aggregation system that aggregates a plurality of records composed of a plurality of data stored in a table format based on a predetermined rule. A means for defining a record corresponding to a category that covers a plurality of records stored in a format, a means for specifying a record that corresponds to a category that covers a plurality of records, and a means for classifying and totaling. Define multiple records composed of multiple data items stored in a record by referring to the classification results of the means for identifying the applicable records and defining the applicable records for multiple records. Aggregate according to the definition that specifies the applicable records _d .

As a result, unlike the conventional ^, it is not possible to classify by simply applying a simple rule to each record by using the means for identifying the applicable records. Aggregation processing by classification becomes possible. In addition, the definition of the information totaling system of the present invention for identifying the applicable record is defined as a whole or a part of the classification result obtained by applying data mining to a plurality of records composed of a plurality of data stored in a table format. It is characterized by including. As a result, the data mining results are such that it is not possible to determine which classification definition matches a record by looking at only a single record. Classification definitions such as "Customer who bought B" Whether or not a record is applicable can be determined by using the classification applicable record identification means.

Further, in the information totalizing system of the present invention, the means for specifying the applicable record of the classification generates the classification result of the applicable record before the totalization by the collecting means, and the totalizing means refers to the result of the classification of the applicable record Aggregation is performed in accordance with the definition that specifies the category applicable records defined by the means for defining the category applicable records that span multiple records.

As a result, the means for identifying the applicable records is not only the method of finding the corresponding record at the time of aggregation, but also classifying the corresponding records in advance and using the intermediate classification result at the time of aggregation be able to. For example, for a record of a customer who purchased products A and B, the key (a field that can uniquely identify the record) may be stored in advance as an intermediate classification result. Classification Definition When performing aggregation processing on “customers who purchased product A and product B”, corresponding records are extracted by referring to intermediate classification results. Existing methods such as lists and hashes can be used to achieve intermediate classification results. In addition, the storage destination of the intermediate classification result can be selected from the main storage or the secondary storage.

Further, the means for specifying a category applicable record of the information totaling system of the present invention is characterized in that the classification result of the category applicable record is updated in accordance with a definition for specifying the category applicable record at every predetermined period.

Due to the characteristics of the data, the target data is not invariable and is always added. As described above, in order to add a record to the target data, once the classification result of the classification applicable record by means of identifying the classification applicable record is used, the data after the generation is not selected for aggregation. become. In order to solve such a problem, by updating the classification result of the record corresponding to the classification by the above-described classification corresponding record identification means at regular intervals, the latest data can be summed up at high speed.

Further, in the information totaling system of the present invention, a definition for specifying a record corresponding to a classification is automatically registered as a classification definition.

Clustering, a technique for data mining, identifies users who have similar trends. Performs the process of grouping into the specified number of groups. At this time, if the resulting cluster number (number specified from 1) can be automatically registered and used as a classification definition, there is no need for the user to instruct registration to the classification definition. In other words, when data mining is applied to tabular data, the result can be automatically registered as a classification definition, and the original tabular data can be aggregated according to the relevant classification definition. .

Further, the information totaling system of the present invention is characterized in that the classification results of data mining are changed over time, and the classification results of each data mining are held.

As a result, the result of data mining changes with time: ^, the result of each data mining is held according to the change of time, and the result is used as a classification definition. For example, if a customer had a rank of 5 in June 2000, but had a rank of 4 in July 2000, the total for June 2000 The data is categorized into rank 5 and tallied. In July 2000, it is categorized as rank 4.

Furthermore, the information totaling system of the present invention is characterized in that the data mining is executed at predetermined time intervals.

As a result, in addition to the means for specifying the classification definition records, data mining is also performed periodically, and each classification definition itself and the corresponding records can be updated and used. The aggregation time itself does not change, but this allows you to obtain results based on the latest classification. :

In addition, the present invention is a computer-readable recording medium storing a program for causing a computer to function. BRIEF DESCRIPTION OF THE FIGURES

Other objects, features and points of the present invention will become more apparent from the following detailed description when read in conjunction with the accompanying drawings.

FIG. 1 is a diagram illustrating an example of tabular data.

FIG. 2 is a diagram showing the results of tabulation of the tabular data in which the records shown in FIG. 1 are recorded. FIG. 3 is a diagram showing the configuration of a conventional system for utilizing stored information.

FIG. 4 is a diagram showing an example of data mining by clustering.

FIG. 5 is a diagram showing a configuration of a system for utilizing stored information according to the embodiment of the present invention.

FIG. 6 is a diagram showing a classification result of the classification applicable record specifying means 502.

FIG. 7 is a diagram showing an example of a classification definition by data mining.

FIG. 8 is a diagram illustrating an example of the classification result of the classification corresponding record identification method corresponding to the classification definition by data mining.

FIG. 9 is a diagram illustrating an example of a classification definition by clustering.

FIG. 10 is a diagram showing a flowchart of the operation of the embodiment of the present invention. .

FIG. 11 is a diagram showing an example of tabular data which is data to be analyzed according to the present invention. FIG. 12 is a diagram showing a flow chart of the definition information creation and registration processing of the present invention.

FIG. 13 is a diagram showing an example of the classification definition based on the correlation analysis.

FIG. 14 is a diagram showing a flowchart of the classification and aggregation processing of the present invention. FIG. 15 is a diagram showing an example of specifying the tallying process according to the present invention.

Fig. 16 is a diagram showing the classification result of the classification applicable record identification means corresponding to the classification definition by data mining.

FIG. 17 is a diagram illustrating an example of the counting result of the counting process corresponding to the classification definition by data mining. BEST MODE FOR CARRYING OUT THE INVENTION

An embodiment for carrying out the present invention will be described below with reference to the drawings. An embodiment of the present invention will be described with reference to FIG. FIG. 5 is a diagram showing a configuration of a system for utilizing stored information according to the embodiment of the present invention. In FIG. 5, the components denoted by the same reference numerals as those in FIG. 3 indicate the same components. The system according to the embodiment of the present invention shown in FIG. 5 mainly includes an information processing device 301, an input device 302, an output device 303, a database 304, and a classification definition accumulating means 300. It is composed of Information processing The device 301 has means for performing various processes, and mainly includes a data registration means 311, a classification definition means 312, a classification instruction means 313, a classification and aggregation means 314, and a classification. It is composed of a definition means 501 for specifying the corresponding record and a classification corresponding record specifying means 502. The display device 303 displays a screen or the like. The input device 304 performs various inputs, such as a mouse and a keyboard.

The data registering means 311 composes the input data into a record and registers (accumulates) it in the database 304. The classification definition means 312 stores the classification definition input from the input device 302 in the classification definition storage means 305. The classification instructing means 3 13 instructs the classification definition stored in the classification definition storing means 3 05 according to the instruction inputted from the input device 302 and sends it to the classification corresponding record specifying means 502.

The database 304 stores data as records. Classification definitions are defined and registered.

The definition means 501 for specifying the record corresponding to the classification receives a definition for specifying the record corresponding to the classification in accordance with the instruction input from the input device 302. The classification applicable record specifying means 5002 is a definition for specifying the classification applicable record input to the classification applicable record 501, and is stored in the classification definition accumulating means 3005. According to the classification definition and the records stored in the database 304, the records corresponding to the classification are classified. FIG. 6 is a diagram showing the classification result of the classification corresponding record identification means 502, which is composed of a classification column 6001 and a corresponding record military column 6002. For example, if there are two types of classifications, 1 and 2, and the keys of the records corresponding to each classification are 1, 2, 4, 5 and 3, 6, 7, for example, as shown in Figure 6, A record correspondence table for Class 1 and Class 2 can be created as an intermediate classification result, and the corresponding record group can be immediately known from this correspondence table. The classifying and totaling means 314 classifies the records stored in the database 304 and performs a totalizing process.

The classification / aggregation means 3 14 includes the classification definition stored in the classification definition storage means 3 05 specified by the classification instruction means 3 13 and the classification corresponding record identification means 5 0 As shown, the correspondence table of records for category 1 and category 2 From the result of the classification, records corresponding to the classification are specified, records corresponding to the respective classifications stored in the database 304 are sequentially retrieved from the database, and the records are divided into the corresponding classifications and tabulated. For example, it is also possible to define and register classifications for each application period according to the purpose such as product lineup. Intermediate classification results by the classification applicable record identification means 502 need to be frosted according to the classification definition. Then, the classified and totaled results are displayed on the display device 303 by the classification and totaling means 3 14. As described above, the classification and aggregation means 3 14 can easily perform the aggregation processing for the record group corresponding to the classification by using the classification corresponding record specifying means 502.

In this way, unlike the conventional method, by using the classification result of the classification applicable record identification means 502, it is possible to classify a plurality of records that cannot be classified only by applying a simple rule to each record. Aggregation processing based on a complicated classification that spans becomes possible.

In addition, a part or all of the results of data mining can be used as a definition for specifying the applicable records defined by the definition means 501 for identifying applicable records. . Figure 7 shows an example of a classification definition using the results of data mining. Regarding the results of data mining, it is not possible to determine which classification definition the record matches by looking at each record alone. For example, as shown in Figure 7, 1 "^ contains a single record to determine if it is classified as" customer who purchased product A and then purchased product B "(7001). The fact that product A is listed is only a necessary condition. For this reason, it is possible to determine which classification definition each record corresponds to by using the classification applicable record specifying means 502.

Further, the classification applicable record identification means 502 not only obtains the corresponding records at the time of counting the records, but also obtains the corresponding records in advance in the middle, and calculates the intermediate classification results at the time of counting. Can be used. FIG. 8 shows an example of the classification result of the classification applicable record specifying means corresponding to the classification definition using the result of the data mining, and is composed of a column 8001 of the classification and a column 8002 of the corresponding record group. For example, “Record of customer who purchased product A and product B” (8 03) The key (a field that uniquely identifies the record) can be stored as an intermediate result as shown in Fig. 7.

Then, when performing the aggregation process for the classification definition “customer who purchased product A and product B”, the corresponding record is extracted by referring to the intermediate classification result. Existing methods such as lists and hashes can be used to achieve intermediate classification results. Also, intermediate classification results can be stored in main storage or secondary storage.

Also, in the database 304, records are constantly added to the data to be classified due to its characteristics. As described above, when the record is recorded in the data that is the data of the classification, the classification result of the classification applicable record specifying means 502 once generated is used as the classification result of the classification applicable record specifying means 502. The records that were ii する l generated after generating the above correspondence table that specifies the records to be included will not be included in the aggregation process. In order to solve such a problem, by executing the above-described classification applicable record specifying means 502 again at regular time intervals, the correspondence table for specifying the records included in the classification is updated. Aggregation can be performed quickly even for the latest data.

Furthermore, in clustering, one of the methods of data mining, records with a similar tendency are grouped into a specified number of groups. At this time, if the resulting cluster number (the number specified from 1) can be automatically registered and used as a classification definition, the user can input the definition device 501 to the definition means 501 for specifying the classification applicable level. There is no need to indicate registration via 2. In other words, when data mining is performed on tabular data, the results are automatically registered in the classification definition storage means 305 as a definition for identifying the record corresponding to the classification, and the original table format data is stored. Data can be aggregated according to the relevant classification definition.

Figure 9 shows an example of a classification definition by clustering. For example, in the example shown in FIG. 4, the classification definition of FIG. 9 is automatically formed, and is stored in the classification definition storing means 300.

Furthermore, the result of data mining changes with time: ^ holds the result of each data mining according to the change of time, and classifies according to the result. Can be used as For example, for a force S whose rank was 5 in June 2000 and a customer whose rank was 4 in July 2000, They can be categorized into rank 5 and tabulated, and in July 2000 they can be tabulated as rank 4.

Furthermore, in addition to the classification definition record specifying means 502, data mining is also performed periodically, and each classification definition and the intermediate classification definition of the corresponding record are updated and used. Can be. Although the time for performing the aggregation itself is not changed, it is possible to obtain a result based on the latest classification.

The present invention can be carried as a computer-readable recording medium storing a program for causing a computer to function.

Next, the operation of the present invention will be described in detail using a flowchart.

FIG. 10 is a diagram showing a flowchart of the operation of the embodiment of the present invention. First, the overall operation will be described with reference to the flowchart of FIG. In FIG. 10, the whole operation starts in step S1.

Next, in step S2, data input to the data registration means 311 via the input device 302 of FIG. 5 is registered in the database 304. In this method, as shown in Fig. 1, customers, products, sales, etc. are stored in records and registered for each transaction.

Next, in step S3, the classification definition is designated to the classification definition means 311 and the definition means 501 for specifying the classification definition record via the input device 202 of FIG. 5, and the classification definition storage means Store in 305. In step S3, a classification is defined based on a certain rule, such as a data mining result that cannot be classified by a conventional single record.

Next, in step S4, classification and totalization are performed based on the data and the classification definition input in steps S2 and S3 described above. This means that, in accordance with the specification of the purpose, the relevant classification definition dictionary is extracted from the classification definition storage means 3 05 in FIG. 5, and the corresponding record is determined by the classification applicable record according to the classification rule of the relevant classification definition. Take out and count.

'And end with step S5. As described above, the data is registered and stored in the database 304, a classification definition based on a complicated rule is created and registered in the classification definition storage means 300, and the record corresponding to the classification shown in FIG. 5 is identified. By using the means 502 to total the records that match the classification, it is edible g to total according to the classification definition based on complicated rules.

Next, steps S2, S3 and S4 will be described in detail below. FIG. 11 is a diagram showing an example of tabular data, which is the data to be analyzed according to the present invention, registered in step S2.

Record number 2, reference 1 1 0 3 sales date, reference 1 1 0 4 customer number, reference 1 1 0 5 merchandise, reference 1 1 0 6 quantity and reference number 1 1 The vertical column of the sales amount of 07 constitutes each field. The data in the first row of this tabular data constitutes record 111. Give a transaction number when you sell the product and

Transaction number: 0 0 0 0 1,

Record number: 1,

Sales say: 2 0 0 2/0 6/3 0,

Customer number: 1 0 0 0 1,

Product: A,

Quantity: 1,

Sales': ¥ 3,000

Are stored in record 111 and registered (accumulated). In step S2 in Fig. 10, data is organized in such a record, and the database 30

Registered in 4.

As described above, the transaction number, the sales date, the customer number, the quantity, the sales, etc. are registered and stored in a record in association with the sales date, so that the classification that matches the analysis purpose according to the classification definition described later ^ Aggregation can be performed using Iij.

Next, the specification of the classification definition in step S3 will be described with reference to FIGS. FIG. 12 is a diagram showing a flowchart of the definition information creation and registration processing of the present invention. The process of creating and registering the definition information is started in step S12201. Next, in step SI202, classification is performed on the target data by a complicated rule such as data mining by the classification and aggregation means 314 in FIG.

Next, in S123, the result of data mining is displayed by the classification and aggregation means 314 of FIG. For example, suppose that the following two rules were obtained by applying the correlation analysis to the data in Fig. 11.

Rule 1) A—> B (buy A and B together)

Rule 2) A-> C (A and C are purchased together)

Next, in step S 1 204, how the user uses the input device 302 to define the data Define whether to use as a classification. If the above rules are obtained, for example, both rules 1 and 2 are defined as classifications. As a result, a classification definition as shown in FIG. 13 can be created based on the data in FIG. FIG. 13 is a diagram showing an embodiment of the classification definition by the correlation analysis, and has two classifications 1301 and 1302.

Next, in step S125, the classification definition thus obtained is stored in the classification definition storage means 305 of FIG.

Then, in step S122, the definition information creation and registration process ends. Next, the classification and tabulation in step S4 in FIG. 10 will be described.

FIG. 14 is a diagram showing a flowchart of the classification and aggregation processing of the present invention. FIG. 15 is a diagram showing an example of the designation of the tallying process according to the present invention.

In FIG. 14, the classification and aggregation processing of the present invention is started in step S1401.

Next, in step S1402, a selection screen is displayed based on the classification definition and the data via the classification and aggregation means 314 in FIG. 5, and the classification and data as shown in FIG. 15 are displayed. . FIG. 15 shows the classification of reference number 1501 as the classification, the data of reference number 1502 as its data, and the OK button 1503 that issues an instruction to select. The user selects a classification and data through the input device 302 in accordance with the analysis. It is possible to select multiple categories for this.

Next, in step S1403, the classification record specifying means 502 of FIG. Get a record corresponding to the specified classification. Obtain a record ^ \ When processing at the time of aggregation according to the specification, there are two ways to generate the record in advance. Figure 16 shows an example of finding the corresponding record group for the classification definition in Figure 13. Fig. 16 is a diagram showing the results of classification corresponding record identification means 502 corresponding to the classification definition by data mining, which is composed of a classification column 1601 and a corresponding record sequence 1602. . When the classification record specifying means 502 of FIG. 5 creates the table of FIG. 16 as an intermediate classification result, it can be represented by holding the corresponding record group for each classification using the table. If the number of classifications is large and it takes time to search for the corresponding record, the time required to search for the classification can be reduced by registering it in the hash table.

Next, in step S144, a record corresponding to the selected classification is examined. According to the result of the examination, the relevant classification is classified and tabulated. FIG. 17 is a diagram showing an example of the result of the aggregation process corresponding to the classification definition by data mining, which is composed of a classification column 1701 and an average sales column 1701. In the example shown in Figure 17, the average sales are aggregated for the customers that fall under the classification definition in Figure 13.

As explained above, whether a record corresponds to the classification of “customer who purchased product B after purchasing product A” can be accurately identified by simply applying the rule to the record alone. Can not do it. ^; According to the present invention, it is possible to perform a totaling process using such a classification based on rules covering a plurality of records.

Claims

The scope of the claims

1. An aggregation system that aggregates a plurality of records composed of a plurality of data stored in a table format based on predetermined rules,

Means for defining a record corresponding to a category that spans a plurality of records stored in a table format;

Means for identifying records that fall into multiple categories,

Classification and aggregation means,

The tfilS classification / aggregation means refers to the classification result of the means for identifying the record corresponding to the cafeteria classification by referring to the plurality of records composed of a plurality of data stored in the form of a frustration table, and An information totaling system, characterized in that data is totaled in accordance with a definition for specifying a record corresponding to a category, which is defined by a means for defining a record applicable to a class.

2. The definition of identifying records that are subject to terrible classification includes all or part of the classification results obtained by applying data mining to multiple records composed of multiple data stored in tfna table format. The information aggregation system according to claim 1, wherein

3. 手段手段 Means to identify the applicable records are 前 Generate the classification result of the applicable records before aggregation by the aggregation means. 3. The information utilization system according to claim 1 or 2, wherein the data is totaled in accordance with the definition for specifying the classification applicable record defined by means for defining the classification applicable record covering a plurality of records.

4. The information utilization system according to claim 1 or claim 2, wherein a definition for identifying a record with a disgust category is updated at predetermined intervals.

5. The means to identify the record that corresponds to the terrible category is, for each predetermined period, 4. The information utilization system according to claim 3, wherein the classification result of the record that corresponds to the disgusting classification is updated in accordance with the definition that specifies the password.

6. The information utilization system according to claim 2, wherein the definition for specifying the record corresponding to the ftiia classification is automatically registered as the classification definition.

7. The information utilization system according to claim 2, wherein the classification result of each ttrf own data mining is retained when the classification result of ffif own data mining changes with time.

8. The information utilization system according to claim 2, wherein ttilS data mining is executed at predetermined time intervals.

9. A method for summarizing information in which a plurality of records composed of a plurality of data stored in a table format are tabulated in accordance with a predetermined rule,

A step of defining a record that categorizes a plurality of records stored in a table format, and

Identifying records that fall into a category that spans multiple records;

Classification and aggregation step,

ΙϋΐΒ The classification and aggregation step is as follows: 歸 Return multiple records composed of multiple data stored in a table format, refer to the classification result of the A method for summarizing information characterized by counting according to the definition for identifying records that fall into categories that define the records that fall into categories.

1 0 · 定義 The definition to identify the applicable records is that some of the classification results obtained by applying data mining to multiple records composed of multiple data stored in 10. The method for summarizing information according to claim 9, comprising:

1 1. The step of identifying a record corresponding to the ttit own classification includes: ΙίίϊΒ generating a classification result of the record corresponding to the classification before the aggregation by the aggregation means; 10. The method according to claim 9, wherein the step of defining a record corresponding to the plurality of records is performed in accordance with the definition for specifying the record corresponding to the category.

12. The method of summarizing information according to claim 9 or 10, wherein the definition for specifying a record corresponding to the self-assembly classification is updated at predetermined intervals.

13. The method according to claim 11, wherein, in the step of specifying the tfriH classification applicable record, the classification result of the classification applicable record is updated in accordance with the step of specifying the classification applicable record every predetermined period. How to summarize the information described.

10. The method of summarizing information according to claim 10, wherein the step of specifying a record corresponding to the classification is automatically registered as a classification definition.

10. The method according to claim 10, wherein when the classification result of the data mining changes with time, the classification result of each disgust data mining is held.

1 6. A method for summarizing information according to claim 10, wherein the Fujimi data mining is performed at predetermined time intervals. 1 7. A computer-readable recording medium storing a program for causing a computer to execute the method according to claim 9.