WO2020250597A1 - 情報処理装置、情報処理方法及びプログラム - Google Patents
情報処理装置、情報処理方法及びプログラム Download PDFInfo
- Publication number
- WO2020250597A1 WO2020250597A1 PCT/JP2020/018400 JP2020018400W WO2020250597A1 WO 2020250597 A1 WO2020250597 A1 WO 2020250597A1 JP 2020018400 W JP2020018400 W JP 2020018400W WO 2020250597 A1 WO2020250597 A1 WO 2020250597A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data set
- prediction model
- data
- information processing
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Definitions
- This disclosure relates to information processing devices, information processing methods and programs.
- Patent Document 1 describes a device that predicts the contract probability of the real estate in the transaction period according to the feature amount of the real estate to be traded.
- the present disclosure has been made in view of the above points, and one of the purposes of the present disclosure is to provide an information processing device, an information processing method, and a program for efficiently performing prediction.
- the present disclosure is, for example, An input section where the first data set containing multiple data is input, and A discriminator that discriminates the processing applied when generating a prediction model based on a second dataset similar to the first dataset. It is an information processing device having a prediction model generation unit that generates a prediction model based on the first data set by applying a process determined by the discrimination unit to the first data set.
- the discriminating unit discriminates the process applied when generating a prediction model based on the second data set similar to the first data set containing a plurality of data input to the input unit.
- This is an information processing method in which the prediction model generation unit generates a prediction model based on the first data set by applying the processing determined by the discrimination unit to the first data set.
- the discriminating unit discriminates the process applied when generating a prediction model based on the second data set similar to the first data set containing a plurality of data input to the input unit.
- a program that causes a computer to execute an information processing method in which a prediction model generation unit generates a prediction model based on the first data set by applying a process determined by the discrimination unit to the first data set. ..
- FIG. 1 is a block diagram showing a configuration example of the information processing apparatus according to the embodiment.
- FIG. 2 is a diagram showing an example of tabular data according to the embodiment.
- FIG. 3 is a diagram showing an example of information stored in the database according to the embodiment.
- FIG. 4 is a diagram showing an example of parameters and their values applied to a predetermined algorithm.
- FIG. 5 is a diagram showing a display example for setting a new project for creating a prediction model.
- FIG. 6 is a diagram showing a display example for selecting tabular data and reading it into the information processing apparatus.
- FIG. 7 is a diagram showing a display example for setting which feature is used in the process of generating the prediction model in the selected tabular data.
- FIG. 1 is a block diagram showing a configuration example of the information processing apparatus according to the embodiment.
- FIG. 2 is a diagram showing an example of tabular data according to the embodiment.
- FIG. 3 is a diagram showing an example of information stored in the database according to the embodiment.
- FIG. 8 is a diagram showing a display example displayed during tuning of algorithm parameters and the like.
- FIG. 9 is a diagram for explaining a display example of the generated prediction model.
- FIG. 10 is a diagram for explaining an example of characteristics for each algorithm.
- FIG. 11 is a diagram for explaining an example of a screen on which priority processing items can be set.
- FIG. 12 is a diagram showing an example of the result of searching an algorithm or the like based on a data set similar to the first data set.
- FIG. 13 is a diagram showing a display example of asking the user for auxiliary information.
- FIG. 14 is a diagram showing another display example of asking the user for auxiliary information.
- FIG. 15 is a diagram showing another display example of asking the user for auxiliary information.
- FIG. 16 is a diagram showing another display example of asking the user for auxiliary information.
- FIG. 17 is a diagram showing a display example of usefulness for each feature.
- ⁇ Problems to be considered in one embodiment> As described above, a predictive analysis technique for predicting various items (sales, population, traffic congestion, etc.) has been proposed. As predictive analysis techniques have become more popular, there are many people who are not experts in statistics or predictive analysis but have data for which they want to apply predictive analysis. In order to obtain higher predictive performance in predictive analysis, it is necessary to appropriately select various preprocessing and predictive algorithms and the hyperparameters associated with them. In order to select algorithms and hyperparameters, it is necessary to actually generate and verify a predictive model, but performing many steps requires a large amount of calculation.
- examples of users who actually want to perform forecast analysis include sales people who want to forecast sales, but it is rare for these users to hold a large amount of computational resources, and forecasts are made many times. It is difficult to try model generation to obtain a model with high prediction performance.
- a large amount of computational resources can be obtained by using cloud services, but predictive analysis using cloud services requires specialized knowledge.
- FIG. 1 is a block diagram showing a configuration example of an information processing device (information processing device 1) according to an embodiment.
- the information processing device 1 is configured as a personal computer, a tablet computer, a smartphone, a server device on the cloud, or the like.
- the information processing device 1 has, for example, a control unit 11, an input unit 12, a display unit 13, a database (DB) 14, and an operation unit 15.
- the control unit 11 has a discrimination unit 11A and a prediction model generation unit 11B as its functional blocks.
- the control unit 11 comprehensively controls the information processing device 1.
- the control unit 11 is composed of a CPU (Central Processing Unit) and the like.
- the control unit 11 has a ROM (Read Only Memory) in which the program is stored, a RAM (Random Access Memory) used as a work memory when the program is executed, and the like (note that the illustration relating to these configurations is shown). Is omitted.).
- the discrimination unit 11A discriminates the process applied when generating the prediction model based on the second data set similar to the first data set. Such processing is, for example, an algorithm applied when generating a prediction model based on the second data set and a parameter value in the algorithm (hereinafter, may be appropriately referred to as an algorithm or the like).
- the prediction model generation unit 11B generates a prediction model based on the first data set by applying the process determined by the discrimination unit 11A to the first data set. Auxiliary information is input to the prediction model generation unit 11B. The operation of the discrimination unit 11A, the operation of the prediction model generation unit 11B, and the details of the auxiliary information will be described later.
- the input unit 12 is an interface for inputting a first data set containing a plurality of data.
- the second data set is also input to the input unit 12.
- the first data set is a data set input to the input unit 12 based on the current operation.
- the second data set is a data set that has been input to the input unit 12 in the past.
- the data set input to the input unit 12 is supplied to the determination unit 11A.
- the display unit 13 is a display (including a driver for driving the display) that displays the prediction model generated by the prediction model generation unit 11B.
- a display including a driver for driving the display
- an LCD Liquid Crystal Display
- OLED Organic Light Emitting Diode
- the display unit 13 may display information by a projector.
- Database 14 stores various types of data. Examples of the database 14 include magnetic storage devices such as HDDs (Hard Disk Drives), semiconductor storage devices, optical storage devices, and optical magnetic storage devices. The database 14 may be detachable from the information processing device 1.
- magnetic storage devices such as HDDs (Hard Disk Drives), semiconductor storage devices, optical storage devices, and optical magnetic storage devices.
- the database 14 may be detachable from the information processing device 1.
- the operation unit 15 is a general term for a configuration that accepts a user's operation input. Examples of the operation unit 15 include physical keys such as a mouse, a touch panel, and buttons. An operation signal is generated according to the operation input made to the operation unit 15, and processing is performed according to the operation signal.
- FIG. 2 is a diagram showing an example of tabular data.
- the content of the tabular data can be any content.
- the example shown in FIG. 2 is tabular data of contents related to the history of product sales. Items indicating the contents of the data (contents specified in the first row of FIG. 2) are set as features of the various data constituting the tabular data.
- Tabular data is specified by the user, for example.
- the tabular data may be data stored in the information processing device 1 or may be data taken into the information processing device 1 from an external device.
- the first data set is data in which all or part of the features in the above-mentioned tabular data are specified. That is, the first data set in the present embodiment is a data set whose contents are set according to user input to tabular data which is an example of predetermined data.
- the first data set corresponding to the specified feature is used when the prediction model generation unit 11B generates the prediction model. That is, in some cases, all of the tabular data becomes the first data set, and in other cases, a part of the tabular data becomes the first data set.
- the second data set is a data set similar to the first data set among the data sets used when the prediction model generation unit 11B generated the prediction model in the past. Although the details will be described later, the first data set and the second data set are assigned indicators that characterize each of them. By comparing such indexes, a second data set similar to the first data set can be discriminated.
- FIG. 3 is a diagram showing an example of information stored in the database 14 (hereinafter, appropriately referred to as database information).
- the items of database information include, for example, model name, tabular data file name, dataset information, information of each feature constituting the dataset, predicted model generation time, predicted model usage memory, and each parameter used in the algorithm. As a result of the experiment, the prediction model generation conditions are set.
- the model name is the name set when the prediction model was generated.
- the model name can be appropriately set according to the content of the prediction model.
- FIG. 3 shows an example in which "A loan bad debt prediction model" is set as a model name of a certain prediction model and "B store waste amount prediction model” is set as a model name of another prediction model.
- the tabular data file name is the tabular data and its file name that are the basis of the second data set used when the prediction model was generated.
- the data set information is various information related to the second data set corresponding to the prediction model generated in the past.
- Data set information includes, for example, the number of data contained in the data set, the number of features, the percentage of missing data, the file size, the domain (information indicating what the data is about, such as weather data and sales data), and problem setting. Information indicating (classification, regression, time series prediction, etc.) and the like.
- the information for each feature is the algorithm applied to the dataset when generating the prediction model, the name of each feature, the number of unique data, the data type of each feature (text, numbers, dates, categorical variables, etc.) ), Other statistics (mean, variance, kurtosis, etc.) for explaining the features.
- This information can be quantified (quantified) by a known method. For example, when there is "text data” as the data type of each feature, an identifier indicating "text data” is assigned as the data type. Then, "text data” is associated with "number of spaces and delimiters", "average sentence length", "language type”, etc. as examples of statistics.
- time stamp data such as a date
- an identifier indicating "time stamp data” is assigned as a data type. Then, as an example of the statistic, "average of time zone”, “period included in data”, “format of time stamp data” and the like are associated.
- the predicted model generation time is the time required to generate the predicted model.
- the memory used for the prediction model is the amount of memory required to generate the prediction model.
- the experimental results of each parameter used in the algorithm are the history of the parameters of the applied algorithm and the information showing the result when the prediction model is generated with the parameters.
- the parameter setting name is described in this item. As shown in FIG. 4, the parameter setting name is associated with the algorithm name used when predicting the prediction model and the value of a specific parameter.
- the algorithm is changed to generate a prediction model, and in some cases, even if the algorithm is the same, the parameter values are changed to generate a prediction model. In such a case, all of them are described as history.
- FIG. 3 for example, when a prediction model with the model name “A loan bad debt prediction model” is generated, a “decision tree for classification” is used as an algorithm and corresponds to a “decision tree model parameter A” as its parameter. It is shown that the parameter and its value were used. Further, in FIG. 3, as a result of generating a prediction model using the parameter and the value of the parameter, it is shown that the accuracy was "0.82", the recall rate was "0.6", and the F value was "0.2". ..
- the prediction model generation condition is a condition indicating the processing items that should be prioritized when generating the prediction model. Such processing items are set by user operation input.
- the processing item is, for example, any one of "performance priority”, “speed priority”, and “memory priority”.
- Performance priority is a setting that prioritizes the accuracy of the prediction model.
- Speed priority is a setting that prioritizes the speed at which the prediction model is generated.
- Memory priority is a setting that gives priority to the memory used when generating the prediction model, which has as small a memory capacity as possible.
- the predicted memory generation condition includes the content of the auxiliary information answered by the user.
- the auxiliary information is information for efficiently generating a prediction model based on the first data set. Specifically, the auxiliary information is used for the period of the data used to generate the prediction model among the time-series data included in the first data set, and for the generation of the prediction model among the text data contained in the first data set. At least one of the information regarding the designation of the text data to be used and the accuracy of the predetermined data contained in the first dataset.
- the information processing device 1 acquires auxiliary information based on the user's answer input to the question asked to the user by the information processing device 1.
- the above is an example of database information.
- the above-mentioned distinction of database information items is for convenience and can be changed as appropriate.
- Procedure B1 First, the user starts a project for generating a prediction model by using the operation unit 15 of the information processing device 1, selects tabular data to be used for generating the prediction model, and reads the tabular data into the information processing device 1. Let me. Then, the user specifies which feature in the tabular data is used in the process of generating the prediction model. With such a designation, a first dataset based on the read tabular data is generated. Such processing is appropriately referred to as “procedure B1" in the following description.
- FIG. 5 is a diagram showing a display example for setting a new project for generating a prediction model.
- the display example shown in FIG. 5 is displayed, for example, on the display unit 13 of the information processing device 1.
- a rectangular display frame 101 into which a project name can be input a rectangular display frame 102 into which an appropriate explanation or memo can be input, a cancel button 103, and a decision button 104 are displayed. ..
- the user uses the operation unit 15 to input information at each display location.
- the user inputs an appropriate project name (“sales forecast from customer data” in the illustrated example) in the frame of the display frame 101.
- the user can provide an appropriate explanation (in the illustrated example, "verify the sales forecast for the next fiscal year using the data from November 2000 to December 2013") in the frame of the display frame 102. Enter using 15.
- FIG. 6 is a diagram showing a display example for selecting tabular data and reading it into the information processing device 1.
- the user selects tabular data using the operation unit 15.
- the address information 105 of the storage source of the selected tabular data is displayed on the display unit 13.
- FIG. 7 is a diagram showing a screen example for setting which feature (in this example, an item in the tabular data) is used in the process of generating the prediction model in the selected tabular data.
- the display unit 13 displays the item name 107, which is the name of each item in the tabular data.
- a check box 108 is displayed on the left side of each item. The user, for example, checks the check boxes corresponding to the features used to generate the predictive model and unchecks the check boxes corresponding to the features not used to generate the predictive model. It is sufficient that at least one check box is checked, and all the check boxes may be checked.
- the data format 109 for each feature can be set. It is also possible to set the prediction type 110 (output format, for example, binary classification, multi-value classification, numerical classification) which is the result of the prediction model using the screen shown in FIG. 7. ..
- the discriminating unit 11A searches for and discriminates a second data set similar to the first data set from among a plurality of second data sets stored in the database 14.
- the determination unit 11A uses, for example, a data set in which the information of the data set is the same as that of the first data set, or the value obtained by integrating the difference values between the first data set and the second data set is equal to or less than a certain value. , It is determined to be a second data set similar to the first data set.
- the discriminating unit 11A may refer to the information of each feature and discriminate a data set having many similar features as a second data set similar to the first data set, or combine these.
- a second data set similar to the first data set may be discriminated by the above method.
- one second data set is discriminated by the discriminating unit 11A as a data set similar to the first data set.
- step 3 the algorithm or the like applied to the second data set determined in step B2 is determined by the determination unit 11A.
- the determination unit 11A acquires an algorithm or the like applied to the second data set by referring to the database information. Then, various settings are tuned so that the algorithm or the like is applied to the second data set. An example of a screen displayed during tuning is shown by FIG.
- Step B4 When tuning related to various settings is completed in step B3, the prediction model generation unit 11B generates a prediction model by applying a tuned algorithm or the like to the first data set. Then, the generated prediction model is displayed on the display unit 13.
- FIG. 9 is a diagram showing a display example of the generated prediction model.
- a graph 113 showing a sales forecast is displayed on the display unit 13.
- the prediction type information 111 numbererical classification in the illustrated example
- information 112 regarding the accuracy of the prediction model is displayed.
- the contents of the process for generating the prediction model (algorithm, etc., accuracy of the prediction model, etc.) are stored in the database 14 as new database information.
- the second data set similar to the first data set set when the prediction model is generated is searched, and the algorithm etc. applied to the searched second data set is applied to the first data set. To do. This eliminates the need to search for an effective algorithm or the like from scratch when generating a prediction model based on the first data set. This makes it possible to efficiently generate a prediction model based on the first data set. Further, since the user only needs to set the first data set based on the tabular data, even a user who does not have specialized knowledge or skill can generate a desired prediction model.
- a plurality of second data sets similar to the first data set may be discriminated.
- a plurality of second data sets having a certain degree of similarity with the first data set may be discriminated by the discriminating unit 11A.
- the algorithm or the like applied to the largest number of second data sets may be applied in step B4.
- the generated prediction models (10 prediction models) may be sequentially displayed on the display unit 13.
- a plurality of algorithms or the like may be applied to the first data set for verification according to a predetermined standard.
- a predetermined standard for example, as shown in FIG. 10, the characteristics of each algorithm (for example, the average effect on performance, the distribution of performance, the number of records in the database (the number of applied algorithms), and the like may be recorded in the database 14.
- the standard is set to prioritize and verify the algorithm that is positive on average, the performance of the part surrounded by the reference code C1 is the largest in the positive direction, so it corresponds to the reference code C1. Verification is performed with priority given to the algorithm to be used (cutting off missing values). For example, when a criterion is set to give priority to verification with an algorithm having a large variance, the portion surrounded by the reference code C2.
- verification is performed with priority given to the algorithm corresponding to the reference code C2 (converted by a triangular function). For example, Upper Confidence bound (the number of searches is small and the performance is positive) is obtained. If the criterion is set, the number of records in the database, which is the number of applications, is the smallest, although the performance of the part surrounded by the reference code C3 is in the positive direction. Verification is performed with priority given to the algorithm corresponding to (divided into 20 sections). The content of the standard may be predetermined or may be set by the user.
- FIG. 11 is a diagram showing an example of a screen on which priority processing items can be set.
- a processing item setting display 121 in which a priority processing item can be set is displayed.
- the processing item setting display 121 is displayed by, for example, a semicircular indicator.
- the left end of the indicator corresponds to speed priority
- the right end of the indicator corresponds to performance priority.
- By setting the needle of the indicator at an appropriate position it is possible to set how much priority can be given to speed or performance.
- a processing item having the content of "completely speed priority” is set.
- a processing item having the content of "slightly speed priority” is set.
- a processing item having the content of "completely prioritizing performance” is set.
- a processing item having the content of "slightly performance priority” is set.
- procedure B22 In procedure B22, basically the same processing as in procedure B2 and procedure B3 is performed. In general, a dataset similar to the first dataset is selected. Then, from the selected data set, the data set corresponding to the priority processing item set by the user is further selected, and the selected data set is set as the second data set.
- the processing item setting display 121 When “completely speed priority” is set in the processing item setting display 121, for example, the top 1 speed (predicted model generation time in FIG. 3) having a short processing time from the data sets similar to the first data set. The% dataset is selected and the selected dataset is set as the second dataset. Then, for example, the most frequently used algorithm or the like in the set second data set is set as the algorithm or the like applied to the first data set. All of the algorithms and the like applied to each of the set second data sets may be applied to the first data set and verified.
- “Slightly speed priority” or “Slightly performance priority” is set in the processing item setting display 121, for example, from the data sets similar to the first data set, the speed is in the top 10% and the performance ( (Accuracy in FIG.
- FIG. 12 is a diagram showing an example of the result of searching an algorithm or the like based on a data set similar to the first data set.
- step B23 the same process as in step B3 is performed.
- the prediction model generation unit 11B generates a prediction model by applying a tuned algorithm or the like to the first data set. Then, the generated prediction model is displayed on the display unit 13.
- a prediction model can be generated based on the priority processing items set by the user.
- settings related to memory priority and the like may be made, and the display mode of the processing item setting display 121 may be appropriately changed according to the content and number of processing items to be prioritized. Can be done.
- the information processing device 1 is used to generate a prediction model that predicts the sales of the next week from the hourly user data of a certain store.
- a prediction model that predicts the sales of the next week from the hourly user data of a certain store.
- information such as "cumulative sales in the last x weeks” and "sales in the same period last year”.
- it is inefficient to examine all the periods such as “one week ago”, “two weeks ago” ... "one year ago” to determine which is effective. Therefore, in this example, a dialog is displayed asking the user for information that cannot be narrowed down from the past database information (in this example, which period cumulative data should be added to the features to be effective for prediction), and the user asks.
- Receive auxiliary information as hints needed for processing.
- a prediction model is generated by applying the processing based on the auxiliary information to the first data set.
- Procedure B31 In procedure B31, the same processing as in procedure B1 and procedure B2 is performed.
- FIG. 13 is a diagram showing a display example of asking the user for auxiliary information.
- the question 131 "When is the period considered to be effective for the sales forecast?" Is displayed.
- the display unit 13 displays the answer candidate 132 for the question.
- a cancel button 133 for canceling the answer content is displayed on the display unit 13.
- three answer candidates 132 are displayed. In the background, while the user is answering the question, the sales period is changed as appropriate and the tuning of the parameters of the prediction model is continued.
- the prediction model generation unit 11B obtains auxiliary information of the user's answer that "cumulative sales for one month immediately before the timing to be predicted” is effective for sales prediction for the question.
- the prediction model generation unit 11B applies processing based on the auxiliary information. For example, the feature "last month” is added to the feature (for example, sales) of the first data set. As a result, all sales data is narrowed down to the data for the previous month.
- a data set similar to the first data set may be searched again based on the added feature, and the second data set may be reset based on the search result.
- step B34 the same process as in step B4 is performed.
- a prediction model is generated by applying a predetermined algorithm or the like to the first data set to which the features are added by the prediction model generation unit 11B.
- the generated forecast model is displayed.
- the first dataset contains at least textual data.
- preprocessing that excludes words that are not necessary for prediction (for example, "desu", "masu”, etc.) from the data can be considered.
- These processes can also be performed automatically by observing the degree of contribution to the prediction while repeatedly generating the prediction model, but it is not efficient because it takes a very long time.
- the information processing apparatus 1 can reduce the time for performing these verifications by receiving the auxiliary information as a hint from the user.
- Procedure B41 In procedure B41, the same processing as in procedure B1 and procedure B2 is performed.
- step B42 a display asking for auxiliary information is displayed on the display unit 13.
- a plurality of words word group 141 included in the first data set and searched for a certain number or more are displayed on the display unit 13.
- a check box is displayed for each word in the word group 141.
- the word is set as a word that is unnecessary for prediction analysis.
- the words "desu” and “masu” are set as unnecessary words for prediction.
- the display unit 13 displays a cancel button 141A for canceling the set contents.
- step B43 the same process as in step B4 is performed. Further, when the prediction model generation unit 11B generates the prediction model, processing based on the auxiliary information is applied. Specifically, a prediction model is generated by applying a predetermined algorithm or the like to the first data set in which "desu” and “masu” are excluded from the text data. The generated forecast model is displayed.
- auxiliary information is not limited to the above-mentioned data period and information related to words unnecessary for prediction.
- the auxiliary information may be, for example, information that identifies words that are treated as different words even though they point to the same thing due to notational fluctuations.
- FIG. 15 is a diagram showing a display example of asking the user for such auxiliary information.
- the question for obtaining auxiliary information is "Are there any of the following words that are the same as" Tokyo "? Question 142 is displayed.
- a word group 143 including, for example, four words (“Tokyo”, “Tokyo”, “TOKIO”, “TOKYOU”) is displayed below the question 142.
- a check box is displayed next to each word in the word group 143.
- a cancel button 143A for canceling the set contents is displayed on the display unit 13.
- the user checks the same word as "Tokyo", for example. Then, when the prediction model generation unit 11B generates the prediction model, the prediction model is generated so that the words "Tokyo" and "Tokyo" are treated as the same words as "Tokyo".
- the auxiliary information may be information whether or not it is an outlier, in other words, information whose accuracy of the data included in the first data set has been confirmed by the user. For example, sales and inventory are usually positive values. However, if there is a negative value in the characteristics of the first data set, specifically, the data corresponding to the sales or the number of inventories, there is a high possibility that the data is abnormal data. On the other hand, predictive analysis becomes inefficient if a process for verifying whether the data is abnormal is performed. Therefore, the user confirms whether or not the data different from the other data is abnormal data.
- FIG. 16 is a diagram showing a display example of asking the user for such auxiliary information. In the example shown in FIG.
- the question 144 "Is the following data normal data?" Is displayed. Then, the content 145 of specific data that seems to be abnormal (in the illustrated example, "store name: Shibuya store, sales: -1, inventory quantity: -1") is displayed. Further, in FIG. 16, the contents of other data that are considered to be normal data 146 (in the illustrated example, "store name: Tokyo store, sales: 12 million yen, inventory quantity: 200”) are displayed, which is normal. The user is allowed to compare the data that seems to be abnormal with the data that seems to be abnormal. When the displayed data is abnormal, the user inputs auxiliary information by clicking the button 147A displayed as "remove".
- the data on the sales and inventory quantity of the Shibuya store is excluded from the first data set used when the forecast model is generated.
- the user inputs auxiliary information by clicking the button 147B displayed as "Use".
- the data regarding the sales and the number of stocks of the Shibuya store are used without being excluded from the first data set used when the forecast model is generated.
- This example is an example of asking a hint from a user who has confirmed the result of generating a prediction model.
- the information processing device 1 generates a forecast model by performing a demand forecast based on manually input sales data, a process of receiving feedback from a user when the performance of the forecast model is not very good is performed. is assumed. Then, the algorithm and the like are reset based on the feedback.
- step B51 steps B1 to B4 are performed to generate a prediction model.
- the information processing apparatus 1 determines the usefulness indicating how useful each feature set to be used for the prediction analysis is when the prediction model based on the first data set is generated.
- the control unit 11 of the information processing device 1 determines the usefulness of each feature based on how much data corresponding to the feature is used in the calculation for generating the prediction model.
- the usefulness of each feature may be determined by another known method.
- FIG. 17 is a diagram showing a display example of usefulness for each feature.
- the item name 151 which is a feature, is displayed, and the usefulness 152 is displayed on the right side of each item name.
- the usefulness 152 is displayed, for example, in a rectangular frame, and it is shown that the larger the black portion in the frame, the higher the usefulness 152.
- the display mode of the usefulness 152 can be changed as appropriate.
- the usefulness 152 may be displayed according to a specific score.
- the display unit 13 displays a comment 153 regarding the feature whose usefulness is not more than a predetermined value. In the example shown in FIG.
- the usefulness of "purchase amount”, which is one of the features, is extremely low. Therefore, as the comment 153, for example, a comment having the content that "the” purchase amount (yen) "was hardly used in the prediction” is displayed. In addition, the display unit 13 displays the current recognition result 154 regarding the "purchase amount (yen)", which is a feature with low usefulness.
- step B52 the user confirms the displayed usefulness 152. Based on the usefulness 152, the user recognizes that the "purchase amount (yen)" data that was supposed to be related to sales is not useful (low usefulness) in generating the forecast model. To do. In addition, since the user has a mixture of symbols such as commas, yen, and yen in the "purchase amount (yen)" based on the recognition result 154, the "purchase amount (yen)" is not a numerical data but a character string. Recognize that it is being processed. Based on this recognition, the user sets the data format of the "purchase amount (yen)" to the numerical data (see FIG. 7). Then, the user clicks the button 155.
- step B52 it may not be necessary to modify the prediction model even when the usefulness 152 is low. In such a case, the user may click the "correct" button 156 displayed on the display unit 13.
- a high-performance prediction model in a short time on a tool for generating a repetitive prediction model or in an environment for verifying the performance of a repetitive prediction model with a similar data set.
- the user can answer a question while searching for an algorithm or the like, so that a prediction model can be generated in a shorter time.
- the first method is that the user specifies specific values and ranges regarding the generation time of the prediction model, the limitation of the memory capacity used when generating the prediction model, the generation time of the prediction model, and the like.
- the contents of the dataset may be set.
- various settings and the generated prediction model are notified by display, but may be notified by voice or the like.
- the tabular data may be data input by the user.
- a part of the processing performed by the information processing device 1 may be performed by a device on the cloud or an external device such as a smartphone. Further, the contents of the operation examples in the above-described embodiment can be combined as appropriate.
- the configuration of the information processing device 1 according to the embodiment can be changed as appropriate.
- the information processing device 1 may have a communication unit for communicating with a server device or the like, a speaker for reproducing voice, or the like.
- This disclosure can also be realized by devices, methods, programs, systems, etc. For example, by making it possible to download a program that performs the functions described in the above-described embodiment and downloading and installing the program by a device that does not have the functions described in the above-described embodiment, the control described in the embodiment can be performed in the device. It becomes possible to do.
- the present disclosure can also be realized by a server that distributes such a program.
- the items described in each embodiment and modification can be combined as appropriate.
- the present disclosure may also adopt the following configuration.
- An input section where the first data set containing multiple data is input, and A discriminant unit that discriminates the processing applied when generating a prediction model based on the second data set similar to the first data set, and An information processing device having a prediction model generation unit that generates a prediction model based on the first data set by applying a process determined by the discrimination unit to the first data set.
- the information processing apparatus according to (1) wherein the discriminating unit discriminates an algorithm applied when generating a prediction model based on the second data set and a parameter value in the algorithm.
- the prediction model generation unit generates a prediction model based on the first data set by applying the processing determined by the discrimination unit and the processing based on the auxiliary information obtained by the user's response (7) or ( The information processing apparatus according to 8).
- the first data set is a data set currently input to the input unit
- the second data set is a data set previously input to the input unit (1) to (10).
- the information processing device according to any one of.
- the discriminating unit discriminates the process applied when generating a prediction model based on the second data set similar to the first data set containing a plurality of data input to the input unit.
- An information processing method in which a prediction model generation unit generates a prediction model based on the first data set by applying a process determined by the discrimination unit to the first data set. (13)
- the discriminating unit discriminates the process applied when generating a prediction model based on the second data set similar to the first data set containing a plurality of data input to the input unit.
- the prediction model generation unit causes a computer to execute an information processing method having a method of generating a prediction model based on the first data set by applying a process determined by the discrimination unit to the first data set. program.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/611,917 US20220215412A1 (en) | 2019-06-12 | 2020-05-01 | Information processing device, information processing method, and program |
| JP2021525943A JPWO2020250597A1 (https=) | 2019-06-12 | 2020-05-01 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-109461 | 2019-06-12 | ||
| JP2019109461 | 2019-06-12 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020250597A1 true WO2020250597A1 (ja) | 2020-12-17 |
Family
ID=73780949
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/018400 Ceased WO2020250597A1 (ja) | 2019-06-12 | 2020-05-01 | 情報処理装置、情報処理方法及びプログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20220215412A1 (https=) |
| JP (1) | JPWO2020250597A1 (https=) |
| WO (1) | WO2020250597A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022168216A1 (ja) * | 2021-02-04 | 2022-08-11 | オリンパス株式会社 | 推定装置、顕微鏡システム、処理方法、及び記憶媒体 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7437531B2 (ja) * | 2020-11-06 | 2024-02-22 | 株式会社日立製作所 | 空調システム |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019016361A (ja) * | 2015-06-15 | 2019-01-31 | ナントミクス,エルエルシー | 細胞系ゲノミクスからの薬物応答の患者特異的予測のためのシステムおよび方法 |
| JP2019075159A (ja) * | 2014-03-28 | 2019-05-16 | 日本電気株式会社 | 学習モデル選択システム、学習モデル選択方法およびプログラム |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7409371B1 (en) * | 2001-06-04 | 2008-08-05 | Microsoft Corporation | Efficient determination of sample size to facilitate building a statistical model |
| US7933762B2 (en) * | 2004-04-16 | 2011-04-26 | Fortelligent, Inc. | Predictive model generation |
| US11443015B2 (en) * | 2015-10-21 | 2022-09-13 | Adobe Inc. | Generating prediction models in accordance with any specific data sets |
| US11030484B2 (en) * | 2019-03-22 | 2021-06-08 | Capital One Services, Llc | System and method for efficient generation of machine-learning models |
-
2020
- 2020-05-01 JP JP2021525943A patent/JPWO2020250597A1/ja not_active Abandoned
- 2020-05-01 US US17/611,917 patent/US20220215412A1/en not_active Abandoned
- 2020-05-01 WO PCT/JP2020/018400 patent/WO2020250597A1/ja not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019075159A (ja) * | 2014-03-28 | 2019-05-16 | 日本電気株式会社 | 学習モデル選択システム、学習モデル選択方法およびプログラム |
| JP2019016361A (ja) * | 2015-06-15 | 2019-01-31 | ナントミクス,エルエルシー | 細胞系ゲノミクスからの薬物応答の患者特異的予測のためのシステムおよび方法 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022168216A1 (ja) * | 2021-02-04 | 2022-08-11 | オリンパス株式会社 | 推定装置、顕微鏡システム、処理方法、及び記憶媒体 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220215412A1 (en) | 2022-07-07 |
| JPWO2020250597A1 (https=) | 2020-12-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11663409B2 (en) | Systems and methods for training machine learning models using active learning | |
| US11392647B2 (en) | Intent-based question suggestion engine to advance a transaction conducted via a chatbot | |
| US20230376857A1 (en) | Artificial inelligence system with intuitive interactive interfaces for guided labeling of training data for machine learning models | |
| US20190095507A1 (en) | Systems and methods for autonomous data analysis | |
| US7705847B2 (en) | Graph selection method | |
| US11868436B1 (en) | Artificial intelligence system for efficient interactive training of machine learning models | |
| US11144582B2 (en) | Method and system for parsing and aggregating unstructured data objects | |
| US20190114711A1 (en) | Financial analysis system and method for unstructured text data | |
| US20250238433A1 (en) | System and Methods for Enabling Conversational Model Building to Extract, Classify, Infer, or Calculate Data from Large Corpuses of Documents | |
| US11163783B2 (en) | Auto-selection of hierarchically-related near-term forecasting models | |
| US10592472B1 (en) | Database system for dynamic and automated access and storage of data items from multiple data sources | |
| WO2024040817A1 (zh) | 基于大数据的债券风险信息处理方法及相关设备 | |
| US20250328525A1 (en) | Divide-and-conquer prompt for LLM-based text-to-SQL conversion | |
| WO2020250597A1 (ja) | 情報処理装置、情報処理方法及びプログラム | |
| US11381677B2 (en) | Systems and methods to manage models for call data | |
| JP5140509B2 (ja) | 設計事例検索装置,設計事例検索プログラム | |
| US20250371427A1 (en) | Methods and systems for improved automated machine learning and data analysis | |
| JP6978582B2 (ja) | 予測業務支援装置および予測業務支援方法 | |
| US20250265523A1 (en) | Workflow creation support device and method | |
| US20210312362A1 (en) | Providing action items for an activity based on similar past activities | |
| US20240054509A1 (en) | Intelligent shelfware prediction and system adoption assistant | |
| KR20240033757A (ko) | 투자 정보 제공 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램 | |
| US20260105250A1 (en) | Grounding machine-learning models for segmenting datasets | |
| US12443967B2 (en) | Apparatus and methods for high-order system growth modeling | |
| US11941076B1 (en) | Intelligent product sequencing for category trees |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20822261 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021525943 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20822261 Country of ref document: EP Kind code of ref document: A1 |