WO2023012933A1 - Feature extraction device, feature extraction method, and feature extraction program - Google Patents

Feature extraction device, feature extraction method, and feature extraction program Download PDF

Info

Publication number
WO2023012933A1
WO2023012933A1 PCT/JP2021/028957 JP2021028957W WO2023012933A1 WO 2023012933 A1 WO2023012933 A1 WO 2023012933A1 JP 2021028957 W JP2021028957 W JP 2021028957W WO 2023012933 A1 WO2023012933 A1 WO 2023012933A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
analysis
unit
time
feature extraction
Prior art date
Application number
PCT/JP2021/028957
Other languages
French (fr)
Japanese (ja)
Inventor
太三 山本
愛 角田
高明 森谷
学 西尾
優 三好
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/028957 priority Critical patent/WO2023012933A1/en
Priority to JP2023539450A priority patent/JPWO2023012933A1/ja
Publication of WO2023012933A1 publication Critical patent/WO2023012933A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • the present invention relates to a feature extraction device, a feature extraction method, and a feature extraction program.
  • Patent Document 1 As a data analysis device for analyzing time-series data, the one disclosed in Patent Document 1 is known. Patent Document 1 describes calculating an index value indicating the amount of change over time in each data to be analyzed, and displaying a plurality of graphs of time-series data arranged in order based on the index value. ing. In Patent Literature 1, for example, it is possible to focus on data that changes significantly at a specific time, so it is possible to support data analysis.
  • Patent Document 1 when there are multiple analysis methods for analyzing time-series data, it is determined which analysis method is suitable for analyzing this time-series data. not mentioned about it.
  • the present invention has been made in view of the above circumstances, and its object is to provide a feature extraction device, a feature extraction method, and a feature extraction program capable of extracting features of an analysis method for analyzing time-series data. is to provide
  • a feature extraction device uses a combining unit that generates a plurality of data pairs by combining two pieces of time-series data, and a plurality of analysis methods to determine the similarity of two pieces of time-series data included in each data pair.
  • an analysis unit that analyzes the degree of similarity
  • an occurrence probability calculation unit that calculates the occurrence probability of the similarity of each data pair for each analysis method based on the analysis result by the analysis unit, and a probability of occurrence for each data pair, calculated for each analysis method
  • a deviation calculation unit that calculates the deviation of the occurrence probability
  • a visualization unit that visualizes each time-series data included in the data pair and the deviation and presents it to the user
  • An input unit that receives a similar determination input
  • a feature extraction unit that extracts features of the analysis method based on the determination input and the degree of divergence.
  • a feature extraction method includes steps of generating a plurality of data pairs by combining two time-series data, and using a plurality of analysis methods to determine the similarity of the two time-series data contained in each data pair. calculating the occurrence probability of the similarity of each data pair for each analysis method based on the analyzed similarity; and for each data pair, the occurrence probability calculated for each analysis method a step of calculating a degree of divergence; a step of visualizing each time-series data included in a data pair and the degree of divergence and presenting it to a user; and extracting features of the analysis method based on the judgment input and the degree of divergence.
  • One aspect of the present invention is a feature extraction program for causing a computer to function as the feature extraction device.
  • FIG. 1 is a block diagram showing the configuration of the feature extraction device according to the first embodiment.
  • FIG. 2 is an explanatory diagram showing an example of time-series data and a data pair obtained by combining two time-series data.
  • FIG. 3 is an explanatory diagram showing first to fourth analysis methods.
  • FIG. 4A is an explanatory diagram showing analysis values calculated by the first to fourth analysis methods for a plurality of data pairs.
  • FIG. 4B is an explanatory diagram showing the distribution curve of the analysis values shown in FIG. 4A, (a) is the distribution curve by the first analysis method, (b) is the distribution curve by the first analysis method, and (c) is the distribution curve by the third analysis method, and (d) is the distribution curve by the fourth analysis method.
  • FIG. 4A is an explanatory diagram showing analysis values calculated by the first to fourth analysis methods for a plurality of data pairs.
  • FIG. 4B is an explanatory diagram showing the distribution curve of the analysis values shown in FIG. 4A, (a) is the distribution
  • FIG. 5 is an explanatory diagram showing a normalized curve obtained by normalizing the distribution curve s1 shown in FIG. 4B.
  • FIG. 6 is a diagram showing a plurality of data pairs, appearance probabilities of similarities obtained by analyzing each data pair by four analysis methods, and deviations of the appearance probabilities.
  • FIG. 7A is a diagram showing two pieces of time-series data forming a data pair and appearance probabilities calculated by four analysis methods.
  • FIG. 7B is a diagram showing two pieces of time-series data forming a data pair and appearance probabilities calculated by four analysis methods.
  • FIG. 8 is a flow chart showing the processing procedure of the feature extraction device according to the first embodiment.
  • FIG. 9 is an explanatory diagram showing characteristic patterns of each piece of time-series data acquired by the recording unit 18.
  • FIG. 10 is a block diagram showing the configuration of the feature extraction device according to the second embodiment.
  • FIG. 11A is an explanatory diagram showing analysis results of multiple data pairs and a normalization curve of the analysis results.
  • FIG. 11B is an explanatory diagram showing the degree of divergence between time-series data items forming data pairs and the appearance probability of each data pair.
  • FIG. 12 is a block diagram showing the hardware configuration of this embodiment.
  • FIG. 1 is a block diagram showing the configuration of the feature extraction device according to the first embodiment.
  • the feature extraction device 1 As shown in FIG. 1, the feature extraction device 1 according to the first embodiment is connected to a database 2 (denoted as "DB" in the figure).
  • the feature extraction device 1 includes a combination unit 11, a data analysis unit 12 (analysis unit), an occurrence probability calculation unit 13, a divergence calculation unit 14, a visualization unit 15, an input unit 16, and a feature extraction unit 17. , and a recording unit 18 .
  • the time-series data qi is, for example, the consumer price index provided by the Statistics Bureau, Ministry of Internal Affairs and Communications.
  • FIG. 2 is an explanatory diagram showing an example of setting a data pair aj by combining two pieces of time-series data. As shown in FIG. 2, time-series data q1 and q2 are combined to generate data pair a1.
  • a data pair a2 is generated by combining the time-series data q2 and q3.
  • a data pair a3 is generated by combining the time-series data q1 and q4.
  • a data pair a4 is generated by combining the time-series data q3 and q4.
  • the data analysis unit 12 analyzes the degree of similarity between the two pieces of time-series data included in each data pair using a plurality of analysis methods. Specifically, the data analysis unit 12 is provided with computation programs for a plurality of analysis methods for analyzing the data pairs aj set by the combination unit 11 .
  • the data analysis unit 12 includes a first analysis unit 21 that analyzes data pairs according to a first analysis method, a second analysis unit 22 that analyzes data pairs according to a second analysis method, and a data pair analysis unit that analyzes data pairs according to a third analysis method. and a fourth analysis unit 24 for analyzing data pairs by a fourth analysis method.
  • the data analysis unit 12 analyzes the degree of similarity between the two pieces of time-series data forming the data pair aj by the first to fourth analysis methods, and outputs the analysis result as an analysis value.
  • an example using four analysis methods is shown, but analysis methods other than four may be used. Specific processing of the first to fourth analysis methods will be described below with reference to FIG.
  • the difference in absolute value between two pieces of time-series data is integrated. Specifically, the absolute value of the difference between one time-series data and the other time-series data obtained at predetermined time intervals is calculated, and the absolute values of the differences are integrated within a certain period. Output the integrated value as the analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
  • the amount of change over time in each time-series data is calculated, and the difference between the calculated amounts of change is integrated. Specifically, the difference in variation between one time-series data and the other time-series data obtained at predetermined time intervals is calculated, and the difference is integrated within a certain period. For example, if one change amount is "+1" and the other change amount is "-1", the difference is "2". If one change amount is "+1" and the other change amount is also "+1", the difference is "0”. If one change amount is "-2" and the other change amount is "+1", the difference is "3".
  • the second analysis method integrates these differences and outputs the integrated numerical value as an analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
  • the third analysis method calculates the rate of change over time for each piece of time-series data, and integrates the difference in the calculated rate of change. Specifically, the difference in variation between one time-series data and the other time-series data obtained at predetermined time intervals is calculated, and the difference is integrated within a certain period. For example, if one time series data is "+3%" and the other time series data is "-1%”, the difference is "4". If one is "+1%” and the other is also "+1%", the difference is "0".
  • a third analysis method integrates these differences within a certain period of time and outputs the integrated numerical value as an analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
  • average values are calculated for each time series data at predetermined time intervals. Further, similar to the third analysis method described above, the differences in the average values are integrated within a certain period of time, and the integrated numerical value is output as the analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
  • the first analysis unit 21 to fourth analysis unit 24 create a distribution curve plotting the analysis values of each data pair aj based on the analysis values calculated by the first to fourth analysis methods.
  • the processing of the data analysis unit 12 will be described in detail below with reference to FIGS. 4A and 4B.
  • FIG. 4A is an explanatory diagram showing analysis values calculated by the first to fourth analysis methods for a plurality of data pairs.
  • “k” indicates the analysis method number
  • “j” indicates the data pair number. That is, “k” is an integer from 1 to 4, and "j” is an integer from 1 to n.
  • the analysis value calculated using the first analysis method for the data pair a1 of the two time-series data "vegetables/seaweed” and “sushi (eating out)" is assumed to be the analysis value "b1-1".
  • the analytical value "b1-1” is "0.05".
  • the analysis value calculated using the second analysis method for the data pair a1 is assumed to be the analysis value "b2-1”.
  • the analytical value "b2-1" is "0.21".
  • the analysis value calculated using the third analysis method for the data pair a2 is assumed to be the analysis value "b3-2".
  • the analysis value "b3-2" is "0.33".
  • the analysis value calculated using the fourth analysis method for the data pair a3 be the analysis value "b4-3".
  • the analysis value "b4-3" is "0.64".
  • each analysis value "bk-j" is calculated.
  • FIGS. 4B(a) to (d) are graphs plotting the analytical values obtained by the first to fourth analytical methods, where the horizontal axis indicates the analytical value and the vertical axis indicates the frequency.
  • the appearance probability calculation unit 13 normalizes the distribution curves s1 to s4 created by a plurality of analysis methods. That is, the distribution curves s1 to s4 shown in FIGS. 4B(a) to (d) cannot be directly compared with each other. Therefore, each distribution curve s1-s4 is normalized. For example, by normalizing the distribution curve s1, a normalized curve s11 shown in FIG. 5 is obtained. That is, the occurrence probability calculation unit 13 normalizes the analysis result of the data analysis unit 12 to calculate the occurrence probability.
  • the appearance probability is an index indicating the rank of the analysis value in the range of "0 to 1". The closer the appearance probability is to "0", the higher the similarity between the two pieces of time-series data. The closer the appearance probability is to "1", the lower the similarity between the two pieces of time-series data.
  • the appearance probability of the analytical value "bk-j" of the data pair aj by the kth analysis method is indicated by "pk-j".
  • the appearance probability of the analysis value "b1-1" of the data pair a1 by the first analysis method is "p1-1". For example, when the analysis value "b1-1" of the data pair a1 by the first analysis method belongs to the top 30%, the occurrence probability "p1-1" is "0.3".
  • the divergence calculation unit 14 calculates the divergence of the appearance probability calculated for each analysis method for each data pair aj. Specifically, the divergence calculator 14 calculates the divergence of the occurrence probability for the target data pair aj by four analysis methods. The degree of divergence of the data pair aj by the k-th analysis method is indicated by "dk-j". For example, the deviation of the data pair a1 by the first analysis method is "d1-1".
  • the degree of divergence dk-j is defined as the following formulas (1) to (4) using the appearance probabilities "p1-j" to "p4-j".
  • the degree of deviation of the occurrence probability pk-j by the k-th analysis method is the occurrence probability by the k-th analysis method and the probability of occurrence by the k-th analysis method other than It is a numerical value indicating the difference from the average of appearance probabilities obtained by the three analysis methods. Therefore, the greater the difference between the occurrence probability calculated by one analysis method and the average of the
  • the divergence calculation unit 14 performs a process of rearranging the data pairs aj in descending order of the absolute values of the divergence dk-j calculated by the above equations (1) to (4) for each analysis method.
  • FIG. 6 is an explanatory diagram showing data obtained by rearranging the degrees of divergence d1-j of the appearance probabilities p1-j calculated using the first analysis method in descending order. For example, a pair of time-series data for “vegetables/seaweed” and “sushi (eating out),” a pair of time-series data for “cup noodles” and “fresh food,” and so on (item 1 and item 2) are rearranged.
  • the visualization unit 15 visualizes each time-series data and the degree of divergence included in the data pair aj and presents them to the user.
  • the visualization unit 15 has a display unit (not shown) such as a display, and the degree of divergence dk-j calculated by the degree of divergence calculation unit 14 is larger than a certain value (for example, 0 .6) data pairs are displayed on the display.
  • a certain value for example, 0 .6
  • the graphs and appearance probability data shown in FIGS. 7A and 7B are displayed on the display unit. That is, the visualization unit 15 visualizes only a predetermined number of analysis results with a large degree of divergence calculated by the degree of divergence calculation unit 14 .
  • FIG. 7A(a) shows a graph of data pair a11 of time-series data q11 (for example, household durable goods) and time-series data q12 (for example, furniture/household goods), and FIG. to the occurrence probability of the data pair a11 calculated using the fourth analysis method.
  • FIG. 7B (a) shows a graph of data pair a12 of time-series data q13 (eg, vegetables/seaweed) and time-series data q14 (eg, sushi (eating out)), and FIG. It shows the appearance probability of the data pair a12 calculated using the fourth analysis method.
  • the visualization unit 15 displays the data shown in FIGS. 7A and 7B on the display unit. A user can recognize the displayed information by looking at the display unit.
  • the input unit 16 accepts similarity or dissimilarity judgment input by the user.
  • the input unit 16 is equipped with an operating device such as a keyboard, and receives input for determination of similarity or dissimilarity to the information displayed on the visualization unit 15 .
  • the time-series data q11 and q12 are separated, so the user inputs the dissimilarity determination result.
  • the time-series data q13 and q14 are close to each other as shown in FIG. 7B, the user inputs similar judgment results.
  • the feature extraction unit 17 extracts features of the analysis method based on the judgment input and the degree of divergence. Specifically, the feature extraction unit 17 extracts features of each analysis method based on the determination input input by the input unit 16 .
  • the graphs of the time-series data q11 and q12 of the data pair a11 shown in FIG. 7A(a) are divergent and have a low degree of similarity. Therefore, the probability of appearance of the data pair a11 should be a large number.
  • the appearance probability calculated by the third analysis method is a small numerical value.
  • the feature extraction unit 17 extracts the feature that the third analysis method is not suitable for the analysis of the time-series data q11 and q12. That is, the feature extraction unit 17 extracts time-series data unsuitable for analysis by the analysis method as the feature of the analysis method.
  • the graphs of the time-series data q13 and q14 of the data pair a12 shown in FIG. 7B(a) are close to each other and have a high degree of similarity. Therefore, the probability of occurrence of the data pair a12 should be a small numerical value. As shown in FIG. 7B(b), the appearance probabilities calculated by the second, third, and fourth analysis methods are large numerical values.
  • the feature extraction unit 17 extracts features that the second, third, and fourth analysis methods are not suitable for the analysis of the time-series data q11 and q12.
  • the feature extraction unit 17 includes a storage device (not shown), and stores the extracted features in the storage device.
  • the recording unit 18 records characteristic data of time-series data. For example, for “vegetables/seaweed”, it is recognized in advance that the characteristics are affected by the change of seasons, so this characteristic data is recorded. As for the “driver's license fee”, since it is recognized in advance that the amount varies stepwise, this characteristic data is recorded. Further, the visualization unit 15 described above may visualize characteristics of time-series data in addition to each time-series data and the degree of divergence that constitute a data pair.
  • m*(m ⁇ 1)/2” data pairs are generated.
  • step S12 the data analysis unit 12 analyzes each data pair aj by a plurality of analysis methods to calculate an analysis value.
  • the first analysis unit 21 calculates the analysis value of each data pair aj using the first analysis method.
  • the second analysis unit 22 calculates the analysis value of each data pair aj using the second analysis method.
  • the third analysis unit 23 calculates the analysis value of each data pair aj using the third analysis method.
  • the fourth analysis unit 24 calculates the analysis value of each data pair aj using the fourth analysis method.
  • the data analysis unit 12 generates a distribution curve of analysis values calculated by each analysis method. Specifically, as shown in FIGS. 4B (a) to (d), the distribution curve s1 of the analysis values calculated by the first analysis method, the distribution curve s2 of the analysis values calculated by the second analysis method, A distribution curve s3 of the analysis values calculated by the third analysis method and a distribution curve s4 of the analysis values calculated by the fourth analysis method are generated.
  • step S13 the occurrence probability calculation unit 13 generates normalized curves obtained by normalizing the distribution curves s1 to s4. For example, the normalized curve s11 shown in FIG. 5 is generated.
  • step S14 the divergence calculation unit 14 calculates the appearance probability of each data pair aj based on the normalization curve s11. For example, as shown in FIG. 5, when the target data pair belongs to the top 30% of all, the appearance probability is set to "0.3". Moreover, when it belongs to the top 70%, the appearance probability is set to "0.7".
  • step S15 the divergence calculation unit 14 calculates the divergence of each appearance probability. Specifically, the appearance probability of the data pair aj for each analysis method is calculated using the formulas (1) to (4) described above. Furthermore, the divergence degree calculation unit 14 executes processing for rearranging the data pairs aj in descending order of the degree of divergence. As a result, for example, as shown in FIG. 6, data is obtained in which the degrees of divergence d1-j of the appearance probabilities p1-j calculated using the first analysis method are arranged in descending order.
  • the appearance probability calculated using the first analysis method is “0.0473”, and the second to fourth analysis methods is approximately "1.0000". Therefore, the appearance probability calculated by the first analysis method has a large difference from the appearance probability calculated by the other three analysis methods, and the deviation is a high value of 0.926428. ing.
  • the visualization unit 15 displays a graph of the data pairs determined to have a large degree of divergence d1-j (for example, data pairs of 0.6 or more) and the data of the appearance probability on a display unit (not shown). indicate. That is, it visualizes a graph of data pairs and data of occurrence probabilities. For example, the information shown in FIGS. 7A and 7B is displayed on the screen.
  • the user determines the validity of the analysis results obtained by each analysis method.
  • the two time-series data q11 and q12 are not similar. Therefore, it is presumed that the probability of occurrence will be a large numerical value (a numerical value close to "1").
  • the appearance probabilities calculated by the first, second, and fourth analysis methods show a value close to "1"
  • the appearance probability calculated by the third analysis method is the above It is a numerical value "0.16" that diverges from the three analysis methods. In this case, it is assumed that the analysis values obtained by adopting the third analysis method are inappropriate, and the analysis values obtained by adopting the first, second and fourth analysis methods are appropriate.
  • the two time-series data q13 and q14 are similar. Therefore, it is inferred that the probability of appearance will be a small numerical value (a numerical value close to "0").
  • the appearance probabilities calculated by the second, third, and fourth analysis methods show a value close to "1"
  • the appearance probability calculated by the first analysis method is the above It is a numerical value "0.05" that deviates from the three analysis methods. In this case, it is assumed that the analytical values obtained by using the second, third, and fourth analytical methods are inappropriate, and the analytical values obtained by using the first analytical method are appropriate.
  • the visualization unit 15 reads the characteristic data of each time-series data recorded in the recording unit 18 and displays it on the display unit. For example, if the data pair to be analyzed includes time-series data of "vegetables/seaweed", the characteristic data "affected by seasonal change” is displayed on the display unit. If the data pair to be analyzed includes the time-series data of "driver's license fee”, the characteristic data "the amount changes stepwise” is displayed on the display unit. By visually recognizing this characteristic data, the user can refer to the determination of the analysis result.
  • step S17 the input unit 16 receives similarity/dissimilarity determination input from the user.
  • the user refers to the visualized information and inputs the determination result as to whether or not the analysis values obtained by each analysis method are appropriate. For example, in the example shown in FIG. 7A described above, input the determination result that the analysis value by the third analysis method is inappropriate and the analysis value by the first, second, and fourth analysis methods are appropriate. Input in part 16 . In the example shown in FIG. 7B described above, the analysis values obtained by the second, third, and fourth analysis methods are inappropriate, and the analysis values obtained by the first analysis method are appropriate. to enter.
  • the degree of divergence between the occurrence probability calculated by analyzing time-series data using one analysis method and the occurrence probability calculated by analyzing time-series data using another analysis method is high.
  • one analysis method or another analysis method is highly likely to be inappropriate as the analysis method used to analyze this time-series data.
  • step S18 the feature extraction unit 17 calculates a score according to the appropriateness/inappropriate determination result based on the determination input input by the input unit 16. Specifically, a score of "+1" is assigned to an analysis method determined to be appropriate, and a score of "-1" is assigned to an analysis method determined to be inappropriate.
  • the score for the third analysis method is "-1"
  • the scores for the first, second, and fourth analysis methods are "+1”.
  • the scores for the second, third and fourth analysis methods are "-1”
  • the score for the first analysis method is "+1”.
  • the feature extraction unit 17 integrates scores for each of the first to fourth analysis methods.
  • the score values are not limited to "+1” and “-1", but “+2", “+1", "-1", and “-” according to the degree of "appropriate” and "inappropriate”. It may be a numerical value such as 2”.
  • the feature extraction unit 17 extracts features of each analysis method based on the above-described integrated score value. For example, the feature is extracted that the analysis method with the highest score among the four analysis methods is suitable for the analysis of target time-series data.
  • the feature extraction unit 17 records the extracted features in a storage device (not shown). Alternatively, the features already recorded in storage are modified based on the extracted features.
  • step S19 the data analysis unit 12 determines whether or not the first to fourth analysis methods require modification. For example, as shown in FIG. 7A, it is determined that the third analysis method is not suitable for the analysis of data pair a11. judge. If it is determined that correction is necessary (S19; YES), the process proceeds to step S20; otherwise (S19; NO), this process ends.
  • step S20 the data analysis unit 12 corrects or makes the target analysis method inappropriate. After that, this process is terminated. In this way, it is possible to extract the characteristics of the analysis method for analyzing the similarity of time-series data.
  • the feature extraction device 1 uses the combination unit 11 that generates a plurality of data pairs by combining two pieces of time-series data, and uses a plurality of analysis methods to extract two data pairs included in each data pair.
  • An analysis unit data analysis unit 12 that analyzes the similarity of two pieces of time-series data, and an occurrence probability calculation unit 13 that calculates the occurrence probability of the similarity of each data pair for each analysis method based on the analysis result of the analysis unit.
  • a deviation calculation unit 14 that calculates the deviation of the appearance probability calculated for each analysis method for each data pair, and each time series data included in the data pair and the deviation are visualized and presented to the user. It comprises a visualization unit 15, an input unit 16 that receives input for determining similarity or dissimilarity from the user, and a feature extraction unit 17 that extracts features of the analysis method based on the determination input and the degree of divergence. ing.
  • the feature extraction device 1 configured as described above, it is possible to extract features that indicate to which type of time series data an analysis method for analyzing time series data is suitable or not. Therefore, when a user such as a data scientist analyzes time-series data using a data analysis device, it is possible to support the user in selecting an appropriate analysis method from among the multiple analysis methods that the user has in stock. becomes.
  • the visualization unit 15 visualizes only a predetermined number of analysis results with a large degree of divergence calculated by the degree of divergence calculation unit 14 .
  • a degree of divergence 0.6 or more are visualized. Therefore, it is possible to omit the visualization of analysis results with a small degree of divergence. That is, the fact that the degree of divergence for all the four analysis methods is small means that the analysis values obtained by the four analysis methods are almost the same numerical value, and it is considered that the need for user intervention is low. .
  • By visualizing only a predetermined number of analysis results with a large divergence it is possible to reduce the user's effort.
  • feature data of each time-series data recognized in advance is recorded in the recording unit 18, and by displaying this feature data on the display unit of the visualization unit 15, the user can judge the appropriateness of each analysis method. It can be used as a reference at times.
  • the feature data "affected by seasonal variation” is recorded in the recording unit 18 for the data pair a1 including the time-series data of "vegetables/seaweed".
  • the recording unit 18 records characteristic data that "commodity prices change stepwise".
  • FIG. 10 is a block diagram showing the configuration of the feature extraction device 1a and its peripherals according to the second embodiment.
  • the second embodiment differs from the above-described first embodiment in that a selector 19 is provided. Therefore, the components other than the selection unit 19 are denoted by the same reference numerals, and description of the configuration is omitted.
  • the selection unit 19 selects time-series data included in one data pair in which the appearance probability of the similarity of one data pair is close to the appearance probability of the similarity of the other data pair among the plurality of data pairs. and when the time-series data included in the other data pair are the same or similar, another data pair is selected.
  • the selection unit 19 selects data pairs having similar time-series data from among the data pairs generated by the combination unit 11 .
  • the visualization unit 15 excludes the appearance probabilities of the data pairs selected by the selection unit 19 and visualizes them.
  • FIG. 11A is a diagram showing a normalized distribution curve of analysis results obtained by analyzing a plurality of data pairs with the first analysis method.
  • FIG. 11B is a diagram showing two pieces of time-series data forming a data pair and the degree of divergence d1-j.
  • the data pairs x1, x2, and x3 shown in FIG. 11B all include time-series data of "university tuition". Also, in FIG. 11A, the locations where the data pairs x1, x2, x3 are plotted are approximate. Therefore, two of these three data pairs x1, x2, x3 are considered redundant and unnecessary.
  • the selection unit 19 excludes the data pair x2 and x3 from the analysis target.
  • Data pair x4 shown in FIG. 11B includes time-series data for "Chinese noodles", and data pair x5 includes time-series data for "soba”. Also, in FIG. 11A, the locations where data pairs x4, x5 are plotted are approximate. Therefore, one of these two data pairs x4, x5 is considered redundant and unnecessary.
  • the selection unit 19 excludes the data pair x5 from the analysis target.
  • the feature extraction device 1a analyzes data by excluding other data pairs similar to one data pair from a plurality of data pairs. can reduce the load required for
  • the selection unit 19 determines that the appearance probability of the similarity of one data pair and the appearance probability of the similarity of the other data pair are close to each other among the plurality of data pairs, and form one data pair. If the time-series data that constitutes another data pair is the same or similar to the time-series data that constitutes another data pair, another data pair is selected. Then, the visualization unit 15 excludes the appearance probabilities of the data pairs selected by the selection unit 19 and displays them on the display unit. Therefore, display of unnecessary data can be avoided, and the computational load can be reduced.
  • the feature extraction device 1 of the present embodiment described above includes, for example, a CPU (Central Processing Unit, processor) 901, a memory 902, and a storage 903 (HDD: HardDisk Drive, SSD: Solid State Drive). , a communication device 904, an input device 905, and an output device 906.
  • a general-purpose computer system can be used.
  • Memory 902 and storage 903 are storage devices. In this computer system, each function of the feature extraction device 1 is realized by the CPU 901 executing a predetermined program loaded on the memory 902 .
  • the feature extraction device 1 may be implemented by one computer, or may be implemented by a plurality of computers. Also, the feature extraction device 1 may be a virtual machine implemented on a computer.
  • the program for the feature extraction device 1 can be stored in computer-readable recording media such as HDD, SSD, USB (Universal Serial Bus) memory, CD (Compact Disc), DVD (Digital Versatile Disc), etc. It can also be delivered via
  • Reference Signs List 1 1a feature extraction device 2 database 11 combination unit 12 data analysis unit (analysis unit) 13 Appearance probability calculation unit 14 Deviation degree calculation unit 15 Visualization unit 16 Input unit 17 Feature extraction unit 18 Recording unit 19 Selection unit 21 First analysis unit 22 Second analysis unit 23 Third analysis unit 24 Fourth analysis unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This feature extraction device comprises: a combining unit (11) that generates a plurality of data pairs that are combinations of two sets of time-series data; a data analysis unit (12) that analyzes the degree of similarity between the two sets of time-series data included in each data pair using a plurality of analysis methods; and an occurrence probability calculation unit (13) that calculates, for each analysis method, the probability of occurrence of the degree of similarity for each data pair. The feature extraction device further comprises: a deviation level calculation unit (14) that calculates a deviation level of the probability of occurrence for each data pair calculated for each analysis method; a visualization unit (15) that visualizes and presents each set of time-series data included in a data pair and the deviation level to a user; an input unit (16) that receives similarity or dissimilarity determination input from the user; and a feature extraction unit (17) that extracts a feature of each analysis method on the basis of the determination input.

Description

特徴抽出装置および特徴抽出方法ならびに特徴抽出プログラムFeature extraction device, feature extraction method, and feature extraction program
 本発明は、特徴抽出装置および特徴抽出方法ならびに特徴抽出プログラムに関する。 The present invention relates to a feature extraction device, a feature extraction method, and a feature extraction program.
 時系列データを分析するデータ分析装置として、特許文献1に開示されたものが知られている。特許文献1には、分析対象となる各データの時間的な変化量を示す指標値を算出し、指標値に基づく順番で複数の時系列データをグラフ化したものを並べて表示することが記載されている。特許文献1では、例えば特定の時期に大きく変化しているデータに注目することができるので、データ分析を支援することが可能になる。 As a data analysis device for analyzing time-series data, the one disclosed in Patent Document 1 is known. Patent Document 1 describes calculating an index value indicating the amount of change over time in each data to be analyzed, and displaying a plurality of graphs of time-series data arranged in order based on the index value. ing. In Patent Literature 1, for example, it is possible to focus on data that changes significantly at a specific time, so it is possible to support data analysis.
特許第6592411号公報Japanese Patent No. 6592411
 しかし、上述した特許文献1では、時系列データを分析する分析方法が複数存在するときに、この時系列データを分析するために、複数の分析方法のうちどの分析方法が適しているかを判定することについて言及されていない。 However, in Patent Document 1 described above, when there are multiple analysis methods for analyzing time-series data, it is determined which analysis method is suitable for analyzing this time-series data. not mentioned about it.
 このため、対象となる時系列データを分析する際に、適切な分析方法を選択することが難しいという問題があった。 For this reason, there was a problem that it was difficult to select an appropriate analysis method when analyzing the target time-series data.
 本発明は、上記事情に鑑みてなされたものであり、その目的とするところは、時系列データを分析する分析方法の特徴を抽出することが可能な特徴抽出装置および特徴抽出方法ならびに特徴抽出プログラムを提供することにある。 The present invention has been made in view of the above circumstances, and its object is to provide a feature extraction device, a feature extraction method, and a feature extraction program capable of extracting features of an analysis method for analyzing time-series data. is to provide
 本発明の一態様の特徴抽出装置は、2つの時系列データを組み合わせたデータ対を複数生成する組み合わせ部と、複数の分析方法を用いて、各データ対に含まれる2つの時系列データの類似度を分析する分析部と、前記分析部による分析結果に基づいて、各データ対の類似度の出現確率を分析方法ごとに算出する出現確率算出部と、各データ対について、分析方法ごとに算出される前記出現確率の乖離度を算出する乖離度算出部と、データ対に含まれる各時系列データ、及び前記乖離度を可視化してユーザに提示する可視化部と、前記ユーザによる、類似、非類似の判定入力を受け付ける入力部と、前記判定入力及び前記乖離度に基づいて、前記分析方法の特徴を抽出する特徴抽出部と、を備える。 A feature extraction device according to one aspect of the present invention uses a combining unit that generates a plurality of data pairs by combining two pieces of time-series data, and a plurality of analysis methods to determine the similarity of two pieces of time-series data included in each data pair. an analysis unit that analyzes the degree of similarity, an occurrence probability calculation unit that calculates the occurrence probability of the similarity of each data pair for each analysis method based on the analysis result by the analysis unit, and a probability of occurrence for each data pair, calculated for each analysis method a deviation calculation unit that calculates the deviation of the occurrence probability, a visualization unit that visualizes each time-series data included in the data pair and the deviation and presents it to the user; An input unit that receives a similar determination input, and a feature extraction unit that extracts features of the analysis method based on the determination input and the degree of divergence.
 本発明の一態様の特徴抽出方法は、2つの時系列データを組み合わせたデータ対を複数生成するステップと、複数の分析方法を用いて、各データ対に含まれる2つの時系列データの類似度を分析するステップと、分析された類似度に基づいて、各データ対の類似度の出現確率を分析方法ごとに算出するステップと、各データ対について、分析方法ごとに算出される前記出現確率の乖離度を算出するステップと、データ対に含まれる各時系列データ、及び前記乖離度を可視化してユーザに提示するステップと、前記ユーザによる、類似、非類似の判定入力を受け付けるステップと、前記判定入力及び前記乖離度に基づいて、前記分析方法の特徴を抽出するステップと、を備える。 A feature extraction method according to one aspect of the present invention includes steps of generating a plurality of data pairs by combining two time-series data, and using a plurality of analysis methods to determine the similarity of the two time-series data contained in each data pair. calculating the occurrence probability of the similarity of each data pair for each analysis method based on the analyzed similarity; and for each data pair, the occurrence probability calculated for each analysis method a step of calculating a degree of divergence; a step of visualizing each time-series data included in a data pair and the degree of divergence and presenting it to a user; and extracting features of the analysis method based on the judgment input and the degree of divergence.
 本発明の一態様は、上記特徴抽出装置としてコンピュータを機能させるための特徴抽出プログラムである。 One aspect of the present invention is a feature extraction program for causing a computer to function as the feature extraction device.
 本発明によれば、時系列データを分析する分析方法の特徴を抽出することが可能になる。 According to the present invention, it is possible to extract the characteristics of the analysis method for analyzing time-series data.
図1は、第1実施形態に係る特徴抽出装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the feature extraction device according to the first embodiment. 図2は、時系列データ、及び2つの時系列データを組み合わせたデータ対の例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of time-series data and a data pair obtained by combining two time-series data. 図3は、第1~第4の分析方法を示す説明図である。FIG. 3 is an explanatory diagram showing first to fourth analysis methods. 図4Aは、複数のデータ対に対して、第1~第4の分析方法で算出した分析値を示す説明図である。FIG. 4A is an explanatory diagram showing analysis values calculated by the first to fourth analysis methods for a plurality of data pairs. 図4Bは、図4Aに示した分析値の分布曲線を示す説明図であり、(a)は第1の分析方法による分布曲線、(b)は第1の分析方法による分布曲線、(c)は第3の分析方法による分布曲線、(d)は第4の分析方法による分布曲線を示す。FIG. 4B is an explanatory diagram showing the distribution curve of the analysis values shown in FIG. 4A, (a) is the distribution curve by the first analysis method, (b) is the distribution curve by the first analysis method, and (c) is the distribution curve by the third analysis method, and (d) is the distribution curve by the fourth analysis method. 図5は、図4Bに示した分布曲線s1を正規化した正規化曲線を示す説明図である。FIG. 5 is an explanatory diagram showing a normalized curve obtained by normalizing the distribution curve s1 shown in FIG. 4B. 図6は、複数のデータ対、各データ対を4つの分析方法で分析した類似度の出現確率、及び出現確率の乖離度を示す図である。FIG. 6 is a diagram showing a plurality of data pairs, appearance probabilities of similarities obtained by analyzing each data pair by four analysis methods, and deviations of the appearance probabilities. 図7Aは、データ対を構成する2つの時系列データ、及び4つの分析方法で算出した出現確率を示す図である。FIG. 7A is a diagram showing two pieces of time-series data forming a data pair and appearance probabilities calculated by four analysis methods. 図7Bは、データ対を構成する2つの時系列データ、及び4つの分析方法で算出した出現確率を示す図である。FIG. 7B is a diagram showing two pieces of time-series data forming a data pair and appearance probabilities calculated by four analysis methods. 図8は、第1実施形態に係る特徴抽出装置の処理手順を示すフローチャートである。FIG. 8 is a flow chart showing the processing procedure of the feature extraction device according to the first embodiment. 図9は、記録部18により取得される各時系列データの特徴パターンを示す説明図である。FIG. 9 is an explanatory diagram showing characteristic patterns of each piece of time-series data acquired by the recording unit 18. As shown in FIG. 図10は、第2実施形態に係る特徴抽出装置の構成を示すブロック図である。FIG. 10 is a block diagram showing the configuration of the feature extraction device according to the second embodiment. 図11Aは、複数のデータ対の分析結果、及び分析結果の正規化曲線を示す説明図である。FIG. 11A is an explanatory diagram showing analysis results of multiple data pairs and a normalization curve of the analysis results. 図11Bは、データ対を構成する時系列データの項目と各データ対の出現確率の乖離度を示す説明図である。FIG. 11B is an explanatory diagram showing the degree of divergence between time-series data items forming data pairs and the appearance probability of each data pair. 図12は、本実施形態のハードウェア構成を示すブロック図である。FIG. 12 is a block diagram showing the hardware configuration of this embodiment.
 以下、本発明の実施形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 [第1実施形態]
 図1は、第1実施形態に係る特徴抽出装置の構成を示すブロック図である。
[First Embodiment]
FIG. 1 is a block diagram showing the configuration of the feature extraction device according to the first embodiment.
 図1に示すように、第1実施形態に係る特徴抽出装置1は、データベース2(図では、「DB」と表記)に接続されている。特徴抽出装置1は、組み合わせ部11と、データ分析部12(分析部)と、出現確率算出部13と、乖離度算出部14と、可視化部15と、入力部16と、特徴抽出部17と、記録部18と、を備えている。 As shown in FIG. 1, the feature extraction device 1 according to the first embodiment is connected to a database 2 (denoted as "DB" in the figure). The feature extraction device 1 includes a combination unit 11, a data analysis unit 12 (analysis unit), an occurrence probability calculation unit 13, a divergence calculation unit 14, a visualization unit 15, an input unit 16, and a feature extraction unit 17. , and a recording unit 18 .
 データベース2は、複数(m個)の時系列データqi(i=1~m)を記憶する。時系列データqiは、例えば、総務省統計局が提供している消費者物価指数である。 The database 2 stores a plurality (m) of time-series data qi (i=1 to m). The time-series data qi is, for example, the consumer price index provided by the Statistics Bureau, Ministry of Internal Affairs and Communications.
 組み合わせ部11は、2つの時系列データを組み合わせたデータ対を生成する。具体的には、組み合わせ部11は、データベース2に記憶されている各時系列データqiから、2つを選択して組み合わせたデータ対aj(j=1~n)を設定する。図2は、2つの時系列データを組み合わせてデータ対ajを設定する例を示す説明図である。図2に示すように、時系列データq1とq2を組み合わせてデータ対a1を生成する。時系列データq2とq3を組み合わせてデータ対a2を生成する。時系列データq1とq4を組み合わせてデータ対a3を生成する。時系列データq3とq4を組み合わせてデータ対a4を生成する。 The combination unit 11 generates a data pair by combining two pieces of time-series data. Specifically, the combination unit 11 selects and combines two pieces of time-series data qi stored in the database 2 to set data pairs aj (j=1 to n). FIG. 2 is an explanatory diagram showing an example of setting a data pair aj by combining two pieces of time-series data. As shown in FIG. 2, time-series data q1 and q2 are combined to generate data pair a1. A data pair a2 is generated by combining the time-series data q2 and q3. A data pair a3 is generated by combining the time-series data q1 and q4. A data pair a4 is generated by combining the time-series data q3 and q4.
 時系列データがm個の場合には、「m*(m-1)/2」個のデータ対が設定される。即ち、「n=m*(m-1)/2」である。例えば、時系列データが380個ある場合には、(380*379)/2=72010個のデータ対が設定される。 When there are m pieces of time-series data, "m*(m-1)/2" data pairs are set. That is, "n=m*(m-1)/2". For example, when there are 380 pieces of time-series data, (380*379)/2=72010 data pairs are set.
 データ分析部12は、複数の分析方法を用いて各データ対に含まれる2つの時系列データの類似度を分析する。具体的には、データ分析部12は、組み合わせ部11で設定されたデータ対ajを分析するための複数の分析方法の演算プログラムを備える。データ分析部12は、第1の分析方法によりデータ対を分析する第1分析部21と、第2の分析方法によりデータ対を分析する第2分析部22と、第3の分析方法によりデータ対を分析する第3分析部23と、第4の分析方法によりデータ対を分析する第4分析部24を備えている。 The data analysis unit 12 analyzes the degree of similarity between the two pieces of time-series data included in each data pair using a plurality of analysis methods. Specifically, the data analysis unit 12 is provided with computation programs for a plurality of analysis methods for analyzing the data pairs aj set by the combination unit 11 . The data analysis unit 12 includes a first analysis unit 21 that analyzes data pairs according to a first analysis method, a second analysis unit 22 that analyzes data pairs according to a second analysis method, and a data pair analysis unit that analyzes data pairs according to a third analysis method. and a fourth analysis unit 24 for analyzing data pairs by a fourth analysis method.
 データ分析部12は、第1~第4の分析方法により、データ対ajを構成する2つの時系列データの類似度を分析し、分析した結果を分析値として出力する。なお、本実施形態では、4つの分析方法を用いる例について示すが、4つ以外の分析方法を用いてもよい。以下、図3を参照して、第1分析方法~第4の分析方法の具体的な処理について説明する。 The data analysis unit 12 analyzes the degree of similarity between the two pieces of time-series data forming the data pair aj by the first to fourth analysis methods, and outputs the analysis result as an analysis value. In this embodiment, an example using four analysis methods is shown, but analysis methods other than four may be used. Specific processing of the first to fourth analysis methods will be described below with reference to FIG.
 第1の分析方法は、図3(a)に示すように、2つの時系列データ間の絶対値の差分を積算する。具体的には、所定の時間間隔ごとに得られる一方の時系列データと他方の時系列データの差分の絶対値を算出し、一定の期間内において差分の絶対値を積算する。積算した数値を分析値として出力する。2つの時系列データの類似度が高いほど、分析値は小さい数値となる。 In the first analysis method, as shown in FIG. 3(a), the difference in absolute value between two pieces of time-series data is integrated. Specifically, the absolute value of the difference between one time-series data and the other time-series data obtained at predetermined time intervals is calculated, and the absolute values of the differences are integrated within a certain period. Output the integrated value as the analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
 第2の分析方法は、図3(b)に示すように、それぞれの時系列データの時間経過に対する変化量を算出し、算出した変化量の差分を積算する。具体的には、所定の時間間隔ごとに得られる一方の時系列データと他方の時系列データの変化量の差分を算出し、一定の期間内において差分を積算する。例えば、一方の変化量が「+1」、他方の変化量が「-1」である場合には、差分は「2」である。一方の変化量が「+1」、他方の変化量も同様に「+1」である場合には、差分は「0」である。一方の変化量が「-2」、他方の変化量が「+1」である場合には、差分は「3」である。第2の分析方法は、これらの差分を積算し、積算した数値を分析値として出力する。2つの時系列データの類似度が高いほど、分析値は小さい数値となる。 In the second analysis method, as shown in FIG. 3(b), the amount of change over time in each time-series data is calculated, and the difference between the calculated amounts of change is integrated. Specifically, the difference in variation between one time-series data and the other time-series data obtained at predetermined time intervals is calculated, and the difference is integrated within a certain period. For example, if one change amount is "+1" and the other change amount is "-1", the difference is "2". If one change amount is "+1" and the other change amount is also "+1", the difference is "0". If one change amount is "-2" and the other change amount is "+1", the difference is "3". The second analysis method integrates these differences and outputs the integrated numerical value as an analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
 第3の分析方法は、図3(c)に示すように、それぞれの時系列データの時間経過に対する変化率を算出し、算出した変化率の差分を積算する。具体的には、所定の時間間隔ごとに得られる一方の時系列データと他方の時系列データの変化量の差分を算出し、一定の期間内において差分を積算する。例えば、一方の時系列データが「+3%」、他方の時系列データが「-1%」である場合には、差分は「4」である。一方が「+1%」、他方も同様に「+1%」である場合には、差分は「0」である。第3の分析方法は、一定の期間内においてこれらの差分を積算し、積算した数値を分析値として出力する。2つの時系列データの類似度が高いほど、分析値は小さい数値となる。 The third analysis method, as shown in FIG. 3(c), calculates the rate of change over time for each piece of time-series data, and integrates the difference in the calculated rate of change. Specifically, the difference in variation between one time-series data and the other time-series data obtained at predetermined time intervals is calculated, and the difference is integrated within a certain period. For example, if one time series data is "+3%" and the other time series data is "-1%", the difference is "4". If one is "+1%" and the other is also "+1%", the difference is "0". A third analysis method integrates these differences within a certain period of time and outputs the integrated numerical value as an analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
 第4の分析方法は、図3(d)に示すように、それぞれの時系列データについて、所定の時間間隔ごとに平均値を算出する。更に、前述した第3の分析方法と同様に、一定の期間内において平均値の差分を積算し、積算した数値を分析値として出力する。2つの時系列データの類似度が高いほど、分析値は小さい数値となる。 In the fourth analysis method, as shown in FIG. 3(d), average values are calculated for each time series data at predetermined time intervals. Further, similar to the third analysis method described above, the differences in the average values are integrated within a certain period of time, and the integrated numerical value is output as the analysis value. The higher the degree of similarity between the two pieces of time-series data, the smaller the analysis value.
 第1分析部21~第4分析部24は、第1~第4の分析方法により算出した分析値に基づき、各データ対ajの分析値をプロットした分布曲線を作成する。以下、図4A、図4Bを参照してデータ分析部12の処理について詳細に説明する。 The first analysis unit 21 to fourth analysis unit 24 create a distribution curve plotting the analysis values of each data pair aj based on the analysis values calculated by the first to fourth analysis methods. The processing of the data analysis unit 12 will be described in detail below with reference to FIGS. 4A and 4B.
 図4Aは、複数のデータ対に対して、第1~第4の分析方法で算出した分析値を示す説明図である。データ分析部12は、図1に示すデータベース2に記憶されている各データ対aj(j=1~n)対して、第1~第4の分析方法により演算した分析値「bk-j」を算出する。なお、「k」は分析方法の番号を示し、「j」はデータ対の番号を示す。即ち、「k」は1~4の整数であり、「j」は1~nの整数である。 FIG. 4A is an explanatory diagram showing analysis values calculated by the first to fourth analysis methods for a plurality of data pairs. The data analysis unit 12 calculates an analysis value "bk-j" calculated by the first to fourth analysis methods for each data pair aj (j=1 to n) stored in the database 2 shown in FIG. calculate. Note that "k" indicates the analysis method number, and "j" indicates the data pair number. That is, "k" is an integer from 1 to 4, and "j" is an integer from 1 to n.
 例えば、2つの時系列データ「野菜・海藻」及び「すし(外食)」のデータ対a1に対して、第1の分析方法を用いて算出した分析値を分析値「b1-1」とする。図4Aでは、分析値「b1-1」は「0.05」とされている。データ対a1に対して、第2の分析方法を用いて算出した分析値を分析値「b2-1」とする。図4Aでは、分析値「b2-1」は「0.21」とされている。 For example, the analysis value calculated using the first analysis method for the data pair a1 of the two time-series data "vegetables/seaweed" and "sushi (eating out)" is assumed to be the analysis value "b1-1". In FIG. 4A, the analytical value "b1-1" is "0.05". The analysis value calculated using the second analysis method for the data pair a1 is assumed to be the analysis value "b2-1". In FIG. 4A, the analytical value "b2-1" is "0.21".
 同様に、データ対a2に対して、第3の分析方法を用いて算出した分析値を分析値「b3-2」とする。図4Aでは、分析値「b3-2」は「0.33」とされている。データ対a3に対して、第4の分析方法を用いて算出した分析値を分析値「b4-3」とする。図4Aでは、分析値「b4-3」は「0.64」とされている。これらと同様に、各分析値「bk-j」を算出する。 Similarly, the analysis value calculated using the third analysis method for the data pair a2 is assumed to be the analysis value "b3-2". In FIG. 4A, the analysis value "b3-2" is "0.33". Let the analysis value calculated using the fourth analysis method for the data pair a3 be the analysis value "b4-3". In FIG. 4A, the analysis value "b4-3" is "0.64". Similarly to these, each analysis value "bk-j" is calculated.
 データ分析部12は、第1~第4の分析方法で算出したデータ対ajの分析値の分布曲線を生成する。具体的には、データ対aj(j=1~n)に対して、第1~第4の分析方法を用いて算出した分析値「bk-1~bk-n」(k=1~4)の分布曲線s1~s4を生成する。例えば、図4B(a)~(d)に示すように、「b1-j」、「b2-j」、「b3-j」、「b4-j」(但し、j=1~n)の分布曲線s1~s4を生成する。図4B(a)~(d)は、第1~第4の分析方法による分析値をプロットしたグラフであり、横軸は分析値、縦軸は度数を示している。 The data analysis unit 12 generates distribution curves of analysis values of data pairs aj calculated by the first to fourth analysis methods. Specifically, the analysis values "bk-1 to bk-n" (k = 1 to 4) calculated using the first to fourth analysis methods for data pairs aj (j = 1 to n) to generate distribution curves s1 to s4 of . For example, as shown in (a) to (d) of FIG. Generate curves s1-s4. FIGS. 4B(a) to (d) are graphs plotting the analytical values obtained by the first to fourth analytical methods, where the horizontal axis indicates the analytical value and the vertical axis indicates the frequency.
 図4B(a)では、第1の分析方法によりn個のデータ対a1~anを分析した分析値b1-jをプロットしており、各分析値に沿った曲線を分布曲線s1としている。 In FIG. 4B(a), the analysis values b1-j obtained by analyzing n data pairs a1 to an by the first analysis method are plotted, and the curve along each analysis value is the distribution curve s1.
 図4B(b)では、第2の分析方法によりn個のデータ対a1~anを分析した分析値b2-jをプロットしており、各分析値に沿った曲線を分布曲線s2としている。 In FIG. 4B(b), the analysis values b2-j obtained by analyzing n data pairs a1 to an by the second analysis method are plotted, and the curve along each analysis value is the distribution curve s2.
 図4B(c)では、第3の分析方法によりn個のデータ対a1~anを分析した分析値b3-jをプロットしており、各分析値に沿った曲線を分布曲線s3としている。 In FIG. 4B(c), the analysis values b3-j obtained by analyzing n data pairs a1 to an by the third analysis method are plotted, and the curve along each analysis value is the distribution curve s3.
 図4B(d)では、第4の分析方法によりn個のデータ対a1~anを分析した分析値b4-jをプロットしており、各分析値に沿った曲線を分布曲線s4としている。 In FIG. 4B(d), the analysis values b4-j obtained by analyzing n data pairs a1 to an by the fourth analysis method are plotted, and the curve along each analysis value is the distribution curve s4.
 図1に戻って、出現確率算出部13は、複数の分析方法で作成された分布曲線s1~s4を正規化する。即ち、図4B(a)~(d)に示した各分布曲線s1~s4は、それぞれを直接比較することはできない。従って、各分布曲線s1~s4を正規化する。例えば、分布曲線s1を正規化することにより、図5に示す正規化曲線s11が得られる。即ち、出現確率算出部13は、データ分析部12の分析結果を正規化して出現確率を算出する。 Returning to FIG. 1, the appearance probability calculation unit 13 normalizes the distribution curves s1 to s4 created by a plurality of analysis methods. That is, the distribution curves s1 to s4 shown in FIGS. 4B(a) to (d) cannot be directly compared with each other. Therefore, each distribution curve s1-s4 is normalized. For example, by normalizing the distribution curve s1, a normalized curve s11 shown in FIG. 5 is obtained. That is, the occurrence probability calculation unit 13 normalizes the analysis result of the data analysis unit 12 to calculate the occurrence probability.
 出現確率算出部13は、データ分析部12による分析結果に基づいて、各データ対の類似度の出現確率を分析方法ごとに算出する。具体的には、出現確率算出部13は、各データ対aj(j=1~n)に対して、第1~第4の分析方法による分析値(即ち、類似度)の出現確率を算出する。出現確率は、分析値の順位を「0~1」の範囲で示す指標である。出現確率が「0」に近いほど、2つの時系列データの類似度が高いことを示す。出現確率が「1」に近いほど2つの時系列データの類似度が低いことを示す。k番目の分析方法によるデータ対ajの分析値「bk-j」の出現確率を「pk-j」で示す。第1の分析方法によるデータ対a1の分析値「b1-1」の出現確率は、「p1-1」である。例えば、第1の分析方法によるデータ対a1の分析値「b1-1」が上位30%に属している場合には、出現確率「p1-1」は「0.3」である。 Based on the analysis result of the data analysis unit 12, the appearance probability calculation unit 13 calculates the appearance probability of the similarity of each data pair for each analysis method. Specifically, the appearance probability calculation unit 13 calculates the appearance probability of the analysis value (that is, similarity) by the first to fourth analysis methods for each data pair aj (j=1 to n). . The appearance probability is an index indicating the rank of the analysis value in the range of "0 to 1". The closer the appearance probability is to "0", the higher the similarity between the two pieces of time-series data. The closer the appearance probability is to "1", the lower the similarity between the two pieces of time-series data. The appearance probability of the analytical value "bk-j" of the data pair aj by the kth analysis method is indicated by "pk-j". The appearance probability of the analysis value "b1-1" of the data pair a1 by the first analysis method is "p1-1". For example, when the analysis value "b1-1" of the data pair a1 by the first analysis method belongs to the top 30%, the occurrence probability "p1-1" is "0.3".
 図5に示す正規化曲線s11において、データ対ajの分析値の出現確率が上位であるほど、データ対ajに含まれる2つの時系列データは類似度が高いことを示している。 In the normalized curve s11 shown in FIG. 5, the higher the appearance probability of the analysis value of the data pair aj, the higher the similarity between the two time-series data included in the data pair aj.
 乖離度算出部14は、各データ対ajについて、分析方法ごとに算出される出現確率の乖離度を算出する。具体的には、乖離度算出部14は、対象となるデータ対ajに対して、4つの分析方法による、出現確率の乖離度を算出する。k番目の分析方法によるデータ対ajの乖離度を「dk-j」で示す。例えば、第1の分析方法によるデータ対a1の乖離度は「d1-1」である。 The divergence calculation unit 14 calculates the divergence of the appearance probability calculated for each analysis method for each data pair aj. Specifically, the divergence calculator 14 calculates the divergence of the occurrence probability for the target data pair aj by four analysis methods. The degree of divergence of the data pair aj by the k-th analysis method is indicated by "dk-j". For example, the deviation of the data pair a1 by the first analysis method is "d1-1".
 乖離度dk-jを、出現確率「p1-j」~「p4-j」を用いて、下記(1)~(4)式のように定義する。 The degree of divergence dk-j is defined as the following formulas (1) to (4) using the appearance probabilities "p1-j" to "p4-j".
 d1-j={(p2-j)+(p3-j)+(p4-j)}/3-(p1-j)…(1)
 d2-j={(p1-j)+(p3-j)+(p4-j)}/3-(p2-j)…(2)
 d3-j={(p1-j)+(p2-j)+(p4-j)}/3-(p3-j)…(3)
 d4-j={(p1-j)+(p2-j)+(p3-j)}/3-(p4-j)…(4)
 上記(1)~(4)式から理解されるように、k番目の分析方法による出現確率pk-jの乖離度とは、k番目の分析方法による出現確率と、k番目の分析方法以外の3つの分析方法による出現確率の平均との差分を示す数値である。従って、一の分析方法により算出した出現確率と、他の3つの分析方法により算出した出現確率の平均との差分が大きいほど、一の分析方法により算出した出現確率の乖離度は大きい数値となる。
d1-j={(p2-j)+(p3-j)+(p4-j)}/3-(p1-j) (1)
d2-j={(p1-j)+(p3-j)+(p4-j)}/3-(p2-j) (2)
d3-j={(p1-j)+(p2-j)+(p4-j)}/3-(p3-j) (3)
d4-j={(p1-j)+(p2-j)+(p3-j)}/3-(p4-j) (4)
As can be understood from the above formulas (1) to (4), the degree of deviation of the occurrence probability pk-j by the k-th analysis method is the occurrence probability by the k-th analysis method and the probability of occurrence by the k-th analysis method other than It is a numerical value indicating the difference from the average of appearance probabilities obtained by the three analysis methods. Therefore, the greater the difference between the occurrence probability calculated by one analysis method and the average of the occurrence probabilities calculated by the other three analysis methods, the greater the divergence of the occurrence probabilities calculated by the one analysis method. .
 乖離度算出部14は、分析方法ごとに、上記(1)~(4)式で算出した乖離度dk-jの絶対値が大きい順にデータ対ajを並べ替える処理を実行する。図6は、第1の分析方法を用いて算出した出現確率p1-jの乖離度d1-jを、大きい順に並べ替えたデータを示す説明図である。例えば、「野菜・海藻」及び「すし(外食)」の時系列データのデータ対、「カップ麺」及び「生鮮食品」の時系列データのデータ対、・・の順にデータ対(品目1と品目2の組み合わせ)が並べ替えられている。 The divergence calculation unit 14 performs a process of rearranging the data pairs aj in descending order of the absolute values of the divergence dk-j calculated by the above equations (1) to (4) for each analysis method. FIG. 6 is an explanatory diagram showing data obtained by rearranging the degrees of divergence d1-j of the appearance probabilities p1-j calculated using the first analysis method in descending order. For example, a pair of time-series data for “vegetables/seaweed” and “sushi (eating out),” a pair of time-series data for “cup noodles” and “fresh food,” and so on (item 1 and item 2) are rearranged.
 可視化部15は、データ対ajに含まれる各時系列データ、及び乖離度を可視化してユーザに提示する。具体的には、可視化部15は、ディスプレイなどの表示部(図示省略)を有しており、乖離度算出部14で算出された乖離度dk-jが一定値よりも大きいもの(例えば、0.6を超えるもの)のデータ対のグラフを、表示部に画面表示する。例えば、図7A、図7Bに示すグラフ及び出現確率のデータを表示部に画面表示する。即ち、可視化部15は、乖離度算出部14で算出される乖離度が大きい所定数の分析結果のみを可視化する。 The visualization unit 15 visualizes each time-series data and the degree of divergence included in the data pair aj and presents them to the user. Specifically, the visualization unit 15 has a display unit (not shown) such as a display, and the degree of divergence dk-j calculated by the degree of divergence calculation unit 14 is larger than a certain value (for example, 0 .6) data pairs are displayed on the display. For example, the graphs and appearance probability data shown in FIGS. 7A and 7B are displayed on the display unit. That is, the visualization unit 15 visualizes only a predetermined number of analysis results with a large degree of divergence calculated by the degree of divergence calculation unit 14 .
 図7A(a)は、時系列データq11(例えば、家庭用耐久財)、及び時系列データq12(例えば、家具・家事用品)のデータ対a11のグラフを示し、図7A(b)は第1~第4の分析方法を用いて算出したデータ対a11の出現確率を示している。図7B(a)は、時系列データq13(例えば、野菜・海藻)、及び時系列データq14(例えば、すし(外食))のデータ対a12のグラフを示し、図7B(b)は第1~第4の分析方法を用いて算出したデータ対a12の出現確率を示している。 FIG. 7A(a) shows a graph of data pair a11 of time-series data q11 (for example, household durable goods) and time-series data q12 (for example, furniture/household goods), and FIG. to the occurrence probability of the data pair a11 calculated using the fourth analysis method. FIG. 7B (a) shows a graph of data pair a12 of time-series data q13 (eg, vegetables/seaweed) and time-series data q14 (eg, sushi (eating out)), and FIG. It shows the appearance probability of the data pair a12 calculated using the fourth analysis method.
 可視化部15は、図7A、図7Bに示すデータを表示部に表示する。ユーザは、表示部を見ることにより、表示された情報を認識することができる。 The visualization unit 15 displays the data shown in FIGS. 7A and 7B on the display unit. A user can recognize the displayed information by looking at the display unit.
 入力部16は、ユーザによる類似または非類似の判定入力を受け付ける。具体的には、入力部16は、キーボード等の操作機器を備えており、可視化部15に表示されている情報に対する類似または非類似の判定入力を受け付ける。例えば、図7Aに示したように時系列データq11、q12は乖離しているので、ユーザにより非類似の判定結果が入力される。他方、図7Bに示したように時系列データq13、q14は接近しているので、ユーザにより類似の判定結果が入力される。 The input unit 16 accepts similarity or dissimilarity judgment input by the user. Specifically, the input unit 16 is equipped with an operating device such as a keyboard, and receives input for determination of similarity or dissimilarity to the information displayed on the visualization unit 15 . For example, as shown in FIG. 7A, the time-series data q11 and q12 are separated, so the user inputs the dissimilarity determination result. On the other hand, since the time-series data q13 and q14 are close to each other as shown in FIG. 7B, the user inputs similar judgment results.
 特徴抽出部17は、判定入力及び乖離度に基づいて分析方法の特徴を抽出する。具体的には、特徴抽出部17は、入力部16で入力された判定入力に基づいて、各分析方法の特徴を抽出する。例えば、図7A(a)に示したデータ対a11の時系列データq11、q12は、グラフが乖離しており類似度が低い。従って、データ対a11の出現確率は大きい数値になるはずである。図7A(b)に示すように、第3の分析方法により算出された出現確率は、小さい数値となっている。特徴抽出部17は、時系列データq11、q12の分析に対して、第3の分析方法は適さないという特徴を抽出する。即ち、特徴抽出部17は、分析方法の特徴として、当該分析方法による分析に適さない時系列データを抽出する。 The feature extraction unit 17 extracts features of the analysis method based on the judgment input and the degree of divergence. Specifically, the feature extraction unit 17 extracts features of each analysis method based on the determination input input by the input unit 16 . For example, the graphs of the time-series data q11 and q12 of the data pair a11 shown in FIG. 7A(a) are divergent and have a low degree of similarity. Therefore, the probability of appearance of the data pair a11 should be a large number. As shown in FIG. 7A(b), the appearance probability calculated by the third analysis method is a small numerical value. The feature extraction unit 17 extracts the feature that the third analysis method is not suitable for the analysis of the time-series data q11 and q12. That is, the feature extraction unit 17 extracts time-series data unsuitable for analysis by the analysis method as the feature of the analysis method.
 また、図7B(a)に示したデータ対a12の時系列データq13、q14は、グラフが接近しており類似度が高い。従って、データ対a12の出現確率は小さい数値になるはずである。図7B(b)に示すように、第2、第3、第4の分析方法により算出された出現確率は、大きい数値となっている。特徴抽出部17は、時系列データq11、q12の分析に対して、第2、第3、第4の分析方法は適さないという特徴を抽出する。特徴抽出部17は、記憶装置(図示省略)を備えており、抽出した特徴を記憶装置に記憶する。 Also, the graphs of the time-series data q13 and q14 of the data pair a12 shown in FIG. 7B(a) are close to each other and have a high degree of similarity. Therefore, the probability of occurrence of the data pair a12 should be a small numerical value. As shown in FIG. 7B(b), the appearance probabilities calculated by the second, third, and fourth analysis methods are large numerical values. The feature extraction unit 17 extracts features that the second, third, and fourth analysis methods are not suitable for the analysis of the time-series data q11 and q12. The feature extraction unit 17 includes a storage device (not shown), and stores the extracted features in the storage device.
 記録部18は、時系列データの特性データを記録する。例えば、「野菜・海藻」については、季節の移り変わりにより影響されるという特性が予め認識されているので、この特性データを記録する。また、「自動車免許手数料」については、階段状に金額が変化するという特性が予め認識されているので、この特性データを記録する。また、上述した可視化部15は、データ対を構成する各時系列データ及び乖離度に加えて、時系列データの特性を可視化してもよい。 The recording unit 18 records characteristic data of time-series data. For example, for "vegetables/seaweed", it is recognized in advance that the characteristics are affected by the change of seasons, so this characteristic data is recorded. As for the "driver's license fee", since it is recognized in advance that the amount varies stepwise, this characteristic data is recorded. Further, the visualization unit 15 described above may visualize characteristics of time-series data in addition to each time-series data and the degree of divergence that constitute a data pair.
 次に、図8に示すフローチャートを参照して第1実施形態に係る特徴抽出装置1の動作について説明する。初めに、図8のステップS11において、組み合わせ部11は、データベース2に記憶されている複数の時系列データqi(i=1~m)を組み合わせることにより、データ対ajを生成する。時系列データqiがm個の場合には、「m*(m-1)/2」個のデータ対が生成される。 Next, the operation of the feature extraction device 1 according to the first embodiment will be described with reference to the flowchart shown in FIG. First, in step S11 of FIG. 8, the combining unit 11 combines a plurality of time-series data qi (i=1 to m) stored in the database 2 to generate data pairs aj. When there are m pieces of time-series data qi, “m*(m−1)/2” data pairs are generated.
 ステップS12において、データ分析部12は、複数の分析方法により各データ対ajを分析して分析値を算出する。具体的には、第1分析部21は第1の分析方法を用いて各データ対ajの分析値を算出する。第2分析部22は第2の分析方法を用いて各データ対ajの分析値を算出する。第3分析部23は第3の分析方法を用いて各データ対ajの分析値を算出する。第4分析部24は第4の分析方法を用いて各データ対ajの分析値を算出する。 In step S12, the data analysis unit 12 analyzes each data pair aj by a plurality of analysis methods to calculate an analysis value. Specifically, the first analysis unit 21 calculates the analysis value of each data pair aj using the first analysis method. The second analysis unit 22 calculates the analysis value of each data pair aj using the second analysis method. The third analysis unit 23 calculates the analysis value of each data pair aj using the third analysis method. The fourth analysis unit 24 calculates the analysis value of each data pair aj using the fourth analysis method.
 更に、データ分析部12は、各分析方法で算出した分析値の分布曲線を生成する。具体的には、図4B(a)~(d)に示したように、第1の分析方法で算出した分析値の分布曲線s1、第2の分析方法で算出した分析値の分布曲線s2、第3の分析方法で算出した分析値の分布曲線s3、第4の分析方法で算出した分析値の分布曲線s4を生成する。 Furthermore, the data analysis unit 12 generates a distribution curve of analysis values calculated by each analysis method. Specifically, as shown in FIGS. 4B (a) to (d), the distribution curve s1 of the analysis values calculated by the first analysis method, the distribution curve s2 of the analysis values calculated by the second analysis method, A distribution curve s3 of the analysis values calculated by the third analysis method and a distribution curve s4 of the analysis values calculated by the fourth analysis method are generated.
 ステップS13において、出現確率算出部13は、各分布曲線s1~s4を正規化した正規化曲線を生成する。例えば、図5に示した正規化曲線s11を生成する。 In step S13, the occurrence probability calculation unit 13 generates normalized curves obtained by normalizing the distribution curves s1 to s4. For example, the normalized curve s11 shown in FIG. 5 is generated.
 ステップS14において乖離度算出部14は、正規化曲線s11に基づいて、各データ対ajの出現確率を算出する。例えば、図5に示すように、対象となるデータ対が全体の上位30%に属している場合には、出現確率を「0.3」に設定する。また、上位70%に属している場合には、出現確率を「0.7」に設定する。 In step S14, the divergence calculation unit 14 calculates the appearance probability of each data pair aj based on the normalization curve s11. For example, as shown in FIG. 5, when the target data pair belongs to the top 30% of all, the appearance probability is set to "0.3". Moreover, when it belongs to the top 70%, the appearance probability is set to "0.7".
 ステップS15において、乖離度算出部14は、各出現確率の乖離度を算出する。具体的には、前述した(1)~(4)式により、各分析方法のデータ対ajの出現確率を算出する。更に、乖離度算出部14は、乖離度が大きい順にデータ対ajを並べ替える処理を実行する。その結果、例えば図6に示したように、第1の分析方法を用いて算出した出現確率p1-jの乖離度d1-jを大きい順に並べたデータが得られる。 In step S15, the divergence calculation unit 14 calculates the divergence of each appearance probability. Specifically, the appearance probability of the data pair aj for each analysis method is calculated using the formulas (1) to (4) described above. Furthermore, the divergence degree calculation unit 14 executes processing for rearranging the data pairs aj in descending order of the degree of divergence. As a result, for example, as shown in FIG. 6, data is obtained in which the degrees of divergence d1-j of the appearance probabilities p1-j calculated using the first analysis method are arranged in descending order.
 例えば、「野菜・海藻」と「すし(外食)」のデータ対においては、第1の分析方法を用いて算出した出現確率は、「0.0473」であり、第2~第4の分析方法を用いて算出した出現確率は、およそ「1.0000」である。このため、第1の分析方法で算出した出現確率は、他の3つの分析方法で算出した出現確率との差分が大きい数値となっており、乖離度が「0.926428」と高い数値になっている。 For example, in the data pair of “vegetables/seaweed” and “sushi (eating out)”, the appearance probability calculated using the first analysis method is “0.0473”, and the second to fourth analysis methods is approximately "1.0000". Therefore, the appearance probability calculated by the first analysis method has a large difference from the appearance probability calculated by the other three analysis methods, and the deviation is a high value of 0.926428. ing.
 ステップS16において、可視化部15は、乖離度d1-jが大きい(例えば、0.6以上のデータ対)と判定されたデータ対のグラフ、及び出現確率のデータを表示部(図示省略)に画面表示する。即ち、データ対のグラフ、及び出現確率のデータを可視化する。例えば、図7A、図7Bに示す情報を画面表示する。 In step S16, the visualization unit 15 displays a graph of the data pairs determined to have a large degree of divergence d1-j (for example, data pairs of 0.6 or more) and the data of the appearance probability on a display unit (not shown). indicate. That is, it visualizes a graph of data pairs and data of occurrence probabilities. For example, the information shown in FIGS. 7A and 7B is displayed on the screen.
 ユーザは、この画面を視認することにより、各分析方法による分析結果の正当性を判定する。例えば、図7A(a)に示すグラフでは、2つの時系列データq11、q12は類似していない。従って、出現確率は大きい数値(「1」に近い数値)になるものと推察される。図7A(b)に示すデータでは、第1、第2、第4の分析方法により算出した出現確率は「1」に近い数値を示しており、第3の分析方法により算出した出現確率は上記3つの分析方法から乖離した数値「0.16」となっている。この場合には、第3の分析方法を採用した分析値は、不適切であり、第1、第2、第4の分析方法を採用した分析値は適切であると想定される。 By viewing this screen, the user determines the validity of the analysis results obtained by each analysis method. For example, in the graph shown in FIG. 7A(a), the two time-series data q11 and q12 are not similar. Therefore, it is presumed that the probability of occurrence will be a large numerical value (a numerical value close to "1"). In the data shown in FIG. 7A (b), the appearance probabilities calculated by the first, second, and fourth analysis methods show a value close to "1", and the appearance probability calculated by the third analysis method is the above It is a numerical value "0.16" that diverges from the three analysis methods. In this case, it is assumed that the analysis values obtained by adopting the third analysis method are inappropriate, and the analysis values obtained by adopting the first, second and fourth analysis methods are appropriate.
 一方、図7B(a)に示すグラフでは、2つの時系列データq13、q14は類似している。従って、出現確率は小さい数値(「0」に近い数値)になるものと推察される。図7B(b)に示すデータでは、第2、第3、第4の分析方法により算出した出現確率は「1」に近い数値を示しており、第1の分析方法により算出した出現確率は上記3つの分析方法から乖離した数値「0.05」となっている。この場合には、第2、第3、第4の分析方法を採用した分析値は、不適切であり、第1の分析方法を採用した分析値は適切であると想定される。 On the other hand, in the graph shown in FIG. 7B(a), the two time-series data q13 and q14 are similar. Therefore, it is inferred that the probability of appearance will be a small numerical value (a numerical value close to "0"). In the data shown in FIG. 7B (b), the appearance probabilities calculated by the second, third, and fourth analysis methods show a value close to "1", and the appearance probability calculated by the first analysis method is the above It is a numerical value "0.05" that deviates from the three analysis methods. In this case, it is assumed that the analytical values obtained by using the second, third, and fourth analytical methods are inappropriate, and the analytical values obtained by using the first analytical method are appropriate.
 更に、可視化部15は、記録部18に記録されている各時系列データの特性データを読み取り、表示部に表示する。例えば、分析対象となるデータ対に「野菜・海藻」の時系列データが含まれている場合には、「季節の移り変わりにより影響される」という特性データを表示部に表示する。また、分析対象となるデータ対に「自動車免許手数料」の時系列データが含まれている場合には、「階段状に金額が変化する」という特性データを表示部に表示する。ユーザは、この特性データを視認することにより、分析結果の判定の参考にすることができる。 Furthermore, the visualization unit 15 reads the characteristic data of each time-series data recorded in the recording unit 18 and displays it on the display unit. For example, if the data pair to be analyzed includes time-series data of "vegetables/seaweed", the characteristic data "affected by seasonal change" is displayed on the display unit. If the data pair to be analyzed includes the time-series data of "driver's license fee", the characteristic data "the amount changes stepwise" is displayed on the display unit. By visually recognizing this characteristic data, the user can refer to the determination of the analysis result.
 ステップS17において、入力部16は、ユーザによる類似、非類似の判定入力を受け付ける。ユーザは、可視化された情報を参照して各分析方法による分析値が適切であるか否かの判定結果を入力する。例えば、前述した図7Aに示した例では、第3の分析方法による分析値は不適切であり、第1、第2、第4の分析方法による分析値は適切である旨の判定結果を入力部16にて入力する。前述した図7Bに示した例では、第2、第3、第4の分析方法による分析値は不適切であり、第1の分析方法による分析値は適切である旨の判定結果を入力部16にて入力する。 In step S17, the input unit 16 receives similarity/dissimilarity determination input from the user. The user refers to the visualized information and inputs the determination result as to whether or not the analysis values obtained by each analysis method are appropriate. For example, in the example shown in FIG. 7A described above, input the determination result that the analysis value by the third analysis method is inappropriate and the analysis value by the first, second, and fourth analysis methods are appropriate. Input in part 16 . In the example shown in FIG. 7B described above, the analysis values obtained by the second, third, and fourth analysis methods are inappropriate, and the analysis values obtained by the first analysis method are appropriate. to enter.
 即ち、一の分析方法を用いて時系列データを分析して算出される出現確率と、他の分析方法を用いて時系列データを分析して算出される出現確率との間の乖離度が高いということは、この時系列データの分析に用いる分析方法として、一の分析方法または他の分析方法が不適切である可能性が高い。ユーザによる判定入力を取得することにより、各分析方法の特徴(例えば、時系列データa1の分析には、第1の分析方法は適していないなど)を高精度に認識することが可能になる。 That is, the degree of divergence between the occurrence probability calculated by analyzing time-series data using one analysis method and the occurrence probability calculated by analyzing time-series data using another analysis method is high. This means that one analysis method or another analysis method is highly likely to be inappropriate as the analysis method used to analyze this time-series data. By acquiring the judgment input by the user, it becomes possible to recognize with high accuracy the characteristics of each analysis method (for example, the first analysis method is not suitable for the analysis of the time-series data a1).
 ステップS18において、特徴抽出部17は、入力部16にて入力された判定入力に基づいて、適切、不適切の判定結果に応じたスコアを計算する。具体的には、適切であると判定した分析方法に対してスコアを「+1」とし、適切でないと判定した分析方法に対してスコアを「-1」とする。図7Aに示した例では、第3の分析方法のスコアを「-1」とし、第1、第2、第4の分析方法のスコアを「+1」とする。図7Bに示した例では、第2、第3、第4の分析方法のスコアを「-1」とし、第1の分析方法のスコアを「+1」とする。特徴抽出部17は、第1~第4の分析方法ごとにスコアを積算する。なお、スコアの数値は「+1」、「-1」に限定されるものではなく、「適切」、「不適切」の度合いに応じて「+2」、「+1」、「-1」、「-2」などの数値としてもよい。 In step S18, the feature extraction unit 17 calculates a score according to the appropriateness/inappropriate determination result based on the determination input input by the input unit 16. Specifically, a score of "+1" is assigned to an analysis method determined to be appropriate, and a score of "-1" is assigned to an analysis method determined to be inappropriate. In the example shown in FIG. 7A, the score for the third analysis method is "-1", and the scores for the first, second, and fourth analysis methods are "+1". In the example shown in FIG. 7B, the scores for the second, third and fourth analysis methods are "-1", and the score for the first analysis method is "+1". The feature extraction unit 17 integrates scores for each of the first to fourth analysis methods. The score values are not limited to "+1" and "-1", but "+2", "+1", "-1", and "-" according to the degree of "appropriate" and "inappropriate". It may be a numerical value such as 2”.
 特徴抽出部17は、上記したスコアの積算値に基づいて、各分析方法の特徴を抽出する。例えば、4つの分析方法のうち上述したスコアが最も高い分析方法が、対象となる時系列データの分析に適している、などの特徴を抽出する。特徴抽出部17は、抽出した特徴を記憶装置(図示省略)に記録する。或いは、抽出した特徴に基づいて、既に記憶装置に記録されている特徴を修正する。 The feature extraction unit 17 extracts features of each analysis method based on the above-described integrated score value. For example, the feature is extracted that the analysis method with the highest score among the four analysis methods is suitable for the analysis of target time-series data. The feature extraction unit 17 records the extracted features in a storage device (not shown). Alternatively, the features already recorded in storage are modified based on the extracted features.
 ステップS19において、データ分析部12は、第1~第4の分析方法に修正が必要であるか否かを判定する。例えば、図7Aに示したように、データ対a11の分析には第3の分析方法は適していないと判定されており、この場合に第3の分析方法に修正が必要であるか否かを判定する。修正が必要であると判定された場合には(S19;YES)、ステップS20に処理を進め、そうでなければ(S19;NO)、本処理を終了する。 In step S19, the data analysis unit 12 determines whether or not the first to fourth analysis methods require modification. For example, as shown in FIG. 7A, it is determined that the third analysis method is not suitable for the analysis of data pair a11. judge. If it is determined that correction is necessary (S19; YES), the process proceeds to step S20; otherwise (S19; NO), this process ends.
 ステップS20において、データ分析部12は、対象となる分析方法を修正、或いは不適とする。その後、本処理を終了する。こうして、時系列データの類似度を分析する分析方法の特徴を抽出することができるのである。 In step S20, the data analysis unit 12 corrects or makes the target analysis method inappropriate. After that, this process is terminated. In this way, it is possible to extract the characteristics of the analysis method for analyzing the similarity of time-series data.
 このように、第1実施形態に係る特徴抽出装置1は、2つの時系列データを組み合わせたデータ対を複数生成する組み合わせ部11と、複数の分析方法を用いて、各データ対に含まれる2つの時系列データの類似度を分析する分析部(データ分析部12)と、分析部による分析結果に基づいて、各データ対の類似度の出現確率を分析方法ごとに算出する出現確率算出部13と、各データ対について、分析方法ごとに算出される出現確率の乖離度を算出する乖離度算出部14と、データ対に含まれる各時系列データ、及び乖離度を可視化してユーザに提示する可視化部15と、ユーザによる、類似、非類似の判定入力を受け付ける入力部16と、判定入力及び乖離度に基づいて、分析方法の特徴を抽出する特徴抽出部17と、を有して構成されている。 As described above, the feature extraction device 1 according to the first embodiment uses the combination unit 11 that generates a plurality of data pairs by combining two pieces of time-series data, and uses a plurality of analysis methods to extract two data pairs included in each data pair. An analysis unit (data analysis unit 12) that analyzes the similarity of two pieces of time-series data, and an occurrence probability calculation unit 13 that calculates the occurrence probability of the similarity of each data pair for each analysis method based on the analysis result of the analysis unit. , a deviation calculation unit 14 that calculates the deviation of the appearance probability calculated for each analysis method for each data pair, and each time series data included in the data pair and the deviation are visualized and presented to the user. It comprises a visualization unit 15, an input unit 16 that receives input for determining similarity or dissimilarity from the user, and a feature extraction unit 17 that extracts features of the analysis method based on the determination input and the degree of divergence. ing.
 上記のように構成された特徴抽出装置1では、時系列データを分析する分析方法がどのタイプの時系列データに適しているか、或いは適していないかを示す特徴を抽出することが可能となる。従って、データサイエンティストなどのユーザが、データ分析装置を用いて時系列データを分析する際に、ユーザがストックしている複数の分析方法から、適切な分析方法を選択できるように支援することが可能となる。 With the feature extraction device 1 configured as described above, it is possible to extract features that indicate to which type of time series data an analysis method for analyzing time series data is suitable or not. Therefore, when a user such as a data scientist analyzes time-series data using a data analysis device, it is possible to support the user in selecting an appropriate analysis method from among the multiple analysis methods that the user has in stock. becomes.
 また、可視化部15は、乖離度算出部14で算出される乖離度が大きい所定数の分析結果のみを可視化する。例えば、乖離度が0.6以上の分析結果のみを可視化する。このため、乖離度が小さい分析結果についての可視化を省略することができる。即ち、4つの分析方法の全ての乖離度が小さいということは、4つの分析方法による分析値がほぼ同一の数値になっているということであり、ユーザが介入する必要性は低いものと考えられる。乖離度が大きい所定数の分析結果のみを可視化の対象とすることにより、ユーザによる労力を低減することができる。 In addition, the visualization unit 15 visualizes only a predetermined number of analysis results with a large degree of divergence calculated by the degree of divergence calculation unit 14 . For example, only analysis results with a degree of divergence of 0.6 or more are visualized. Therefore, it is possible to omit the visualization of analysis results with a small degree of divergence. That is, the fact that the degree of divergence for all the four analysis methods is small means that the analysis values obtained by the four analysis methods are almost the same numerical value, and it is considered that the need for user intervention is low. . By visualizing only a predetermined number of analysis results with a large divergence, it is possible to reduce the user's effort.
 また、予め認識されている各時系列データの特徴テータが記録部18に記録されており、この特徴データを可視化部15の表示部に表示することにより、ユーザは各分析方法の適正を判断するときの参考とすることができる。 Further, feature data of each time-series data recognized in advance is recorded in the recording unit 18, and by displaying this feature data on the display unit of the visualization unit 15, the user can judge the appropriateness of each analysis method. It can be used as a reference at times.
 即ち、図9に示すように、「野菜・海藻」の時系列データを含むデータ対a1について、記録部18に「季節変動に影響される」という特徴データが記録されている。また、「自動車免許手数料」の時系列データを含むデータ対a10について、記録部18に「階段状に物価が変化する」という特徴データが記録されている。ユーザはデータ対a1、a10の分析を行うときに、これらの特徴データを参照して各分析方法の特徴を判定することが可能となる。 That is, as shown in FIG. 9, the feature data "affected by seasonal variation" is recorded in the recording unit 18 for the data pair a1 including the time-series data of "vegetables/seaweed". In addition, for the data pair a10 containing the time-series data of the "driver's license fee", the recording unit 18 records characteristic data that "commodity prices change stepwise". When the user analyzes the data pair a1 and a10, it becomes possible to refer to these feature data and determine the feature of each analysis method.
 [第2実施形態]
 次に、第2実施形態について説明する。図10は、第2実施形態に係る特徴抽出装置1a、及びその周辺機器の構成を示すブロック図である。第2実施形態は、前述した第1実施形態と対比して、選択部19が設けられている点で相違する。従って、選択部19以外の構成要素については、同一符号を付して構成説明を省略する。
[Second embodiment]
Next, a second embodiment will be described. FIG. 10 is a block diagram showing the configuration of the feature extraction device 1a and its peripherals according to the second embodiment. The second embodiment differs from the above-described first embodiment in that a selector 19 is provided. Therefore, the components other than the selection unit 19 are denoted by the same reference numerals, and description of the configuration is omitted.
 選択部19は、複数のデータ対のうち、一のデータ対の類似度の出現確率と、他のデータ対の類似度の出現確率が近似しており、一のデータ対に含まれる時系列データと、他のデータ対に含まれる時系列データが同一、または類似している場合に、他のデータ対を選択する。 The selection unit 19 selects time-series data included in one data pair in which the appearance probability of the similarity of one data pair is close to the appearance probability of the similarity of the other data pair among the plurality of data pairs. and when the time-series data included in the other data pair are the same or similar, another data pair is selected.
 即ち、選択部19は、組み合わせ部11で生成されるデータ対のうち、時系列データが類似しているデータ対を選択する。可視化部15は、選択部19にて選択されたデータ対の出現確率を除外して可視化する。 That is, the selection unit 19 selects data pairs having similar time-series data from among the data pairs generated by the combination unit 11 . The visualization unit 15 excludes the appearance probabilities of the data pairs selected by the selection unit 19 and visualizes them.
 図11Aは、第1の分析方法で複数のデータ対を分析して得られた分析結果の正規化分布曲線を示す図である。図11Bは、データ対を構成する2つの時系列データ、及び乖離度d1-jを示す図である。 FIG. 11A is a diagram showing a normalized distribution curve of analysis results obtained by analyzing a plurality of data pairs with the first analysis method. FIG. 11B is a diagram showing two pieces of time-series data forming a data pair and the degree of divergence d1-j.
 図11Bに示されているデータ対x1、x2、x3には、全て「大学授業料」の時系列データが含まれている。また、図11Aにおいて、データ対x1、x2、x3がプロットされている位置は近似している。従って、これら3つのデータ対x1、x2、x3のうちの2つのデータ対は冗長であり、不要であると考えられる。選択部19は、データ対x2、x3を分析対象から除外する。 The data pairs x1, x2, and x3 shown in FIG. 11B all include time-series data of "university tuition". Also, in FIG. 11A, the locations where the data pairs x1, x2, x3 are plotted are approximate. Therefore, two of these three data pairs x1, x2, x3 are considered redundant and unnecessary. The selection unit 19 excludes the data pair x2 and x3 from the analysis target.
 図11Bに示されているデータ対x4には「中華そば」の時系列データが含まれ、データ対x5には「そば」の時系列データが含まれている。また、図11Aにおいて、データ対x4、x5プロットされている位置は近似している。従って、これら2つのデータ対x4、x5のうちの一方のデータ対は冗長であり、不要であると考えられる。選択部19は、データ対x5を分析対象から除外する。 Data pair x4 shown in FIG. 11B includes time-series data for "Chinese noodles", and data pair x5 includes time-series data for "soba". Also, in FIG. 11A, the locations where data pairs x4, x5 are plotted are approximate. Therefore, one of these two data pairs x4, x5 is considered redundant and unnecessary. The selection unit 19 excludes the data pair x5 from the analysis target.
 このように、第2実施形態に係る特徴抽出装置1aでは、複数のデータ対から、一のデータ対に対して類似する他のデータ対を除外してデータ分析を行うので、データ対の分析処理に要する負荷を軽減することができる。 As described above, the feature extraction device 1a according to the second embodiment analyzes data by excluding other data pairs similar to one data pair from a plurality of data pairs. can reduce the load required for
 即ち、選択部19は、複数のデータ対のうち、一のデータ対の類似度の出現確率と、他のデータ対の類似度の出現確率が近似しており、且つ、一のデータ対を構成する時系列データと、他のデータ対を構成する時系列データが同一、または類似している場合に、他のデータ対を選択する。そして、可視化部15は、選択部19にて選択されたデータ対の出現確率を除外して表示部に表示する。このため、不要なデータの表示を回避することができ、演算負荷を軽減することができる。 That is, the selection unit 19 determines that the appearance probability of the similarity of one data pair and the appearance probability of the similarity of the other data pair are close to each other among the plurality of data pairs, and form one data pair. If the time-series data that constitutes another data pair is the same or similar to the time-series data that constitutes another data pair, another data pair is selected. Then, the visualization unit 15 excludes the appearance probabilities of the data pairs selected by the selection unit 19 and displays them on the display unit. Therefore, display of unnecessary data can be avoided, and the computational load can be reduced.
 上記説明した本実施形態の特徴抽出装置1には、図12に示すように例えば、CPU(Central Processing Unit、プロセッサ)901と、メモリ902と、ストレージ903(HDD:HardDisk Drive、SSD:SolidState Drive)と、通信装置904と、入力装置905と、出力装置906とを備える汎用的なコンピュータシステムを用いることができる。メモリ902およびストレージ903は、記憶装置である。このコンピュータシステムにおいて、CPU901がメモリ902上にロードされた所定のプログラムを実行することにより、特徴抽出装置1の各機能が実現される。 As shown in FIG. 12, the feature extraction device 1 of the present embodiment described above includes, for example, a CPU (Central Processing Unit, processor) 901, a memory 902, and a storage 903 (HDD: HardDisk Drive, SSD: Solid State Drive). , a communication device 904, an input device 905, and an output device 906. A general-purpose computer system can be used. Memory 902 and storage 903 are storage devices. In this computer system, each function of the feature extraction device 1 is realized by the CPU 901 executing a predetermined program loaded on the memory 902 .
 なお、特徴抽出装置1は、1つのコンピュータで実装されてもよく、あるいは複数のコンピュータで実装されても良い。また、特徴抽出装置1は、コンピュータに実装される仮想マシンであっても良い。 Note that the feature extraction device 1 may be implemented by one computer, or may be implemented by a plurality of computers. Also, the feature extraction device 1 may be a virtual machine implemented on a computer.
 なお、特徴抽出装置1用のプログラムは、HDD、SSD、USB(Universal Serial Bus)メモリ、CD (Compact Disc)、DVD (Digital Versatile Disc)などのコンピュータ読取り可能な記録媒体に記憶することも、ネットワークを介して配信することもできる。 The program for the feature extraction device 1 can be stored in computer-readable recording media such as HDD, SSD, USB (Universal Serial Bus) memory, CD (Compact Disc), DVD (Digital Versatile Disc), etc. It can also be delivered via
 なお、本発明は上記実施形態に限定されるものではなく、その要旨の範囲内で数々の変形が可能である。  It should be noted that the present invention is not limited to the above embodiments, and many modifications are possible within the scope of the gist. 
 1、1a 特徴抽出装置
 2 データベース
 11 組み合わせ部
 12 データ分析部(分析部)
 13 出現確率算出部
 14 乖離度算出部
 15 可視化部
 16 入力部
 17 特徴抽出部
 18 記録部
 19 選択部
 21 第1分析部
 22 第2分析部
 23 第3分析部
 24 第4分析部
Reference Signs List 1, 1a feature extraction device 2 database 11 combination unit 12 data analysis unit (analysis unit)
13 Appearance probability calculation unit 14 Deviation degree calculation unit 15 Visualization unit 16 Input unit 17 Feature extraction unit 18 Recording unit 19 Selection unit 21 First analysis unit 22 Second analysis unit 23 Third analysis unit 24 Fourth analysis unit

Claims (8)

  1.  2つの時系列データを組み合わせたデータ対を複数生成する組み合わせ部と、
     複数の分析方法を用いて、各データ対に含まれる2つの時系列データの類似度を分析する分析部と、
     前記分析部による分析結果に基づいて、各データ対の類似度の出現確率を分析方法ごとに算出する出現確率算出部と、
     各データ対について、分析方法ごとに算出される前記出現確率の乖離度を算出する乖離度算出部と、
     データ対に含まれる各時系列データ、及び前記乖離度を可視化してユーザに提示する可視化部と、
     前記ユーザによる、類似、非類似の判定入力を受け付ける入力部と、
     前記判定入力及び前記乖離度に基づいて、前記分析方法の特徴を抽出する特徴抽出部と、
     を備えた特徴抽出装置。
    a combination unit that generates a plurality of data pairs that combine two pieces of time-series data;
    an analysis unit that analyzes the degree of similarity between two pieces of time-series data included in each data pair using a plurality of analysis methods;
    an appearance probability calculation unit that calculates the appearance probability of the similarity of each data pair for each analysis method based on the analysis result by the analysis unit;
    a divergence calculation unit that calculates the divergence of the occurrence probability calculated for each analysis method for each data pair;
    a visualization unit that visualizes each time-series data included in the data pair and the degree of divergence and presents it to the user;
    an input unit that receives similarity/dissimilarity judgment input by the user;
    a feature extraction unit that extracts features of the analysis method based on the determination input and the degree of divergence;
    A feature extractor with
  2.  前記出現確率算出部は、前記分析部の分析結果を正規化して前記出現確率を算出する請求項1に記載の特徴抽出装置。 The feature extraction device according to claim 1, wherein the appearance probability calculation unit calculates the appearance probability by normalizing the analysis result of the analysis unit.
  3.  前記可視化部は、前記乖離度算出部で算出される乖離度が大きい所定数の分析結果のみを可視化する
     請求項1または2に記載の特徴抽出装置。
    The feature extraction device according to claim 1 or 2, wherein the visualization unit visualizes only a predetermined number of analysis results with a large degree of divergence calculated by the degree of divergence calculation unit.
  4.  前記複数のデータ対のうち、一のデータ対の類似度の出現確率と、他のデータ対の類似度の出現確率が近似しており、前記一のデータ対に含まれる時系列データと、前記他のデータ対に含まれる時系列データが同一、または類似している場合に、前記他のデータ対を選択する選択部、を更に備え、
     前記可視化部は、前記選択部にて選択された他のデータ対の前記出現確率を除外して可視化する請求項1~3のいずれか1項に記載の特徴抽出装置。
    Time-series data included in the one data pair, wherein the occurrence probability of the similarity of one data pair and the occurrence probability of the similarity of the other data pair are close to each other among the plurality of data pairs; A selection unit that selects the other data pair when the time-series data included in the other data pair is the same or similar,
    The feature extraction device according to any one of claims 1 to 3, wherein the visualization unit visualizes data pairs excluding the appearance probabilities of other data pairs selected by the selection unit.
  5.  前記時系列データの特性データを記録する記録部、を更に備え、
     前記可視化部は、データ対に含まれる各時系列データ及び前記乖離度に加えて、前記時系列データの特性データを可視化する
     請求項1~4のいずれか1項に記載の特徴抽出装置。
    A recording unit that records characteristic data of the time-series data,
    The feature extraction device according to any one of claims 1 to 4, wherein the visualization unit visualizes characteristic data of the time-series data in addition to each time-series data and the divergence degree included in the data pair.
  6.  前記特徴抽出部は、前記分析方法の特徴として、当該分析方法による分析に適さない時系列データを抽出する請求項1~5のいずれか1項に記載の特徴抽出装置。 The feature extraction device according to any one of claims 1 to 5, wherein the feature extraction unit extracts time-series data that is not suitable for analysis by the analysis method as the feature of the analysis method.
  7.  2つの時系列データを組み合わせたデータ対を複数生成するステップと、
     複数の分析方法を用いて、各データ対に含まれる2つの時系列データの類似度を分析するステップと、
     分析された類似度に基づいて、各データ対の類似度の出現確率を分析方法ごとに算出するステップと、
     各データ対について、分析方法ごとに算出される前記出現確率の乖離度を算出するステップと、
     データ対に含まれる各時系列データ、及び前記乖離度を可視化してユーザに提示するステップと、
     前記ユーザによる、類似、非類似の判定入力を受け付けるステップと、
     前記判定入力及び前記乖離度に基づいて、前記分析方法の特徴を抽出するステップと、
     を備えた特徴抽出方法。
    generating a plurality of data pairs combining two time-series data;
    analyzing the similarity of the two time series data included in each data pair using a plurality of analysis methods;
    calculating the occurrence probability of the similarity of each data pair for each analysis method based on the analyzed similarity;
    For each data pair, calculating the degree of divergence of the occurrence probability calculated for each analysis method;
    a step of visualizing each time-series data included in the data pair and the degree of divergence and presenting it to the user;
    a step of receiving a determination input of similarity or dissimilarity from the user;
    a step of extracting features of the analysis method based on the judgment input and the degree of divergence;
    A feature extraction method with
  8.  請求項1~6のいずれか1項に記載の特徴抽出装置としてコンピュータを機能させる特徴抽出プログラム。 A feature extraction program that causes a computer to function as the feature extraction device according to any one of claims 1 to 6.
PCT/JP2021/028957 2021-08-04 2021-08-04 Feature extraction device, feature extraction method, and feature extraction program WO2023012933A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/028957 WO2023012933A1 (en) 2021-08-04 2021-08-04 Feature extraction device, feature extraction method, and feature extraction program
JP2023539450A JPWO2023012933A1 (en) 2021-08-04 2021-08-04

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/028957 WO2023012933A1 (en) 2021-08-04 2021-08-04 Feature extraction device, feature extraction method, and feature extraction program

Publications (1)

Publication Number Publication Date
WO2023012933A1 true WO2023012933A1 (en) 2023-02-09

Family

ID=85155420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/028957 WO2023012933A1 (en) 2021-08-04 2021-08-04 Feature extraction device, feature extraction method, and feature extraction program

Country Status (2)

Country Link
JP (1) JPWO2023012933A1 (en)
WO (1) WO2023012933A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210067588A (en) * 2019-11-29 2021-06-08 숙명여자대학교산학협력단 Electronic device for determining similarity between sequences considering item classification scheme and control method thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210067588A (en) * 2019-11-29 2021-06-08 숙명여자대학교산학협력단 Electronic device for determining similarity between sequences considering item classification scheme and control method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MAKOTO NAGATOMO, KENTARO YUTA, NAONOBU OKAZAKI, MISO PARK: "Investigation of Secure Device Pairing Method using Marker by Camera and Accelerometer", IPSJ SIG TECHNICAL REPORT: INTERNET AND OPERATION TECHNOLOGY, INFORMATION PROCESSING SOCIETY OF JAPAN, JP, vol. 2019-IOT-045, no. 7, 16 May 2019 (2019-05-16), JP, pages 1 - 8, XP009543199 *
UENO, TOMOHIRO: "C1-2 About the acquisition of transmission light spectrum data of beverages and the method of judging drinks using the same", THE 9TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT - THE 15TH ANNUAL CONFERENCE OF THE DATABASE SOCIETY OF JAPAN (DEIM FORUM 2017), 27 February 2017 (2017-02-27), pages 1 - 7, XP009543394 *

Also Published As

Publication number Publication date
JPWO2023012933A1 (en) 2023-02-09

Similar Documents

Publication Publication Date Title
US9916541B2 (en) Predicting a consumer selection preference based on estimated preference and environmental dependence
US20200043022A1 (en) Artificial intelligence system and method for generating a hierarchical data structure
US20140095184A1 (en) Identifying group and individual-level risk factors via risk-driven patient stratification
US10885059B2 (en) Time series trends
JP4847916B2 (en) RECOMMENDATION DEVICE, RECOMMENDATION METHOD, RECOMMENDATION PROGRAM, AND RECORDING MEDIUM CONTAINING THE PROGRAM
EP3462386A2 (en) Learning data selection program, learning data selection method, and learning data selection device
US20170329880A1 (en) System & method for computationally efficient and statistically robust design of multi-arm multi-stage experiments
Kennedy Making useful conflict predictions: Methods for addressing skewed classes and implementing cost-sensitive learning in the study of state failure
WO2023012933A1 (en) Feature extraction device, feature extraction method, and feature extraction program
JP6676993B2 (en) Information providing apparatus, information providing method, and program
US10867249B1 (en) Method for deriving variable importance on case level for predictive modeling techniques
JP6682585B2 (en) Information processing apparatus and information processing method
US20200202368A1 (en) Product assortment optimization
JP2020154890A (en) Correlation extraction method and correlation extraction program
JP5424989B2 (en) POS data analysis apparatus, method and program
Siddiqui et al. Assessing market integration between MINT and developed economies: evidence from dynamic cointegration
Mondal et al. Assessing growth impact of public debt in Sri Lanka
JP2017207878A (en) Missing data estimation method, missing data estimation device, and missing data estimation program
JP2021179668A (en) Data analysis system, data analysis method, and data analysis program
JP6547436B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
JP6548284B2 (en) CASE SEARCH DEVICE, CASE SEARCH METHOD, AND PROGRAM
US11010039B2 (en) Display control apparatus and non-transitory computer readable medium
KR102430471B1 (en) Method for providing review information and apparatus for the same
US20180047035A1 (en) Analysis device, analysis method, and computer-readable recording medium
JP6621432B2 (en) Computer and analysis data classification method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952760

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023539450

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE