WO2025027711A1 - データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム - Google Patents

データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム Download PDF

Info

Publication number
WO2025027711A1
WO2025027711A1 PCT/JP2023/027849 JP2023027849W WO2025027711A1 WO 2025027711 A1 WO2025027711 A1 WO 2025027711A1 JP 2023027849 W JP2023027849 W JP 2023027849W WO 2025027711 A1 WO2025027711 A1 WO 2025027711A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processing method
attribute information
target data
affinity score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2023/027849
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
進 古跡
覚 田中
仁志 楓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to PCT/JP2023/027849 priority Critical patent/WO2025027711A1/ja
Priority to JP2024555971A priority patent/JP7657382B1/ja
Publication of WO2025027711A1 publication Critical patent/WO2025027711A1/ja
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This disclosure relates to technology for recommending data processing methods.
  • Patent Document 1 discloses a system for distributing processed data. This system independently manages source data and processing methods for processing the source data, and transmits a combination of processed data and processing methods in response to a request for data processing. Patent Document 1 describes a method of recommending processing methods by advertising samples of processed data, etc.
  • the system of Patent Document 1 aims to distribute a dedicated processing method for specific data. Therefore, the system of Patent Document 1 cannot recommend a processing method for processing unspecified data.
  • the purpose of this disclosure is to make it possible to recommend processing methods for processing unspecified data.
  • the data processing method distribution device of the present disclosure includes: an attribute information acquisition unit for acquiring attribute information of target data to be processed; an affinity score calculation unit that refers to processing method information including attribute information of each of a plurality of processing methods, and calculates an affinity score of the processing method for the target data based on the attribute information of the processing method and the attribute information of the target data for each processing method; a recommendation information output unit that selects a processing method to be a recommended method for the data processing of the target data from the plurality of processing methods based on the affinity score of each of the plurality of processing methods, and outputs recommendation information indicating the recommended method; Equipped with.
  • FIG. 13 is a flowchart of a data processing method distribution method in the third embodiment.
  • FIG. 13 is a diagram showing an example of a flowchart of step S310 in the third embodiment.
  • FIG. 13 is a diagram showing an example of analysis of a data body in the third embodiment.
  • FIG. 2 is a hardware configuration diagram of a data processing method distribution device 100 according to an embodiment.
  • Embodiment 1 The data processing method distribution device 100 will be described with reference to FIGS.
  • the configuration of a data processing method distribution device 100 is a computer comprising hardware such as a processor 101, a memory 102, an auxiliary storage device 103, and an input/output interface 104. These pieces of hardware are connected to each other via signal lines.
  • the processor 101 is an IC that performs arithmetic processing and controls other hardware.
  • the processor 101 is a CPU.
  • IC is an abbreviation for Integrated Circuit.
  • CPU is an abbreviation for Central Processing Unit.
  • the memory 102 is a volatile or non-volatile storage device.
  • the memory 102 is also called a primary storage device or a main memory.
  • the memory 102 is a RAM.
  • Data stored in the memory 102 is saved in the secondary storage device 103 as necessary.
  • RAM is an abbreviation for Random Access Memory.
  • the auxiliary storage device 103 is a non-volatile storage device.
  • the auxiliary storage device 103 is a ROM, a HDD, a flash memory, or a combination of these. Data stored in the auxiliary storage device 103 is loaded into the memory 102 as needed.
  • ROM is an abbreviation for Read Only Memory.
  • HDD is an abbreviation for Hard Disk Drive.
  • the auxiliary storage device 103 stores a data processing method distribution program for causing a computer to function as an attribute information acquisition unit 110, an affinity score calculation unit 120, and a recommendation information output unit 130.
  • the data processing method distribution program is loaded into the memory 102 and executed by the processor 101.
  • the auxiliary storage device 103 further stores an OS. At least a part of the OS is loaded into the memory 102 and executed by the processor 101.
  • the processor 101 executes a data processing method distribution program while executing the OS.
  • OS is an abbreviation for Operating System.
  • Input and output data of the data processing method distribution program is stored in the storage unit 190.
  • the auxiliary storage device 103 functions as the storage unit 190.
  • a storage device such as the memory 102, a register in the processor 101, or a cache memory in the processor 101 may function as the storage unit 190 instead of the auxiliary storage device 103 or together with the auxiliary storage device 103.
  • the data processing and distribution method will be described with reference to FIG.
  • the data processing method distribution method is a method for recommending to a user a processing method suitable for processing target data.
  • the user inputs attribute information 199 of the target data into the data processing method distribution device 100. Then, the data information receiving unit 111 receives the input attribute information 199.
  • the target data is data that is the target of data processing.
  • the target data is the data that the user wants to process.
  • the attribute information 199 of the target data indicates various attributes of the target data.
  • FIG. 4 shows an example of attribute information 199 of the target data.
  • An example of the target data is time-series data of power measured by a terminal device.
  • the attribute information 199 indicates the data name, data content, data format, and data structure of the target data.
  • the data name column indicates the name of the target data.
  • the data content column indicates the content of the target data.
  • the data format column indicates the data format of the target data.
  • the data format of the target data indicates the major classification and minor classification of the target data. The major classification and minor classification will be described later.
  • the data structure column shows the data structure of the target data. Specifically, it shows the multiple data elements that make up the target data. If the data format of the target data is a table, the number, column name, and data type of each column in the table are shown.
  • step S ⁇ b>120 the affinity score calculation unit 120 refers to the processing method information 191 .
  • the processing method information 191 includes attribute information 192 for each of a plurality of processing methods.
  • the affinity score calculation unit 120 reads attribute information 192 of each processing method from the processing method information 191 . All of the attribute information 192 may be read at once, or when there are a large number of processing methods, the attribute information 192 may be read in a small number of cases at a time.
  • the processing method is the method of processing data.
  • the processing method is a program or algorithm for processing data.
  • the program and algorithm can be used for general purposes without being limited to specific data. Examples of data processing are data analysis and data shaping.
  • the processing method attribute information 192 indicates various attributes of the processing method.
  • the attribute information 192 may be input by a user and registered in the processing method information 191, or may be generated by analyzing annotations in the program (processing method) and the program's setting files, etc., and registered in the processing method information 191.
  • FIG. 5 shows an example of the attribute information 192 of the processing method.
  • An example of a processing method is a program that detects state transitions in time-series data.
  • the attribute information 192 indicates, for the processing method, the processing name, the processing content, the input data format, the input data structure, the output data format, the output data structure, and the form of provision.
  • the processing name column indicates the name of the processing method.
  • the processing content column indicates the content of data processing according to the processing method.
  • the input data format column indicates the data format of the input data for the processing method.
  • the input data for the processing method is subjected to data processing according to the processing method.
  • the input data structure column shows the data structure of the input data for the processing method.
  • the output data format column indicates the data format of the output data of the processing method.
  • the output data of the processing method is obtained by applying data processing according to the processing method to the input data.
  • the output data structure column indicates the data structure of the output data of the processing method.
  • the column of "Form of provision" indicates the form in which the data processing is provided by the processing method. Specifically, the form of provision is a program or an algorithm.
  • the data formats of the input data and the output data are expressed, for example, by major and minor categories.
  • broad categories are tables, semi-structured, images, audio, video and text.
  • a minor category is expressed at a finer level of granularity than a major category.
  • a minor category corresponds to a file extension.
  • extensions representing the minor categories are CSV and TSV.
  • CSV is an abbreviation for comma-separated values.
  • TSV is an abbreviation for tab-separated values.
  • the data structure of each of the input and output data is described based on the data format.
  • the data structure is represented by the multiple data elements that make up the data. If the data format is a table, the data structure is represented by the number, column name, and data format for each column in the table.
  • the input data format field can indicate multiple data formats
  • the input data structure field can indicate multiple data structures
  • the data format column in the attribute information 199 of the target data indicates information similar to the information in the input data format column in the attribute information 192 of the processing method.
  • the data structure column in the attribute information 199 of the target data indicates information similar to the information in the input data structure column in the attribute information 192 of the processing method.
  • the affinity score calculation unit 120 calculates, for each processing method, an affinity score of the processing method for the target data based on attribute information 192 of the processing method and attribute information 199 of the target data.
  • the affinity score is a numerical representation of the probability that the processing method can be effectively applied to the target data.
  • the affinity score may also be a numerical representation of an index or degree similar to that probability.
  • the affinity score is calculated as follows based on the attribute information 192 of the processing method and the attribute information 199 of the target data.
  • the affinity score calculation unit 120 calculates an affinity score for each processing method based on the data format of the input data of the processing method and the data format of the target data. In particular, the affinity score calculation unit 120 calculates an affinity score for each processing method based on a comparison result between the major categories of the input data of the processing method and the target data, and a comparison result between the minor categories of the input data of the processing method and the target data.
  • the affinity score calculation unit 120 calculates an affinity score for each processing method based on a data structure of the input data of the processing method and a data structure of the target data. In particular, the affinity score calculation unit 120 calculates the affinity score for each processing method based on multiple data elements of the input data of the processing method and multiple data elements of the target data.
  • the affinity score calculation unit 120 may calculate the affinity score based on both the data format and the data structure, or may calculate the affinity score based on either the data format or the data structure.
  • the affinity score calculation unit 120 may calculate the affinity score by taking into account other attribute elements of the attribute information.
  • the calculated affinity score may be stored as historical data.
  • the affinity score is associated with the target data and the processing method.
  • the affinity score calculation unit 120 may obtain the affinity score from the historical data instead of calculating the affinity score.
  • Step S120 will be described in detail with reference to FIG. Steps S121 to S125 are executed for each processing method.
  • step S121 the affinity score calculation unit 120 calculates an affinity score for the data format based on the data format of the input data for the processing method and the data format of the target data.
  • the affinity score for a data type is calculated as follows:
  • the affinity score calculation unit 120 compares the data format of the input data of the processing method with the data format of the target data, and calculates an affinity score for the data format based on the degree of agreement between the data format of the input data of the processing method and the data format of the target data.
  • the data format classification table 193 is used.
  • the data type classification table 193 is a table showing the relationship between major classifications and minor classifications, and is stored in advance in the storage unit 190 .
  • FIG. 7 shows an example of the data type classification table 193.
  • Data formats are classified into major and minor categories.
  • the major categories are based on the nature of the data.
  • the major category "table data” may be read as "structured data.”
  • the minor categories are classified based on the file extension.
  • the extensions shown in the minor category column are examples. Extensions not shown here may be added to the minor categories. Also, if there is information other than the extension that can specify the data format, that information may be used instead of or together with the extension.
  • step S1211 the affinity score calculation unit 120 compares the major classification of the input data of the processing method with the major classification of the target data.
  • step S1212 the affinity score calculation unit 120 determines whether the major classification of the input data of the processing method matches the major classification of the target data, based on the comparison result of step S1211. If the major classification of the input data of the processing method matches the major classification of the target data, the process proceeds to step S1213. If the major classification of the input data of the processing method does not match the major classification of the target data, the process proceeds to step S1217.
  • step S1213 the affinity score calculation unit 120 compares the subcategory of the input data for the processing method with the subcategory of the target data.
  • step S1214 the affinity score calculation unit 120 determines whether the subcategory of the input data for the processing method includes the subcategory of the target data based on the comparison result of step S1213.
  • the data formats of the input data of the processing method may include multiple sub-categories. Therefore, subcategory inclusion is determined rather than subcategory match.
  • step S1215 If the subcategory of the input data of the processing method includes the subcategory of the target data, the process proceeds to step S1215. If the subcategory of the input data of the processing method does not include the subcategory of the target data, the process proceeds to step S1216.
  • the affinity score calculation unit 120 determines the affinity score for the data format as a high score.
  • a high score is a predetermined score that is higher than a low score. For example, a high score is 10 points.
  • a high score corresponds to the highest score out of the three levels of scores.
  • the affinity score calculation unit 120 sets the affinity score for the data format to a low score.
  • the low score is a predetermined score that is higher than no score and lower than a high score.
  • the low score is 5 points.
  • the high score corresponds to the middle score of the three-level score system. In this case, it may be possible to apply the processing method by converting the data format of the target data. Therefore, the affinity score for the data format is set to a low score rather than zero.
  • the affinity score calculation unit 120 sets the affinity score for the data format to zero.
  • a zero score is 0 points.
  • a zero score corresponds to the lowest score out of the three levels of scores.
  • the affinity score calculation unit 120 may set the affinity score for the data format to a score according to the similarity of the major classification, rather than to no score.
  • the similarity relationship of the major classifications is determined in advance and managed, for example, in a table format. For example, a data format whose major classification is a table is partially similar to a data format whose major classification is semi-structured.
  • FIG. 9 shows an example of affinity scores for data formats.
  • the major classification of the input data of the processing method matches the major classification of the target data
  • the minor classification of the input data of the processing method encompasses the minor classification of the target data, so the affinity score for the data format is high.
  • the major classification of the input data of the processing method matches the major classification of the target data
  • the minor classification of the input data of the processing method does not include the minor classification of the target data. Therefore, the affinity score for the data format is low.
  • the major classification of the input data of the processing method does not match the major classification of the target data, so the affinity score for the data format is zero.
  • step S122 the affinity score calculation unit 120 determines whether the affinity score for the data format is equal to or greater than a threshold value.
  • This determination is made to determine whether it is possible to compare the data structure of the processing method's input data with the data structure of the target data.
  • the threshold is a low score.
  • the affinity score for the data format is a high score or a low score, the affinity score for the data format is greater than or equal to the threshold.
  • step S123 If the affinity score for the data format is greater than or equal to the threshold, the process proceeds to step S123. If the affinity score for the data type is less than the threshold, processing proceeds to step S124.
  • step S123 the affinity score calculation unit 120 calculates an affinity score for the data structure based on the data structure of the input data for the processing method and the data structure of the target data.
  • the affinity score for a data structure is calculated as follows:
  • the affinity score calculation unit 120 compares the data structure of the input data of the processing method with the data structure of the target data, and calculates an affinity score for the data structure based on the degree of agreement between the data structure of the input data of the processing method and the data structure of the target data.
  • the data structure differs for each data format, so the affinity score for the data structure is calculated according to a procedure defined for each data format.
  • a procedure for calculating an affinity score for a data structure is prepared, for example, for each major classification of data formats. If the data format is tabular data, the column name and data type of each column contributes to the affinity score for the data structure. If there are other columns, the information in the other columns also contributes to the affinity score for the data structure.
  • the data format is semi-structured data, the name (key name) of each element separated by a tag or symbol, and the data type of the value (value) contribute to the affinity score for the data structure. By utilizing these relationships, semi-structured data can be treated like table data. If the data format is unstructured data, there is no element that contributes to the affinity score regarding the data structure, so step S123 may be skipped. Examples of unstructured data are text data, image data, audio data, and video data.
  • step S123 the details of step S123 will be explained when the data format of the input data for the processing method and the data format of the target data are table data.
  • Steps S1231 to S1233 are executed for each column of input data for the processing method.
  • step S1231 the affinity score calculation unit 120 compares the input data columns of the processing method with each column of the target data.
  • the affinity score calculation unit 120 compares the columns of the input data of the processing method with each column of the target data for each column name and data type.
  • a thesaurus mechanism can be used to compare column names.
  • the column name thesaurus shows, for each word used in a column name, words that are similar to the word and words that have an inclusive relationship with the word.
  • the column name thesaurus is stored in advance in the storage unit 190.
  • the affinity score calculation unit 120 extracts a group of column names for each of the columns of the input data of the processing method and the columns of the target data from the column name thesaurus.
  • the group of column names consists of the target column name, column names similar to the target column name (similar column names), and column names in an inclusion relationship with the target column name (inclusion-related column names). Then, the affinity score calculation unit 120 compares the column names of the input data of the processing method with the column names of the target data.
  • a data type relationship table can be used to compare data types.
  • the data type relationship table shows data types that are in an inclusive relationship for each data type. For example, the data type "numeric" includes data types such as “integer” and "real number”. However, the data type "integer” and the data type "real number” do not include the data type "numeric".
  • the data type relationship table is stored in advance in the storage unit 190.
  • step S1232 the affinity score calculation unit 120 calculates the affinity score of the input data column of the processing method for the target data column for each target data column based on the comparison result of step S1231.
  • the affinity score calculation unit 120 calculates the affinity score of the column of the input data of the processing method for each column of the target data based on the comparison result of the column name and the comparison result of the data type. For example, the affinity score calculation unit 120 calculates an affinity score based on the comparison result of column names and an affinity score based on the comparison result of data types. Then, the affinity score calculation unit 120 performs an operation such as sum or product on the affinity score based on the comparison result of column names and the affinity score based on the comparison result of data types to calculate the affinity score of the input data column of the processing method for the target data column.
  • the affinity score based on the comparison of column names is explained.
  • the affinity score calculation unit 120 calculates the similarity and inclusion of the column name of the processing method to the column name of the target data for each column of the target data, and calculates the affinity score based on the similarity and inclusion. The higher the similarity, the higher the affinity score. Also, the higher the inclusion, the higher the affinity score.
  • An example of similarity will be described. Assume that the column name of the input data for the processing method is "power value.” In this case, if the column name of the target data is "current value,” the similarity is high. On the other hand, if the column name of the target data is "age,” the similarity is low. An example of inclusion is explained. Assume that the column name of the input data of the processing method is "measurement value”. "Measurement value” includes “current value” in terms of meaning. In this case, if the column name of the target data is "current value", the inclusion degree is high.
  • Affinity scores based on the results of data type comparisons are described below. If the data type of the input data of the processing method encompasses the data type of the target data, the affinity score is high.
  • step S1233 the affinity score calculation unit 120 selects the maximum affinity score from the affinity scores of the columns of the input data of the processing method for each column of the target data. This allows the column of the target data that corresponds to the highest affinity score to be combined with the column of the input data of the processing method.
  • the input data column of the processing method is best matched with the column of the target data that corresponds to the highest affinity score. In other words, the column of the target data that most closely matches the column of the input data for the processing method is determined.
  • the affinity score calculation unit 120 may prevent two or more strings of input data of the processing method from being combined with one string of target data.
  • the affinity score calculation unit 120 may be designed so that one string of the target data is combined with one string of the input data of the processing method.
  • the affinity score calculation unit 120 selects the maximum affinity score from the affinity scores of columns of the input data of the processing method for each column of the target data, excluding columns from the target data that are combined with any column of the input data of the processing method.
  • Steps S1231 to S1233 calculate an affinity score for each pair of input data columns for the processing method and target data columns.
  • FIG. 11 shows examples of combinations of input data columns and target data columns for processing methods.
  • column 1 of the input data for the processing method best matches column 2 of the target data, and an affinity score for the pair of column 1 of the input data for the processing method and column 2 of the target data is calculated.
  • column 2 of the input data for the processing method best matches column 1 of the target data, and an affinity score for the pair of column 2 of the input data for the processing method and column 1 of the target data is calculated.
  • column 3 of the input data for the processing method best matches column 3 of the target data, and an affinity score for the pair of column 3 of the input data for the processing method and column 3 of the target data is calculated.
  • step S1234 the affinity score calculation unit 120 calculates an affinity score for the data structure using the affinity scores of each pair of the input data string and the target data string of the processing method.
  • the affinity score is calculated by performing an operation such as sum or product on the affinity scores of each pair of columns of input data for the processing method and columns of target data.
  • step S124 the affinity score calculation unit 120 calculates affinity scores for other attribute elements based on the attribute information 192 about the input data of the processing method and the attribute information 199 of the target data.
  • the other attribute elements are the remaining attribute elements excluding the data format and the data structure, such as the data acquisition status, the data acquisition purpose, and the data quality.
  • the affinity score calculation unit 120 compares other attribute elements of the input data of the processing method with other attribute elements of the target data, and calculates an affinity score for the other attribute elements based on the comparison result.
  • the affinity score calculation section 120 may use characteristic attribute elements based on the data format among the other attribute elements to calculate an affinity score for the other attribute elements.
  • affinity scores for other attribute elements are calculated as follows: When the input data and target data for the processing method are text data, the affinity score calculation unit 120 calculates an affinity score for other attribute elements based on attribute elements such as language type and character code. When the input data and the target data of the processing method are image data, the affinity score calculation unit 120 calculates affinity scores for other attribute elements based on attribute elements such as the number of pixels, the type of data format, and the type of subject. Examples of the type of data format are illustrations and photographs, etc. Examples of the type of subject are humans and objects, etc. When the input data and the target data of the processing method are voice data, the affinity score calculation unit 120 calculates affinity scores for other attribute elements based on attribute elements such as sampling rate and voice type. Examples of voice types include human voice and music.
  • step S125 the affinity score calculation unit 120 calculates the final affinity score using the affinity score related to the data format, the affinity score related to the data structure, and the affinity score related to other attribute elements.
  • the affinity score calculation unit 120 performs a specific calculation on the affinity score related to the data format, the affinity score related to the data structure, and the affinity score related to other attribute elements to calculate one affinity score.
  • the calculated one affinity score is the final affinity score. Examples of specific operations are a sum, a product, an average, a weighted average, or combinations thereof.
  • step S120 an affinity score for the target data is calculated for each processing method.
  • the recommendation information output unit 130 selects a processing method to be a recommended method from the multiple processing methods based on the affinity scores of each of the multiple processing methods.
  • the recommendation method is a processing method recommended for data processing of the target data.
  • the recommendation method output unit 131 selects the processing method with the highest affinity score for the target data as the recommendation method.
  • the recommendation information output unit 130 outputs recommendation information indicating a recommendation method.
  • the recommendation method output unit 131 displays the recommendation information on a display.
  • the recommendation information output unit 130 may output a program for data processing according to a recommendation method as recommendation information.
  • the recommendation scheme output unit 131 stores the recommendation information in a storage medium. After step S130, the process ends.
  • the data processing method distribution device 100 compares the attribute information of the target data that the user wishes to process with each processing method distributed by the device, and calculates an affinity score.
  • the affinity score represents the probability or degree to which the processing method can be effectively applied to the target data.
  • the data processing method distribution device 100 recommends a processing method even if it has no track record of use for the target data, if the processing method has a high affinity score for the target data.
  • the first embodiment has an advantage not available in the prior art, that is, it is possible to recommend a processing method that has not been used in the past for the data to be processed but is likely to be effectively applicable to the data to be processed. This allows users to use many data processing methods via the data processing method distribution device 100. As a result, users can effectively utilize data.
  • the attributes of each processing method are compared for the data to be processed. This makes it possible to recommend processing methods that have not been used in the past.
  • the data format and data structure are compared for each combination of data to be processed and processing method. This improves the accuracy of recommendations.
  • Embodiment 2 The embodiment for recommending two or more processing methods that can be applied in a chain manner will be described below with reference to Figs. 12 to 14, mainly with respect to the points that differ from the first embodiment.
  • the recommendation information output unit 130 further includes an element called a chain-based output unit 132 .
  • FIG. 13 shows the functional configuration of the data processing method distribution device 100.
  • step S210 the attribute information acquisition unit 110 acquires the attribute information 199 of the target data.
  • Step S210 is the same as step S110 in the first embodiment.
  • step S220 the affinity score calculation unit 120 calculates an affinity score for the target data for each processing method.
  • Step S220 is the same as step S120 in the first embodiment.
  • step S230 the recommendation information output unit 130 outputs recommendation information of processing methods for the target data based on the affinity score of each processing method.
  • Step S230 is the same as step S130 in the first embodiment.
  • step S240 the affinity score calculation unit 120 determines whether to recommend processing methods in a chain.
  • the chain recommendation of processing methods means recommending a processing method (next recommendation method) that is suitable for processing the output data (next target data) of the recommendation method.
  • the affinity score calculation unit 120 determines whether to recommend a secondary processing method (next recommendation method) for further secondary processing of the processed data (output data) generated by the recommendation method in the previous step S230.
  • a secondary processing method a tertiary processing method, . . . repeated processing methods are recommended, steps S210 to S240 are repeatedly executed.
  • the affinity score calculation unit 120 makes the determination as follows. The affinity score calculation unit 120 makes a determination according to an input from a user. If the number of times steps S210 to S240 are repeated has not reached the upper limit, the affinity score calculation unit 120 determines that processing methods should be consecutively recommended. The affinity score calculation unit 120 makes a determination based on the affinity score of the recommendation method in the immediately preceding step S230.
  • step S210 the attribute information acquisition unit 110 acquires attribute information 192 on the output data of the recommendation method from the processing method information 191 as attribute information 199 of the next target data.
  • step S220 the affinity score calculation unit 120 calculates, for each processing method, the affinity score of the processing method for the next target data as a new affinity score based on the attribute information 192 of the processing method and the attribute information 199 of the next target data.
  • step S230 the recommendation information output unit 130 selects a processing method to be the next recommended method from the multiple processing methods based on the new affinity scores of each of the multiple processing methods, and outputs recommendation information indicating the next recommended method.
  • the data processing method distribution device 100 can recommend a plurality of processing methods in a combination such as applying a primary processing method that performs data shaping to certain processing target data and then applying a secondary processing method that performs data analysis. This increases the variety of processing methods available to users, enabling them to carry out advanced data processing.
  • Embodiment 3 The embodiment in which attribute information 199 obtained by analyzing the data body of the target data is used will be described below with reference to Figs. 15 to 19, focusing mainly on the points different from the first embodiment.
  • the attribute information acquisition unit 110 further includes an element called a data information analysis unit 112 .
  • FIG. 16 shows the functional configuration of the data processing method distribution device 100.
  • step S310 the user inputs at least one of the data body of the target data and the attribute information of the target data to the data processing system distribution device 100.
  • the data information receiving unit 111 receives the input data body.
  • attribute information of the target data is input, the data information receiving unit 111 receives the input attribute information.
  • step S312 the data information receiving unit 111 determines whether the data body has been received. If the data body is accepted, the process proceeds to step S313. If the data body is not accepted, the process ends.
  • step S313 the data information analysis unit 112 analyzes the data body and generates attribute information.
  • the data body is parsed as follows:
  • the data information analysis unit 112 determines the data format of the data body. For example, when an electronic file is received as the data body, the data information analysis unit 112 identifies the data format by referring to the extension of the electronic file.
  • the data information analysis unit 112 determines the data structure of the data body as follows. If the data format of the data body is structured data (or semi-structured data), the data information analysis unit 112 identifies the data structure of the data body.
  • An example of structured data is table data. Specifically, the data information analysis unit 112 parses the data body in accordance with the data format of the data body, thereby identifying the data structure of the data body.
  • the data type can be determined by performing type determination on the values of each column in the data body file. If multiple data types can be determined, the data information analyzer 112 selects the most restrictive data type. For example, if a column has values such as 1, 2, ..., the data type is specified as a number rather than a string. Numeric data can always be converted to string data, but string data cannot always be converted to numeric data. Therefore, numbers are a more specific data type than strings.
  • step S313 the process ends.
  • the data processing method distribution device 100 includes a processing circuit 109 .
  • the processing circuit 109 is hardware that realizes the attribute information acquisition unit 110, the affinity score calculation unit 120, and the recommendation information output unit 130.
  • the processing circuitry 109 may be dedicated hardware, or may be a processor 101 that executes a program stored in the memory 102 .
  • processing circuitry 109 When processing circuitry 109 is dedicated hardware, processing circuitry 109 may be, for example, a single circuit, multiple circuits, a programmed processor, parallel programmed processors, an ASIC, an FPGA, or a combination thereof.
  • ASIC is an abbreviation for Application Specific Integrated Circuit.
  • FPGA is an abbreviation for Field Programmable Gate Array.
  • the data processing method distribution device 100 may be realized by a plurality of devices (computers). In other words, the functions of the data processing method distribution device 100 may be distributed among a plurality of devices (computers), and the plurality of devices (computers) may cooperate with each other through a network or the like to realize the functions of the data processing method distribution device 100.
  • a cloud environment may also be used.
  • the "part" of each element of the data processing method distribution device 100 may be read as “processing,” “step,” “circuit,” or “circuitry.”
  • 100 Data processing method distribution device 101 Processor, 102 Memory, 103 Auxiliary storage device, 104 Input/output interface, 109 Processing circuit, 110 Attribute information acquisition unit, 111 Data information reception unit, 112 Data information analysis unit, 120 Affinity score calculation unit, 130 Recommendation information output unit, 131 Recommendation method output unit, 132 Chaining method output unit, 190 Memory unit, 191 Processing method information, 192 Attribute information, 193 Data format classification table, 199 Attribute information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2023/027849 2023-07-28 2023-07-28 データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム Pending WO2025027711A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2023/027849 WO2025027711A1 (ja) 2023-07-28 2023-07-28 データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム
JP2024555971A JP7657382B1 (ja) 2023-07-28 2023-07-28 データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/027849 WO2025027711A1 (ja) 2023-07-28 2023-07-28 データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム

Publications (1)

Publication Number Publication Date
WO2025027711A1 true WO2025027711A1 (ja) 2025-02-06

Family

ID=94394638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/027849 Pending WO2025027711A1 (ja) 2023-07-28 2023-07-28 データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム

Country Status (2)

Country Link
JP (1) JP7657382B1 (https=)
WO (1) WO2025027711A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017168967A1 (ja) * 2016-03-28 2017-10-05 三菱電機株式会社 データ分析手法候補決定装置
WO2018011895A1 (ja) * 2016-07-12 2018-01-18 株式会社日立製作所 データ処理フロー管理システムおよび方法
JP2019133610A (ja) * 2018-02-03 2019-08-08 アレグロスマート株式会社 データオーケストレーションプラットフォーム管理

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017168967A1 (ja) * 2016-03-28 2017-10-05 三菱電機株式会社 データ分析手法候補決定装置
WO2018011895A1 (ja) * 2016-07-12 2018-01-18 株式会社日立製作所 データ処理フロー管理システムおよび方法
JP2019133610A (ja) * 2018-02-03 2019-08-08 アレグロスマート株式会社 データオーケストレーションプラットフォーム管理

Also Published As

Publication number Publication date
JPWO2025027711A1 (https=) 2025-02-06
JP7657382B1 (ja) 2025-04-04

Similar Documents

Publication Publication Date Title
JP5169816B2 (ja) 質問回答装置、質問回答方法および質問回答用プログラム
CN109508373B (zh) 企业舆情指数的计算方法、设备及计算机可读存储介质
US20200097531A1 (en) Dynamic facet tree generation
CN101382946A (zh) 信息处理设备、信息处理方法和程序
CN108509427B (zh) 文本数据的数据处理方法及应用
CN113435188B (zh) 基于语义相似的过敏文本样本生成方法、装置及相关设备
CN117171331B (zh) 基于大型语言模型的专业领域信息交互方法、装置及设备
CN115238816B (zh) 基于多元数据融合的用户分类方法及相关设备
JP2019032704A (ja) 表データ構造化システムおよび表データ構造化方法
WO2008062822A1 (en) Text mining device, text mining method and text mining program
JP4567025B2 (ja) テキスト分類装置、テキスト分類方法及びテキスト分類プログラム並びにそのプログラムを記録した記録媒体
JP7657382B1 (ja) データ加工方式流通装置、データ加工方式流通方法およびデータ加工方式流通プログラム
CN119066179B (zh) 问答处理方法、计算机程序产品、设备及介质
CN118838914B (zh) 一种威胁情报的检索问答方法、装置、电子设备
CN115146194A (zh) 内容质量的确定方法、装置、设备以及存储介质
CN116012061A (zh) 商品文案生成方法、装置、设备及存储介质
CN113780365A (zh) 样本生成方法和装置
CN112632229A (zh) 文本聚类方法及装置
CN113254612B (zh) 知识问答处理方法、装置、设备及存储介质
CN114281942B (zh) 问答处理方法、相关设备及可读存储介质
CN115905505A (zh) 一种专利查重方法、装置及电子设备
CN114461798A (zh) 元器件分类识别方法、装置、存储介质以及电子设备
CN114676237A (zh) 语句相似度确定方法、装置、计算机设备和存储介质
US7933853B2 (en) Computer-readable recording medium, apparatus and method for calculating scale-parameter
JP2004326600A (ja) 構造化文書のクラスタリング装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2024555971

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23947510

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE