WO2023181318A1 - 情報処理装置及び情報処理方法 - Google Patents

情報処理装置及び情報処理方法 Download PDF

Info

Publication number
WO2023181318A1
WO2023181318A1 PCT/JP2022/014203 JP2022014203W WO2023181318A1 WO 2023181318 A1 WO2023181318 A1 WO 2023181318A1 JP 2022014203 W JP2022014203 W JP 2022014203W WO 2023181318 A1 WO2023181318 A1 WO 2023181318A1
Authority
WO
WIPO (PCT)
Prior art keywords
accuracy
unit
input data
classification
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/014203
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
佑介 山梶
邦彦 福島
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to PCT/JP2022/014203 priority Critical patent/WO2023181318A1/ja
Priority to DE112022006518.4T priority patent/DE112022006518T5/de
Priority to CN202280093861.1A priority patent/CN118891641A/zh
Priority to JP2024503517A priority patent/JP7483172B2/ja
Publication of WO2023181318A1 publication Critical patent/WO2023181318A1/ja
Priority to US18/822,999 priority patent/US20240428140A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to an information processing device and an information processing method.
  • neural networks used for classifying input data such as image recognition output inference results based on the accuracy of each classification result when classifying input data (see Patent Document 1).
  • the present disclosure solves the above problems, and provides an information processing device and an information processing method that can determine an appropriate accuracy based on the inference results of machine learning depending on the machine learning to be used and the input data to be used.
  • the purpose is to provide.
  • An information processing device includes a first feature extracting unit that extracts feature quantities of input data, and an inference of the input data based on the feature quantities extracted by the first feature extracting unit. a first accuracy calculation unit that calculates the accuracy of classification for each of the first several classes; and a first accuracy calculation unit that calculates the accuracy of classifying the input data into at least one of the first several classes based on the accuracy calculated by the first accuracy calculation unit.
  • a first classification unit for classifying the input data includes a first process for sorting the input data so that the accuracy calculated by the first accuracy calculation unit is in ascending order or descending order;
  • a second process of extracting the label with the maximum accuracy from the input data a third process of comparing the label with the maximum value and the correct label associated with the input data, and a third process of the third process.
  • a first storage process that stores classes obtained in the first process whose comparison results match, and a second storage process that stores classes obtained in the first process whose comparison results do not match.
  • a first statistical process that statistically processes the classes stored in the first storage process and a second statistical process that statistically processes the classes stored in the second storage process. That is.
  • FIG. 1 is a configuration diagram showing an example of a hardware configuration of an information processing device according to a first embodiment
  • FIG. 1 is a block diagram showing the configuration of an information processing device according to Embodiment 1.
  • FIG. FIG. 2 is a flow diagram showing processing performed by the information processing device according to the first embodiment.
  • FIG. 2 is a flow diagram showing a process for setting a threshold value performed by the information processing apparatus according to the first embodiment.
  • 7 is a flowchart showing a modification example of processing performed by the information processing device according to the first embodiment.
  • FIG. 3 is a diagram illustrating an example of an image data set input to the information processing device according to the first embodiment.
  • FIG. 3 is a diagram illustrating an example of a graph data set input to the information processing apparatus according to the first embodiment.
  • FIG. 2 is a diagram illustrating an example of a natural language data set input to the information processing device according to the first embodiment.
  • FIG. 3 is a diagram illustrating an example of a data set of time waveforms of signals input to the information processing device according to the first embodiment.
  • FIG. 2 is a flow diagram illustrating an example of a neural network for multi-value classification and binary classification of the information processing apparatus according to the first embodiment.
  • FIG. 3 is a diagram illustrating an example of a second data set generated by the information processing device according to the first embodiment.
  • FIG. 2 is a diagram showing the number of pieces of data for which binary classification has been calculated for a threshold value out of 10,000 test data of CIFAR10 by the information processing apparatus according to the first embodiment.
  • FIG. 6 is a diagram showing experimental data of inference results when the information processing apparatus according to the first embodiment uses and does not use binary classification for CIFAR10.
  • FIG. 3 is a diagram showing experimental data of the time required for the information processing apparatus according to the first embodiment to infer 10,000 pieces of data with respect to the threshold value of CIFAR10.
  • FIG. 7 is a diagram illustrating an example of a second data set generated by the information processing device according to Embodiment 3; 7 is a table showing the accuracy of inference by the second learning unit of the information processing device according to the third embodiment. 7 is a graph showing average values of inference accuracy by the information processing apparatuses according to Embodiments 1 and 5.
  • FIG. 7 is a graph showing the median value of inference accuracy by the information processing apparatuses according to Embodiments 1 and 5.
  • FIG. 7 is a graph showing the median value of inference accuracy by the information processing apparatuses according to Embodiments 1 and 5.
  • FIG. 1 is a configuration diagram showing an example of the hardware configuration of an information processing apparatus 100 according to the first embodiment.
  • the information processing device 100 may be a standalone computer not connected to an information network, or may be a server or client of a server-client system connected to a cloud or the like via an information network. Further, the information processing device 100 may be a smartphone or a microcomputer. Further, the information processing device 100 may be a computer used in a closed network environment in a factory called edge computing.
  • the information processing device 100 includes a CPU (Central Processing Unit) 1, a ROM (Read Only Memory) 2a, a RAM (Random Access Memory) 2b, a hard disk (HDD) 2c, and an input/output interface. Equipped with face 4 and These are interconnected via bus wiring 3. Further, for example, the information processing device 100 includes an output section 5, an input section 6, a communication section 7, and a drive 8, which are connected to the input/output interface 4.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • HDD hard disk
  • the input unit 6 includes, for example, a keyboard, a mouse, a microphone, a camera, and the like.
  • the output unit 5 includes, for example, an LCD (Liquid Crystal Display), a speaker, and the like.
  • the CPU 1 executes a program stored in the ROM 2a. Further, the CPU 1 loads a program stored in the hard disk 2c or SSD (Solid State Drive, not shown) into RAM (Random Access Memory), reads and writes the program as necessary, and executes the program. Thereby, the CPU 1 performs various processes and causes the information processing device 100 to function as a device having predetermined functions.
  • the CPU 1 outputs the results of various processes via the input/output interface 4. For example, the CPU 1 outputs the results of various processes from an output device that is the output unit 5. Further, for example, the CPU 1 outputs (transmits) the results of various processes from a communication device, which is the communication unit 7, to an external device. Further, for example, the CPU 1 outputs the results of various processes to the storage unit 20 (see FIG. 2), such as the hard disk 2c, for recording. For example, various information input from the input section 6 and communication section 7 via the input/output interface 4 is recorded on the hard disk 2c. The CPU 1 reads various information recorded on the hard disk 2c from the hard disk 2c and uses it as necessary.
  • the program executed by the CPU 1 is recorded in advance on the hard disk 2c or ROM 2a as a recording medium built into the information processing device 100. Further, for example, a program executed by the CPU 1 is stored (recorded) in a removable recording medium 9 connected via a drive 8. Such a removable recording medium 9 may be provided as so-called packaged software. Examples of the removable recording medium 9 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.
  • a removable recording medium 9 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.
  • the program executed by the CPU 1 is transmitted via the communication unit 7 from a system (comport) such as WWW (World Wide Web) that connects multiple pieces of hardware via wired or wireless communication or both. sent and received.
  • a system such as WWW (World Wide Web) that connects multiple pieces of hardware via wired or wireless communication or both.
  • WWW World Wide Web
  • parameters obtained by the learning are transmitted and received using the above method.
  • the CPU 1 functions as a machine learning device that performs machine learning calculation processing.
  • machine learning devices are configured with general-purpose hardware that is good at parallel calculations such as GPUs (Graphics Processing Units), as well as FPGAs (Field-Programmable Gate Arrays) or dedicated hardware. Can be configured with ware.
  • the information processing device 100 may be configured with a plurality of computers connected via a communication port, and learning and inference, which will be described later, may be implemented using separate hardware configurations that are independent of each other. good. Furthermore, the information processing device 100 may receive a single or multiple sensor signals from an external sensor connected via a communication port. Further, the information processing apparatus 100 may prepare a plurality of virtual hardware environments within one piece of hardware, and each virtual hardware may be virtually treated as an individual piece of hardware.
  • FIG. 2 is a block diagram showing the configuration of information processing device 100 according to the first embodiment.
  • the information processing device 100 is configured to include a control section 10, an input section 6, an output section 5, a communication section 7, and a storage section 20 using the hardware configuration described above.
  • the storage unit 20 includes, for example, a ROM 2a, a RAM 2b, a hard disk 2c, a drive 8, etc., and stores various data and information such as seed information used by the information processing device 100 and results of calculations by the information processing device 100.
  • the control unit 10 includes a first learning unit 11, a second learning unit 12, a first feature extraction unit 13A, a second feature extraction unit 13B, a learning data generation unit 14, a threshold setting unit 15, and an accuracy determination unit. 16 and a classification result selection section 17, and based on the data input from the input section 6 and the communication section 7 and the data and information acquired from the storage section 20, the first learning section 11 and the second learning section 12 , the first feature extraction unit 13A, the second feature extraction unit 13B, the learning data generation unit 14, the threshold setting unit 15, the accuracy determination unit 16, and the classification result selection unit 17 perform various processing. For example, the control unit 10 outputs the results of various processes to the outside via the output unit 5 and the communication unit 7.
  • control unit 10 causes the storage unit 20 to store the results of various processes.
  • the input section 6, the communication section 7, and the storage section 20 constitute the input section in the first embodiment.
  • the output section 5, the communication section 7, and the storage section 20 constitute the output section in the first embodiment.
  • the first learning section 11 and the second learning section 12 perform learning based on input data from the input section 6, the communication section 7, and the storage section 20. Inference is made on the input data from the storage unit 20, and the input data is classified into one of a plurality of classes.
  • the first feature extraction unit 13A and the second feature extraction unit 13B extract feature quantities of input data from the input unit 6, the communication unit 7, and the storage unit 20. In other words, the first feature extracting section 13A and the second feature extracting section 13B quantify the features of the input data from the input section 6, the communication section 7, and the storage section 20. Further, the first feature amount extraction unit 13A and the second feature amount extraction unit 13B extract feature amounts of input data that are different from each other.
  • the learning data generation unit 14 causes the second learning unit 12 to perform learning based on the learning data input from the input unit 6, the communication unit 7, and the storage unit 20 and is used for the first learning unit 11 to perform learning. Generate training data for this purpose.
  • the threshold setting unit 15 sets a threshold that the control unit 10 refers to when performing a predetermined process.
  • the accuracy determining unit 16 determines whether the accuracy of the estimation performed by the first learning unit 11 is less than or equal to the threshold set by the threshold setting unit 15 or exceeds the threshold. .
  • the classification result selection unit 17 selects and outputs either the classification result by the first learning unit 11 or the classification result by the second learning unit 12 based on the determination result by the accuracy determination unit 16. Details of the learning data generation section 14, threshold setting section 15, accuracy determination section 16, and classification result selection section 17 will be described later.
  • the first learning section 11 includes a first model generation section 11A, a first accuracy calculation section 11B, and a first classification section 11C.
  • the first model generation unit 11A performs learning based on input data from the input unit 6, the communication unit 7, and the storage unit 20, and generates a first learned model.
  • the first accuracy calculation unit 11B performs inference (identification) on the input data from the input unit 6, the communication unit 7, and the storage unit 20 based on the feature quantity extracted by the first feature quantity extraction unit 13A and the first learned model. Then, the probability that the input data is classified into each of the plurality of classes preset by the first learned model is calculated.
  • the accuracy with which input data is classified into each of a plurality of classes preset by the learned model is also referred to as inference accuracy.
  • three numbers are obtained by inputting the input data to a trained model. The three numbers are, for example, 0.3, 0.6, and 0.1, and in this embodiment, these numbers are called the accuracy of inference.
  • the first classification unit 11C converts the input data from the input unit 6, the communication unit 7, and the storage unit 20 into the data set in advance by the first learned model, based on the inference accuracy calculated by the first accuracy calculation unit 11B. Classify into at least one of a plurality of classes.
  • the second learning section 12 includes a second model generation section 12A, a second accuracy calculation section 12B, and a second classification section 12C.
  • the second model generation unit 12A performs learning based on input data from the input unit 6, the communication unit 7, and the storage unit 20, and generates a second trained model.
  • the second accuracy calculation unit 12B performs inference (identification) on the input data from the input unit 6, the communication unit 7, and the storage unit 20 based on the feature quantity extracted by the second feature quantity extraction unit 13B and the second trained model.
  • the probability that the input data is classified into each of the plurality of classes preset by the second learned model (inference precision) is calculated.
  • the second classification unit 12C converts the input data from the input unit 6, the communication unit 7, and the storage unit 20 into the input data set in advance by the second learned model based on the inference accuracy calculated by the second accuracy calculation unit 12B. Classify into one of multiple classes.
  • the first learning unit 11 and the second learning unit 12 generate trained models by performing learning based on the learning data input from the input unit 6, the communication unit 7, and the storage unit 20, By inferring input data from the input unit 6, communication unit 7, and storage unit 20 based on the generated trained model, it functions as a learning device that classifies the input data.
  • FIG. 3 is a flow diagram showing processing performed by the information processing apparatus 100 according to the first embodiment.
  • the processing performed by the information processing apparatus 100 can be divided into learning processing and inference processing.
  • the information processing device 100 uses learning data that is a plurality of first input data and correct answer labels for N-value classification (first number classification) problems associated with each of the plurality of learning data.
  • a first data set containing the data is acquired (step ST1).
  • the information processing device 100 uses a plurality of correct labels corresponding to a plurality of classes and learning data that is a plurality of input data associated with each of the plurality of correct labels.
  • the first number N is a predetermined natural number satisfying 3 ⁇ N.
  • the information processing device 100 may acquire the first data set via the input unit 6 and the communication unit 7 each time, or may acquire the first data set in advance and store it in the storage unit 20. You can read and use it.
  • the information processing device 100 learns the N-value classification problem using the first model generation unit 11A, and generates a first learned model. Further, when the process of step ST1 is performed, the information processing apparatus 100 uses the learning data generation unit 14 to generate the first data so that it becomes an M-value classification (second numerical classification) in which the number of classes is different from the N-value classification. The correct answer label for the set is reattached, and a second data set is created (step ST3). In other words, the information processing device 100 uses the learning data generation unit 14 to correct the first dataset so that the number of classes is M (second several), resulting in M-value classification (second numerical classification). Relabel and create a second dataset. In the first embodiment, the correct label of the first data set is reattached so that it becomes a binary classification, and the second data set is generated. Note that the second number M may be a predetermined natural number satisfying M ⁇ N.
  • the information processing device 100 uses the generated second data set to learn binary classification using the second model generation unit 12A, and generates a second learned model (step ST4).
  • the second trained model may be a single trained model that outputs one result for one input data, or may be a single trained model that outputs multiple results for one input data. , may be composed of multiple trained models.
  • the information processing device 100 causes the first learning unit 11 to perform inference on unknown input data (for example, test data) that is not included in the first data set (step ST5).
  • the information processing device 100 performs inference using the first accuracy calculation unit 11B, and calculates the inference accuracy of the input test data for each of the N values (classes).
  • the first classification unit 11C of the information processing device 100 classifies the class (the class with the highest inference accuracy) among the N (first several) classes that are inference candidates (classification candidates) of the input data.
  • the input data is classified into 1 class).
  • the class with the highest inference accuracy is also referred to as a first inference candidate
  • the class (second class) with the second highest inference accuracy is also referred to as a second inference candidate.
  • this embodiment can be applied to data sets such as MultiMNIST, which is one of the data sets, in which there are two or more correct labels for one input data. If it is known that , the first inference candidate and the second inference candidate are set as inference values, and the label corresponding to the inference value is set as an inference label. However, if there are multiple correct labels, the processing is the same as in the case of one correct label, so in this embodiment, the case where there is one correct label will be described.
  • the information processing device 100 After performing the process of step ST5, the information processing device 100 causes the accuracy determining unit 16 to determine whether the accuracy of the first inference candidate is less than or equal to a threshold value preset by the threshold setting unit 15. (Step ST6).
  • step ST6 if the inference accuracy of the first inference candidate exceeds the threshold (NO in step ST6), the information processing device 100 causes the classification result selection unit 17 to select the classification result by the first classification unit 11C and Among the classification results by the second classification unit 12C, it is selected to output the classification results by the first classification unit 11C, that is, the value of the class that is the first inference candidate by the first classification unit 11C.
  • step ST6 if the inference accuracy of the first inference candidate is less than or equal to the threshold (YES in step ST6), the information processing device 100 uses the classification result by the first classification unit 11C and the second classification unit Among the classification results by 12C, the second classification unit 12C selects to output the classification results, and the second accuracy calculation unit 12B performs binary classification inference on the input data to calculate the inference for each of the two classes. Calculate accuracy. Further, the information processing device 100 uses the second classification unit 12C to classify the input data into a class with higher inference accuracy among the two classes that are inference candidates for the input data. The value of is output as the classification result and inference result.
  • the information processing device 100 selects the classification result by the first classification section 11C and the classification result by the second classification section 12C based on the selection result of the classification result selection section 17. Either one is outputted from the control section 10 to either the output section 5, the communication section 7, or the storage section 20.
  • the information processing device 100 uses the accuracy determination unit 16 to determine whether the accuracy of the inference by the first learning unit 11 is less than or equal to the threshold value, but the invention is not limited to this. .
  • the information processing device only needs to be able to determine whether the accuracy of the inference by the first learning unit is larger or smaller than the threshold value by the accuracy determination unit, and the accuracy of the inference by the first learning unit is less than the threshold value. It may be determined whether the accuracy of the inference by the first learning unit is equal to or higher than a threshold value, or it may be determined whether the accuracy of the inference by the first learning unit is equal to or higher than the threshold value. It may be determined whether or not the value exceeds the value.
  • the information processing apparatus 100 of Embodiment 1 performs processing using the inference accuracy and threshold value, both of which are positive values, the invention is not limited to this.
  • the information processing device performs the first learning process when the accuracy of the inference by the first learning unit exceeds the threshold value in the process performed by the accuracy determination unit.
  • the inference result is output based on the inference made by the learning section, and the inference result is output based on the inference made by the second learning section when the accuracy of the inference made by the first learning section is less than or equal to a threshold value.
  • the method of setting the threshold value by the threshold setting unit 15 will be explained later, but for example, the information processing device 100 statistically processes the correctly inferred result and the incorrectly inferred result, and calculates the value between them. Set as a threshold.
  • FIG. 4 is a flow diagram illustrating a threshold setting process performed by the information processing apparatus 100.
  • the information processing device 100 includes a first process in which the first classification unit 11C rearranges the input data so that the accuracy calculated by the first accuracy calculation unit 11B is in ascending order or descending order; A second process of extracting the label with the maximum accuracy from the sorted input data, a third process of comparing the label with the maximum accuracy with the correct label associated with the input data, and The first storage process stores the classes obtained in the first process where the comparison results of the third process match, and the classes obtained in the first process where the comparison results of the third process do not match.
  • the threshold setting unit 15 sets a threshold between the first statistical value calculated by the first statistical process and the second statistical value calculated by the second statistical process.
  • the first classification unit 11C classifies the input data based on the comparison result between the accuracy calculated by the first accuracy calculation unit 11B and the threshold value.
  • the first statistical process and the second statistical process are, for example, processes for calculating any one of an average value, a median value, a standard deviation, or information entropy.
  • the first statistical process and the second statistical process may be a process of calculating a combination of two or more of the average value, median value, standard deviation, or information entropy.
  • the second process is, for example, a process of extracting a label that has a minimum value
  • the third process is, for example, a process of comparing a label that is a minimum value with a correct label associated with input data.
  • the information processing device 100 first obtains a first data set including a plurality of first input data and a correct label for an N-value classification problem associated with each of the plurality of first input data. (Step ST1).
  • the information processing device 100 After performing the process in step ST1, the information processing device 100 refers to the information stored in the storage unit 20 and calls the first trained model for performing inference in the first learning unit 11 (step ST8). , the first learning unit 11 infers the N-value classification problem for the input first input data, and calculates the accuracy of the inference for each first input data (step ST5). For example, in the process of step ST5, the information processing device 100 calculates the accuracy of inference for a plurality of input data that are not used to generate the first trained model. After performing the process of step ST5, the information processing device 100 rearranges the inference data so that the calculated probabilities are in ascending order or descending order (first process, step ST19).
  • the information processing apparatus 100 sorts the inferred data in such a manner that the calculated probabilities are in ascending order or descending order.
  • the information processing device 100 extracts the label (inference label) with the maximum accuracy for each sorted inference data (second process), and extracts the extracted inference label and the correct label. It is determined whether or not they match (third process, step ST20).
  • step ST20 if the inference label and the correct label match (YES in step ST20), the corresponding sorted inference data is stored in the first storage section of the storage section 20 (first storage process , step ST21).
  • the information processing device 100 After performing the process in step ST22, the information processing device 100 statistically processes the sorted inference data stored in the first storage unit using the first statistical unit included in the threshold setting unit 15 (first statistical process, step ST22). In the process of step ST20, if the inference label and the correct label do not match (NO in step ST20), the corresponding sorted inference data is stored in the second storage section of the storage section 20 (the second Storage process, step ST23). After performing the process in step ST23, the information processing apparatus 100 statistically processes the sorted inference data stored in the second storage section using the second statistics section included in the threshold setting section 15 (second statistical process, step ST24). After performing the processes of step ST22 and step ST24, the information processing apparatus 100 sets a threshold value based on the results of these statistical processes (step ST25).
  • the threshold setting unit 15 sets the threshold so that it is equal to or less than the first statistical value calculated by the first statistical process. Thereby, it is possible to determine that the accuracy is sufficiently high for values that are equal to or greater than the first statistical value serving as the threshold value, and there is no need to analyze the values, so that the threshold value can be narrowed down. Further, the threshold setting unit 15 sets a threshold between the first statistical value calculated by the first statistical process and the second statistical value calculated by the second statistical process. In other words, the threshold setting unit 15 sets the threshold so that it is less than or equal to the first statistical value calculated by the first statistical process and greater than or equal to the second statistical value calculated by the second statistical process. .
  • the threshold value setting unit 15 sets the threshold value to be the average value of the first statistical value and the second statistical value. Further, for example, the threshold value setting unit 15 sets the threshold value to be a weighted average value weighted by the number of input data sorted into the first statistical value and the second statistical value.
  • the threshold setting unit 15 uses a combination of both the average value and the weighted average of the first statistical values, and standard deviations and median values other than the average, and sets a threshold value for conditions that do not satisfy all values.
  • the first statistical value and the second statistical value may be determined as A value between each statistical value may be determined as a threshold value. For example, if the highest accuracy among the classification accuracy for each of the first several classes calculated by the first accuracy calculation unit is set as the fifth accuracy, the threshold setting unit 15 Either the average value or the median value of the fifth accuracy when a result matching the class is obtained, and the correct label among the results of the first classification unit classifying the plurality of input data of the first data set.
  • the threshold value may be set to a value between either the average value or the median value of the fifth accuracy when a result that does not match the class corresponding to is obtained.
  • the threshold setting unit 15 sets a fifth probability when a result matching the class corresponding to the correct label is obtained among the results of classifying the plurality of input data of the first data set by the first classification unit.
  • the threshold value may be set so that Also, among the accuracies classified for each of the first several classes calculated by the first accuracy calculation unit, the next highest accuracy (or any class with the second or higher accuracy) Assuming that the accuracy (accuracy) is the sixth accuracy, the threshold setting unit 15 determines whether the first classification unit has obtained a result that matches the class corresponding to the correct answer label among the results of classifying the plurality of input data of the first data set
  • the threshold value may be set to a value between either the average value or the median value of the sixth accuracy when is obtained.
  • the threshold setting unit 15 sets a fifth probability when a result matching the class corresponding to the correct label is obtained among the results of classifying the plurality of input data of the first data set by the first classification unit. Either the average value or the median value of 6. Among the results of classifying multiple input data of the first data set by the first classification unit, the result does not match the class corresponding to the correct label. Either the average value or the median value of the fifth accuracy when is obtained matches the class corresponding to the correct label among the results of the first classification unit classifying the plurality of input data of the first data set.
  • the threshold value may be set to a value between either the average value or the median value of the sixth accuracy when a result that does not occur is obtained. Further, the threshold setting unit 15 may set a threshold for each subset of input data included in the first data set, or may set a threshold for each of a plurality of classes classified by the first classification unit. May be set.
  • the information processing device 100 may determine that the threshold value set by the threshold setting unit 15 and the value of the label extracted in the second process to be compared with the threshold value are less than or equal to the threshold value.
  • the second classification section 12C performs inference using the second feature amount extraction section 13B.
  • the information processing device 100 uses the second classification unit 12C to classify input data when the maximum value of accuracy in the second process is less than or equal to the threshold set by the threshold setting unit 15. Inference is performed using the second feature extraction unit.
  • the search range becomes narrower, so the optimum value can be reached with a smaller number of trials.
  • this method does not depend on the machine learning used or the input data used, it is possible to determine appropriate accuracy using any method.
  • the present invention has revealed that regardless of the size of the data set, data with a small maximum accuracy tend to be easily mistaken. Furthermore, by setting a threshold value for accuracy, it is possible to exclude items with low accuracy even if they have been learned using a small data set, so it is possible to obtain the effect of increasing inference accuracy. By using an information processing device that not only removes the information but also provides higher accuracy, it is possible to perform inference with high accuracy, and as a result, the effect of increasing inference accuracy can be obtained.
  • the data input to the information processing device 100 is, for example, an image, a graph, a text, and a time waveform.
  • the information processing device 100 processes input data as a multi-value classification problem, that is, an N-value classification problem, and outputs a classification result.
  • Multi-value classification for example, uses a trained model to infer (identify) which of the 10 values input data is, from 0 to 9, and outputs the inference results (classification results, identification results). This is an example of classification using machine learning.
  • the learning data used by the information processing device 100 in machine learning is supervised data.
  • Supervised data has one or more classification values for each of a plurality of input data.
  • the classification value for the supervised data is called a correct label.
  • the correct label for "handwritten character 5" in MNIST (Modified National Institute of Standards and Technology database) is "5".
  • the set of the above learning data and correct label is called a data set.
  • the correct label is generally an integer from 0 to 9, but it is not limited to a continuous integer or a label starting from 0.
  • the above 1 is (1, 0, 0)
  • the above 2 is (0, 1, 0)
  • the above 3 is (0, 0, 1), etc. It is also effective to put 1 only in the correct answer label.
  • the correct label may be defined as a 10 ⁇ 10 matrix.
  • the explanation will be given using 10-value classification for ease of understanding, but the classification performed by the information processing device may be any N-value classification where 3 ⁇ N, for example, in image recognition.
  • the classification may be a dataset that has 20,000 correct labels for 14 million pieces of input data, such as the famous dataset ImageNet.
  • the range of the correct answer label for regression is a real number from 0 to 100, for example, the correct answer label can be set as 0-1, 1-2,..., 99-100, etc.
  • the information processing apparatus 100 has a configuration that classifies input data into N values.
  • the information processing device 100 uses different algorithms such as deep learning, gradient boosting method, support vector machine, logistic regression, k-nearest neighbor method, decision tree, and naive Bayes, which have a configuration that classifies input data into N values, and these algorithms. It may be a combination of
  • deep learning which is an example of desirable learning with high inference accuracy (inference accuracy)
  • inference accuracy Various deep learning algorithms are known depending on the input data.
  • CNN convolutional neural network
  • MLP Multi-Layer Perceptron
  • Transformer etc.
  • Vgg ResNet
  • DenseNet DenseNet
  • MobileNet MobileNet
  • EfficientNet which have a common feature of convolution in CNN
  • pure fully connected combinations and algorithms such as MLP-Mixer are known in MLP, and algorithms in combination with CNN feature extraction and algorithms such as Vision Transformer are also known in Transformer.
  • the information processing device may use one of these methods alone or a combination of a plurality of these methods. Further, in the first embodiment, the first learning section 11 and the second learning section 12 will be explained, but the first learning section and the second learning section may have different algorithms from each other, and the second learning section may be configured by two or more devices, and each device may use a plurality of algorithms of two or more different types.
  • the information processing device 100 performs learning and inference using the learning data set.
  • learning refers to the process of optimizing internal parameters of the information processing device 100
  • inference refers to performing calculations on input data based on the optimized parameters. .
  • FIG. 5 is a flow diagram showing a modification of the processing performed by the information processing device 100 according to the first embodiment.
  • the information processing device 100 refers to the information stored in the storage unit 20 and calls a trained model for performing inference in the first learning unit 11 (step ST8).
  • the first learning unit 11 may infer an N-value classification problem for the input data (step ST5).
  • the information processing device 100 refers to the information stored in the storage unit 20,
  • the second learning unit 12 may call up a trained model for inference (step ST9), and the second learning unit 12 may infer a binary classification problem for the input data (step ST7).
  • the information processing device 100 may store the trained model in the storage unit 20 in advance, and call the trained model to perform inference as needed.
  • FIG. 6 is a diagram illustrating an example of an image data set input to the information processing apparatus 100.
  • the image shown on the left side of FIG. 6 may be a still image or a moving image, but since a moving image can be considered as a continuous combination of still images, in the first embodiment, , a case where still image data is input to the information processing apparatus 100 will be explained.
  • the still image data input to the information processing device 100 may be a color image composed of a combination of two or more channels such as RGB, or a monochrome image composed of one channel.
  • various processes are known for processing when there is a plurality of channels, depending on the algorithm of the information processing device 100, but a common process is to combine the channels into one channel using a weight matrix for coupling the channels. be.
  • the size of the image data input to the information processing device 100 may be 32 pixels x 32 pixels, such as MNIST or CIFAR10 (Canadian Institute For Advanced Research 10), or may be 32 pixels x 32 pixels, such as STL10. It may be image data of 96 pixels x 96 pixels, image data of other sizes, or image data of a shape other than square. Note that the smaller the size of the image data input to the information processing apparatus 100, the shorter the computation time.
  • the input image data is converted from physical data to numerical data by equipment that captures electromagnetic waves such as a CCD (Charge Coupled Device) camera, a CMOS (Complementary MOS) camera, an infrared camera, an ultrasonic measuring device, and an antenna. It may be a sensor signal, or it may be a graphic created on a computer using CAD (Computer Aided Design) or the like.
  • FIG. 7 is a diagram showing an example of a graph data set input to the information processing device 100.
  • a graph is composed of nodes, which are points, and edges, which are lines connecting the points, and the nodes and edges have arbitrary graph information.
  • major classification problems for such graphs include the problem of classifying nodes from edges and graph information, the problem of classifying edges from node and graph information, and the problem of classifying graphs by learning multiple graphs. .
  • an electrical circuit can be represented as a graph.
  • the problem of classifying nodes is to assume that the data input to an information processing device is a circuit diagram, and the data output by the information processing device is an output voltage between arbitrary terminals of the circuit.
  • the wires that connect between the components are The problem of optimizing can be treated as a classification problem.
  • the information processing apparatus 100 of the first embodiment In order for the information processing apparatus 100 of the first embodiment to perform classification, two or more nodes are required, but if there are two or more parts, it can be handled as a multi-value classification problem.
  • a problem of classifying whether a circuit is a circuit, a sensor circuit, a communication circuit, or a control circuit can be treated as a classification problem of classifying a graph.
  • FIG. 8 is a diagram showing an example of a natural language data set input to the information processing device 100.
  • the input data may be a portion of a block of text, such as one sentence, one paragraph, one section, or a full text. Conceivable. For example, when given data on a news article, a problem of classifying it into economics, politics, sports, or science or making inferences is a classification problem.
  • Such a classification problem may be a classification problem that is evaluated in one sentence or one paragraph, or, for example, a classification problem in which a person is given a novel and infers the author and genre of the novel. It may be a problem of classifying source code of a programming language, G code of an NC milling machine, etc. into functions, or it may be a problem of classifying a given sentence into emotions such as happiness, anger, sadness, etc. good.
  • FIG. 9 is a diagram illustrating an example of a data set of time waveforms of signals input to the information processing device 100.
  • the classification problem of classifying a time waveform which is a set of continuously changing numerical values including time series data shown on the left side of Figure 9, is based on the horizontal axis being time, and the vertical axis being arbitrary physical information such as voltage or peak value.
  • the time waveform of a signal is used as input data, this time waveform is classified.
  • the problem of classifying the electric circuit as a power supply circuit, sensor circuit, communication circuit, or control circuit based on the input data is the time waveform of a signal in an electric circuit. It can be treated as a problem.
  • the horizontal axis of the data input to the information processing device 100 is not limited to time, but may be any feature quantity that has a physical extent, such as frequency or coordinates. .
  • the data input to the information processing device 100 is, for example, an iris data set that is classified into three types from four types of numerical features. (iris Dataset), numerical data sets, etc., as long as it can be input to AI (artificial intelligence) and can be converted into a form where the output can be obtained as a classification result. good.
  • AI artificial intelligence
  • the information processing device 100 performs on input data immediately before the output layer of deep learning.
  • deep learning information processing is performed on input data such as the above-mentioned images and graphs.
  • the information processing apparatus 100 performs processing using full coupling or a nonlinear function in processing immediately before output.
  • the full combination process is performed to collect the results of extracting feature amounts from input data by convolution calculation or the like into a desired number of classifications.
  • the result of a process using a nonlinear activation function such as a softmax function, is output.
  • the full connection process is not necessarily necessary, and the information processing device may aggregate the features into a desired number of classifications at the stage of extracting the feature amounts described below, although the inference accuracy often decreases to some extent. For example, the information processing device may compare the correct label with the output of the processing result of these full connections or the inference value obtained by extracting the feature amount. Additionally, in general, processing using a softmax function creates clear differences between inference candidates and is expected to improve inference accuracy. It is desirable to perform processing using functions. Note that instead of using the softmax function, the information processing device may perform processing on input data using a nonlinear function that is a modified version of the softmax function, such as log-softmax.
  • CNN convolutional neural network
  • MLP Multi-Layer Perceptron
  • Transformer may be used to extract feature amounts as described above.
  • GNNs Graph Neural Networks
  • RNNs Relational Neural Networks
  • the information processing device 100 may use logistics regression, support vector machine, gradient boosting method, etc., and various algorithms can be considered as these algorithms.
  • various algorithms are known in deep learning, and the information processing device may use algorithms such as Vgg, ResNet, AlexNet, MobileNet, and EfficientNet.
  • an information processing device may process images using pure full combination in MLP
  • methods such as MLP-Mixer that utilize MLP, and it is not possible to process images using these methods. There may be. Also, methods that combine Vision Transformer and CNN feature quantity extraction are known for transformers, and the information processing device may use these methods alone or in combination.
  • the information processing device 100 uses GNN (Graph Neural Network), GCN (Graph Convolutional Network) that convolves nearby nodes, etc. as graph data.
  • GNN Graph Neural Network
  • GCN Graph Convolutional Network
  • the graph data is input after being transformed by an adjacency matrix or a degree matrix, which is a reversible transformation.
  • the adjacency matrix is a matrix that expresses whether there is a connection between the nodes of the graph, and if there are N nodes, it becomes an N ⁇ N matrix.
  • the adjacency matrix is a symmetric matrix when the graph is an undirected graph with no direction in the edges, and an asymmetric matrix when the graph is a directed graph.
  • the degree matrix is a matrix expressing the number of edges included in each node, and when there are N nodes, it becomes an N ⁇ N matrix and becomes a diagonal matrix.
  • the information processing device converts the input graph data into matrix data, inputs the matrix data to GNN, GCN, etc., performs learning through hidden layers multiple times, and applies a fully connected or softmax function before the output layer.
  • the method is the same as the deep learning for images described above, so the explanation will be omitted.
  • deep learning when the input data is time waveform data, RNN is often used, and GRU (gated recurrent unit) and LSTM (long short-term memory), which are extended RNN, are the main ones. It becomes a technology.
  • the information processing device 100 extracts the feature amount of the input data using the method described above, and then performs processing using full connection, softmax function, etc. before the output layer and outputs the data.
  • the method is the same as the deep learning for images described above, so the explanation will be omitted.
  • LSTM which handles the above-mentioned time waveform
  • Seq2Seq sequence to sequence
  • Attention which is an extension of Seq2Seq
  • Transformer technology and its advanced technology, Transformer technology are known, and the information processing device 100 can classify natural language data by using these technologies.
  • LSTM was able to predict the language from the context of a sentence, but because it could only handle fixed-length signals, the accuracy of inference varied depending on the length of the sentence.
  • the above-mentioned problem is solved by using the concept of encoder-decoder in Seq2Seq for LSTM.
  • the number of data such as images, graphs, time waveforms, texts, etc. input to the information processing device 100 is preferably 100 or more for each correct label, and more preferably 1,000 or more. Furthermore, it is not desirable that the training data set input to the information processing device 100 be a data set in which the variance of similar data in one correct label is small, and should have a distribution that can include the results expected at the time of inference. Preferably a dataset.
  • data padding can be performed to increase the learning data using affine transformation or the like.
  • it is not possible to use padding for all kinds of data for example, when the data input to the information processing device 100 is graph, text, or time waveform data, it is generally not possible to pad the data as described above. Have difficulty.
  • the information processing device 100 When the amount of data used for learning is small, the information processing device 100 performs learning using a similar data set from which more data can be obtained, or using a time waveform data set obtained more often by similar sensors. By doing so, the accuracy of inference can be improved. Further, the information processing device 100 may perform learning by using transfer learning or fine tuning using a small amount of acquired data, using the variables and weight matrices obtained through learning as initial values. When learning is performed in this manner, the number of data input to the information processing device 100 may be 100 or less.
  • Transfer learning is a method of changing the initial values of variables and weight matrix elements to reduce the learning rate
  • fine tuning is a method of learning only full connections by fixing variables and weight matrices. It is.
  • transfer learning and fine tuning are often used in combination, and during repeated calculations, the information processing device 100 first performs fine tuning multiple times to optimize parameters, and then performs transfer learning. It may be configured to try. Further, in such a case, it is not necessary to set all variables and weight matrices as initial values, and only some variables, weight matrices, and parameters may be shared.
  • the information processing device 100 may also perform semi-supervised learning.
  • the information processing device 100 may also perform semi-supervised learning.
  • the information processing device 100 may be capable of learning by unsupervised learning, such as self-supervised learning called contrastive learning, and later providing correct answers.
  • unsupervised learning such as self-supervised learning called contrastive learning, and later providing correct answers.
  • it is desirable that the number of learning data without correct labels is 1,000 or more for each correct label, and the number of data with correct labels is 100 or more.
  • the information processing apparatus 100 processes an N-value classification problem when N is an integer of 3 or more. Although there is no particular upper limit to N, the larger N becomes, the larger the data set is required for learning by the information processing device 100, and the amount of calculation required for learning also becomes larger, so it is desirable that N be as small as possible.
  • the data set is divided into training data, verification data, and test data, or simply into training data and test data, for each correct label.
  • MNIST Modem National Institute of Standards and Technology database
  • MNIST includes 60,000 pieces of learning data and 10,000 pieces of test data. 0 uses all these as learning data
  • 50,000 pieces of data may be used as learning data and 10,000 pieces of data may be used as verification data.
  • the data used for learning include approximately the same number of training data, verification data, and test data for each of the N correct labels, so that there is no bias due to the correct labels. Preferably chosen randomly.
  • the information processing device 100 when using part of the data as verification data, the information processing device 100 first performs learning using the learning data, uses the data not used for learning as verification data, and makes inferences using the verification data. The accuracy of the data may be checked. By doing so, it is possible to prevent the learning performed by the information processing device 100 from overlearning the test data. However, if part of the data is used as verification data, the amount of data that can be used as test data will be reduced, and the accuracy of inference on the test data will likely decrease. is desirable.
  • FIG. 10 is a flow diagram illustrating an example of a neural network in deep learning for multi-value classification and binary classification.
  • input data is first input to the input layer (step ST11), feature extraction is performed in the hidden layer (step ST12), processing using an activation function (step ST13), and processing is performed in the hidden layer.
  • step ST16 After repeating the process of extracting the feature amount (step ST14) and processing using the activation function (step ST15) multiple times, full combination is performed (step ST16), and processing using the activation function is performed again (step ST17). The result is output (step ST18).
  • the information processing device 100 that performs deep learning and other learning devices that perform general learning that is not deep learning output the same information. Further, the use of a loss function, optimization function, and error backpropagation is the same for the information processing device 100 that performs deep learning and other learning devices that perform general learning.
  • the learning device that performs general learning outputs the label with the maximum value (accuracy) after processing the input data using a softmax function as the inference result (classification result).
  • the first learning unit 11 differs in that a neural network is defined so that it can output classification results based on inference for all labels.
  • the information processing device 100 learns the N-value classification dataset in this way, that is, updates the variables, weight matrices, parameters, etc., and stores the updated learning results in the storage unit 20 of the information processing device 100.
  • the information processing device 100 causes the learning data generating unit 14 to generate second learning data by using a part of the input data as first learning data and changing the correct answer label of the first learning data. do.
  • the first data set has N types of correct labels as described above.
  • N 10 will be explained as an example, but N may be any other integer as long as it is 3 or more.
  • the information processing device 100 first selects one correct label (second correct label) from among ten types of correct labels.
  • the information processing device 100 converts input data other than the selected correct label into data with one label (third correct label). For example, when generating the second learning data, the information processing device 100 first selects one of the ten types of integers whose correct label is from 0 to 9, and then selects the correct label from 0 and 2 other than 1. The learning data corresponding to 9 is grouped, and one correct label is assigned to the data corresponding to 0 and 2 to 9. For example, the information processing device 100 allocates a new correct answer label of 0 to input data of 1, and also allocates a new correct answer label of 1 to data corresponding to 0 and 2 to 9.
  • FIG. 11 is a diagram illustrating an example of the second data set generated by the information processing apparatus 100.
  • the second data set (second learning data) is a data set used for learning by the second learning unit 12, and is classified into two types with correct labels of 0 and 1 generated as described above, for example. This is the data.
  • the second data set is data classified into binary correct labels, and when the number of input data classified as 0 is M0, the number of data classified as 1 is M1, etc., the second data set is data classified into binary correct labels. In the entire data set, the number of data classified into i 0 is M i0 , and the number of data classified into other categories is expressed by equation (1).
  • the second data set generated in this way becomes binary classification data in which the number is biased depending on the correct label.
  • the second data set is a binary classification data set
  • the first data set is an N-value classification data set
  • the second data set is Any data set with M value classification satisfying M ⁇ N ⁇ 1 may be used.
  • M is 3 or more
  • the number of data combinations will be greater than when M is 2, and the amount of calculations when the information processing device 100 performs learning and inference will increase. If there is no special reason, it is desirable to set M to 2.
  • the second learning unit 12 may use a combination of M-value classification and multi-value classification other than M-value classification.
  • the second learning unit 12 performs learning of M ( ⁇ N-1) value classification.
  • M ⁇ N-1 value classification
  • a loss function Hinge Loss
  • the loss function is a function that outputs 0 when 1-t ⁇ y is less than 0, and outputs 1-t ⁇ y when it is greater than or equal to 0. Note that t is the output result of the second learning section 12, and y is the correct label.
  • a sigmoid function, a log sigmoid function, or the like may be used as the nonlinear activation function immediately before the output layer.
  • the second learning unit 12 uses a softmax function similarly to the first learning unit 11.
  • cross entropy information entropy
  • output binary values from an information processing device for binary classification and apply a softmax function and cross Outputs the result by applying entropy.
  • the sum of the two values before being input to the cross entropy becomes 1 due to the effect of the softmax function.
  • the value becomes [0.63, 0.37].
  • a single value is output from the binary classification information processing device. Due to the effect of the hinge function, the result is a single value between 0 and 1, and the inferred value is changed depending on whether it is close to 0 or close to 1.
  • the average binary classification of the test dataset was 98.375% when using the hinge function.
  • the average is 98.694%, which is not much different.
  • the second learning unit 12 may perform deep learning or may perform learning using an algorithm other than deep learning.
  • the information processing device 100 is not limited to one in which both the first learning section 11 and the second learning section 12 perform deep learning.
  • the neural network used by the second learning unit 12 may be a deep learning neural network that is smaller than that of the first learning unit 11.
  • a small neural network is a neural network that has a relatively small number of hidden layers and adjustable parameters. For example, it can be said that MobileNet (the number of parameters is about 3 million) is a smaller neural network compared to ResNet18 (the number of parameters is about 12 million).
  • the first learning unit 11 performs deep learning using ResNet50 as a neural network
  • the second learning unit 12 performs deep learning using ResNet18 as a neural network, with respect to the input of CIFAR10. It is configured as follows. Thereby, the information processing apparatus 100 can shorten the calculation time required for learning, and can also reduce the size of the learned model stored in the hardware. In this way, the information processing apparatus 100 utilizes the feature that binary classification is easier to obtain high inference accuracy even with a small network than 10-value classification.
  • the second learning unit 12 may be configured by a plurality of binary classification learning devices. In such a case, the second learning unit 12 does not need to use the same machine learning algorithm in different binary classification learning devices, and may use different machine learning algorithms if the inference accuracy is low. .
  • the second learning unit 12 performs learning using ResNet18, but if sufficient inference accuracy cannot be obtained, the second learning unit 12 switches the algorithm to ResNet32.
  • the algorithm used may be switched to ResNet18, which is a smaller network.
  • the second learning unit 12 outputs the output using the same softmax function immediately before the output layer, or outputs the same softmax function immediately before the output layer, or outputs the output using the same loss function. It is desirable to evaluate using the same metrics across different networks, such as output.
  • the second learning unit 12 may utilize the difference or dispersion between the first inference value and the second inference value in binary classification, the maximum value and the minimum Evaluation indicators and correction coefficients may be defined depending on the function used, such as by performing calibration using values. In this way, the second learning section 12 learns the binary classification problem and stores the learning results in the storage section 20 such as the ROM, RAM, hard disk, or external storage medium of the information processing device. Furthermore, since the second learning section 12 is lighter than the first learning section 11 and performs multiple operations that are similar to each other, it is not necessarily necessary to perform learning on a large computer as in conventional machine learning, but on multiple small computers. Learning may be performed in a distributed manner.
  • the first learning unit 11 calculates variables, weight matrices, and parameters learned through learning in a forward direction on a matrix that is input data.
  • the result of the calculation performed by the first learning unit 11 is the output of the softmax function used for learning by the first learning unit 11, and the output of this softmax function is the accuracy, that is, the probability, for each of the N-value classifications. means.
  • the information processing device 100 selects the candidate with the highest accuracy among the N candidates as the classification result (inference result) of the first learning unit 11.
  • the information processing device 100 only needs to be able to calculate the likelihood for each of the N-value classifications, and may perform learning using an algorithm other than deep learning.
  • the candidate with the highest probability will be defined as the first inference candidate
  • the candidate with the second highest probability will be defined as the second inference candidate.
  • the value (accuracy) of the first inference candidate is smaller than a separately defined threshold (first threshold)
  • the value of the second inference candidate is less than the threshold (second threshold).
  • a feature of the information processing device 100 is to output a classification result using the second learning unit 12 when the second learning unit 12 is also large.
  • the first threshold value and the second threshold value may be the same value, or may be different values such that the second threshold value ⁇ the first threshold value.
  • the information processing device 100 presets a threshold value for determining the accuracy of the inference, and when it is determined that the accuracy of the inference by the first learning unit 11 is low, the information processing device 100 By making inferences, the accuracy of inferences can be improved.
  • the information processing device 100 performs inference using the second learning unit 12 when the accuracy of the first inference result is lower than the threshold value.
  • the data input to the information processing device 100 is image data
  • the input data for which the accuracy of the first inference result is lower than the threshold value is referred to as the first input image data. It is called.
  • the second learning unit 12 processes the first input image data.
  • the second learning unit 12 sequentially calls learned models. For example, by combining binary classification of 0 and (1 to 9), binary classification of 1 and (0, 2 to 9), and binary classification of 2 and (0 to 1, 3 to 9), all learned Call the trained model.
  • the information processing device 100 uses the second learning unit 12 to perform inference on the first input image data using all trained models, and calculates a correct label for each trained model, that is, a binary value of 0 and (1 to 9). In the case of classification, if the accuracy is classified as 0, the result of the inference is output and the content of the output is stored in the storage unit 20.
  • the information processing device 100 performs inference by the second learning unit 12, and if there are two or more inference results classified as correct labels, the inference result with the highest accuracy, that is, if the softmax function is used, The inference result with the largest calculated value is output as the inference result of the second learning section 12 and stored in the storage section 20. Further, the information processing device 100 performs inference by the second learning unit 12, and if there is no inference result classified as a correct label, outputs a label corresponding to the first inference result in the first learning unit 11. do. Note that this process requires a long processing time because the binary classification model is called one by one for the first input image. For this reason, the information processing device 100 uses a parallel processing device such as a GPU to calculate each subset or batch of results for input data whose accuracy is less than a threshold value and which needs to be inferred by the second learning unit 12. may be processed.
  • a parallel processing device such as a GPU to calculate each subset or batch of results for input data whose accuracy is less than a
  • the above-mentioned threshold is calculated by calculating the values of the first inference candidate and the second inference candidate for a plurality of inference results, and statistically processing the results. It is set according to the algorithm, loss function, etc. used in the learning section 11. For example, by using the average value of the first inference candidates as the threshold value, it is possible to obtain simple and high inference accuracy.
  • the information processing device 100 stores the accuracy of the first inference candidate in the storage unit 20 when the first learning unit 11 performs inference after the first learning unit 11 performs learning using the learning data. memorize by. In addition, the information processing device 100 calculates the average value of the accuracy of the past first inference candidates using the accuracy determination unit 16 based on the accuracy of the past first inference candidates stored in the storage unit 20, The result is stored in the storage unit 20 as a threshold value.
  • the information processing device 100 may update the threshold value stored in the storage unit 20 as a new threshold value each time the first learning unit 11 performs inference, or update the threshold value stored in the storage unit 20 as a new threshold value, or update the threshold value stored in the storage unit 20 as a new threshold value, or The threshold value may be calculated as a result of inference by the first learning unit 11 using the test data.
  • the information processing device 100 first performs inference on a plurality of input data using the first learning unit 11, and outputs an inference result (classification result). Based on the inference results output by the information processing device 100, the user determines whether or not each of the first inference candidates matches the correct label, and inputs the respective determination results to the information processing device 100.
  • the information processing device 100 uses the accuracy determination unit 16 to calculate the average value of accuracy when the first inference candidate matches the correct label based on the determination result input by the user, and stores the calculation result as a threshold value.
  • the information is stored by the unit 20. In this way, the information processing apparatus 100 can easily obtain high inference accuracy by using the average value of the accuracy of the first inference candidates.
  • the threshold value may be, for example, a median, a percentile such as the 25th percentile, or a 75th percentile, or a statistical value obtained by performing an exponent or logarithm calculation on these values.
  • the threshold value is a statistical value including the average value of the accuracy of the first inference candidate when the inference result of the first learning unit 11 becomes equal to the correct label, and the threshold value of the inference of the first learning unit 11. The statistical value is set to be between the average value of the accuracy of the first inference candidate when the result is different from the correct label.
  • the information processing device 100 performs inference on a plurality of input data using the first learning unit 11, and outputs an inference result (classification result). Based on the inference results output by the information processing device 100, the user determines whether or not each of the first inference candidates matches the correct label, and inputs the respective determination results to the information processing device 100. Based on the determination result input by the user, the information processing device 100 calculates the average value of accuracy when the first inference candidate matches the correct label, and the accuracy when the first inference candidate does not match the correct label. The accuracy determining unit 16 calculates the average value of the accuracy, and calculates a predetermined value between the average value of the accuracy when the answer label does not match and the average value of the accuracy when the answer label does not match.
  • the value is set by the accuracy determination unit 16 and stored in the storage unit 20 as a threshold value. More specifically, the information processing device 100 calculates the median value (average value) of the average value of the accuracy when the answer label does not match and the average value of the accuracy when the answer label does not match. The accuracy determination unit 16 calculates the threshold value, and the storage unit 20 stores the calculation result as a threshold value.
  • the information processing device 100 first performs inference on a plurality of pieces of verification data using the first learning unit 11, and determines whether or not the plurality of first inference candidates match the correct label based on the inference result. are determined by the accuracy determination unit 16, and the accuracy determination unit calculates the average value of accuracy when the first inference candidate matches the correct label, and the average value of accuracy when the first inference candidate does not match the correct label. 16, and the accuracy determination unit 16 sets a predetermined value between the average value of the accuracy when the label does not match the correct label and the average value of the accuracy when the label does not match the correct label. , the value is stored in the storage unit 20 as a threshold value.
  • the information processing device 100 calculates the median value (average value) of the average value of the accuracy when the answer label does not match and the average value of the accuracy when the answer label does not match.
  • the accuracy determination unit 16 calculates the threshold value, and the storage unit 20 stores the calculation result as a threshold value.
  • the threshold value may be set so that the inference accuracy is maximized by a parameter sweep that continuously changes the threshold value.
  • the threshold value may be calculated using a parallel processing device such as a GPU. If the input data has spatial or temporal bias, the threshold set statistically is likely to differ from the threshold set by parameter sweep. By calculating the optimal value of , inference accuracy can be improved.
  • the threshold value is constant regardless of the value of the first inference candidate, whereas in the case of 10-value classification, if the first inference candidate is 0, 1, 2, 3 , 4, 5, 6, 7, 8, and 9, and for each inference candidate, a threshold value is calculated based on statistical information. However, if there are few data classified as errors due to high inference accuracy or small inference data, specifically if it is less than 100 data, the value as statistical information will be reduced. , it is not desirable to change the threshold for each inference candidate; in that case, it is preferable to use a constant threshold regardless of the value of the first inference candidate.
  • the second inference candidate as a threshold
  • statistical methods such as the average value and median value may be used, but if the inference time and computational resources given to inference allow.
  • a method of determining the second inference candidate by parameter sweep is also an effective means.
  • a parallel processing device such as a GPU cannot be used, in order to reduce calculation time, it is not necessary for the second learning unit 12 to perform inference on all first input data that has fallen below a threshold. It is also desirable to use the second learning unit 12 only when the first learning unit 11 has classified the correct label in advance as a correct label that is likely to be mistaken.
  • FIG. 12 is a diagram showing the number of test data for which the information processing apparatus 100 has calculated binary classification for the threshold value out of the 10,000 test data of CIFAR10.
  • CIFAR10 was used as the data set input to the information processing device 100.
  • CIFAR10 includes 50,000 training images and 10,000 test images, which are classified into 10 values: airplane, car, bird, cat, deer, dog, frog, horse, ship, and truck. It is a dataset.
  • no verification data was created, 50,000 pieces of learning data were input to the information processing device 100, and the first learning unit 11 learned using ResNet50, which is a CNN method.
  • ResNet50 is composed of 48 convolution layers, 1 maximum value pooling layer, and 1 average value pooling layer.
  • Poisson regression Poisson negative log likelihood loss
  • MSE least squares error
  • MAE mean absolute error
  • Adam with a learning rate of 0.01 as the optimization function, you may use any other function such as momentum, RMSprop, SGD (Stochastic gradient descent), or define your own error function. .
  • Step LR function As a scheduler that changes the learning rate, many schedulers such as the Cosine Annealing LR function and the Cyclic LR function are known, and if the inference accuracy for test data can be ensured, the loss function As with optimization functions, it doesn't matter what you use. Xavier's initial values were used as the convolution weight matrix, that is, the initial values of the filter.
  • the inference accuracy of the first learning unit 11 was 86.28% for the test data set. It was confirmed.
  • the inference value takes a real number between 0 and 1
  • the result of calculating the number of first inference candidates having a number between 0.30 and 0.99 is shown in FIG. For example, when it is 0.9, it means that out of 10,000 pieces of test data, 2617 pieces will be inferred by binary classification.
  • the binary classification data set includes airplanes and others, cars and others, birds and others, cats and others, deer and others, dogs and others, frogs and others, horses and others, starting from the first data set. Other than that, ships and other things, trucks and other things, Ten data sets were created, and for example, in the case of airplanes and other cases, the correct label for airplanes was defined as 0, and the correct labels for other cases were defined as 1. In this way, the airplane data set will have 5,000 images, and the other data sets will have 45,000 images.
  • the second learning unit 12 used ResNet18, which is a CNN method. Although a hinge loss is used as the loss function, any type of loss function may be used, such as defining and using a unique error function. Furthermore, although Adam with a learning rate of 0.01 was used as the optimization function, any other function may be used, such as defining a unique error function. In addition, we used the Cosine Annealing Warm Restarts function as the scheduler that changes the learning rate, but as long as the inference accuracy for the test data can be ensured, any function can be used, just like loss functions and optimization functions. . As with the first learning unit 11, Xavier's initial values were used as the convolution weight matrix, that is, the initial values of the filter.
  • the binary classification for the test dataset was 97.01% for airplanes and 98.90% for cars. %, Bird: 96.02%, Cat: 94.85%, Deer: 96.96%, Dog: 96.31%, Frog: 98.36%, Horse: 98.35%, Boat: 98.71% , Track: An inference result of 98.30% was obtained.
  • FIG. 13 is a diagram showing experimental data of inference results when the information processing device uses binary classification for CIFAR10 and when it does not.
  • the inference method is the same as the method explained using FIG.
  • the standard for comparison is the inference accuracy of 86.28% when only the first learning unit 11 is used.
  • FIG. 13 shows the inference results using the first learning section 11 and the second learning section 12 when the threshold value for the first inference candidate was changed from 0.3 to 0.99.
  • the threshold value increases and the amount of data to be classified into binary values increases, the inference accuracy improves, reaching a maximum value of 88.70% when the threshold value is 0.85. I understand.
  • FIG. 14 shows the inference time for the threshold.
  • FIG. 14 is a diagram showing experimental data regarding the time required for the information processing apparatus 100 to infer 10,000 pieces of data with respect to the threshold value of CIFAR10. The inference was not parallelized using GPUs, but was calculated sequentially on the CPU. Looking at the results, it can be seen that the inference is completed in 6 seconds when binary classification is not used, but when the threshold is 0.86, the inference calculation time is 570 seconds, which is about 100 times longer. Most of this calculation time is the time required to call the trained model from the ROM, so if parallelization is not possible, it is desirable to call the trained binary classification model to the RAM. Further, FIG. 14 also shows the results of storing data that is below the threshold value and processing it with the GPU. It can be seen that when the time-consuming threshold is 0.99, the CPU takes 1119 seconds, while the GPU takes 16.6 seconds, a reduction of 98.5%. Moreover, this result is not much different from the time of 3 seconds when no threshold value is used.
  • the size of the trained model this time is 103 MB for 10-value classification and 47 MB x 10 for binary classification, which is sufficiently small considering the memory of recent GPUs.
  • N parallel ASICs may be prepared and each calculation unit may perform binary classification inference in parallel.
  • ResNet50 and ResNet18 have a larger file size, that is, a larger number of parameters in the weight matrix, even if they have the same inference accuracy than, for example, EfficientNet or MobileNet, so if file size becomes a problem, you can solve the problem by simply changing the model. can do.
  • the information processing device 100 outputs the classification result by the first classification unit 11C when the accuracy of the inference by the first classification unit 11C exceeds a preset threshold. If the accuracy of the inference by the first classification unit 11C is less than or equal to the threshold, the classification result by the second classification unit 12C, which classifies into a smaller number of classes than the first classification unit 11C, is output. Regardless of the amount of input data when generating a model, it is possible to improve the accuracy of inference from input data.
  • the amount of calculation required to achieve the same inference accuracy as conventional methods can be reduced, reducing computational resources and training time. It is possible to shorten the time and reduce costs.
  • the amount of data required to obtain the same inference accuracy as conventional methods can be reduced, which not only allows machine learning devices to learn with a low-cost and simple device configuration, but also lowers the hurdles to using machine learning. be able to. The difference is especially noticeable in neural networks that require a lot of data.
  • the conventional large-scale machine learning device for one N-value classification required learning with one large-scale computer, but the learning device for N-value classification has been miniaturized and can instead be trained using multiple M-value classifications. Since the learning of the classification device can be distributed to different small computers, for example, computers that are not equipped with dedicated hardware such as a GPU, it becomes easier to utilize the machine learning device.
  • Embodiment 2 ⁇ Inference of the second learning part>
  • the first learning unit 11 when the accuracy of the inference by the first learning unit 11 is less than or equal to a threshold value, the first learning unit 11 infers and selects the first inference candidate with the highest accuracy from the second learning unit 12. It is characterized by passing to.
  • the second learning unit 12 is a device that is trained using the data set composed of combinations of binary values described in Embodiment 1, and first instructs the trained model trained using the first inference candidate and other data. It is used to make judgments. As a result of the determination, if a result different from the first inference candidate is obtained, the second learning unit 12 performs inference using all combinations, and selects the most accurate inference result as the inference result of the second learning unit 12. do.
  • the second learning unit 12 uses the binary classification learned using the airplane and other data sets. Make inferences. If the inference result is an airplane, that is, the accuracy of the first inference candidate class calculated by the second accuracy calculation unit 12B (first accuracy) is higher than the accuracy of other classes (second accuracy). In this case, the second learning unit 12 outputs the class of airplane, that is, the first inference candidate. If the inference result is other than that, airplanes and others, cars and others, birds and others, cats and others, deer and others, dogs and others, frogs and others, horses and others.
  • Inference is made using all the learning devices except for ships and other learning devices, and trucks and other learning devices, and the inference candidates that result in no other results are compared, and the inference result is determined based on the comparison results.
  • the inference result is the one with the smallest value, or the one with the largest value depending on the output function.
  • the first inference candidate is output as the inference result of the second learning unit 12.
  • Embodiment 3 ⁇ Data used for the second learning section>
  • the number of data sets used in the second learning unit 12 is N in the case of N-value classification.
  • the data set in this embodiment is an N-value classification
  • L (third number) is a natural number less than or equal to N
  • any L (third number) correct answer labels first correct answer
  • a second data set is constructed using the input data with the L correct labels.
  • FIG. 15 shows an example of the structure of some data sets. As shown in FIG. 15, L correct labels are selected from among the N-value classifications, and a data set for L-value classification is created. Therefore, the following A data sets are created.
  • N 10 and L is 2 will be explained, but other integers may be used.
  • A (N,L)
  • the second learning unit 12 that performs learning requires 45 pieces, the same as the data sets, and the inference accuracy may deteriorate for some test data sets that are not used as learning data. In that case, you may change to an algorithm that increases accuracy. Also, the accuracy for the test data set may be 100%, and in that case, the calculation time and amount can be reduced by changing to a simpler algorithm, as in the first embodiment. . Therefore, in addition to being different from the first learning unit 11, the second learning unit 12 may also use a different algorithm for each data set. As shown, it is desirable to use the same loss function and activation function immediately before the output layer.
  • FIG. 16 shows the results of learning binary classification using the method based on this embodiment using CIFAR10 and performing inference using the test data set for each binary classification.
  • 0 is a plane
  • 1 is a car
  • 2 is a bird
  • 3 is a cat
  • 4 is a deer
  • 5 is a dog
  • 6 is a frog
  • 7 is a horse
  • 8 is a boat
  • 9 is a truck.
  • the inference accuracy results are generally over 90%, it can be seen that the accuracy for classification of cats and dogs in 3 and 5 is low at 84.5%. In such problems, it is desirable to use a larger network or, in the case of images, to increase the inference accuracy by padding the data.
  • the learned parameters of the second learning section 12 are saved, and when the certainty of the output result of the first learning section 11 becomes less than the threshold, the second learning section 12 performs inference. This is what we do.
  • the second learning unit 12 in order to measure the reduction in the amount of calculation, as in the first embodiment, it is not necessary to use the second learning unit 12 for all data that has fallen below the threshold, and the first inference result is a combination that is likely to be mistaken.
  • Binary classification may be used to reduce the calculation time only when the first inference result is a classification value that is likely to be mistaken for the first inference result.
  • the second learning unit 12 is used only when cat, dog, ship, and airplane are the first inference candidates. It's okay. It is desirable to evaluate the susceptibility to mistakes by first performing inference and quantifying the combinations of incorrect data.
  • ternary or higher classification may also be used. This is because inference accuracy improves as the number of classifications decreases. However, when there are two or more combinations, such as ternary classification, the number of combinations increases, and if 10-value classification is divided into 3-value classification, 120 second learning units 12 are required. Therefore, as described above, it is necessary to reduce the amount of calculation required for inference by using the first learning unit 11 only when inference is made for a label that is likely to be mistaken.
  • Embodiment 4 ⁇ Inference of the second learning part>
  • the first learning unit 11 when the inference result of the first learning unit 11 is below the threshold, the first learning unit 11 infers and selects the first inference candidate and the second inference that are the top two with high accuracy.
  • the feature is that the candidates are passed to the second learning section 12.
  • the second learning unit 12 uses the N trained models for binary classification described in Embodiment 1 or the A1 trained models for binary classification described in Embodiment 2. It is something that makes inferences.
  • the second data set consisting of 5 and other results is Inference is performed using the learned trained model, and if 5 is the inference result, 5 is output, and if it is other than that, the trained model is trained using the second dataset consisting of 6 and other results. Inference is made using the model, and when the probability of being classified as 6 (third probability) is higher than the probability of being classified as other than 6 (fourth probability), 6 is output. Furthermore, when using the N binary classifications described above, if there is sufficient computational resources, inference can be performed using both trained models 5 and 6, and the degree of certainty of the two inference results can be compared. However, it outputs a more probable result, for example, 5.
  • the first inference candidate is 5 and the second inference candidate is 6, learning with the second data set composed of 5 and 6. Inference is performed using a trained model. If the inference is performed, either 5 or 6 will be the most accurate result, so the inference result, for example, 5 is output. In the present embodiment, it has been explained that the top two inference candidates of the first learning section 11 are output, but the top P candidates may be passed to the second learning section 12. Similarly to the above, when N trained models for binary classification are used, a more probable inference result among the top P inference results is output.
  • the order of the inference candidates of the first learning unit 11 that is, the inference values sorted by certainty such as the third inference candidate and the fourth inference candidate, can be obtained. If there is, inference is performed in order, such as if the second inference candidate results in something else, the third inference candidate is used, and if the third inference candidate becomes something else, the fourth inference candidate is used. If the result is different, the inference value can be sent to the second learning unit 12 as an inference result. However, if all the second inference results are other than that, the first inference candidate is output as the inference value.
  • Embodiment 5 a method of determining a threshold value will be explained.
  • the threshold value is characterized in that it is obtained by statistically processing the N-value output results in the inference of the first learning unit 11. For example, if there are 10,000 test data sets on which inference is performed, and 9,000 of them are correct in the inference of the first learning section 11, then if only correct answers are collected, 9,000 This becomes a ⁇ N matrix, which we will call the correct matrix. Furthermore, if only incorrect answers are collected, a 1,000 ⁇ N matrix is formed, which is defined as an error matrix. Then, by rearranging each matrix so that, for example, the smaller the column, the higher the probability, a 9,000 ⁇ N correct matrix with the maximum value in the 1st column and the minimum value in the N column and the 1,000 A ⁇ N error matrix is created.
  • a matrix is created by arranging the outputs of the softmax function for each data set in order of size. For the sake of simplicity, this time we will explain by assuming that one column is the first inference candidate.
  • the first inference candidates for the minimum value may be arranged in N columns, or they may be arranged so that the minimum value is in one column and the maximum value is in N column.
  • FIG. 16 shows the average value of the inference results in the first learning unit 11, which has an inference accuracy of 86.28% with CIFAR10 shown in the first embodiment.
  • the solid line in the figure shows the average value of the correct matrix, and the broken line shows the average value of the error matrix.
  • the threshold value it is desirable to set a value between the average value of the correct matrix in the first column and the error matrix in the first column as the threshold value. For example, since the value in the first column of the correct matrix for FIG. 16 is 0.93 and the value in the first column of the error matrix is 0.70, it is desirable to set the threshold between 0.70 and 0.93. .
  • the threshold value may be determined depending on the computational resources, computational time, and required computational accuracy.
  • the threshold value in FIG. 16 is the same as the calculation accuracy for the threshold value shown in FIG. 12, and the maximum value in FIG. 12 is when the threshold value is set to 0.85. It is included between 0.93 and 0.93.
  • FIG. 17 shows the results of calculating the median value for the above correct matrix and error matrix.
  • the median value it is desirable to set the threshold value to a value between the median values of the correct matrix in the first column and the error matrix in the first column, similarly to the above average value. That is, it is desirable to set the value between 0.56 and 0.96. Even in this case, it can be seen that the maximum value in FIG. 12 holds true considering that the threshold value is 0.85.
  • the threshold value as in the case of the average value, it is desirable that the threshold value be large, but the threshold value may be determined according to calculation resources, calculation time, and required calculation accuracy.
  • the result is the result of learning CIFAR10 with ResNet50, so the above result may be obtained, but it may be possible to use data other than images, or even if the features are extracted using other algorithms even if it is an image, or due to the definition of the loss function. Although the values are different, it is desirable to follow the method described above for determining the threshold value.
  • the average value of the first column of the correct matrix is 0.8
  • the average value of the first column of the error matrix is 0.6
  • the median value of the first column of the correct matrix is 0.9
  • the first column of the error matrix is
  • the median value of is 0.5
  • the upper limit of the threshold is 0.8, which is the average value of the first column of the correct answer matrix
  • the lower limit of the threshold is the median value of the first column of the error matrix. It is also desirable to set the threshold value to a range of 0.5 to 0.8.
  • Embodiment 6 ⁇ Threshold of the first learning section>
  • the correct matrix and error matrix have been explained.
  • a method of deriving a threshold value from statistical information in the second column, which is the second largest value, for the same correct matrix and error matrix will be described.
  • calculation is performed based on the average value and median value of the second column.
  • the threshold value in the second column is 0.047 for the correct matrix and 0.207 for the error matrix. Therefore, it is desirable to set the threshold value between 0.047 and 0.21.
  • the threshold value in the second column is 0.00025 for the correct matrix and 0.0953 for the error matrix. Therefore, it is desirable to set the threshold value between 0.00025 and 0.0953.
  • the difference between the first inference candidate and the second inference candidate may be used.
  • the average value of the difference between the first inference candidate and the second inference candidate in the correct answer matrix is called the correct answer average value
  • the average value of the difference between the first inference candidate and the second inference candidate in the error matrix is called the error average value.
  • the correct average value is always larger than the error average value. Therefore, the threshold value can also be defined by setting the threshold value to be greater than or equal to the error average and less than or equal to the correct answer average.
  • the average value and median value of the first inference candidate and the average value and median value of the second inference candidate are combined, and a value between the average value of the first inference candidate and the average value of the second inference candidate and the center of the first inference candidate is determined.
  • a value between the value and the median value of the second inference candidate may be used as the threshold value.
  • Embodiment 7 ⁇ Threshold of the first learning section>
  • the correct matrix and error matrix shown in Embodiments 5 and 6 are matrices created based on the results of inference performed by the first learning unit 11 on all test data. However, when the test data is large or when the calculation resources are small, the calculation time and amount of calculation required for inference become large. Furthermore, when using a device capable of parallel processing such as a GPU, it is common to input test data as a batch, which is a set, instead of inputting the test data one by one to the first learning unit 11 even in inference. . The size of the batch depends on the amount of memory that the GPU and the like have.
  • Embodiment 7 instead of performing statistical processing after inference is completed on all test data, a part of the test data or a matrix after one batch process is used to calculate the correct answer matrix. and the error matrix is calculated. For example, when there are 10,000 pieces of test data, when 1,000 pieces of data are collected, or when 1,000 pieces of data are batched and put into a device that can process in parallel. The method calculates one batch and creates a correct matrix and an error matrix from the results.
  • inference may be performed using the binary classification apparatus shown in Embodiments 1 to 4.
  • the above process calculates the correct matrix and error matrix each time one set or one batch process is completed.
  • This method is effective when there are variations in the correct labels of the test data, for example, in the case of CIFAR10, when there is a set or batch containing many photos of airplanes.
  • the test data is arranged sufficiently randomly, the following method can be used. That is, the threshold value derived from the correct matrix and error matrix calculated from one set or one or more batch processes is also applied to the remaining test data. This holds true when the above set or one or more batches is a close subset of the entire test data, which reduces the amount of calculation required for inference and shortens the inference time. can.
  • the information processing device can be used to classify input data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
PCT/JP2022/014203 2022-03-25 2022-03-25 情報処理装置及び情報処理方法 Ceased WO2023181318A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/JP2022/014203 WO2023181318A1 (ja) 2022-03-25 2022-03-25 情報処理装置及び情報処理方法
DE112022006518.4T DE112022006518T5 (de) 2022-03-25 2022-03-25 Informationsverarbeitungseinrichtung und informationsverarbeitungsverfahren
CN202280093861.1A CN118891641A (zh) 2022-03-25 2022-03-25 信息处理装置和信息处理方法
JP2024503517A JP7483172B2 (ja) 2022-03-25 2022-03-25 情報処理装置及び情報処理方法
US18/822,999 US20240428140A1 (en) 2022-03-25 2024-09-03 Information processing device and information processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/014203 WO2023181318A1 (ja) 2022-03-25 2022-03-25 情報処理装置及び情報処理方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/822,999 Continuation US20240428140A1 (en) 2022-03-25 2024-09-03 Information processing device and information processing method

Publications (1)

Publication Number Publication Date
WO2023181318A1 true WO2023181318A1 (ja) 2023-09-28

Family

ID=88100846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/014203 Ceased WO2023181318A1 (ja) 2022-03-25 2022-03-25 情報処理装置及び情報処理方法

Country Status (5)

Country Link
US (1) US20240428140A1 (https=)
JP (1) JP7483172B2 (https=)
CN (1) CN118891641A (https=)
DE (1) DE112022006518T5 (https=)
WO (1) WO2023181318A1 (https=)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7348103B2 (ja) * 2020-02-27 2023-09-20 株式会社日立製作所 運転状態分類システム、および、運転状態分類方法
CN119924848B (zh) * 2025-03-27 2026-04-17 湖北工业大学 一种心电图信号的分类方法、装置及心电监护仪

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11202888A (ja) * 1997-12-19 1999-07-30 Mitsubishi Electric Inf Technol Center America Inc 否定標本を使用するマルコフモデル弁別器
US20110251989A1 (en) * 2008-10-29 2011-10-13 Wessel Kraaij Electronic document classification apparatus
JP2018528521A (ja) * 2015-07-31 2018-09-27 クゥアルコム・インコーポレイテッドQualcomm Incorporated メディア分類

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013117861A (ja) 2011-12-02 2013-06-13 Canon Inc 学習装置、学習方法およびプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11202888A (ja) * 1997-12-19 1999-07-30 Mitsubishi Electric Inf Technol Center America Inc 否定標本を使用するマルコフモデル弁別器
US20110251989A1 (en) * 2008-10-29 2011-10-13 Wessel Kraaij Electronic document classification apparatus
JP2018528521A (ja) * 2015-07-31 2018-09-27 クゥアルコム・インコーポレイテッドQualcomm Incorporated メディア分類

Also Published As

Publication number Publication date
JP7483172B2 (ja) 2024-05-14
US20240428140A1 (en) 2024-12-26
DE112022006518T5 (de) 2024-11-28
JPWO2023181318A1 (https=) 2023-09-28
CN118891641A (zh) 2024-11-01

Similar Documents

Publication Publication Date Title
CN111667022B (zh) 用户数据处理方法、装置、计算机设备和存储介质
US11585918B2 (en) Generative adversarial network-based target identification
KR102077804B1 (ko) 학습 데이터 전처리 방법 및 시스템
Chen et al. Adaptive feature selection-based AdaBoost-KNN with direct optimization for dynamic emotion recognition in human–robot interaction
US20240428140A1 (en) Information processing device and information processing method
Korshunova A convolutional fuzzy neural network for image classification
WO2020095321A2 (en) Dynamic structure neural machine for solving prediction problems with uses in machine learning
CN107223260B (zh) 用于动态地更新分类器复杂度的方法
CN114220164A (zh) 一种基于变分模态分解和支持向量机的手势识别方法
CN111144552A (zh) 一种粮食品质多指标预测方法及装置
CN116630816B (zh) 基于原型对比学习的sar目标识别方法、装置、设备及介质
Listyalina et al. Accurate and low-cost fingerprint classification via transfer learning
US20200272812A1 (en) Human body part segmentation with real and synthetic images
Urgun et al. Composite power system reliability evaluation using importance sampling and convolutional neural networks
CN116881841A (zh) 一种基于F1-score多级决策分析的混合模型故障诊断方法
Eastwood et al. Evaluation of hyperbox neural network learning for classification
CN116246326B (zh) 基于多任务transformer的疼痛表情评估方法
CN117070741B (zh) 酸洗机组的控制方法及其系统
Appalanaidu et al. Classification of plant disease using machine learning algorithms
US20240419993A1 (en) Information processing device
Wirayasa et al. Comparison of convolutional neural networks model using different optimizers for image classification
CN115937634B (zh) 基于Transformer的动态稀疏化结合原型的分类方法及装置
Lee et al. Ensemble algorithm of convolution neural networks for enhancing facial expression recognition
Desai et al. Enhancing plant leaf disease classification performance through efficientNetB3 and hyperparameter optimization
Han Detecting an ECG arrhythmia using cascade architectures of fuzzy neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933458

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024503517

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202280093861.1

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 112022006518

Country of ref document: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22933458

Country of ref document: EP

Kind code of ref document: A1