WO2018122931A1

WO2018122931A1 - Information processing device, method, and program

Info

Publication number: WO2018122931A1
Application number: PCT/JP2016/088752
Authority: WO
Inventors: 真吏佳金子
Original assignee: 株式会社Pfu
Priority date: 2016-12-26
Filing date: 2016-12-26
Publication date: 2018-07-05

Abstract

The present invention addresses the problem of more accurately determining whether or not new target data is unknown classification target data. An information processing device 1 is provided with: a feature extraction unit 23 that generates feature data by extracting features from new target data; a clustering unit 25 that clusters a set of target data comprising classified target data and the new target data into clusters the number of which is a number obtained by adding one to the number of classifications to which the classified target data have been classified, on the basis of feature data of the classified target data and the feature data of the new target data generated by the feature extraction unit 23; and an inquiry output unit 28 that outputs an inquiry for inquiring about the classification of the new target data in the case where a cluster including only the new target data has appeared as the result of the clustering.

Description

Information processing apparatus, method, and program

This disclosure relates to a technique for classifying data.

Conventionally, a feature amount of each document image is derived from a plurality of document images assigned with a first classification label, and clustering processing is executed using the feature amounts, so that the plurality of document images converge to one cluster. And the cluster is divided into a plurality of sub-clusters at a predetermined threshold of the joint distance, a second classification label is assigned to the document image included in each sub-cluster, and the feature amount and the second Image classification apparatus for generating a classification rule for performing machine learning using a classification label and classifying a document image having a feature amount corresponding to the second classification label to a classification destination designated by the first classification label Techniques using the clustering result as learning input have been proposed (see Patent Documents 1, 2, and 3).

In addition, the information processing apparatus calculates a feature amount of each of a plurality of pieces of document information to which common attribute information is assigned, and a plurality of pieces of document information based on the feature amounts calculated by the feature amount calculation unit. A distance calculating unit that calculates a distance in the feature amount space between each of the two, and a distribution that creates distribution map information in which each of a plurality of document information is plotted on the feature amount space based on the distance calculated by the distance calculating unit There has been proposed a technique for creating information for allowing a user to determine whether or not the classification of document information is appropriate by providing a diagram creating means (see Patent Document 4).

JP 2016-071412 A JP 2014-123286 A JP 2010-282383 A JP2015-026355A

Conventionally, as a technique for automatically classifying data, various automatic classification systems such as classification by machine learning and classification by clustering have been proposed and adopted.

However, since the conventional automatic classification system tries to classify the data into any of the learned classifications, if the target data does not belong to any of the learned classifications, There was a problem that classification results were output.

In view of the above-described problems, the present disclosure has an object to more accurately determine whether new target data is target data of an unknown classification.

An example of the present disclosure includes a storage unit that stores feature data of classified target data in association with the classification of the target data, a target data reception unit that receives input of new target data, and a feature from the new target data A feature extraction means for generating feature data by extracting the feature data, and a set of target data consisting of the classified target data and the new target data, the feature data of the classified target data stored by the storage means, and the Based on the feature data of the new target data generated by the feature extraction means, clustering means for clustering the classified target data into the number of classified classifications + 1 clusters, and as a result of the clustering, the new target data When a cluster containing only target data appears, query the new target data classification A query output means for outputting for an information processing apparatus including a.

The present disclosure can be grasped as an information processing apparatus, a system, a method executed by a computer, or a program executed by a computer. The present disclosure can also be understood as a program recorded on a recording medium readable by a computer, other devices, machines, or the like. Here, a computer-readable recording medium refers to a recording medium that stores information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from a computer or the like. Say.

According to the present disclosure, it is possible to more accurately determine whether or not new target data is target data of an unknown classification.

It is the schematic which shows the structure of the system which concerns on embodiment. 1 is a diagram illustrating an outline of a configuration of a scanner according to an embodiment. It is a figure which shows the outline of a function structure of the information processing apparatus which concerns on embodiment. It is a flowchart which shows the outline | summary of the flow of the data classification process which concerns on embodiment. It is a flowchart which shows the outline | summary of the flow of the classification determination process by a classification model based on Embodiment. It is a flowchart which shows the outline | summary of the flow of the classification determination process by a user inquiry based on Embodiment.

Hereinafter, embodiments of an information processing apparatus, a method, and a program according to the present disclosure will be described with reference to the drawings. However, the embodiment described below exemplifies the embodiment, and the information processing apparatus, method, and program according to the present disclosure are not limited to the specific configuration described below. In implementation, a specific configuration according to the embodiment is appropriately adopted, and various improvements and modifications may be performed.

In this embodiment, the information processing apparatus, method, and program according to the present disclosure are used to capture image data obtained by imaging a medium such as paper or a card using a scanner, and information recorded on the type of medium or the medium. An embodiment when implemented in a system for classifying each type will be described. However, the information processing apparatus, method, and program according to the present disclosure can be widely used for techniques for classifying data, and the application target of the present disclosure is not limited to the example shown in the present embodiment.

<System configuration>
FIG. 1 is a schematic diagram showing a configuration of a system according to the present embodiment. The system according to the present embodiment includes an information processing apparatus 1 and a scanner 3. The information processing apparatus 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, an EEPROM (Electrically Erasable Memory and Programmable Read Only Memory D), and the like. 14, a computer including a communication unit 15 such as a NIC (Network Interface Card), an input device 16 such as a keyboard and a touch panel, and an output device 17 such as a display and a speaker.

FIG. 2 is a diagram showing an outline of the configuration of the scanner 3 according to this embodiment. The scanner 3 according to the present embodiment is a device that acquires image data by capturing an image of a document, business card, receipt, photo / illustration or the like set by the user, and sends the document to the imaging unit 37. 36, an imaging unit 37, a scan button 38, a CPU 31, a ROM 32, a RAM 33, a storage device 34, a communication unit 35, and the like. In the present embodiment, the scanner 3 is exemplified as the imaging method of the scanner 3 that adopts the imaging method of imaging while automatically feeding the document set on the sheet feeder 36. However, the imaging method of the scanner is not limited. For example, the scanner may be of a type that images a document set at a reading position by a user. The communication means, hardware configuration, and the like of the scanner that can employ the method according to the present embodiment are not limited to the examples in the present embodiment. In this embodiment, an example in which the scanner 3 is used as the imaging device used in the present system has been described. However, the imaging device used in the present system is not limited to the scanner. For example, a camera may be employed as the imaging device.

1 illustrates an example in which the scanner 3 and the information processing apparatus 1 are connected via a network or a peripheral device connector, but the system configuration is not limited to that illustrated in FIG. The information processing apparatus may be distributed and implemented using a cloud or distributed computing technology, and the scanner may be built in the information processing apparatus.

The system shown in this embodiment is a system for classifying image data obtained by using the scanner 3 according to the type of medium and the type of information recorded on the medium, and has been learned at this time. If it is classification data, automatic classification is performed using the learning results.If it is classification data that has never been learned, users are asked questions to receive user feedback and learn feedback content. Equipped with a user interface that gradually improves the accuracy of automatic classification.

For this reason, the system shown in this embodiment employs a technique that uses unsupervised learning and supervised learning in two stages. Specifically, the system shown in this embodiment uses unsupervised learning (clustering) in the first stage, and when a cluster of only new target data is created, it is determined that the classification is unknown, Make an inquiry. On the other hand, if it is determined that the classification is not unknown, the system shown in the present embodiment performs classification estimation using a supervised classification model in the second stage.

FIG. 3 is a diagram illustrating an outline of a functional configuration of the information processing apparatus 1 according to the present embodiment. The information processing apparatus 1 reads out a program recorded in the storage 14 to the RAM 13 and executes it by the CPU 11, whereby a storage unit 21, a target data reception unit 22, a feature extraction unit 23, a determination unit 24, clustering. It functions as an information processing apparatus including the unit 25, the estimation unit 26, the confirmation output unit 27, the inquiry output unit 28, the response reception unit 29, the classification determination unit 30, and the classification model management unit 20. In the present embodiment, each function of the information processing apparatus 1 is executed by the CPU 11 that is a general-purpose processor. However, some or all of these functions may be executed by one or more dedicated processors. . Some or all of these functions may be executed by a device installed at a remote value or a plurality of devices installed in a distributed manner using cloud technology or the like.

The storage unit 21 stores the feature data of the classified target data in association with the classification of the target data. Further, when the new target data classification is determined, the storage unit 21 stores the feature data of the new target data as classified target data in association with the determined classification. As a result, the accumulated number of classified target data increases, and the accuracy of the estimation process by the classification model described later improves.

The target data receiving unit 22 receives input of new target data.

The feature extraction unit 23 generates feature data by extracting features from new target data. In this embodiment, a feature vector is used as the feature data. However, the method for converting the feature of the target data into data is not limited to a vector.

The determination unit 24 has the same feature data as the feature data of the new target data generated by the feature extraction unit 23 in the feature data of the classified target data stored by the storage unit 21 before clustering. It is determined whether or not. If the same feature data exists, the clustering process is skipped.

The clustering unit 25 sets the target data composed of the classified target data and the new target data, the feature data of the classified target data stored in the storage unit 21, and the new target data generated by the feature extraction unit 23. Is clustered into “the number of classifications into which the classified target data is classified + 1” clusters.

If the cluster including the new target data includes other classified target data as a result of clustering, the estimation unit 26 determines that the new target data is likely to belong to the existing (known) classification. Then, the classification of the new target data is estimated. At this time, the estimation unit 26 determines the new target based on the feature data of the classified target data accumulated by the storage unit 21, the classification of the classified target data, and the feature data of the new target data. Guess the classification of the data.

The confirmation output unit 27 performs an output for confirming the result of the estimation by the estimation unit 26 via the output device 17.

When a cluster that includes only new target data appears as a result of clustering, the query output unit 28 determines that the new target data is likely to belong to an unknown classification, and the new target data Output for inquiring the classification is performed via the output device 17.

The response reception unit 29 receives an input of a response by the user with respect to an inquiry output or a confirmation output to the user via the input device 16.

The classification determination unit 30 determines the classification of new target data according to the response to the query output or confirmation output for the user.

The classification model management unit 20 holds a classification model used in the estimation process by the estimation unit 26. Further, the classification model management unit 20 generates or updates a classification model based on the feature data of the new target data and the classification determined by the classification determination unit 30. Thereby, the estimation part 26 can estimate the classification | category of object data using the updated newest model.

For the classification model, a general classification algorithm such as a pattern recognition model (learning model) using supervised learning, for example, SVM (support vector machine) may be used. In this case, the classification model is generated by providing and learning feature data (feature vector) and a set of classification labels corresponding thereto as teacher data. In addition, when updating the classification model, accuracy verification may be performed using a method such as cross verification, and a model with improved accuracy may be employed.

<Process flow>
Next, the flow of processing executed by the system according to the present embodiment will be described using a flowchart. Note that the specific contents and processing order of the processing shown in the flowcharts described below are examples for carrying out the present disclosure. Specific processing contents and processing order may be appropriately selected according to the embodiment of the present disclosure.

FIG. 4 is a flowchart showing an outline of the flow of data classification processing according to the present embodiment. The data classification process according to the present embodiment is executed when the information processing apparatus 1 receives the image data transmitted by the scanner 3.

In step S101, input of new target data to be classified is accepted. When the user sets a paper medium on which a document, a photograph, or the like is recorded on the sheet feeder or reading table of the scanner 3 and performs a scan start operation, the scanner 3 captures the paper medium and generates image data. Further, the scanner 3 transmits the generated image data to the information processing apparatus 1. The target data receiving unit 22 receives the image data transmitted from the scanner 3 and input to the information processing apparatus 1 as new target data, and records it in the RAM 13. Thereafter, the process proceeds to step S102.

In step S102, features of new target data are extracted and feature data is generated. The feature extraction unit 23 extracts features from the new target data received in step S101, and generates feature data. In the present embodiment, since the target data is image data, the feature extraction unit 23, for example, paper size (width, height, card size flag, receipt size flag, photo size flag, etc.), number of colors, blank page ratio, Line direction, ruled line (length, width, center coordinates, number, etc.), characters (recognition language, character rectangle, character position, character size, appearance word frequency (Bag of Words / TFIDF), etc.), image (used color information , Density information, figure outlines, local feature quantities such as SIFT / SURF (Bag of Features), features by business form (business card tags, receipt matching results, receipt tags, etc.) and the like are extracted as features. Then, the feature extraction unit 23 generates feature data (in this embodiment, a feature vector) by digitizing the extracted features. Thereafter, the process proceeds to step S103.

In step S103, it is determined whether there is accumulated classified target data. The information processing apparatus 1 determines whether the number of classified target data stored in the storage unit 21 (or the number of classifications into which the classified target data is classified) is greater than zero. This is a process for determining whether or not the new target data received in step S101 is the first target data. If the number of classified target data stored in the storage unit 21 (or the number of classifications into which the classified target data is classified) is greater than 0 as a result of the determination, the new target data accepted in step S101 Is not the first target data, the process proceeds to step S104. On the other hand, if the number of classified target data stored in the storage unit 21 (or the number of classifications into which the classified target data is classified) is 0 as a result of the determination, the new data received in step S101 is displayed. Since the target data is the first target data and naturally belongs to an unknown classification, the process proceeds to “classification determination processing by user inquiry” in step S108.

In step S104, the presence / absence of classified target data having the same feature data is determined. The determination unit 24 searches the feature data of the classified target data stored by the storage unit 21 before clustering, so that the new data generated by the feature extraction unit 23 in the stored classified target data. It is determined whether there is classified target data having the same feature data as the target data. When it is determined that there is classified target data having the same feature data, the clustering process shown in step S105 and step S106 is skipped, and the process proceeds to step S107, and the estimation unit 26 does not perform clustering. Make a guess. On the other hand, if it is determined that there is no classified target data having the same feature data, the process proceeds to step S105.

In this embodiment, when it is determined that there is classified target data having “identical” feature data, the clustering process is skipped to reduce the overall processing load. However, the determination condition is “ It is not necessarily limited to “same”. For example, the determination unit 24 sets a new threshold generated by the feature extraction unit 23 in the feature data of the classified target data stored by the storage unit 21 by a method of setting a predetermined threshold as a determination condition. It may be determined whether there is feature data that is the same as or similar to the feature data of the data. However, if the determination condition is not limited to “same” and the condition has a wide range, the processing load for searching the feature data of the classified target data becomes large. It is preferable to set in consideration.

In this embodiment, when it is determined that there is already classified target data having the same (or approximate) feature data, the classification is determined by the classification determination process using the classification model. The classification associated with the classified target data having the same (or approximate) feature data may be immediately determined as the classification of the new target data without performing the process.

In step S105, clustering processing is performed. The clustering unit 25 includes all the classified target data stored in the storage unit 21 (however, only a part may be used depending on the data amount), and the new target data received in step S101. The set of target data is clustered so that all target data are elements of any one cluster. In the clustering, the feature data of the classified target data stored by the storage unit 21 and the feature data of the new target data generated in step S102 are used. In the present embodiment, a general clustering algorithm based on the distance between feature vectors between target data is used for clustering. However, the algorithm used for clustering is not limited.

Further, the clustering unit 25 clusters the set of target data including the new target data into clusters of “the number of classifications in which the classified target data is classified + 1”. For example, when the classified target data is classified into three classifications, the clustering unit 25 performs clustering into “3 + 1 = 4” clusters. By setting the number of clusters in this way, it can be determined whether or not there is a high possibility that the new target data belongs to the unknown classification. Thereafter, the process proceeds to step S106.

In step S106, it is determined whether or not the cluster to which the new target data belongs includes other classified target data. As a result of the clustering in step S105, the information processing apparatus 1 determines whether the new target data belongs to the existing classification by determining whether the cluster to which the new target data belongs includes other classified target data. Estimate whether or not. As a result of the determination, when the cluster to which the new target data belongs includes other classified target data (that is, it is estimated that the new target data belongs to the existing classification), the processing is performed according to the “classification model according to step S107”. Proceed to “Classification process”. On the other hand, as a result of the determination, the cluster to which the new target data belongs includes only the new target data and does not include other classified target data (that is, it is estimated that the new target data is an unknown classification). In this case, the process proceeds to “classification determination process by user inquiry” in step S108.

In step S107, classification determination processing using a classification model is executed. In the classification determination process using the classification model, the classification of the target data is estimated using the classification model, and the classification of the target data is determined through confirmation of the estimation result by the user. Details of the processing will be described later with reference to FIG. Thereafter, the process proceeds to step S109.

In step S108, a classification determination process based on a user inquiry is executed. In the classification determination process based on the user inquiry, the classification input by the user is determined as the classification of the target data. Details of the processing will be described later with reference to FIG. Thereafter, the process proceeds to step S109.

In step S109, the accumulation processing of classified target data is executed. The storage unit 21 stores the feature data of the new target data whose classification is determined by the classification determination unit 30 in association with the determined classification as the classified target data. That is, the feature data and its classification accumulated here are used as the classified target data and its classification in the data classification process executed when other new target data is received. Thereafter, the processing shown in this flowchart ends.

FIG. 5 is a flowchart showing an overview of the flow of classification determination processing based on the classification model according to the present embodiment. This flowchart explains in detail the processing shown in step S107 of FIG.

In step S201 and step S202, when the classification model has not been generated, one existing classification is adopted as the estimation result. The information processing apparatus 1 determines whether a classification model has been generated (step S201). When it is determined that the classification model has not been generated, since the existing classification is only one classification, the estimation unit 26 estimates the existing one classification as a classification of new target data (step S202). On the other hand, if it is determined that the classification model has been generated, the process proceeds to step S203.

In step S203, the classification of new target data is estimated using the classification model. The estimation unit 26 reads out the classification model generated / updated based on the classification of the classified target data from the classification model management unit 20. Then, the estimation unit 26 inputs the feature data of the new target data into the classification model generated / updated based on the feature data of the classified target data and the classification of the classified target data. Guess the classification of the target data. Thereafter, the process proceeds to step S204.

In this embodiment, the estimation of data classification using a classification model has been described. However, other methods may be used for estimation of data classification. For example, the estimation unit 26 compares the feature data of the classified target data with the feature data of the new target data, and specifies the classification of the classified target data that approximates the new target data, thereby determining the new target data. The classification of the target data may be estimated.

Also, the cluster generated by clustering (unsupervised) in step S105 does not necessarily match the classification estimated (supervised) in step S203. This is because, in the present embodiment, the clustering process is processed to infer whether the new target data belongs to the existing classification, and is generated or updated using the determined classification. This is because it is independent of the classification estimation process by the classification model.

In step S204, a user query of the estimation result is performed. The confirmation output unit 27 performs output for confirming the result of estimation in step S203 using the classification model. Specifically, the confirmation output unit 27 outputs a message such as “Is this a“ business card ”?” Including the classification of the estimation result. Thereafter, the process proceeds to step S205.

In step S205, a response from the user is accepted. The response receiving unit 29 receives an input of a response to the confirmation output in step S204. Specifically, the user confirms the output message (for example, “Is this a“ business card ”?”) Including the guess result classification, and inputs a response to the message. For example, the user performs input (for example, “Yes”) indicating that the estimation is correct when the estimation result in step S203 is correct, and inputs the correct classification when the estimation result is incorrect. When making an input indicating the correct classification, the user may input the classification by freely inputting text (for example, “receipt”) and adding a new classification, or output by the query output unit 28. A classification may be input by selecting from existing classifications. The response receiving unit 29 receives these inputs from the user. Thereafter, the process proceeds to step S206.

In step S206, the classification of new target data is determined according to the response. The classification determination unit 30 refers to the response received in step S205, and if an input indicating that the estimation result in step S203 is correct is received, the classification determination unit 30 uses the estimation result in step S203 as new target data. Determine the classification. On the other hand, the classification determination unit 30 refers to the response received in step S205, and when the input indicating the correct classification is received from the user because the estimation result in step S203 is incorrect, the input is made by the user response. The classified classification is determined as a classification of new target data. Thereafter, the process proceeds to step S207.

In step S207 to step S209, the classification model is updated when the estimation result is incorrect and the number of existing classifications is 2 or more. If the correct classification input by the user is accepted because the estimation result is incorrect ("NO" in step S207), and the classification target data has two or more classifications ("YES" in step S208), the classification The model management unit 20 updates the held classification model based on the feature data of the new target data and the classification input by the user (step S209). On the other hand, when the estimation result is correct (“YES” in step S207), the classification model held is not updated. Also, when the number of classified target data is less than 2 (“NO” in step S208), the classification model is not updated because the classification model cannot be generated. Thereafter, the processing shown in this flowchart ends.

In this embodiment, the learning model is updated when the estimation result is incorrect (“NO” in step S207). However, even when the estimation result is correct, the correct estimation result is reflected in the classification model. Thus, the classification model may be updated. In other words, the determination in step S207 may be omitted. Note that the update timing of the classification model may be determined in consideration of the processing load in the system.

FIG. 6 is a flowchart showing an overview of the flow of classification determination processing by user inquiry according to the present embodiment. This flowchart explains in detail the processing shown in step S108 of FIG.

In step S301, a user inquiry for classification is performed. The inquiry output unit 28 performs an output for inquiring about the classification of new target data. Specifically, the inquiry output unit 28 outputs a message such as “What kind of document is this?”. Thereafter, the process proceeds to step S302.

In step S302, a response from the user is accepted. The response receiving unit 29 receives an input of a response to the output in step S301. Specifically, the user confirms the output message, and makes an input indicating the classification to which the new target data should belong as a response thereto. Here, the user may input a classification by freely inputting text and adding a new classification, or inputting a classification by selecting from existing classifications output by the query output unit 28. May be. The response receiving unit 29 receives input from the user. Thereafter, the process proceeds to step S303.

In step S303, a new classification of target data is determined according to the response. The classification determination unit 30 refers to the response accepted in step S302, and determines the classification input by the user response as a new classification of target data. Thereafter, the process proceeds to step S304.

In step S304 and step S305, a classification model is generated or updated when there are two or more existing classifications. The information processing apparatus 1 determines whether the number of classifications of the classified target data is two or more (step S304). If the number of classified target data is less than two, a classification model cannot be generated, and the process shown in this flowchart ends. On the other hand, when the number of classified target data is two or more, the classification model management unit 20 generates or updates a classification model (step S305).

Here, when the new target data received in step S101 is the first target data (“NO” in step S103), the classification model management unit 20 is input by the feature data of the new target data and the user. A new classification model is generated based on the classification. In other cases (“NO” in step S106), the classification model management unit 20 updates the classification model held based on the feature data of the new target data and the classification input by the user. Thereafter, the processing shown in this flowchart ends.

<Example>
Hereinafter, a general flow when the user actually uses the system according to the present embodiment described above will be described.

First, when the user causes the scanner 3 to capture an image of any document in a state where the classified target data is not accumulated, the information processing apparatus 1 that has received the image data has zero accumulated target data (step S103). "NO") and "What kind of manuscript is this?" In response to this, the user inputs a document type (for example, “business card”), and the information processing apparatus 1 generates a classification model using the input result.

Next, when the user causes the scanner 3 to capture an image of an original, the information processing apparatus 1 that has received the image data determines whether or not the classification is unknown by clustering. While the target data stored is small, the new target data is likely to be an element of a cluster that does not include other data (“NO” in step S106). Is the message type? And the user's input of the document type is repeated several times.

After that, when a certain amount of target data is accumulated, the possibility that the new target data becomes an element of a cluster including other data increases. Therefore, the information processing apparatus 1 determines that the classification is an existing classification ( The target data for which “YES” in step S106) is estimated based on the classification model and confirmed and output to the user, and the target data determined to be an unknown classification as a result of clustering (“NO” in step S106) An inquiry output “What kind of manuscript is this?” Is performed. By repeating such processing, the accuracy of data classification by the information processing apparatus 1 according to the present embodiment is improved.

Conventionally, in machine learning, when training data for learning has already been learned, but there is little learned data, generalization ability is insufficient (overfit), and the estimation accuracy for unknown data is low. In this state, by referring to the score (probability considered to be correct) derived from the classification model, it is determined whether the target data belongs to the already defined classification (whether it is an unknown classification) However, since the reliability of the model itself for deriving the score is low, the estimation accuracy is also low. On the other hand, in the system shown in the present embodiment, relative data that does not depend on the classification model for determining whether the target data belongs to the already defined classification (whether it is an unknown classification). By adopting clustering that uses the relationship between them, it is possible to determine whether or not the classification is unknown with high accuracy.

<Effect>
According to the information processing apparatus, method, and program according to the present embodiment, in the system for classifying data, whether or not the newly input classification target data belongs to the already defined classification (in other words, unknown Whether it is a classification or not). In addition, it is possible to accurately determine whether the classification is unknown, and to inquire the user so that the user's erroneous operation (for example, the incorrect classification presented by the information processing device is approved by the user). Operation and the like that can be performed) and user feedback with high accuracy can be obtained.

1 Information processing device 3 Scanner

Claims

Storage means for storing the characteristic data of the classified target data in association with the classification of the target data;
Target data receiving means for receiving input of new target data;
Feature extraction means for extracting features from the new target data and generating feature data;
A set of target data composed of the classified target data and the new target data is classified into feature data of the classified target data stored by the storage unit and features of the new target data generated by the feature extraction unit. Clustering means for clustering into the number of classifications + 1 classification clusters in which the classified target data is classified based on data;
As a result of the clustering, when a cluster including only the new target data appears, an inquiry output unit that performs an output for inquiring the classification of the new target data;
An information processing apparatus comprising:
Response accepting means for accepting an input of a response to the output;
Classification determination means for determining a classification of the new target data according to the response;
The information processing apparatus according to claim 1, further comprising:
The storage means stores the feature data of the new target data whose classification is determined by the classification determination means, in association with the determined classification, and stores it as classified target data.
The information processing apparatus according to claim 2.
As a result of the clustering, when the cluster including the new target data includes other classified target data, the feature data of the classified target data stored by the storage unit, the classification of the classified target data, , Further comprising an estimation means for estimating a classification of the new target data based on the feature data of the new target data.
The information processing apparatus according to claim 2 or 3.
The inference means uses the feature data of the classified target data, the classification model generated based on the classification of the classified target data, and the feature data of the new target data, and uses the feature data of the new target data. Guess the classification,
The information processing apparatus according to claim 4.
Classification model management means for managing the generated classification model, wherein the classification model management means generates or updates the classification model based on the feature data of the new target data and the classification determined by the classification determination means Further comprising
The information processing apparatus according to claim 5.
The inference means compares the feature data of the classified target data with the feature data of the new target data, and specifies the classification of the classified target data that approximates the new target data, thereby determining the new target data. Guess the classification of target data,
The information processing apparatus according to claim 4.
A confirmation output means for performing an output for confirming a result of the estimation by the estimation means;
The response receiving means receives an input of a response to the confirmation output;
The classification determining means determines a classification of the new target data according to a response to the confirmation output;
The information processing apparatus according to any one of claims 4 to 7.
Before the clustering, in the feature data of the classified target data stored by the storage unit, is there feature data that is the same as or approximates the feature data of the new target data generated by the feature extraction unit A determination means for determining whether or not,
The estimation unit performs the estimation without performing the clustering when the determination unit determines that there is feature data that is the same or approximate.
The information processing apparatus according to any one of claims 4 to 8.
The inquiry output means further determines the classification of the new target data when the number of classifications into which the classified target data is classified or the number of classified target data stored in the storage means is zero. Do output to query,
The information processing apparatus according to any one of claims 1 to 9.
Computer
An accumulation step of accumulating the characteristic data of the classified target data in association with the classification of the target data;
A target data receiving step for receiving input of new target data;
A feature extraction step of extracting features from the new target data to generate feature data;
A set of target data composed of the classified target data and the new target data is classified into feature data of the classified target data accumulated in the accumulation step and feature data of the new target data generated in the feature extraction step. A clustering step of clustering the classified target data into the number of classifications plus one cluster based on
As a result of the clustering, when a cluster including only the new target data appears, an inquiry output step for performing an output for inquiring about the classification of the new target data;
How to run.
Computer
Storage means for storing the characteristic data of the classified target data in association with the classification of the target data;
Target data receiving means for receiving input of new target data;
Feature extraction means for extracting features from the new target data and generating feature data;
A set of target data composed of the classified target data and the new target data is classified into feature data of the classified target data stored by the storage unit and features of the new target data generated by the feature extraction unit. Clustering means for clustering into the number of classifications + 1 classification clusters in which the classified target data is classified based on data;
As a result of the clustering, when a cluster including only the new target data appears, an inquiry output unit that performs an output for inquiring the classification of the new target data;
Program to function as.