WO2023166578A1 - ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム - Google Patents
ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム Download PDFInfo
- Publication number
- WO2023166578A1 WO2023166578A1 PCT/JP2022/008749 JP2022008749W WO2023166578A1 WO 2023166578 A1 WO2023166578 A1 WO 2023166578A1 JP 2022008749 W JP2022008749 W JP 2022008749W WO 2023166578 A1 WO2023166578 A1 WO 2023166578A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- clusters
- data group
- labeled
- labeling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Definitions
- the present invention relates to a labeling support system, a labeling support method, and a labeling support program that support labeling of unlabeled data.
- Patent Document 1 describes a sensor data classification device that classifies sensor data obtained from a large number of sensors according to their characteristics.
- the device described in Patent Document 1 associates a set of sensor data divided for each preset time interval with a sensor identifier and a divided section identifier, and extracts a plurality of types of feature parameters from the data included in the set of divided data. calculate.
- an object of the present invention is to provide a labeling support system, a labeling support method, and a labeling support program that can support labeling work for clusters in which unlabeled data are classified.
- a labeling support system includes first classification means for generating a plurality of first clusters by classifying a first data group, which is a data group to be labeled, by unsupervised learning; a second classifying means for generating a second plurality of clusters by classifying a second data group that is a data group including a part of data; and output means for outputting data classified into different clusters in the plurality of clusters.
- a computer classifies a first data group, which is a data group to be labeled, by unsupervised learning to generate a plurality of first clusters, and the computer classifies the data to be labeled.
- a second plurality of clusters are generated by classifying a second data group, which is a data group including at least part of the data, and a computer classifies the first data included in the second plurality of clusters. It is characterized by outputting data classified into different clusters in a plurality of clusters.
- a labeling support program provides a computer with a first classification process for generating a first plurality of clusters by classifying a first data group, which is a data group to be labeled, by unsupervised learning, data to be labeled, A second classification process for generating a second plurality of clusters by classifying a second data group that is a data group containing at least part of the data of and outputting data classified into different clusters in the first plurality of clusters.
- FIG. 1 is a block diagram showing a configuration example of an embodiment of a labeling support system according to the present invention
- FIG. FIG. 4 is an explanatory diagram showing an example of data used in the labeling support system
- FIG. 4 is an explanatory diagram showing an example of feature amounts
- FIG. 10 is an explanatory diagram showing an example of visualization of dimension-reduced data in a graph
- FIG. 11 is an explanatory diagram showing another example of visualizing the dimension-reduced data with a graph
- FIG. 4 is an explanatory diagram showing an example of processing for labeling data in a cluster
- FIG. 10 is an explanatory diagram showing an example of processing for selecting some clusters
- FIG. 10 is an explanatory diagram showing an example of processing for excluding part of data; It is explanatory drawing which shows the example which carried out the overlay display of the result before and behind refinement
- FIG. 11 is an explanatory diagram showing an example of displaying a list of data with different results before and after elaboration in another window; FIG. 11 is an explanatory diagram showing an example of overlay display of refinement results of a plurality of times; FIG.
- FIG. 10 is an explanatory diagram showing an example of displaying a list of data with different results due to multiple elaborations in separate windows;
- FIG. 10 is an explanatory diagram showing an example of displaying statistical information of each cluster;
- FIG. 11 is an explanatory diagram showing another example of displaying statistical information of each cluster;
- 4 is a flow chart showing an operation example of the labeling support system;
- 1 is a block diagram showing an overview of a labeling support system according to the present invention;
- FIG. 1 is a schematic block diagram showing a configuration of a computer according to at least one embodiment;
- unlabeled data is not limited to moving images, and may be still images, music data, text data, and the like. Further, unlabeled data (data to be labeled) may be hereinafter referred to as unclassified data.
- FIG. 1 is a block diagram showing a configuration example of one embodiment of a labeling support system according to the present invention.
- the labeling support system 1 of this embodiment includes a data acquisition unit 10, a related information acquisition unit 20, an object identification unit 30, a data processing unit 40, a text information input unit 50, a feature extraction unit 60, and a feature storage. It comprises a unit 70 , a visualization processing unit 80 , an input/output device 90 and a data refinement unit 100 .
- the data acquisition unit 10 acquires data to be labeled (that is, unclassified data). For example, when a camera (not shown) captures an image of a traveling vehicle, the data acquisition unit 10 may acquire a moving image of the vehicle captured by the camera as data to be labeled.
- the data acquired by the data acquisition unit 10 is not limited to data acquired in real time.
- the data acquisition unit 10 may acquire the data to be labeled, for example, from a storage server (not shown) in which the data to be labeled is stored.
- the related information acquisition unit 20 acquires information related to data to be labeled (hereinafter referred to as related information).
- the related information is information indicating the situation in which the data to be labeled is generated. (hereinafter referred to as sensor data).
- the data to be labeled is video data captured by an in-vehicle camera (drive recorder), it is acquired based on GPS (Global Positioning System) information representing the vehicle position and CAN (Controller Area Network) as related information. and the information to be provided.
- GPS Global Positioning System
- CAN Controller Area Network
- sensor data acquired in this case are velocity, acceleration, and position (latitude, longitude, altitude, etc.).
- sensor data when a video showing the operating status of a thermal power plant is used as the data to be labeled, sensor data includes, for example, fuel flow rate, pressure, temperature, rotation speed, and power generation amount.
- sensor data when images showing farm conditions are used as data to be labeled, sensor data includes time, temperature, humidity, pH, soil water content, solar radiation, wind direction/speed, water level, and the like.
- the object identification unit 30 identifies objects included in the acquired data and generates information specifying the identified objects (hereinafter referred to as an object list). For example, when the object to be identified is a vehicle, the object identification unit 30 identifies the vehicle from the data acquired by the data acquisition unit 10, and identifies the vehicle (for example, coordinates indicating the position in the image). may be generated as an object list. Methods for identifying objects from images and videos are widely known, and detailed description thereof is omitted here.
- the data processing unit 40 processes the data (more specifically, the object list) into a form that can be used when the feature extraction unit 60, which will be described later, performs processing. Specifically, the data processing unit 40 processes the data so as to improve the accuracy of feature extraction and clustering.
- the data processing unit 40 for example, thins data, interpolates missing values, excludes outliers, and deletes unnecessary data items. Further, for example, when the data to be labeled is video data, the data processing unit 40 may convert the video data into numerical time-series data.
- the text information input unit 50 accepts input of text data including information to be added to each data to be labeled (hereinafter referred to as additional information).
- the additional information is information indicating the content of the labeling target data that can be acquired other than the related information. Categories indicating additional information include, for example, weather, types of plants, traffic participants, and the like. Examples of weather categorical values include sunny, cloudy, rainy, and snowy. Examples of plant type categorical values include rice, wheat, and barley. ⁇ Pedestrians, etc.
- labeling target data associated with additional information is also simply referred to as labeling target data.
- FIG. 2 is an explanatory diagram showing an example of data used in the labeling support system 1 of this embodiment.
- the example shown in FIG. 2 indicates that the data acquisition unit 10 has acquired the image 11 as data to be labeled, and the related information acquisition unit 20 has acquired related information 21 regarding the location where the image 11 was shot.
- the data processing unit 40 processes the video 11 and the related information 21 (more specifically, the object list generated by the object identification unit 30) to generate numerical time series data 41. indicate that Furthermore, the example shown in FIG. 2 indicates that the text information input unit 50 has received input of text data 51 including information on the weather, scene, time period, and objects as additional information.
- the feature extraction unit 60 extracts features from each data to be labeled.
- the feature extraction unit 60 of the present embodiment firstly generates a plurality of clusters by automatically classifying each data to be labeled including additional information by unsupervised learning. Any method can be used to generate clusters by unsupervised learning, and examples thereof include the k-means method and the Gaussian mixture model.
- the process in which the feature extraction unit 60 classifies the data group to be labeled by unsupervised learning to generate a plurality of clusters will be referred to as the first classification process.
- a plurality of clusters generated by the first classification process will be referred to as a first plurality of clusters, and a data group classified into the first plurality of clusters will be referred to as a first data group.
- the feature extraction unit 60 since the feature extraction unit 60 performs a process of classifying data to be labeled by unsupervised learning, the feature extraction unit 60 can also be called a classifying means.
- the feature extraction unit 60 extracts the feature amount of each data included in the generated cluster.
- the feature extraction unit 60 may extract, for example, additional information included in the text data as a feature amount.
- the feature extraction unit 60 may extract feature amounts indicated by numerical time-series data.
- the feature extraction unit 60 may extract feature amounts based on sensor values included in the data to be labeled (more specifically, numerical time-series data).
- any method can be used to extract feature values from numerical time-series data. For example, for each cluster generated by the k-means method, the feature extraction unit 60 extracts a feature amount called the distance (cluster distance feature) from the center of gravity of the numerical time series data included in the cluster to each data. good.
- the object identification unit 30 identifies the object from the information obtained by the data acquisition unit 10 and the related information acquisition unit 20, and the data processing unit 40 uses the identification result, and the feature extraction unit 60 uses the identification result.
- the data acquisition unit 10 may directly acquire data in the format used by the feature extraction unit 60 and input the acquired data to the feature extraction unit 60 .
- the labeling support system 1 does not have to include the related information acquisition unit 20, the object identification unit 30, and the data processing unit 40.
- the feature storage unit 70 stores feature amounts of each data extracted by the feature extraction unit 60 .
- the feature storage unit 70 may also store information on labels added by the data refinement unit 100, which will be described later. Note that the mode in which the feature storage unit 70 stores the feature amount for each data is arbitrary.
- FIG. 3 is an explanatory diagram showing an example of feature amounts stored in the feature storage unit 70.
- the vertical direction represents one feature point
- the horizontal direction represents the feature amount (category value) of each category (for example, weather, traffic participants, types of plants, etc.).
- the feature storage unit 70 is implemented by, for example, a magnetic disk.
- the visualization processing unit 80 performs processing for visualizing information that contributes to the labeling work for the generated clusters.
- the visualization processing unit 80 of the present embodiment draws a graph on the input/output device 90 of the dimensionality reduction (lower dimension) of the data to be labeled so that a person can observe how the data to be labeled is clustered. Visualize by doing.
- the visualization processing unit 80 uses UMAP (Uniform Manifold Approximation and Projection) or the like to reduce the dimension of the data to be labeled in two dimensions or three dimensions, and visualizes the dimension-reduced data as a graph such as a distribution map. good too.
- the visualization processing unit 80 may display the data classified into the same cluster in a manner different from that of other clusters (for example, by changing the color, changing the symbol, etc.).
- FIG. 4 is an explanatory diagram showing an example of visualizing the dimension-reduced data in a graph.
- the graph illustrated in FIG. 4 shows an example in which the data reduced to two dimensions by UMAP are displayed in different manners (hatching, blacking, etc.) for each cluster to which they belong.
- FIG. 5 is an explanatory diagram showing another example of visualizing the dimension-reduced data in a graph.
- the graph illustrated in FIG. 5 is a graph displayed by changing symbols plotted for each type of video data.
- the visualization processing unit 80 may display the range surrounded by a dotted line so that the range of data included in the cluster can be identified.
- the visualization processing unit 80 may display all data, or may determine that only data that satisfies a specific condition is displayed or not displayed.
- the visualization processing unit 80 for example, targets clusters that satisfy a specific condition (for example, clusters with more data than a predetermined number) and unclassified data (that is, unlabeled data). or not to display.
- the visualization processing unit 80 of the present embodiment outputs data that belong to different clusters as a result of re-learning processing, which will be described later.
- a data output method will be described later.
- the input/output device 90 displays the output result from the visualization processing unit 80.
- the input/output device 90 also receives input from the user regarding the displayed result, and executes processing according to the input.
- the processing of the data refinement unit 100 which will be described later, is performed based on the input of the cluster specified by the user with respect to the output of the input/output device 90.
- the input/output device 90 may be realized by a tablet terminal or the like. Alternatively, the input/output device 90 may be realized by a device having a display device and a pointing device.
- the data refinement unit 100 performs each process on the data group to be labeled based on the clusters generated by the feature extraction unit 60. Specifically, the data refinement unit 100 generates a second data group from the labeling target data group according to the generated first plurality of clusters. In this embodiment, the data refinement unit 100 performs the following three types of processing.
- the first process is the process of labeling the data within the cluster.
- the data refining unit 100 performs labeling for each cluster on the data classified into one of the first plurality of clusters among the data group to be labeled, and converts the data into a second data group. to generate Any cluster can be labeled by the data refinement unit 100 .
- the data refinement unit 100 may label all clusters, or may label clusters specified by the user via the input/output device 90 .
- the data refinement unit 100 may add an arbitrary temporary label to the data in the target cluster, or may add a label with content specified by the user. Then, the data refinement unit 100 may associate the data (more specifically, the feature amount of the data) with the added label and store them in the feature storage unit 70 .
- FIG. 6 is an explanatory diagram showing an example of processing for labeling data within a cluster.
- the example shown in FIG. 6 indicates that the data refinement unit 100 added temporary labels “A”, “B” and “C” to the clusters illustrated in FIG. 5, respectively. Note that when the user designates a cluster to be added among the clusters illustrated in FIG. 5, the data refinement unit 100 may add a temporary label only to the designated cluster.
- the feature extraction unit 60 regenerates a plurality of clusters by learning (supervised learning) using the labeled data.
- the feature extraction unit 60 may perform learning (unsupervised learning) by adding unlabeled data.
- a process of generating a plurality of clusters by classifying a data group including at least part of data to be labeled by the feature extraction unit 60 will be referred to as a second classification process.
- a plurality of clusters generated by the second classification process will be referred to as a second plurality of clusters
- a data group classified into the second plurality of clusters will be referred to as a second data group.
- the second classification process at least part of the labeling target data used in the first classification process is used to generate and refine a plurality of clusters again.
- This can be called a relearning process or refinement. This makes it possible to semi-automate labeling through unsupervised learning, and also contributes to the discovery of new labels.
- the feature extraction unit 60 may extract feature amounts of each data included in the clusters (second plurality of clusters) generated by the second classification process, and store the extracted feature amounts in the feature storage unit 70. .
- the visualization processing unit 80 outputs data classified into different clusters in the first plurality of clusters among the data included in the second plurality of clusters. This corresponds to the process of visualizing data belonging to different clusters as a result of re-learning. Note that specific processing for visualization will be described later.
- the second process is a process of selecting at least some clusters and learning again (unsupervised learning).
- the data refinement unit 100 generates a data group classified into a cluster selected from the first plurality of clusters as a second data group among the data groups to be labeled.
- the data refinement unit 100 selects at least some clusters from among the first plurality of clusters.
- the data refinement unit 100 may select a cluster specified by the user via the input/output device 90, or may automatically select a cluster that satisfies a condition.
- the conditions are arbitrary, and include, for example, clusters in which the number of data is a predetermined number or more, a ratio of classified data that is greater than a predetermined threshold, and the like.
- the data group within the cluster selected here corresponds to the above-described second data group.
- FIG. 7 is an explanatory diagram showing an example of the process of selecting some clusters.
- the example shown in FIG. 7 indicates that two clusters have been selected from the three generated clusters.
- the data refinement unit 100 may add arbitrary cluster identification information to the data in each cluster so that the clusters classified in the first classification process can be identified.
- the feature extraction unit 60 regenerates a plurality of clusters (that is, performs re-learning processing) by learning (unsupervised learning) targeting data in the selected cluster.
- This process corresponds to the above-described second classification process, and the generated clusters correspond to the second clusters.
- the feature extraction unit 60 may perform learning by adding new data separately. As a result, it is possible to dig deeper into the data within the cluster, so it can be expected to classify the data in more detail.
- the visualization processing unit 80 classifies the data included in the second plurality of clusters into different clusters in the first plurality of clusters in the same manner as in the first processing. Output data.
- the visualization processing unit 80 selects the data with the cluster identification information in the minority (other than the maximum ratio) among the data in the cluster as the first A plurality of clusters may be output as data classified into different clusters.
- the third process is a process of excluding at least part of the data not classified into clusters, such as outliers, and learning again (unsupervised learning or supervised learning).
- the data refinement unit 100 generates, as a second data group, a data group excluding one or more data not classified into any of the first plurality of clusters from the data group to be labeled.
- FIG. 8 is an explanatory diagram showing an example of processing for excluding part of the data.
- the example shown in FIG. 8 indicates that the data in the range surrounded by a solid line circle is excluded as an outlier.
- the data to be labeled is video data, this corresponds to processing for excluding noise scenes.
- at least one of the above-described first processing and second processing, or both of them are performed. This is expected to improve classification accuracy.
- the three types of processing performed by the data refinement unit 100 have been described above. However, the processing executed by the data refinement unit 100 is not limited to the three types of processing described above.
- the data refinement unit 100 may also perform data maintenance processing. Also, after each of the first process, the second process, and the third process, the same process may be performed again, or a different process may be performed.
- the data refinement unit 100 may output a file containing a data group to which labels have been added or a data group from which outliers have been removed.
- the data refinement unit 100 creates a label file in which the designated label is described, copies only the labeled data to the next learning folder, and sorts the original data into folders for each label based on the label. (move/copy) etc. may be performed.
- the data refinement unit 100 may create a data list file describing only the data belonging to the selected cluster, copy only the data belonging to the selected cluster to the next learning folder, and the like.
- the data refinement unit 100 creates a data list file describing only data other than the specified data (outliers), and copies the data other than the specified data (outliers) to the next learning folder. Processing and the like may be performed.
- a method for the visualization processing unit 80 to visualize data belonging to a different cluster as a result of re-learning will be specifically described below.
- the visualization processing unit 80 performs dimension reduction on the data group to be labeled, and divides the dimension-reduced data contained in the first plurality of clusters and the dimension-reduced data contained in the second plurality of clusters into Graphs are drawn in such a manner that each cluster can be identified.
- the visualization processing unit 80 displays data classified into different clusters in the first plurality of clusters among the dimension-reduced data included in the second plurality of clusters in a manner different from other data. do.
- Examples of different aspects include changing the shade of color, changing the color itself, changing the line of the outer frame, and blinking.
- FIG. 9 is an explanatory diagram showing an example of an overlay display of results before and after refinement.
- the visualization processing unit 80 superimposes the distribution of the data of each refinement and displays the data other than the layer of interest (that is, the refinement) in a manner different from the data of the layer of interest. to indicate that it is displayed.
- the result of the first elaboration and the result of the second elaboration are superimposed and displayed.
- the data d1 included in the target cluster only in the second refinement is shown in a manner different from other data.
- the data d2 which is included in the cluster of interest only in the first refinement, is shown in a manner different from the other data.
- Figs. 10 and 11 are explanatory diagrams showing examples of displaying results before and after refinement in parallel windows.
- the visualization processing unit 80 may display the results before and after elaboration in separate windows.
- the visualization processing unit 80 may display the data changed before and after elaboration in a different manner from other data, as illustrated in FIG. 11 .
- the visualization processing unit 80 may display a list of data with different results before and after elaboration (that is, data classified into different clusters).
- FIG. 12 is an explanatory diagram showing an example of displaying a list of data d3 that have different results before and after elaboration in separate windows. In the example shown in FIG. 12, the results are shown by displaying a list of the coordinates where the data showing different results before and after elaboration are displayed.
- FIGS. 9 to 12 exemplify the case of comparing two refinement results.
- comparison targets are not limited to two results, and may be three or more.
- FIG. 13 is an explanatory diagram showing an example of an overlay display of results of elaboration performed multiple times.
- FIG. 14 is an explanatory diagram showing an example of displaying a list in another window of data that have resulted in different results due to multiple elaborations. Compared with the example shown in FIG. 9, the example shown in FIG. 13 shows an example in which there are four refinement results. Similarly, the example shown in FIG. 14 shows an example in which there are four refinement results in comparison with the example shown in FIG.
- the visualization processing unit 80 may display cluster statistical information for each data group classification process (that is, refinement) separately from the above-described graph or together with the above-described graph. Note that the creation of the statistical information may be performed by the visualization processing unit 80 or by the feature extraction unit 60 .
- FIG. 15 is an explanatory diagram showing an example of displaying statistical information of each cluster.
- the example shown in FIG. 15 shows an example of displaying the number of data in the cluster, the center of gravity of the data, and the variance (x-direction and y-direction) as the cluster statistical information.
- the visualization processing unit 80 may switch and display the statistical information for each refinement, or may display them side by side.
- FIG. 16 is an explanatory diagram showing another example of displaying the statistical information of each cluster.
- the visualization processing unit 80 may display cluster statistical information (eg, false positive rate) in graph and tabular form.
- cluster statistical information eg, false positive rate
- the example shown in FIG. 16 represents the degree of matching between labels and assigned clusters when supervised learning is performed. In the example shown in FIG. 16, unsupervised learning is assumed for the first time, and there is no evaluation result.
- a data acquisition unit 10 a related information acquisition unit 20, an object identification unit 30, a data processing unit 40, a text information input unit 50, a feature extraction unit 60, a visualization processing unit 80, and a data refinement unit 100.
- a computer processor eg, CPU (Central Processing Unit)
- CPU Central Processing Unit
- a program labeling support program
- the program is stored in a storage unit (not shown) of the labeling support system 1, the processor reads the program, and according to the program, the data acquisition unit 10, the related information acquisition unit 20, the object identification unit 30, the data processing It may operate as the unit 40 , the text information input unit 50 , the feature extraction unit 60 , the visualization processing unit 80 and the data refinement unit 100 .
- the functions of the labeling support system 1 may be provided in a SaaS (Software as a Service) format.
- a data acquisition unit 10, a related information acquisition unit 20, an object identification unit 30, a data processing unit 40, a text information input unit 50, a feature extraction unit 60, a visualization processing unit 80, and a data refinement unit 100 may be implemented by dedicated hardware. Also, part or all of each component of each device may be implemented by general-purpose or dedicated circuitry, processors, etc., or combinations thereof. These may be composed of a single chip, or may be composed of multiple chips connected via a bus. A part or all of each component of each device may be implemented by a combination of the above-described circuits and the like and programs.
- each component of the labeling support system 1 is realized by a plurality of information processing devices, circuits, etc.
- the plurality of information processing devices, circuits, etc. may be centrally arranged, They may be distributed.
- the information processing device, circuits, and the like may be implemented as a form in which each is connected via a communication network, such as a client-server system, a cloud computing system, or the like.
- FIG. 17 is a flow chart showing an operation example of the labeling support system 1.
- FIG. 17 is an operation example when the data acquisition unit 10 directly acquires data in a format used by the feature extraction unit 60 and inputs the acquired data to the feature extraction unit 60 .
- the feature extraction unit 60 generates a first plurality of clusters from the data group to be labeled (first data group) (step S11). After that, the feature extraction unit 60 generates a second plurality of clusters from a data group (second data group) including at least part of data to be labeled (step S12). Then, the visualization processing unit 80 outputs data classified into different clusters in the first plurality of clusters among the data included in the second plurality of clusters (step S13).
- the feature extraction unit 60 classifies the first data group by unsupervised learning to generate the first plurality of clusters. Also, the feature extraction unit 60 classifies the second data group to generate a second plurality of clusters. Then, the visualization processing unit 80 outputs data classified into different clusters in the first plurality of clusters among the data included in the second plurality of clusters. Therefore, it is possible to support labeling work for clusters in which unlabeled data are classified.
- the data refinement unit 100 generates a second data group from among the data group to be labeled, according to the generated first plurality of clusters. Therefore, it is possible to improve the accuracy of re-learning using the generated second data group.
- FIG. 18 is a block diagram showing an overview of a labeling support system according to the present invention.
- a labeling support system 180 (for example, a labeling support system 1) according to the present invention classifies a first data group, which is a data group to be labeled, by unsupervised learning to generate a first plurality of clusters.
- means 181 for example, the feature extracting unit 60
- classifying that is, re-learning
- a second data group which is a data group including at least part of the data to be labeled, to classify a second plurality of clusters.
- a second classifying means 182 e.g., a feature extracting unit 60
- a second classifying means 182 that generates the data classified into a different cluster in the first plurality of clusters out of the data included in the second plurality of clusters.
- means 183 for example, the visualization processing unit 80.
- the labeling support system 180 includes data refinement means (for example, the data refinement unit 100 ).
- the data refining means generates a second data group by performing labeling for each cluster on data classified into one of the first plurality of clusters among the data group to be labeled. (for example, the first processing by the data refinement unit 100).
- the data refining means may generate, as a second data group, a data group classified into a cluster selected from the first plurality of clusters among the data groups to be labeled (for example, , second processing by the data refinement unit 100).
- the data refining means may generate, as the second data group, a data group obtained by excluding one or more data not classified into any of the first plurality of clusters from the data group to be labeled. Good (for example, the third processing by the data refinement unit 100).
- the output means reduces the dimension of the data group to be labeled, and divides the dimension-reduced data included in the first plurality of clusters and the dimension-reduced data included in the second plurality of clusters into clusters. , and out of the dimensionality-reduced data contained in the second plurality of clusters, the data classified into different clusters in the first plurality of clusters are displayed in a manner different from the other data may be displayed.
- the output means may display cluster statistical information for each data group classification process.
- FIG. 19 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
- a computer 1000 comprises a processor 1001 , a main storage device 1002 , an auxiliary storage device 1003 and an interface 1004 .
- the labeling support system 180 described above is implemented in the computer 1000.
- the operation of each processing unit described above is stored in the auxiliary storage device 1003 in the form of a program (labeling support program).
- the processor 1001 reads out the program from the auxiliary storage device 1003, develops it in the main storage device 1002, and executes the above processing according to the program.
- the secondary storage device 1003 is an example of a non-transitory tangible medium.
- Other examples of non-transitory tangible media include magnetic disks, magneto-optical disks, CD-ROMs (Compact Disc Read-only memory), DVD-ROMs (Read-only memory), connected via interface 1004, A semiconductor memory etc. are mentioned.
- the computer 1000 receiving the distribution may develop the program in the main storage device 1002 and execute the above process.
- the program may be for realizing part of the functions described above.
- the program may be a so-called difference file (difference program) that implements the above-described functions in combination with another program already stored in the auxiliary storage device 1003 .
- the data refining means generates a second data group by performing labeling for each cluster on data classified into one of the first plurality of clusters among the data group to be labeled.
- the labeling support system according to appendix 1 or appendix 2.
- the data refining means generates, as a second data group, a data group classified into a cluster selected from the first plurality of clusters among the data groups to be labeled.
- the data refining means generates, as a second data group, a data group excluding one or more data not classified into any of the first plurality of clusters from the data group to be labeled.
- the labeling support system according to any one of appendices 1 to 4.
- the output means reduces the dimension of the data group to be labeled, and divides the dimension-reduced data contained in the first plurality of clusters and the dimension-reduced data contained in the second plurality of clusters into Graphing is performed in a manner in which each cluster can be identified, and among the dimensionally reduced data included in the second plurality of clusters, the data classified into different clusters in the first plurality of clusters are compared with other data.
- the labeling support system according to any one of appendices 1 to 5, wherein the labeling support system is displayed in different modes.
- a computer generates a first plurality of clusters by classifying a first data group, which is a data group to be labeled, by unsupervised learning,
- the computer generates a second plurality of clusters by classifying a second data group, which is a data group including at least part of the data to be labeled, and
- the labeling support method wherein the computer outputs data classified into different clusters in the first plurality of clusters, among the data included in the second plurality of clusters.
- appendix 11 to the computer, 11.
- the program according to appendix 10 which stores a labeling support program for executing a data refinement process for generating a second data group according to the first plurality of clusters generated from the data group to be labeled. storage medium.
- labeling support system 10 data acquisition unit 20 related information acquisition unit 30 object identification unit 40 data processing unit 50 text information input unit 60 feature extraction unit 70 feature storage unit 80 visualization processing unit 90 input/output device 100 data refinement unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024504060A JP7758149B2 (ja) | 2022-03-02 | 2022-03-02 | ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム |
| PCT/JP2022/008749 WO2023166578A1 (ja) | 2022-03-02 | 2022-03-02 | ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム |
| US18/836,420 US20250200116A1 (en) | 2022-03-02 | 2022-03-02 | Labeling assistance system, labeling assistance method, and labeling assistance program |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/008749 WO2023166578A1 (ja) | 2022-03-02 | 2022-03-02 | ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023166578A1 true WO2023166578A1 (ja) | 2023-09-07 |
Family
ID=87883223
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/008749 Ceased WO2023166578A1 (ja) | 2022-03-02 | 2022-03-02 | ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20250200116A1 (https=) |
| JP (1) | JP7758149B2 (https=) |
| WO (1) | WO2023166578A1 (https=) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118152826A (zh) * | 2024-05-09 | 2024-06-07 | 深圳市翔飞科技股份有限公司 | 基于行为分析的摄像机智能报警系统 |
| WO2025224959A1 (ja) * | 2024-04-26 | 2025-10-30 | 三菱電機株式会社 | 情報処理装置、学習方法およびプログラム |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008084151A (ja) * | 2006-09-28 | 2008-04-10 | Just Syst Corp | 情報表示装置および情報表示方法 |
| JP2008084203A (ja) * | 2006-09-28 | 2008-04-10 | Nec Corp | ラベル付与システム、ラベル付与方法およびラベル付与プログラム |
| JP2014063343A (ja) * | 2012-09-21 | 2014-04-10 | Nippon Telegr & Teleph Corp <Ntt> | クラスタリング品質改善方法 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7087851B2 (ja) | 2018-09-06 | 2022-06-21 | 株式会社リコー | 情報処理装置、データ分類方法およびプログラム |
| EP3929927A1 (en) | 2020-06-23 | 2021-12-29 | KWS SAAT SE & Co. KGaA | Associating pedigree scores and similarity scores for plant feature prediction |
-
2022
- 2022-03-02 JP JP2024504060A patent/JP7758149B2/ja active Active
- 2022-03-02 WO PCT/JP2022/008749 patent/WO2023166578A1/ja not_active Ceased
- 2022-03-02 US US18/836,420 patent/US20250200116A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008084151A (ja) * | 2006-09-28 | 2008-04-10 | Just Syst Corp | 情報表示装置および情報表示方法 |
| JP2008084203A (ja) * | 2006-09-28 | 2008-04-10 | Nec Corp | ラベル付与システム、ラベル付与方法およびラベル付与プログラム |
| JP2014063343A (ja) * | 2012-09-21 | 2014-04-10 | Nippon Telegr & Teleph Corp <Ntt> | クラスタリング品質改善方法 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025224959A1 (ja) * | 2024-04-26 | 2025-10-30 | 三菱電機株式会社 | 情報処理装置、学習方法およびプログラム |
| CN118152826A (zh) * | 2024-05-09 | 2024-06-07 | 深圳市翔飞科技股份有限公司 | 基于行为分析的摄像机智能报警系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2023166578A1 (https=) | 2023-09-07 |
| JP7758149B2 (ja) | 2025-10-22 |
| US20250200116A1 (en) | 2025-06-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111886609B (zh) | 用于减少机器学习中的数据存储的系统和方法 | |
| Cui et al. | Convolutional neural network for recognizing highway traffic congestion | |
| US10963734B1 (en) | Perception visualization tool | |
| JP5075924B2 (ja) | 識別器学習画像生成プログラム、方法、及びシステム | |
| Li et al. | Robust vehicle detection in high-resolution aerial images with imbalanced data | |
| CN109325502B (zh) | 基于视频渐进区域提取的共享单车停放检测方法和系统 | |
| US11255678B2 (en) | Classifying entities in digital maps using discrete non-trace positioning data | |
| CN112817755A (zh) | 基于目标追踪加速的边云协同深度学习目标检测方法 | |
| US20210133495A1 (en) | Model providing system, method and program | |
| Xiao et al. | Treetop detection using convolutional neural networks trained through automatically generated pseudo labels | |
| CN109871875A (zh) | 一种基于深度学习的建筑物变化检测方法 | |
| Ziaei et al. | A rule-based parameter aided with object-based classification approach for extraction of building and roads from WorldView-2 images | |
| Yu et al. | Multi-temporal remote sensing of land cover change and urban sprawl in the coastal city of Yantai, China | |
| JP7758149B2 (ja) | ラベリング支援システム、ラベリング支援方法およびラベリング支援プログラム | |
| CN114359258A (zh) | 红外移动对象目标部位的检测方法、装置及系统 | |
| CN115830399B (zh) | 分类模型训练方法、装置、设备、存储介质和程序产品 | |
| JP2025503714A (ja) | センサデータを自動的にアノテーションする方法およびシステム | |
| CN114003672A (zh) | 一种道路动态事件的处理方法、装置、设备和介质 | |
| CN114241373A (zh) | 一种端到端的车辆行为检测方法、系统、设备及存储介质 | |
| Chen et al. | SACNet: A novel self-supervised learning method for shadow detection from high-resolution remote sensing images | |
| CN114155440A (zh) | 一种耕地非农化的自动检测方法及系统 | |
| CN115810151A (zh) | 深度学习林地提取树种识别算法模型 | |
| CN114241258B (zh) | 一种面向自动驾驶雷达点云数据的扩增与优化方法 | |
| Teresa et al. | Deep learning-based detection systems for autonomous vehicles in challenging weather conditions | |
| CN116128831A (zh) | 一种配电线路检测方法、检测装置和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22929726 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2024504060 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18836420 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22929726 Country of ref document: EP Kind code of ref document: A1 |
|
| WWP | Wipo information: published in national office |
Ref document number: 18836420 Country of ref document: US |