WO2023105610A1 - 情報処理装置、情報処理方法、およびプログラム - Google Patents
情報処理装置、情報処理方法、およびプログラム Download PDFInfo
- Publication number
- WO2023105610A1 WO2023105610A1 PCT/JP2021/044858 JP2021044858W WO2023105610A1 WO 2023105610 A1 WO2023105610 A1 WO 2023105610A1 JP 2021044858 W JP2021044858 W JP 2021044858W WO 2023105610 A1 WO2023105610 A1 WO 2023105610A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- learning
- learning data
- classification
- information processing
- data elements
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7747—Organisation of the process, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- the present invention relates to an information processing device, an information processing method, and a program, and particularly to a technique for learning a learning model.
- E-commerce/e-commerce which sells products using the Internet
- EC Electronic Commerce
- Non-Patent Document 1 discloses a technique for hierarchically classifying images using a convolutional neural network model.
- Non-Patent Document 1 discloses a learning model for hierarchically classifying objects contained in images. There is a problem that the efficiency of
- the present invention has been made in view of the above problems, and aims to provide a learning model learning method that effectively prevents a decrease in accuracy even in learning a learning model for increasingly complex tasks.
- one aspect of an information processing apparatus selects one or more learning data elements from a data group including a plurality of learning data elements to which different correct labels are assigned, and generating means for generating a plurality of learning data sets so that the number of learning data elements changes in order; and applying the plurality of learning data sets to a learning model for machine learning in the order in which they were generated, and learning means for repeatedly learning the learning model.
- the generating means may generate the plurality of learning data sets from the plurality of learning data elements so that the learning data elements change randomly.
- the generating means may generate the plurality of learning data sets from the plurality of learning data elements such that the number of the learning data elements increases or decreases in order.
- the generating means randomly selects one or more learning data elements from the data group to generate an initial learning data set, and uses the initial learning data set as a starting point to generate the data
- the plurality of training data sets may be generated by adding and deleting one or more training data elements randomly selected from the group.
- the learning model can be configured including a convolutional neural network.
- the learning model includes a main network that receives an object image including an object as input, extracts a plurality of feature amounts for hierarchical classification of the object based on the object image, and a main network that extracts a plurality of feature amounts for hierarchical classification of the object based on the object image;
- the main network is composed of a plurality of extractors that extract each of the plurality of features, and the sub-network is configured from each of the plurality of features It may comprise a plurality of classifiers outputting classifications for said object, and a higher level classifier may be configured to have connections to one or more lower level classifiers.
- Each of the plurality of extractors in the main network may be configured including a plurality of convolution layers.
- Each of the plurality of classifiers in the sub-network may be composed of a fully-connected neural network.
- the label can indicate a classification with a hierarchical structure for the object.
- the information processing apparatus may further include output means for outputting the two or more hierarchical classifications determined by the classification means.
- one aspect of an information processing method selects one or more learning data elements from a data group including a plurality of learning data elements assigned with different correct labels, A generation step of generating a plurality of learning data sets so that the number of learning data elements changes in order, and applying the plurality of learning data sets to a learning model for machine learning in the order in which they were generated, and a learning step of repeatedly learning the learning model.
- one aspect of an information processing program is an information processing program for causing a computer to execute information processing, wherein the computer is provided with different correct labels.
- FIG. 1 is a block diagram showing an example of the functional configuration of an information processing apparatus according to an embodiment of the invention.
- FIG. 2 shows an example of the architecture of a classification prediction model.
- FIG. 3 shows another example of the architecture of a classification prediction model.
- FIG. 4 shows a conceptual diagram of hierarchical classification for commodities.
- FIG. 5 shows examples of multiple training datasets for a classification prediction model.
- FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing device according to the embodiment of the present invention.
- FIG. 7 shows a flow chart of the classification prediction phase.
- FIG. 8 shows a flow chart of the learning phase.
- FIG. 9 shows an example of classification results.
- the information processing apparatus 100 acquires an image (product image) including a product and applies the product image to a learning model to classify the product image (product included in the product image). is predicted and output.
- an example of predicting a hierarchical classification (a classification having a hierarchical structure) for products will be described.
- the classification prediction target is not limited to products, and may be any object. Therefore, the present embodiment can also be applied to the case of predicting a hierarchical classification of an object from an image (object image) containing an arbitrary object.
- FIG. 1 shows an example of the functional configuration of an information processing apparatus 100 according to this embodiment.
- Information processing apparatus 100 shown in FIG. 1 shows an example of the functional configuration of an information processing apparatus 100 according to this embodiment.
- Information processing apparatus 100 shown in FIG. 1 shows an example of the functional configuration of an information processing apparatus 100 according to this embodiment.
- the acquisition unit 101 acquires product images.
- the acquisition unit 101 may acquire the product image by an input operation by the user (operator) via the input unit 605 (FIG. 6), or by the user's operation from the storage unit (ROM 602 or RAM 603 in FIG. 6). You may The acquisition unit 101 may also acquire product images received from an external device via the communication interface (I/F) 607 (FIG. 6).
- the product image may be an image expressing colors with three colors, red (R), green (G), and blue (B).
- the acquisition unit 101 outputs the acquired product image to the classification prediction unit 102 .
- the classification prediction unit 102 applies the product image acquired by the acquisition unit 101 to the classification prediction model 107, and predicts the classification of the product included in the product image.
- the classification prediction model 107 will be described later.
- the data set generation unit 103 generates a learning data set used for learning the classification prediction model 107 from a plurality of learning data elements (learning data) included in the learning data group 106 .
- the data set generation unit 103 generates a plurality of learning data sets, and the learning unit 104 uses the plurality of learning data sets sequentially (in chronological order) to generate the classification prediction model 107. let them learn
- the classification prediction model 107 is a learning model that receives as input a product image including a product and predicts the classification of the product.
- the taxonomy may be a hierarchical taxonomy.
- the classification prediction model 107 shown in FIG. 2 includes a main network 201 including multiple subparts (first subpart 211 to fifth subpart 215) and multiple classification blocks (first classification block 221 to third classification block 223). It consists of a neural network including a sub-network 202 . A subpart is also called an extractor. Note that the classification block is also called a classifier.
- the output of first classification block 221 is input to second classification block 222 and third classification block 223, and the output of second classification block 222 is input to third classification block 223. It is configured.
- a skip connection (shortcut connection), which is a configuration in which the output of one block in a neural network model is input to another block that is not adjacent, is known in neural network models such as ResNet. In ResNet, skip connections are used in feature quantity extraction.
- the upper classifier in a plurality of classifiers such as the first classification block 211 to the third classification block 223, the upper classifier has a connection to one or more lower classifiers. different from That is, in the model shown in FIG. 2, the output of the classifier on the upper side is input to one or more classifiers on the lower side.
- the main network 201 shown in FIG. 2 is a neural network based on a 16-layer version of the well-known VGG network (VGGNet), which is one of the convolutional neural network (CNN) models.
- VGGNet VGG network
- the first subpart 211 and the second subpart 212 consist of two convolution layers (Conv) and one pooling layer (Pooling)
- the third subpart 213 and the fourth subpart 214 consist of three convolution layers and Consisting of one pooling layer
- the fifth subpart 215 has three convolutional layers.
- convolution with a kernel size of 3 ⁇ 3 is performed, while the number of filters (number of channels) differs in each subpart.
- the number of filters is 64 in the first subpart 211 , 128 in the second subpart 212 , 256 in the third subpart 213 , and 512 in the fourth subpart 214 and the fifth subpart 215 .
- 2 ⁇ 2 size maximum pooling MaxPooling
- stride 2 2 ⁇ 2 size maximum pooling
- a product image is input to the main network 201 as an input image, and the first subpart 211 to the fifth subpart 215 extract the feature amount of the product image and output it as an output of the main network 201 .
- the feature amount may be a feature amount for hierarchical classification.
- product classification is learned in order from top to bottom. Therefore, the plurality of feature quantities output from the first subpart 211 to the fifth subpart 215 show the features of the lower (subdivided) classification in order from the feature quantity showing the features of the higher (coarse) classification of the product. It becomes a feature quantity (a plurality of feature quantities for hierarchical classification of products).
- the output of main network 201 is input to sub-network 202 .
- the sub-network 202 predicts hierarchical classification from each of the plurality of features from the main network 201 .
- Each of the first classification block 221 to the third classification block 223 shown in FIG. 2 is composed of a fully-connected neural network.
- a classification label (class) for the product is output, and the classification (classification name) is determined from the label.
- the classifications (first classification to third classification) output from the first classification block 221 to the third classification block 223 are the higher (coarse) classification of the product and the lower (subdivided) classification in order. An example of such hierarchical classification will be described later with reference to FIG.
- the operations of the first classification block 221 to the third classification block 223 will be described more specifically.
- the first classification block 221 outputs the label of the first classification from the feature quantity output from the third subpart 213 of the main network 201 and determines the first classification.
- the second classification block 222 outputs a second classification label from the feature quantity output from the fourth subpart 214 of the main network 201 and the first classification label output from the first classification block 221. , determines the second classification.
- the third classification block 223 combines the feature quantity output from the fifth subpart 215 of the main network 201 with the first classification label output from the first classification block 221 and the first classification block 222 output from the second classification block 222 . From the labels of the second classification, the label of the third classification is output to determine the third classification.
- the configuration is not limited to that shown in FIG. 2, and may be configured such that the classification result of a higher classifier is input to one or more lower classifiers.
- the classification label output from the first classification block 221 may be configured to be input to the third classification block 223 without being input to the second classification block 223 .
- sub-network 202 may be configured to output the second and/or third classification without the first classification.
- the classification prediction model 107 shown in FIG. 3 includes a main network 301 including multiple subparts (first subpart 311 to fifth subpart 315) and multiple classification blocks (first classification block 321 to third classification block 323). It consists of a neural network including a sub-network 302 . In sub-network 302 , the output of first classification block 321 is input to second classification block 322 and third classification block 323 , and the output of second classification block 322 is configured to input to third classification block 323 . .
- the main network 301 is a neural network based on the 19-layer version of the VGG network.
- first subpart 311 and second subpart 312 are common to first subpart 211 and second subpart 212
- third subpart 313 to fifth subpart 315 are common to third subpart 213.
- ⁇ 5 subpart 215 includes one more convolutional layer. This may result in a more accurate prediction of the resulting output classification as compared to the model shown in FIG. Since other configurations are the same as those of the model shown in FIG. 2, description thereof is omitted.
- the learning unit 103 sequentially applies a plurality of learning data sets generated by the data set generation unit 103 to the classification prediction model 107 configured as shown in FIGS. 2 and 3 to learn the model. That is, the learning unit 103 repeats learning of the classification prediction model 107 using the plurality of learning data sets. Then, the learning unit 103 stores the learned classification prediction model 107 in a storage unit such as the RAM 603 (FIG. 6).
- the classification prediction unit 102 applies the product image acquired by the acquisition unit 101 to the learned classification prediction model 107 stored in a storage unit such as the RAM 603 (FIG. 6), and predicts the classification of the product image for the product. .
- FIG. 4 shows a conceptual diagram of hierarchical classification as an example of product classification predicted by the classification prediction unit 102 .
- FIG. 4 shows an example of hierarchical classification of products 42 included in product images 41 .
- the product image 41 By applying the product image 41 to the classification prediction model 107, the first Classification to third classification are estimated hierarchically.
- the classification indicated in bold is predicted. That is, it is predicted that the first category is "men's fashion", the second category is “tops”, and the third category is "T-shirts".
- the main network 201 or the main network 301 in the classification prediction model 107, as a classification result, from the upper (coarse) first classification to the lower (subdivided) Three classification predictions are possible.
- the data set generation unit 103 generates a plurality of learning data sets used for learning the classification prediction model 107 from a plurality of learning data elements included in the learning data group 106. do.
- the learning unit 104 learns the classification prediction model 107 by sequentially using the plurality of learning data sets (in chronological order).
- the learning data group 106 is composed of learning data elements to which 10 different correct labels (classes) from "0" to "9" are assigned.
- Each data element for learning is composed of a plurality of sets each including a product image including the product and the same label (correct data) indicating the classification of the product. Therefore, one learning data element is given the same label.
- the labels are configured to indicate a classification having a hierarchical structure, and correspond to, for example, the labels attached to the third classification associated with the first and second classifications described above. do. Therefore, referring to FIG.
- label “0” corresponds to "T-shirt” associated with “men's fashion” and “tops”, and the learning data element with label “0”
- a set of an image and a label "0" is defined as one set, and a plurality of such sets are included.
- the data set generation unit 103 generates a plurality of learning data sets by applying predetermined generation rules (selection rules).
- selection rules generation rules according to this embodiment will be described. Note that this rule is only an example, and a plurality of learning data sets may be generated by another rule or method so that the number of learning data elements changes in order. Also, a plurality of learning data sets may be generated such that the learning data elements change randomly.
- the learning data elements to be added/deleted in rule (2) are randomly selected. That is, the learning data elements to be added are randomly selected from the learning data elements not included in the learning data set to be added, and the learning data elements to be deleted are selected from the learning data elements to be deleted. Randomly select from the set.
- FIG. 5 shows an example of a plurality of learning data sets generated by the data set generation unit 103 according to the generation rule.
- ten learning data sets 1-10 are shown.
- numbers surrounded by squares correspond to learning data elements labeled with the numbers.
- Learning data set 1 is an initial learning data set that follows rule (1), and in the example of FIG. 8, labels "1", “2”, “4", “6 , “8”, and “9” are randomly selected and generated.
- the learning data set 2 is a learning data set randomly selected from learning data elements not selected (not included in the learning data set 1) for the learning data set 1. It is a learning data set to which two learning data elements "3" and "7" are added.
- the learning data set 3 is a learning data set obtained by deleting one learning data element labeled "8" from the learning data elements included in the learning data set 2 with respect to the learning data set 2. be.
- Learning data set 4 and subsequent ones are also generated with additions and deletions according to the above generation rules.
- the number of learning data elements to be added is greater than the number of learning data to be deleted. 10 is generated.
- the learning data sets 1 to 10 are data sets in which the number of learning data elements increases or decreases in order.
- the data set generation unit 103 may generate the learning data set by alternately adding and deleting learning data elements. Further, the data set generation unit 103 may generate a learning data set by deleting or adding learning data elements after successive additions or deletions of learning data elements.
- the learning data sets 1 to 10 generated in this way are used (applied) to the classification prediction model 107 in chronological order to learn the classification prediction model 107. That is, training data set 1 is used at time t to learn the classification prediction model 107, and then training data set 2 is used at time t+1 to learn the classification prediction model 107. FIG. Such a learning process continues until the training data set 10 is used at time t+9 to learn the classification prediction model 107 . In this embodiment, the classification prediction model 107 is learned with different learning data sets in chronological order by such learning processing.
- the data set generation unit 103 generates the learning data set 1
- the learning unit 104 uses the data set to learn the classification prediction model 107
- the data set generation unit 103 generates the learning data set 2.
- the data may be generated, and the learning unit 104 may learn the classification prediction model 107 using the data, and such processing may be continued up to the training data set 10 .
- FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing apparatus 100 according to this embodiment.
- the information processing apparatus 100 according to this embodiment can be implemented on any single or multiple computers, mobile devices, or any other processing platform. Referring to FIG. 6, an example in which information processing apparatus 100 is implemented in a single computer is shown, but information processing apparatus 100 according to the present embodiment is implemented in a computer system including a plurality of computers. good. A plurality of computers may be interconnectably connected by a wired or wireless network.
- information processing apparatus 100 may include CPU 601 , ROM 602 , RAM 603 , HDD 604 , input section 605 , display section 606 , communication I/F 607 , and system bus 608 .
- Information processing apparatus 100 may also include an external memory.
- a CPU (Central Processing Unit) 601 comprehensively controls operations in the information processing apparatus 100, and controls each component (602 to 607) via a system bus 608, which is a data transmission path.
- a ROM (Read Only Memory) 602 is a non-volatile memory that stores control programs and the like necessary for the CPU 601 to execute processing.
- the program may be stored in a non-volatile memory such as a HDD (Hard Disk Drive) 604 or an SSD (Solid State Drive) or an external memory such as a removable storage medium (not shown).
- a RAM (Random Access Memory) 603 is a volatile memory and functions as a main memory, a work area, and the like for the CPU 601 . That is, the CPU 601 loads necessary programs and the like from the ROM 602 to the RAM 603 when executing processing, and executes the programs and the like to realize various functional operations.
- the HDD 604 stores, for example, various data and information necessary for the CPU 601 to perform processing using programs.
- the HDD 604 also stores various data, information, and the like obtained by the CPU 601 performing processing using programs and the like, for example.
- An input unit 605 is configured by a pointing device such as a keyboard and a mouse.
- a display unit 606 is configured by a monitor such as a liquid crystal display (LCD).
- the display unit 606 may function as a GUI (Graphical User Interface) by being configured in combination with the input unit 605 .
- GUI Graphic User Interface
- a communication I/F 607 is an interface that controls communication between the information processing apparatus 100 and an external device.
- a communication I/F 607 provides an interface with a network and executes communication with an external device via the network.
- Various data, various parameters, and the like are transmitted/received to/from an external device via the communication I/F 607 .
- the communication I/F 607 may perform communication via a wired LAN (Local Area Network) conforming to a communication standard such as Ethernet (registered trademark) or a dedicated line.
- the network that can be used in this embodiment is not limited to this, and may be configured as a wireless network.
- This wireless network includes a wireless PAN (Personal Area Network) such as Bluetooth (registered trademark), ZigBee (registered trademark), and UWB (Ultra Wide Band). It also includes a wireless LAN (Local Area Network) such as Wi-Fi (Wireless Fidelity) (registered trademark) and a wireless MAN (Metropolitan Area Network) such as WiMAX (registered trademark). Furthermore, wireless WANs (Wide Area Networks) such as LTE/3G, 4G, and 5G are included. It should be noted that the network may connect each device so as to be able to communicate with each other as long as communication is possible, and the communication standard, scale, and configuration are not limited to the above.
- At least some of the functions of the elements of the information processing apparatus 100 shown in FIG. 6 can be realized by the CPU 601 executing a program. However, at least some of the functions of the elements of the information processing apparatus 100 shown in FIG. 6 may operate as dedicated hardware. In this case, the dedicated hardware operates under the control of the CPU 601 .
- FIG. 7 shows a flowchart of the processing (learning phase) for learning the classification prediction model 107
- FIG. 8 shows classification prediction for product images (products included in product images) using the trained classification prediction model 107
- 4 shows a flowchart of processing (classification prediction phase).
- the processing shown in FIGS. 7 and 8 can be realized by the CPU 601 of the information processing apparatus 100 loading a program stored in the ROM 602 or the like into the RAM 603 and executing the program.
- FIG. 5 will be referred to.
- the dataset generation unit 103 generates the learning dataset 1 as the initial learning dataset. Subsequently, in S72, the data set generation unit 103 repeats addition and deletion of learning data elements starting from the learning data set 1, and generates learning data sets 2 to 10 in order.
- the learning unit 104 learns the classification prediction model 107 by sequentially using the learning data sets 1 to 10 generated in S71 and S72 in chronological order. That is, the learning unit 104 learns the classification prediction model 107 by sequentially using a plurality of learning data sets 1 to 10 generated such that the number of learning data elements dynamically changes in order. As a result, the learning data set that has been used for learning once is used again after changing the time, and a new learning data set is also used for learning.
- learning unit 104 stores classification prediction model 107 learned in S ⁇ b>73 as learned classification prediction model 107 in a storage unit such as RAM 603 .
- the acquisition unit 101 acquires a product image including a product to be classified and predicted. For example, when the operator of the information processing device 100 operates the information processing device 100 to access an arbitrary EC site and select a product image including an arbitrary product, the acquisition unit 101 acquires the product image. to get Further, the acquisition unit 101 can acquire a product image by acquiring a product image or a URL indicating the product image transmitted from an external device such as a user device.
- the number of products targeted for classification prediction included in one product image is not limited to one, and the product image may include a plurality of products targeted for classification prediction.
- the classification prediction unit 102 inputs the product image acquired by the acquisition unit 101 to the classification prediction model 107, predicts and determines the hierarchical classification of the product.
- An example of the classification prediction model 107 is as shown in FIG. 2 or FIG. is used to output a hierarchical classification for the product.
- the main network is composed of a plurality of subparts (extractors) that extract each of the plurality of feature values, and the subnetwork includes a plurality of classification blocks that output a classification of the product from each of the plurality of feature values. (classifiers), and a higher classifying block is configured to have connections to one or more lower classifying blocks.
- the classification prediction unit 102 classifies the products in the product image acquired in S81 as a hierarchical classification as shown in FIG.
- the 2nd classification and the 3rd classification are predicted and determined.
- the output unit 105 outputs the classification result (classification result) predicted and determined by the classification prediction unit 102 in S82.
- the output unit 105 may display the classification result on the display unit 606 of the information processing apparatus 100 or transmit it to an external device such as a user device via the communication I/F 607 .
- the acquisition unit 101 acquires a plurality of product regions (Region of Interest) including each of the plurality of products by, for example, a known image processing technique, and the classification prediction unit 102 can be output to Then, the classification prediction unit 102 can perform the processing of S82 on each product area (partial image) to predict and determine the classification for each product.
- the output unit 105 may output the (hierarchical) classification for each product separately as classification results, or may output them as one classification result.
- FIG. 9 shows an example of the classification result output by the output unit 105 as a classification result 90.
- the classification result 90 may be displayed on the display unit 606 of the information processing apparatus 100, or may be transmitted to an external device such as a user device via the communication I/F 607 and displayed on the external device.
- a product image 91 is a product image including a product to be classified and predicted. This is the image selected after Also, the product image 91 may be an image transmitted from an external device such as a user device.
- the classification prediction unit 102 of the information processing apparatus 100 applies the product image 91 to the learned classification prediction model 107 shown in FIGS. 2 and 3 to predict a hierarchical classification 93 for the product 92 .
- the hierarchical classification 93 is similar to the classification shown in FIG.
- the output unit 105 can combine the hierarchical classification 93 and the product image 91 to form a classification result 90 and output it.
- Table 1 shows simulation results of performance evaluation of classification prediction processing. Specifically, Table 1 shows the correct answer rate (accuracy) for the correct data when the classification prediction process is performed using each of the plurality of learning models. In this simulation, the Fashion-MNIST data set that can be obtained on the Internet or the like was used as the learning data group 106 .
- the classification prediction model 107 described with reference to FIG. 3 in the first embodiment is learned by sequentially applying the learning data sets 1 to 10 described with reference to FIG. A model was used (present invention model).
- a learning model composed only of the main network 301, the third classification block 223, and the third output block 243 of the classification prediction model 107 shown in FIG. 3 is shown in FIG. A model trained by once applying the training data set with labels “0” to “9” described above was used as the first comparison model. The learning model can output only the third classification.
- the classification prediction model 107 described in the first embodiment with reference to FIG. A model trained by sequentially applying the six training data sets generated by the method was used as the second comparison model.
- the model of the present invention has a high accuracy rate in all of the first to third classifications. Also, it can be seen that the present invention model has a higher accuracy rate in the third classification than the first comparison model. This is due to the synergy of using the classification prediction model 107 described in the first embodiment and learning the classification prediction model 107 using a plurality of time-series learning data sets described in the present embodiment. It is an effect.
- a plurality of learning data sets are applied to the classification prediction model 107 in chronological order to make the classification prediction model 107 learn.
- a learning model is constructed, making it possible to accurately predict increasingly complex classifications.
- the classification prediction model 107 has skip connections for the outputs of the first classification block 211 to the third classification block 223 and the outputs of the first classification block 311 to the third classification block 323. With applied configuration. With such a configuration, high-level classification results are combined with low-level classification results to predict low-level classifications, making it possible to accurately predict hierarchical classification of products.
- the learning process is performed by applying a plurality of learning data sets described with reference to FIG. 5 to a learning model configured by/including a convolutional neural network (CNN).
- CNN convolutional neural network
- the present embodiment it is possible to construct a learning model that targets increasingly complex classification tasks. can be predicted with high accuracy. As a result, for example, the predictability of the trend of products purchased on the EC site and the ease of product selection by the user are improved.
- 100 Information processing device
- 101 Acquisition unit
- 102 Classification prediction unit
- 103 Data set generation unit
- 104 Learning unit
- 105 Output unit
- 106 Learning data group
- 107 Classification prediction model
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP21943318.2A EP4216114B1 (en) | 2021-12-07 | 2021-12-07 | Information processing device, information processing method, and program |
| US18/009,831 US12327403B2 (en) | 2021-12-07 | 2021-12-07 | Information processing apparatus, information processing method, and non-transitory computer readable medium |
| JP2022561532A JP7445782B2 (ja) | 2021-12-07 | 2021-12-07 | 情報処理装置、情報処理方法、およびプログラム |
| PCT/JP2021/044858 WO2023105610A1 (ja) | 2021-12-07 | 2021-12-07 | 情報処理装置、情報処理方法、およびプログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/044858 WO2023105610A1 (ja) | 2021-12-07 | 2021-12-07 | 情報処理装置、情報処理方法、およびプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023105610A1 true WO2023105610A1 (ja) | 2023-06-15 |
Family
ID=86729837
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/044858 Ceased WO2023105610A1 (ja) | 2021-12-07 | 2021-12-07 | 情報処理装置、情報処理方法、およびプログラム |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12327403B2 (https=) |
| EP (1) | EP4216114B1 (https=) |
| JP (1) | JP7445782B2 (https=) |
| WO (1) | WO2023105610A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7788115B1 (ja) | 2024-07-08 | 2025-12-18 | ソフトバンク株式会社 | 情報処理装置及び情報処理方法 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010231768A (ja) * | 2009-03-27 | 2010-10-14 | Mitsubishi Electric Research Laboratories Inc | マルチクラス分類器をトレーニングする方法 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6418211B2 (ja) | 2016-09-15 | 2018-11-07 | オムロン株式会社 | 識別情報付与システム、識別情報付与装置、識別情報付与方法及びプログラム |
| JP7357551B2 (ja) | 2020-01-17 | 2023-10-06 | 株式会社日立ソリューションズ・クリエイト | 画像判定システム |
-
2021
- 2021-12-07 US US18/009,831 patent/US12327403B2/en active Active
- 2021-12-07 WO PCT/JP2021/044858 patent/WO2023105610A1/ja not_active Ceased
- 2021-12-07 EP EP21943318.2A patent/EP4216114B1/en active Active
- 2021-12-07 JP JP2022561532A patent/JP7445782B2/ja active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010231768A (ja) * | 2009-03-27 | 2010-10-14 | Mitsubishi Electric Research Laboratories Inc | マルチクラス分類器をトレーニングする方法 |
Non-Patent Citations (4)
| Title |
|---|
| KUZBORSKIJ ILJA, ORABONA FRANCESCO, CAPUTO BARBARA: "Transfer Learning Through Greedy Subset Selection ", ARXIV.ORG, 6 August 2010 (2010-08-06), XP093069973 * |
| LIU YUANJUN; LUO GAOFENG; DONG FENG: "Convolutional Network Model using Hierarchical Prediction and its Application in Clothing Image Classification", 2019 3RD INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA), IEEE, 11 October 2019 (2019-10-11), pages 157 - 160, XP033867912, DOI: 10.1109/ICDSBA48748.2019.00041 * |
| See also references of EP4216114A4 |
| SEO YIANSHIN KYUNG-SHIK: "Hierarchical convolutional neural networks for fashion image classification", EXP. SYS. APPL., vol. 116, 2019, pages 328 - 329 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7788115B1 (ja) | 2024-07-08 | 2025-12-18 | ソフトバンク株式会社 | 情報処理装置及び情報処理方法 |
| JP2026009771A (ja) * | 2024-07-08 | 2026-01-21 | ソフトバンク株式会社 | 情報処理装置及び情報処理方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4216114B1 (en) | 2025-08-13 |
| JPWO2023105610A1 (https=) | 2023-06-15 |
| JP7445782B2 (ja) | 2024-03-07 |
| EP4216114A4 (en) | 2024-02-14 |
| EP4216114A1 (en) | 2023-07-26 |
| US12327403B2 (en) | 2025-06-10 |
| US20240135693A1 (en) | 2024-04-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7715250B2 (ja) | 棚割支援装置、棚割支援方法、および、プログラム | |
| EP3416105A1 (en) | Information processing method and information processing device | |
| US10540257B2 (en) | Information processing apparatus and computer-implemented method for evaluating source code | |
| CN112598080A (zh) | 一种基于注意力的宽度图卷积神经网络模型及其训练方法 | |
| US20250181644A1 (en) | Systems, methods, computing platforms, and storage media for comparing non-adjacent data subsets | |
| US20200175727A1 (en) | Color Handle Generation for Digital Image Color Gradients using Machine Learning | |
| JP2020198080A (ja) | 1以上のプロセスを監視しセンサデータを提供する複数のセンサを含むシステムのための方法 | |
| DE102022002707A1 (de) | Maschinell lernende Konzepte zur Schnittstellenmerkmalseinführung über Zeitzonen oder grafische Bereiche hinweg | |
| JP7445782B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| JP7376730B1 (ja) | 情報処理装置、情報処理方法、プログラム、および学習モデル | |
| Sha et al. | Development of a regression-based method with case-based tuning to solve the due date assignment problem | |
| KR102759298B1 (ko) | 인공지능 기반의 사용자 스케치 인식을 통한 인포그래픽 라이브러리 추천 장치 및 방법 | |
| JP7087220B1 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| JP2022081410A (ja) | ネットワーク・ノードのクラスタリング | |
| US20250190267A1 (en) | Resource reallocation during a workload runtime | |
| Halnaut et al. | Compact visualization of DNN classification performances for interpretation and improvement | |
| Ignatius et al. | Data Analytics and Reporting API–A Reliable Tool for Data Visualization and Predictive Analysis | |
| JP7676662B2 (ja) | 情報処理装置、情報処理方法、および情報処理プログラム | |
| EP4195135B1 (en) | Information processing device, information processing method, information processing system, and program | |
| Sheth et al. | New Techniques for Intelligent Offset Data Analysis Using Artificial Intelligence | |
| Sikarwar et al. | Exploring Data Visualization Techniques for Large Datasets | |
| US20220335293A1 (en) | Method of optimizing neural network model that is pre-trained, method of providing a graphical user interface related to optimizing neural network model, and neural network model processing system performing the same | |
| US20260024242A1 (en) | Heatmap in low-code integration environment | |
| Joseph et al. | Explainable real-time sign language to text translation | |
| CN114186695B (zh) | 决策树配置方法、装置、计算机设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022561532 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18009831 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 2021943318 Country of ref document: EP Effective date: 20221206 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 18009831 Country of ref document: US |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2021943318 Country of ref document: EP |