WO2023112199A1 - 情報処理装置、情報処理方法、およびプログラム - Google Patents

情報処理装置、情報処理方法、およびプログラム Download PDF

Info

Publication number
WO2023112199A1
WO2023112199A1 PCT/JP2021/046260 JP2021046260W WO2023112199A1 WO 2023112199 A1 WO2023112199 A1 WO 2023112199A1 JP 2021046260 W JP2021046260 W JP 2021046260W WO 2023112199 A1 WO2023112199 A1 WO 2023112199A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
information processing
attribute values
product
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/046260
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
フアレス ホスエ クエバス
ハサン アルサラン
ラジャセイカル サナガヴァラプ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rakuten Group Inc
Original Assignee
Rakuten Group Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rakuten Group Inc filed Critical Rakuten Group Inc
Priority to US18/008,730 priority Critical patent/US12361682B2/en
Priority to PCT/JP2021/046260 priority patent/WO2023112199A1/ja
Priority to JP2022573687A priority patent/JP7265688B1/ja
Priority to EP21943330.7A priority patent/EP4220551B1/en
Publication of WO2023112199A1 publication Critical patent/WO2023112199A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the present invention relates to an information processing device, an information processing method, and a program, and particularly to technology for predicting all attributes related to objects.
  • E-commerce/e-commerce which sells products using the Internet
  • EC Electronic Commerce
  • Japanese Patent Application Laid-Open No. 2002-200001 discloses a technique for storing a preset attribute of a product in association with an image of the product.
  • Patent Literature 1 discloses a technique for storing a preset product attribute in association with an image of the product. , the processing load increases. One effective technique for avoiding such an increase in processing load is to efficiently predict all attributes associated with each product.
  • the present invention has been made in view of the above problems, and aims to provide technology for efficiently predicting all attributes related to objects such as products.
  • an information processing apparatus is an acquisition unit that acquires an object image including an object; and by applying the object image acquired by the acquisition unit to a learning model, prediction means for predicting any attribute associated with said object, wherein said learning model is a learning model common to a plurality of different objects including said object, and a plurality of attributes associated with said plurality of different objects. and an output layer that concatenates and outputs the plurality of attribute values output from the plurality of estimation layers.
  • the learning model is composed of a first part and a second part.
  • the first part receives the object image and outputs a feature vector representing the feature of the object.
  • the part has the plurality of estimation layers and the output layer, and the plurality of estimation layers receives the feature vector as input and outputs a value indicating the object type of the object and the plurality of attribute values, and the output
  • the layer may concatenate and output a value indicating an object type of the object output from the plurality of estimation layers and the plurality of attribute values.
  • the prediction means can predict all the attributes from the plurality of attribute values output from the second part of the learning model.
  • one or more valid attribute values among the plurality of attribute values are set in advance according to the value indicated by the object type, and the prediction means selects from the plurality of attribute values , obtaining the one or more valid attribute values according to the value indicated by the object type, and predicting an attribute corresponding to the one or more valid attribute values as the every attribute.
  • the prediction means can predict all attributes associated with each of the plurality of objects.
  • the object image may be image data generated from Y elements, Cb elements, and Cr elements in data DCT-transformed from a YCbCr image.
  • the image data may be data in which the Y element, the Cb element, and the Cr element of the DCT-transformed data are aligned and connected.
  • the information processing device may further include output means for outputting all the attributes predicted by the prediction means.
  • one aspect of the information processing method is an acquisition step of acquiring an object image including an object, and applying the object image acquired by the acquisition means to a learning model, a prediction step of predicting any attribute associated with said object, wherein said learning model is a learning model common to a plurality of different objects including said object, and a plurality of attributes associated with said plurality of different objects; and an output layer that concatenates and outputs the plurality of attribute values output from the plurality of estimation layers.
  • an information processing program for causing a computer to execute information processing, the program instructing the computer to acquire an object image including an object. and a prediction process of predicting all attributes related to the object by applying the object image acquired by the acquisition means to a learning model
  • the learning model is a learning model common to a plurality of different objects including the object, and a plurality of estimation layers for estimating a plurality of attribute values for a plurality of attributes associated with the plurality of different objects; and an output layer that concatenates and outputs a plurality of attribute values output from the layer.
  • FIG. 1 is a block diagram showing an example of functional configuration of an information processing apparatus according to an embodiment of the present invention.
  • FIG. 2(a) shows a conceptual diagram of the configuration of learning data
  • FIG. 2(b) shows specific examples of the first to N-th attributes.
  • FIG. 3(a) shows an example of the configuration of learning data
  • FIG. 3(b) shows an example of attribute values.
  • FIG. 4A shows an example architecture of the first part of the attribute prediction model.
  • FIG. 4B shows a conceptual diagram of the processing flow in the input layer of FIG. 2A.
  • FIG. 5A shows an example architecture for the second part of the attribute prediction model.
  • FIG. 5B shows another configuration example of the composite attribute vector shown in FIG. 5A.
  • FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing device according to the embodiment of the present invention.
  • FIG. 7 is a flow chart showing processing executed by the information processing apparatus according to the embodiment of the present invention. It is a figure which shows an example of an attribute prediction result. It is a figure explaining another example of an attribute prediction result.
  • the information processing apparatus 100 acquires an image (product image) including a product and predicts all attributes uniquely related to the product.
  • the attribute can be an index when a user purchases a product, such as the visual characteristics of the product or the target gender.
  • predicting attributes related to a product for a product
  • This embodiment can also be applied when predicting the attribute for
  • FIG. 1 shows an example of the functional configuration of an information processing device 1 according to this embodiment.
  • the learning model storage unit 105 stores attribute prediction models 106 .
  • the acquisition unit 101 acquires product images.
  • the acquisition unit 101 may acquire the product image by an input operation by the user (operator) via the input unit 605 (FIG. 6), or by the user's operation from the storage unit (ROM 602 or RAM 603 in FIG. 6). You may Moreover, the acquisition part 101 may acquire the goods image received from the external device via communication I/F607 (FIG. 6).
  • the product image may be an image expressing colors with three colors, red (R), green (G), and blue (B).
  • the product image is an image expressed by luminance (Y (Luma)) representing brightness and color components (Cb, Cr (Chroma)) (an image YCbCr converted from an RGB image (JPEG image/YCbCr image)) may be
  • the product image may be data (coefficient values) obtained by DCT (Discrete Cosine Transform) conversion (compression) from a YCbCr image by an encoding unit (not shown) provided in the information processing apparatus 100 .
  • DCT Discrete Cosine Transform
  • the acquisition unit 101 may be configured to acquire data as a product image that has undergone (YCbCr conversion and) DCT conversion by a device other than the information processing device 100 .
  • the acquisition unit 101 outputs the acquired product image to the attribute prediction unit 102 .
  • the attribute prediction unit 102 applies the product image acquired by the acquisition unit 101 to an attribute prediction model (neural network model) 106 to predict all attributes related to the product included in the product image (attribute prediction model described later). 106 are used to predict all attributes associated with the product). Processing by the attribute prediction unit 102 will be described later.
  • attribute prediction model neural network model
  • the learning unit 103 causes the attribute prediction model 106 to learn, and stores the learned attribute prediction model 106 in the learning model storage unit 105.
  • the configuration of learning (teacher) data used by the learning unit 103 for learning the attribute prediction model 106 will be described with reference to FIGS. 2 and 3.
  • FIG. 1 The configuration of learning (teacher) data used by the learning unit 103 for learning the attribute prediction model 106 will be described with reference to FIGS. 2 and 3.
  • FIG. 2(a) shows a conceptual diagram of the configuration of learning data.
  • learning data includes a product image, a product type (object type) of a product included in the product image, and a plurality of attributes (object type) related to the product. 1 to N-th attributes (N>1)).
  • each attribute corresponds to the visual characteristics of the product (hereinafter referred to as "attribute category”), and the attribute value of each attribute is the type of the attribute category (hereinafter referred to as "attribute type").
  • the plurality of attribute values associated with the product may indicate values indicative of attribute types for each of the plurality of attribute categories.
  • product types include, but are not limited to, “shirts,” “skirts,” “jacket,” “caps,” “sneakers,” and “boots.”
  • product types include, but are not limited to, “shirts,” “skirts,” “jacket,” “caps,” “sneakers,” and “boots.”
  • product is assumed to be goods worn by a person, but the “product” is not limited to this.
  • merchandise is not limited to goods/merchandise handled on electronic commerce sites.
  • FIG. 2(b) shows specific examples of the first to Nth attributes as attribute categories.
  • the first to Nth attributes include, for example, "length” and “neck type”, which correspond to the first attribute and the second attribute, respectively.
  • Attribute categories include, for example, “sleeve length”, “sleeve type”, “heel height (of shoes)”, “gender (targeted by the product)”, “color”, etc. obtain. In this way, it is assumed that which attribute category corresponds to which attribute is mapped in advance using a predetermined table or the like.
  • the attribute category can be valid or invalid depending on the product type, and the relationship is set in advance. For example, when the product type is "shirt", the attribute category "length” is valid, but "heel height” is invalid. At this time, a valid attribute value (valid label) is set for the attribute value corresponding to the attribute category "length”, but an invalid attribute value (dummy value) is set for the attribute category "heel height”. / invalid label) is set.
  • which of the first to Nth attribute values indicates a valid attribute value and which indicates an invalid attribute value is preset according to the product type. It is assumed that there is In this embodiment, the valid attribute value is set to a value that indicates the attribute type for the attribute category (see FIGS. 3(a) and 3(b)), and the invalid attribute value is set to "-1".
  • Fig. 3(a) shows an example of the configuration of learning data.
  • each product image is given a product image number in order to identify the product image.
  • the learning data includes, for a product image, a value indicating the product type of the product included in the product image and a value indicating the attribute type of the attribute category of the product (the first attribute value to the first attribute value). N attribute values).
  • the output unit 104 outputs all attribute information (attribute prediction results) predicted by the attribute prediction unit 102 .
  • the output unit 104 may output the attribute prediction result in association with the product image acquired by the acquisition unit 102, for example.
  • the output unit 104 may display the attribute prediction result on the display unit 606 (FIG. 6). Further, when the product image targeted for attribute prediction is acquired from an external device such as a user device, the output unit 104 is displayed on the display unit of the external device via the communication I/F 607 (FIG. 6). You may transmit to the said external device.
  • the attribute prediction unit 102 applies the product image acquired by the acquisition unit 101 to the attribute prediction model 106 and performs supervised learning to generate (extract) a shared feature vector, and from the vector, Predict all of the attributes associated with a product.
  • a feature vector represents a value/information representing a feature.
  • the attribute prediction model 106 is a common learning model for a plurality of different products (product types) that can be included in product images, and is characterized by being shared and used. For example, if the product types that can be included in the product image are “shirt”, “skirt”, “jacket”, “cap”, “sneakers”, and “boots”, the attribute prediction model 106 may at least include “shirt”, It is configured to predict all of the attributes associated with each of 'skirt', 'jacket', 'cap', 'sneakers' and 'boots'.
  • the attribute prediction model 106 consists of a first part and a second part.
  • the first part is a layer (shared feature vector extraction layers) for generating (extracting) shared feature vectors from product images.
  • the second part is multiple layers (attribute specific (estimation) layers) for predicting and outputting a value indicating the product type and multiple attribute values from the generated shared feature vector. .
  • FIG. 4A shows an example architecture of the first part of attribute prediction model 106 .
  • a first part of the attribute prediction model 106 is a learning model for machine learning applying an image recognition model.
  • the first part of the attribute prediction model 106 is composed of an input layer L41 and multiple layers (first layer L42 to N-th layer L45).
  • the first layer L42 to the N-th layer L45 are composed of an intermediate layer including a plurality of convolution layers and an output layer for classifying/predicting classes, and generate and output a shared feature vector 42 from the input image 41. do.
  • an intermediate layer for example, EfficientNet by Google Research is used.
  • EfficientNet When EfficientNet is used, each convolutional layer uses MBConv (Mobile Inverted Bottleneck Convolution).
  • the intermediate layer extracts the feature map, and the output layer is configured to reduce the dimensionality from the map to generate the final shared feature vector 42 .
  • the number of convolution layers is not limited to a specific number.
  • FIG. 4B shows a conceptual diagram of the flow of processing in the input layer L41 of FIG. 4A.
  • the attribute prediction unit 102 is configured to use, as the input image 41, data that has been YCbCr-converted and DCT-converted from an image expressed in RGB.
  • the conversion process may be performed by the acquisition unit 101 or may be performed by a device other than the information processing device 100 .
  • DCT-Y411, DCT-Cb412, and DCT-Cr413 each have components of [64,80,80], [64,40,40], and [64,40,40], and each dimension (Dimensionality ) represents [number of channels (n_channels), width (width), height (height)].
  • the attribute prediction unit 102 performs upsampling processing on DCT-Cb412 and DCT-Cr413 to generate DCT-Cb414 and DCT-Cr415.
  • attribute prediction section 102 concatenates DCT-Y 411 , DCT-Cb 414 and DCT-Cr 415 for each channel to generate concatenated DCT data 416 .
  • Concatenated DCT data 416 is input to the next first layer L42. That is, the sizes of the Y, Cb, and Cr elements are adjusted to generate concatenated DCT data 416 (image data).
  • the attribute prediction unit 102 generates a shared feature vector 42 from the concatenated DCT data 416 input to the first layer L42 via the first layer L42 to the Nth layer L45 as shown in FIG. 4A.
  • Shared feature vector 42 is a feature vector shared by the second part of attribute prediction model 106 .
  • FIG. 5A shows an example architecture of the second part of the attribute prediction model 106.
  • the second part of the attribute prediction model 106 includes a plurality of layers (a plurality of estimation layers) provided in parallel that output attribute values for each of a plurality of attributes related to a product, with a shared feature vector as input, and the and an output layer that concatenates and outputs a plurality of attribute values output from a plurality of layers (layer branches).
  • the second part includes, as multiple layers, sublayers L511 to L514 that predict the product type (product type value 51) from the shared feature vector, and attribute values of multiple attributes for the product (first attribute value 52, second attribute value 53, . It is composed of a connecting unit 55 that connects and outputs a plurality of output attribute values. With such a configuration, the second part predicts and outputs the value indicating the product type and each attribute value of the plurality of attribute values.
  • the sub-layers that predict product type values 51 are, for example, a convolutional layer (Conv) L511, a batch normalization (Batch Normalization) L512, an activation function layer (ReLu) L513, and a prediction (output) layer, as shown in FIG. 5A. It consists of L514. Sublayers for predicting the first attribute value 52 to the Nth attribute value 54 are similarly constructed.
  • the attribute prediction unit 102 After predicting the product type value 51 and the first attribute value 52 to the Nth attribute value 52, the attribute prediction unit 102 concatenates the product type value 51 and the first attribute value 32 to the Nth attribute value 54. are embedded in a common feature space, and a composite feature vector (composite attribute value) 56 is generated on the feature space. As shown in FIG. 5A, the composite attribute vector 56 is connected so that the first attribute value 52 to the Nth attribute value 54 continue in order starting from the product type value 51 . note that.
  • the configuration is one form representing a plurality of output attribute values, and is not limited to a specific configuration.
  • the attribute prediction unit 102 After generating the composite attribute vector 56 , the attribute prediction unit 102 reads (decodes) and acquires the product type value 51 located at the beginning of the composite attribute vector 56 . Then, the attribute prediction unit 102 reads out and acquires only attribute values (that is, valid attribute values) corresponding to the acquired product type value 51 from the composite attribute vector 56 .
  • the attribute prediction unit 102 can read the first attribute value 52 and not read the second attribute value 53. .
  • the attribute prediction unit 102 similarly reads the attribute value (valid attribute value) corresponding to the product type value 51 for the third attribute value 54 and subsequent ones, and does not read the attribute value (invalid attribute value) that does not correspond to the product type value 51. At this time, the read process can be completed.
  • the attribute prediction unit 102 reads only the attribute values that are valid for the product type value 51 and does not read invalid attribute values that do not correspond to the product type value. and the processing load is low.
  • the attribute prediction unit 102 generates attribute information corresponding to one or more read attribute values as an attribute prediction result. For example, the attribute prediction unit 102 acquires the name of the attribute type corresponding to each of the read one or more attribute values from the correspondence table shown in FIG. It can be generated as an attribute prediction result together with the type.
  • a plurality of product regions (Region of Interest) including each of the plurality of products can be used as the input image (DCT-transformed image data) of the attribute prediction model 106.
  • the attribute prediction unit 102 applies the plurality of product regions to the attribute prediction model 106, and predicts the product type value and the first to Nth attribute values for the plurality of product regions. are connected and embedded in a common feature space to generate a composite feature vector (composite attribute value) on the feature space.
  • FIG. 5B shows a configuration example of the composite feature vector generated in this way.
  • FIG. 5B shows a configuration example of a composite feature vector output from the connecting unit 55 as an output layer when the number of products is n.
  • the output layer concatenates and outputs attribute values for each of the plurality of attributes associated with each of the plurality of products.
  • the attribute prediction unit 102 corresponds to each of the product type values (1) to (n) (that is, in the case of the product type value (1), the first attribute value (1) to the Nth attribute value (n)), the attribute value can be read, and the read process can be completed without reading the non-corresponding attribute value. Accordingly, the information processing apparatus 100 can predict all attributes related to each of the plurality.
  • FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing apparatus 100 according to this embodiment.
  • the information processing apparatus 100 according to this embodiment can be implemented on any single or multiple computers, mobile devices, or any other processing platform. Referring to FIG. 6, an example in which information processing apparatus 100 is implemented in a single computer is shown, but information processing apparatus 100 according to the present embodiment is implemented in a computer system including a plurality of computers. good. A plurality of computers may be interconnectably connected by a wired or wireless network.
  • information processing apparatus 100 may include CPU 601 , ROM 602 , RAM 603 , HDD 604 , input section 605 , display section 606 , communication I/F 607 , and system bus 608 .
  • Information processing apparatus 100 may also include an external memory.
  • a CPU (Central Processing Unit) 601 comprehensively controls operations in the information processing apparatus 100, and controls each component (602 to 607) via a system bus 608, which is a data transmission path.
  • a ROM (Read Only Memory) 602 is a non-volatile memory that stores control programs and the like necessary for the CPU 601 to execute processing.
  • the program may be stored in a non-volatile memory such as a HDD (Hard Disk Drive) 604 or an SSD (Solid State Drive) or an external memory such as a removable storage medium (not shown).
  • a RAM (Random Access Memory) 603 is a volatile memory and functions as a main memory, a work area, and the like for the CPU 601 . That is, the CPU 601 loads necessary programs and the like from the ROM 602 to the RAM 603 when executing processing, and executes the programs and the like to realize various functional operations.
  • the HDD 604 stores, for example, various data and information necessary for the CPU 601 to perform processing using programs.
  • the HDD 604 also stores various data, information, and the like obtained by the CPU 601 performing processing using programs and the like, for example.
  • An input unit 605 is configured by a pointing device such as a keyboard and a mouse.
  • a display unit 606 is configured by a monitor such as a liquid crystal display (LCD).
  • the display unit 606 may function as a GUI (Graphical User Interface) by being configured in combination with the input unit 605 .
  • GUI Graphic User Interface
  • a communication I/F 607 is an interface that controls communication between the information processing apparatus 100 and an external device.
  • a communication I/F 607 provides an interface with a network and executes communication with an external device via the network.
  • Various data, various parameters, and the like are transmitted/received to/from an external device via the communication I/F 607 .
  • the communication I/F 607 may perform communication via a wired LAN (Local Area Network) conforming to a communication standard such as Ethernet (registered trademark) or a dedicated line.
  • the network that can be used in this embodiment is not limited to this, and may be configured as a wireless network.
  • This wireless network includes a wireless PAN (Personal Area Network) such as Bluetooth (registered trademark), ZigBee (registered trademark), and UWB (Ultra Wide Band). It also includes a wireless LAN (Local Area Network) such as Wi-Fi (Wireless Fidelity) (registered trademark) and a wireless MAN (Metropolitan Area Network) such as WiMAX (registered trademark). Furthermore, wireless WANs (Wide Area Networks) such as LTE/3G, 4G, and 5G are included. It should be noted that the network connects each device so as to be able to communicate with each other, and the communication standard, scale, and configuration are not limited to those described above.
  • At least some of the functions of the elements of the information processing apparatus 100 shown in FIG. 6 can be realized by the CPU 601 executing a program. However, at least some of the functions of the elements of the information processing apparatus 100 shown in FIG. 6 may operate as dedicated hardware. In this case, the dedicated hardware operates under the control of the CPU 601 .
  • FIG. 7 shows a flowchart of processing executed by the information processing apparatus 100 according to this embodiment.
  • the processing shown in FIG. 6 can be realized by the CPU 601 of the information processing apparatus 100 loading a program stored in the ROM 602 or the like into the RAM 603 and executing the program.
  • the acquisition unit 101 acquires product images including products whose attributes are to be predicted. For example, when the operator of the information processing apparatus 100 operates the information processing apparatus 100 to access an arbitrary e-commerce site and select a product image including an arbitrary product, the acquisition unit 101 Get product images. Further, the acquisition unit 101 can acquire a product image by acquiring a product image or a URL indicating the product image transmitted from an external device such as a user device.
  • the number of products targeted for attribute prediction included in one product image is not limited to one, and the product image may include a plurality of products targeted for attribute prediction.
  • the attribute prediction unit 102 inputs the product image acquired by the acquisition unit 101 to the attribute prediction model 106 to generate the shared feature vector 42 (see FIG. 4A). Subsequently, in S73, the attribute prediction unit 102 receives as input the shared feature vector 42 generated in S72 and generates a composite attribute vector (composite attribute value) 56 (see FIGS. 5A and 5B). In S74, the attribute prediction unit 102 predicts and acquires all of the multiple attributes related to the product from the composite attribute vector 56 generated in S73. The prediction processing is as described above.
  • the acquisition unit 102 acquires a plurality of product regions (Region of Interest) including each of the plurality of products by, for example, a known image processing technique, and the attribute prediction unit 102 can be output to Then, the attribute prediction unit 102 can perform the processing of S72 to S74 on each product area (partial image).
  • a product region Region of Interest
  • the attribute prediction unit 102 can perform the processing of S72 to S74 on each product area (partial image).
  • FIG. 8 shows an attribute prediction result 81 for a product image 80 as an example of the attribute prediction result.
  • the attribute prediction result 80 may be displayed on the display unit 606 of the information processing apparatus 100, or may be transmitted to an external device such as a user device via the communication I/F 607 and displayed on the external device.
  • the product image 81 is a product image including a product whose attribute is to be predicted. This is an image. Also, the product image 80 may be an image transmitted from an external device such as a user device.
  • the product type is predicted to be "jacket” for the product image 80, and the attributes (attribute types for each attribute category) are "Length: to the hips", "Neck type: round ', 'Sleeve Length: To Wrist', 'Sleeve Style: General', 'Collar Type: Hood', 'Close: Full', 'Close Type: Chuck', 'Gender: Male', 'Color : Gray” is predicted.
  • the attribute prediction result 81 all the attributes related to the product (product type: "jacket") included in the product image 80 are displayed. This allows the operator or user who receives the attribute prediction result to better understand the features of the product.
  • FIGS. 9A to 9C are diagrams for explaining another example of attribute prediction results.
  • a product image 90 includes a plurality of products (a cap and a shirt), and the acquisition unit 101 uses, for example, a known image processing technique to obtain a plurality of product areas ( Region of Interest) 91, 92 are obtained (FIG. 9(b)).
  • the acquisition unit 101 may be configured to acquire a plurality of product areas 91 and 92 by machine learning using a YOLO (You only look once) learning model.
  • the attribute prediction unit 102 performs attribute prediction processing on the product regions (partial images) 91 and 92, respectively.
  • FIG. 9(c) shows the attribute prediction result 93 for the product areas 91 and 92 included in the product image 80. Specifically, for “cap”, “cap type: baseball”, “gender: male”, and “color: white” are predicted. Neck type: round”, “sleeve length: short”, “gender: unisex”, and “color: gray”. As in the example of FIG. 8, the attribute prediction result 93 may be displayed on the display unit 606 of the information processing apparatus 100, transmitted to an external device such as a user device via the communication I/F 607, and may be displayed in
  • the information processing apparatus 100 predicts all related attributes of one or more products included in the product image by machine learning, and the learning model used in the machine learning is It is a common learning model for all possible products. Further, the information processing apparatus 100 is configured to predict only attributes related to products and not predict attributes not related to products. This makes it possible to predict all relevant attributes for each product using a single learning model, enabling efficient attribute prediction.
  • the product image input to the learning model is image data (coefficient values) DCT-transformed from the YCbCr image.
  • the composite feature vector 311 is generated from four feature vectors, but the number of combined feature vectors is not limited to four.
  • a composite feature vector 311 may be generated from the second feature vector 302 and the color feature vector 304, and a similar image may be retrieved from the composite feature vector 311.
  • a similar image may be retrieved from a composite feature vector 311 that combines other feature vectors generated by machine learning.
  • 10 user device
  • 100 information processing device
  • 101 acquisition unit
  • 102 attribute prediction unit
  • 103 learning unit
  • 104 output unit
  • 105 learning model storage unit
  • 106 attribute prediction model 106

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
PCT/JP2021/046260 2021-12-15 2021-12-15 情報処理装置、情報処理方法、およびプログラム Ceased WO2023112199A1 (ja)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US18/008,730 US12361682B2 (en) 2021-12-15 2021-12-15 Information processing apparatus, information processing method, and non-transitory computer readable medium
PCT/JP2021/046260 WO2023112199A1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、およびプログラム
JP2022573687A JP7265688B1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、およびプログラム
EP21943330.7A EP4220551B1 (en) 2021-12-15 2021-12-15 Information processing device, information processing method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/046260 WO2023112199A1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、およびプログラム

Publications (1)

Publication Number Publication Date
WO2023112199A1 true WO2023112199A1 (ja) 2023-06-22

Family

ID=86100872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/046260 Ceased WO2023112199A1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、およびプログラム

Country Status (4)

Country Link
US (1) US12361682B2 (https=)
EP (1) EP4220551B1 (https=)
JP (1) JP7265688B1 (https=)
WO (1) WO2023112199A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20260080187A1 (en) 2024-09-18 2026-03-19 Arcade Studio, Inc. Using artificial intelligence to generate images of product based on user input

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005101712A (ja) * 2003-09-22 2005-04-14 Fuji Xerox Co Ltd 画像処理装置およびプログラム
JP2005250712A (ja) * 2004-03-03 2005-09-15 Univ Waseda 人物属性識別方法およびそのシステム
JP2016139189A (ja) 2015-01-26 2016-08-04 株式会社ファーストリテイリング 商品表示プログラム
JP2018106284A (ja) * 2016-12-22 2018-07-05 楽天株式会社 情報処理装置、情報処理方法及び情報処理プログラム
JP2020071871A (ja) * 2019-08-28 2020-05-07 ニューラルポケット株式会社 情報処理システム、情報処理装置、サーバ装置、プログラム、又は方法
US20200320769A1 (en) * 2016-05-25 2020-10-08 Metail Limited Method and system for predicting garment attributes using deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2765277T3 (es) * 2014-12-22 2020-06-08 Reactive Reality Gmbh Método y sistema para generar datos de modelo de prenda

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005101712A (ja) * 2003-09-22 2005-04-14 Fuji Xerox Co Ltd 画像処理装置およびプログラム
JP2005250712A (ja) * 2004-03-03 2005-09-15 Univ Waseda 人物属性識別方法およびそのシステム
JP2016139189A (ja) 2015-01-26 2016-08-04 株式会社ファーストリテイリング 商品表示プログラム
US20200320769A1 (en) * 2016-05-25 2020-10-08 Metail Limited Method and system for predicting garment attributes using deep learning
JP2018106284A (ja) * 2016-12-22 2018-07-05 楽天株式会社 情報処理装置、情報処理方法及び情報処理プログラム
JP2020071871A (ja) * 2019-08-28 2020-05-07 ニューラルポケット株式会社 情報処理システム、情報処理装置、サーバ装置、プログラム、又は方法

Also Published As

Publication number Publication date
JPWO2023112199A1 (https=) 2023-06-22
JP7265688B1 (ja) 2023-04-26
EP4220551A1 (en) 2023-08-02
EP4220551B1 (en) 2026-04-22
US20240303969A1 (en) 2024-09-12
US12361682B2 (en) 2025-07-15

Similar Documents

Publication Publication Date Title
US11232324B2 (en) Methods and apparatus for recommending collocating dress, electronic devices, and storage media
US12548289B2 (en) Information processing apparatus, information processing method, and non-transitory computer readable medium
CN112784865A (zh) 使用多尺度图块对抗性损失的衣物变形
Sowmya et al. Significance of incorporating chrominance information for effective color-to-grayscale image conversion
JP2024515532A (ja) 1つ以上のユーザ固有の頭皮分類を生成するために、ユーザの頭皮の頭皮領域の画素データを分析するためのデジタル撮像及び学習システム並びに方法
US12572973B2 (en) Artificial intelligence-based systems and methods for providing personalized skin product recommendations
US11455747B2 (en) Digital imaging systems and methods of analyzing pixel data of an image of a user's body for determining a user-specific skin redness value of the user's skin after removing hair
JP7265688B1 (ja) 情報処理装置、情報処理方法、およびプログラム
US12572589B2 (en) Information processing apparatus, information processing method, and non-transitory computer readable medium
TW201137648A (en) Image-processing device, image-processing method, and image-processing program
JP2018132821A (ja) 情報処理装置、情報処理システム、端末装置、プログラム及び情報処理方法
CN116805373B (zh) 彩色底色检测
JP7445782B2 (ja) 情報処理装置、情報処理方法、およびプログラム
JP7376730B1 (ja) 情報処理装置、情報処理方法、プログラム、および学習モデル
CN118401967A (zh) 分析用户皮肤区域的图像的像素数据以确定皮肤油性的数字成像系统和方法
JP6033988B2 (ja) 情報処理装置
WO2024201980A1 (ja) 検索システム、検索方法、および情報処理装置
US20240382149A1 (en) Digital imaging and artificial intelligence-based systems and methods for analyzing pixel data of an image of user skin to generate one or more user-specific skin spot classifications
Miyazaki et al. Color exaggeration for dichromats using weighted edge
CN109583362B (zh) 图像卡通化方法及装置
HK40130225A (zh) 用於分析用户皮肤的图像的像素数据以生成一个或多个用户特定皮肤斑点分类的基於数字成像和人工智能的系统和方法
CN118411438A (zh) 图像生成方法、装置、电子设备及存储介质
WO2024201983A1 (ja) 情報処理装置、情報処理方法、および情報処理プログラム

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022573687

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18008730

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2021943330

Country of ref document: EP

Effective date: 20221208

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943330

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWG Wipo information: grant in national office

Ref document number: 18008730

Country of ref document: US