WO2023058233A1 - 情報処理装置、情報処理方法、情報処理システム、およびプログラム - Google Patents

情報処理装置、情報処理方法、情報処理システム、およびプログラム Download PDF

Info

Publication number
WO2023058233A1
WO2023058233A1 PCT/JP2021/037384 JP2021037384W WO2023058233A1 WO 2023058233 A1 WO2023058233 A1 WO 2023058233A1 JP 2021037384 W JP2021037384 W JP 2021037384W WO 2023058233 A1 WO2023058233 A1 WO 2023058233A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information processing
color
object image
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/037384
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
ハサン アルサラン
フアレス ホスエ クエバス
ラジャセイカル サナガヴァラプ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rakuten Group Inc
Original Assignee
Rakuten Group Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rakuten Group Inc filed Critical Rakuten Group Inc
Priority to EP21931923.3A priority Critical patent/EP4187485B1/en
Priority to US17/915,857 priority patent/US12548289B2/en
Priority to JP2022508496A priority patent/JP7138264B1/ja
Priority to PCT/JP2021/037384 priority patent/WO2023058233A1/ja
Publication of WO2023058233A1 publication Critical patent/WO2023058233A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/56Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20052Discrete cosine transform [DCT]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to an information processing device, an information processing method, an information processing system, and a program, and particularly to technology for extracting colors from an image with high accuracy.
  • E-commerce/e-commerce which sells products using the Internet
  • EC Electronic Commerce
  • Patent Literature 1 discloses a technique for removing a background image from a product image, extracting a product area, and retrieving an image containing an area similar to the product area. Such a function can also be used when searching for similar products in response to a user's request using a terminal (store terminal) provided in a store that sells products handled on an EC site.
  • Patent Document 1 With the technology disclosed in Patent Document 1, a similar image is searched from a product area extracted from a product image, and the search is performed with consideration given to the color of the product area. In order to extract the color of the product from the product image, first, it is necessary to accurately extract the area of the product. However, in the technique disclosed in Patent Document 1, the foreground image and the background image of the image are simply separated, and the area of the product targeted for similarity search is not accurately extracted, and the extracted color is not appropriate. may not represent the color of the product.
  • the present invention has been made in view of the above problems, and aims to provide a technique for extracting, from an input image, a color to be used for retrieval with high precision.
  • one aspect of an information processing apparatus is an acquisition means for acquiring an object image including one or more objects, and applying the object image to a first learning model.
  • a first predicting means for predicting a rectangular area surrounding each of said one or more objects in said object image and a type of each of said one or more objects;
  • second prediction means for predicting a region of an object of interest in said object image by applying a model; and extracting means for extracting the
  • a color is determined for each pixel in the region of the target object, and among the one or more colors determined in the region of the target object, a predetermined number of colors from the top are selected, It can be extracted as the color of the object of interest.
  • the first learning model may be a learning model based on YOLO (You only look once).
  • the second learning model may be a learning model composed of FCN (Fully Convolutional Networks).
  • the object image may be image data generated from Y elements, Cb elements, and Cr elements in data DCT-transformed from a YCbCr image.
  • the image data may be data in which the Y, Cb, and Cr elements of the DCT-transformed data are size-matched and connected.
  • the information processing apparatus includes a generation means for generating a plurality of feature vectors for the object by applying a color feature vector representing the color of the object extracted by the extraction means and the object image to a plurality of learning models. a concatenating means for concatenating the color feature vector and a plurality of feature vectors and embedding them in a common feature space to generate a composite feature vector on the feature space; and a search means for retrieving images.
  • one aspect of the information processing method is an acquisition step of acquiring an object image including one or more objects, and applying the object image to a first learning model.
  • an information processing program for causing a computer to execute information processing, the program obtains an object image including one or more objects. and predicting a rectangular area surrounding each of the one or more objects in the object image and a type of each of the one or more objects by applying the object image to a first learning model. a second prediction process for predicting a target object area in the object image by applying the rectangular area and the type to a second learning model; and the target object and extracting the color of the target object by judging the color of each pixel in the area of .
  • an information processing system having a user device and an information processing device, wherein the user device transmits an object image including one or more objects to the
  • the information processing device has a transmitting means for transmitting to an information processing device, and the information processing device acquires the object image, and applies the object image to a first learning model to obtain the one in the object image.
  • a first prediction means for predicting a rectangular area surrounding each of one or more objects and a type of each of the one or more objects; and applying the rectangular area and the type to a second learning model, a second prediction means for predicting a region of a target object in the object image; an extraction means for determining a color of each pixel in the region of the target object and extracting the color of the target object;
  • FIG. 1 shows a configuration example of an information processing system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing an example of the functional configuration of the information processing device according to the embodiment of the present invention.
  • FIG. 3A shows a conceptual diagram of each feature vector and composite feature vectors.
  • FIG. 3B shows a conceptual diagram of similarity search processing.
  • FIG. 4 shows the schematic architecture of the image recognition model.
  • FIG. 5A shows a conceptual diagram of the processing flow of the color extractor.
  • FIG. 5B shows an example architecture of the segment extraction model.
  • FIG. 5C shows the flow of color extraction processing.
  • FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing device according to the embodiment of the present invention.
  • FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing device according to the embodiment of the present invention.
  • FIG. 7 is a flow chart showing processing executed by the information processing apparatus according to the embodiment of the present invention.
  • FIG. 8A shows a screen display example of the user device according to the first embodiment.
  • FIG. 8B shows a screen display example of the user device according to the first embodiment.
  • FIG. 9A shows a screen display example of the user device according to the second embodiment.
  • FIG. 9B shows a screen display example of the user device according to the second embodiment.
  • FIG. 9C shows a screen display example of the user device according to the second embodiment.
  • FIG. 10A shows a screen display example of the user device according to the third embodiment.
  • FIG. 10B shows a screen display example of the user device according to the third embodiment.
  • FIG. 1 shows the configuration of an information processing system according to this embodiment.
  • This information processing system includes a user device 10 such as a terminal device or a shop terminal provided in a shop, and an information processing device 100 .
  • the user device 10 is, for example, a device such as a smartphone or a tablet, and can communicate with the information processing device 100 via a public network such as LTE (Long Term Evolution) or a wireless communication network such as a wireless LAN (Local Area Network). is configured to The user device 10 has a display unit (display surface) such as a liquid crystal display, and the user can perform various operations using a GUI (Graphic User Interface) provided on the liquid crystal display.
  • the operation includes various operations for content such as an image displayed on the screen, such as a tap operation, a slide operation, and a scroll operation using a finger, stylus, or the like.
  • the user device 10 may be a device such as a desktop PC (Personal Computer) or a notebook PC. In that case, a user's operation can be performed using an input device such as a mouse or a keyboard.
  • the user device 10 may have a separate display surface.
  • the user device 10 transmits a search query to the information processing device 100 according to the user's operation.
  • the search query corresponds to a request for retrieving similar images (images containing products similar to the product) associated with an image containing a product (object) (product image (object image)) to the product image.
  • the product image for which similar images are to be searched may also be referred to as a query image.
  • the user selects, for example, one product image from among one or more product images displayed on the display unit of the user device 10 as a query image, and then selects a predetermined search button to perform a search query. can be sent.
  • the search query can include (associate with) query image information in a format that can be decoded by the information processing apparatus 100 or in a URL format.
  • the information processing device 100 is a server device capable of building an EC site and distributing web content, and in this embodiment, is configured to be able to provide a search service. As the search service, the information processing apparatus 100 can generate content (search results) corresponding to a search query received from the user device 10 and distribute (output) the content to the user device 10 .
  • the information processing device 100 acquires a product image associated with the search query received from the user device 10, generates a plurality of feature vectors in light of a plurality of attributes of the product included in the product image, A composite feature vector is generated by connecting the plurality of feature vectors, and a similar image similar to the product image is searched using the composite feature vector.
  • FIG. 2 shows an example of the functional configuration of the information processing device 1 according to this embodiment.
  • Information processing apparatus 1 shown in FIG. an output unit 109 , a learning model storage unit 110 , and a search database 115 .
  • Learning model storage unit 110 stores various learning models (first feature estimation model 111, second feature estimation model 111, second feature estimation model 112, gender estimation model 113, and segment extraction model 114). The various learning models will be described later.
  • the search database 115 is a database that stores information related to similar image search, and may be provided outside the information processing apparatus 100 .
  • the acquisition unit 101 acquires a product image (query image).
  • the acquisition unit 101 receives a search query transmitted by the user device 10 and acquires product images associated with (included in) the search query.
  • the product image may be an image expressing colors with three colors, red (R), green (G), and blue (B).
  • the product image is an image (an image YCbCr converted from an RGB image (YCbCr image)) expressed by luminance (Y (Luma)) representing brightness and color components (Cb, Cr (Chroma)). good too.
  • the product image may be data (coefficients) obtained by DCT (Discrete Cosine Transform) conversion (compression) from a YCbCr image by an encoding unit (not shown) provided in the information processing apparatus 100 .
  • the acquisition unit 101 may be configured to acquire data as a product image that has undergone (YCbCr conversion and) DCT conversion by a device other than the information processing device 100 .
  • Acquisition section 101 outputs the acquired product image to first feature estimation section 102 , second feature estimation section 103 , gender estimation section 104 , and color extraction section 105 .
  • FIG. 3A shows a conceptual diagram of each feature vector and a compound feature vector (Compounded Feature Vector).
  • the first feature estimation unit 102 applies the product image (corresponding to the input image 30 in FIG. 3A) acquired by the acquisition unit 101 to the first feature estimation model 111, and performs supervised learning to obtain the first feature for the product.
  • One feature is estimated (predicted) to generate a first feature vector 301 representing the first feature.
  • the first feature indicates a high-level (aggregated) classification of the product, also called category.
  • a feature vector represents a value/information representing a feature.
  • the second feature estimation unit 103 applies the product image acquired by the acquisition unit 101 to the second feature estimation model 112 and performs supervised learning to estimate (predict) the second feature of the product, A second feature vector 302 representing the second feature is generated.
  • the second feature indicates a lower-level (subdivided) classification of the product and is associated with the first feature.
  • the second feature is also called genre.
  • the second feature estimation unit 103 may be configured to estimate the first feature by applying it to the first feature estimation model 111, and to estimate the second feature from the estimated first feature.
  • the first feature indicates a higher level (aggregated) product classification type
  • the second feature indicates a lower level (subdivided) product classification type.
  • the first feature (category) includes product classification types such as men's fashion, ladies' fashion, fashion goods, innerwear, shoes, accessories, and watches.
  • the second feature (genre) includes product classification types such as pants, shirts, blouses, skirts, and dresses when the first feature is women's fashion.
  • First feature estimating section 103 and second feature estimating section 104 output generated first feature vector 301 and second feature vector 302 to connecting section 106, respectively.
  • the gender estimation unit 104 applies the product image acquired by the acquisition unit 101 to the gender estimation model 113 and performs supervised learning to estimate (predict) the gender targeted by the product, A gender feature vector 303 indicating the gender is generated.
  • the gender estimation unit 104 can identify not only genders such as male and female, but also categories such as kids and unisex.
  • Gender estimating section 104 outputs generated gender feature vector 303 to connecting section 106 .
  • the color extraction unit 105 applies the product image acquired by the acquisition unit 101 to the segment extraction model 114, performs supervised learning, acquires the product area as a segment, and obtains the segment (segmented area ) (corresponding to color extraction 32 in FIG. 3A) to generate a color feature vector 304 indicative of the color. Processing of the color extraction unit 105 will be described later.
  • Color estimating section 105 outputs generated color feature vector 304 to connecting section 106 .
  • Concatenating section 106 concatenates the feature vectors output from first feature estimating section 102, second feature estimating section 103, gender estimating section 104, and color extracting section 105 to form a multi-dimensional feature space (hereinafter referred to as , called the feature space) to generate a composite feature vector 311 (corresponding to concatenation 31 in FIG. 3A). That is, the connecting unit 106 connects the composite feature vector 311 connecting the first feature vector 301, the second feature vector 302, the gender feature vector 303, and the color feature vector 304 in one (common) feature space. are embedded in one (common) common feature space to generate a composite feature vector 311 .
  • the first feature vector 301 is 200-dimensional (200D (dimension))
  • the second feature vector 302 is 153-dimensional (153D)
  • the gender feature vector 303 is 4-dimensional (4D)
  • the color feature vector 304 is 6-dimensional. (6D). Therefore, the composite feature vector 311 is represented by 363 dimensions (363D). Also, the composite feature vector 311 may be concatenated in the order of the gender feature vector 303, the second feature vector 302, the color feature vector 304, and the first feature vector 301, as shown in FIG. 3A.
  • the order of connection is an example, and is not limited to this order.
  • the linking unit 106 outputs the generated composite feature vector 311 to the similarity searching unit 107 .
  • the similarity search unit 107 receives as input the composite feature vector 311 generated by the connection unit 106 and searches for images similar to the product image acquired by the acquisition unit 101 .
  • the similarity search unit 107 performs similar image search on the feature space.
  • the similarity search unit 107 is configured to search for similar images using, for example, a known Nearest Neighbor Search engine.
  • a neighborhood search engine for example, one using the FAISS (Facebook AI Similarity Search) algorithm is known. All or part of the configuration of the similarity search unit 107 may be installed outside so as to be associated with the information processing apparatus 100 .
  • the output unit 109 outputs information including images (similar images) corresponding to one or more image IDs that are the search results of the similarity search unit 107 .
  • the output unit 109 can provide the information via the communication I/F 507 (FIG. 5).
  • the learning unit 108 learns (trains) each of the first feature estimation model 111, the second feature estimation model 112, the gender estimation model 113, and the segment extraction model 114, and stores these learned learning models in the learning model storage unit 110.
  • the first feature estimation model 111, the second feature estimation model 112, and the gender estimation model 113 are all learning models for machine learning to which image recognition models are applied.
  • An example of schematic architecture of the image recognition model is shown in FIG. Segment extraction model 114 will be described later.
  • the image recognition model according to this embodiment is composed of an intermediate layer that includes a plurality of convolution layers and an output layer that classifies/predicts classes. output the feature vector.
  • an intermediate layer for example, EfficientNet by Google Research is used. When EfficientNet is used, each convolutional layer uses MBConv (Mobile Inverted Bottleneck Convolution). The intermediate layer extracts the feature map, and the output layer is configured to reduce the dimensionality from the map to generate the final feature vector. Note that the number of convolution layers is not limited to a specific number.
  • the first feature estimation model 111, the second feature estimation model 112, and the gender estimation model 113 can each be configured with an architecture like the image recognition model shown in FIG. 302, output the gender feature vector 303;
  • the first feature estimation model 111, the second feature estimation model 112, and the gender estimation model 113 are each subjected to learning processing using individual learning (teacher) data. Here, learning processing for these learning models will be described.
  • First feature estimation model 111 A model that predicts a first feature (category (higher-level classification of products)) from a product image and outputs a first feature vector 301 .
  • categories for products are set in advance, and in this embodiment, it is assumed that there are 200 types of categories. Examples of categories are men's fashion, women's fashion, fashion goods, innerwear, shoes, accessories, and watches, as mentioned above, with respect to wearables. Categories may also include food, gardening, computers/peripherals, and the like.
  • the first feature estimation model 111 is configured to be able to classify 200 types of categories
  • the first feature vector 301 is a vector capable of representing 200 dimensions.
  • Second feature estimation model 112 A model that predicts a second feature (genre (lower-level classification of products)) from a product image and outputs a second feature vector 302 .
  • learning data a combination of a product image (input image) and a genre of the product as correct data is used.
  • genres for products are set in advance, and are set in advance in a form that is associated with each category, which is a higher level classification.
  • the second feature estimation model 112 is configured to be able to estimate 153 types of genres for each first feature vector 301 (category) generated by the first feature estimation unit 102. is a vector capable of expressing 153 dimensions.
  • the second feature estimation model 112 is configured to estimate a first feature to generate a first feature vector 301, and estimate a second feature from the first feature to generate a second feature vector 302. may be
  • Gender estimation model 113 A model that predicts gender from product images and outputs a gender feature vector 303 .
  • learning data a combination of a product image (input image) and gender information targeted by the product as correct data is used.
  • gender includes not only male and female, but also kids and unisex.
  • the gender estimation model 113 is configured to be able to estimate four types of gender (male, female, kids, and unisex), and the gender feature vector 303 is a vector capable of expressing four dimensions.
  • the gender estimation model 113 predicts gender based on the first feature vector 301 and/or the second feature vector 302, not from the image recognition model shown in FIG. may be configured to
  • FIG. 5A shows a conceptual diagram of the processing flow of the color extraction unit 105 according to this embodiment.
  • the color extracting unit 105 is configured to extract the color of the product by inputting the data converted from the RGB image to the YCbCr image and DCT-converted as the product image.
  • the conversion process may be performed by the acquisition unit 101 or may be performed by a device other than the information processing device 100 .
  • a block 51 in FIG. 5A shows processing from the product image acquired by the color extraction unit 105 to the input to the segment extraction model 114 .
  • the input YCbCr-transformed and DCT-transformed images are denoted as DCT-Y501, DCT-Cb501, and DCT-Cr501.
  • DCT-Y501, DCT-Cb501, and DCT-Cr501 each have components of [64,80,80], [64,40,40], and [64,40,40], and each dimension (Dimensionality ) represents [number of channels (n_channels), width (width), height (height)].
  • the color extraction unit 105 performs upsampling processing on DCT-Cb502 and DCT-Cr503 to generate DCT-Cb504 and DCT-Cr505.
  • the color extraction unit 105 concatenates the DCT-Y 501 , DCT-Cb 504 and DCT-Cr 505 channel-wise to generate concatenated DCT data 506 . That is, the sizes of the Y, Cb, and Cr elements are adjusted to generate concatenated DCT data 506 (image data).
  • Concatenated DCT data 506 is input to segment extraction model 114 .
  • the segment extraction model 114 includes an object detection model 52 and a semantic segmentation model 53.
  • the object detection model 52 is, for example, a learning model based on YOLO (You only look once).
  • the semantic segmentation model 53 is, for example, a learning model composed of a neural network such as FCN (Fully Convolutional Networks).
  • the object detection model 52 receives the concatenated DCT data 338 as input and learns to predict the bounding box (rectangular area surrounding the object) of one or more objects (objects) contained in the input image and the type (class) of the object. is a model. This may include not only the bounding box and type of the object to be searched (ie, the product to be searched), but also the bounding box and type of the object not to be searched.
  • the predicted bounding box and type are input to semantic segmentation model 53 .
  • the semantic segmentation model 53 is a learning model that predicts the region (segmentation) of the search target product (target object) using the predicted bounding box and class as input.
  • the semantic segmentation model 53 is configured to output pixel (picture element) information in the segmented region (ie region of the product).
  • the semantic segmentation model 53 may be configured to take only the predicted bounding box as input and output pixel information in the segmented region.
  • FIG. 5B shows an example architecture of the segment extraction model 114 .
  • segment extraction model 114 includes object detection model 52 and semantic segmentation model 53 .
  • the object detection model 52 includes a backbone portion 521 for pre-training and a head portion 522 for predicting bounding boxes and classes (kinds).
  • Conv denotes a convolution layer
  • BottleneckCSP Cross Stage Partial Network
  • SPP Spatial Pyramid Pooling
  • Identity indicates an activation function that does nothing.
  • the object detection model 52 and the semantic segmentation model 53 are learned by the learning unit 108 using arbitrary product images as learning data.
  • the color extraction unit 105 acquires pixel information (S51), and then extracts the colors of the color palette for each pixel using the color map (S52).
  • the color extractor 105 converts the segmented region (from RGB or YCbCr) to the Lab color space and represents the colors of all pixels in the region as color values. It is also assumed that the colors of the color palette have been converted into the Lab color space.
  • the color extractor 105 can determine the distance (difference) between both color values for each pixel and extract the color for each pixel.
  • the color extraction unit 105 selects the top two colors among the colors extracted from all pixels of the area as a primary color (first color) and a secondary color (second color) (S53).
  • a color information vector 304 containing RGB information of two colors is output (S54).
  • the color information vector 304 is a vector capable of expressing six dimensions of 3 (RGB) ⁇ 2 (primary color and secondary color).
  • FIG. 6 is a block diagram showing an example of the hardware configuration of the information processing apparatus 100 according to this embodiment.
  • the information processing apparatus 100 according to this embodiment can be implemented on any single or multiple computers, mobile devices, or any other processing platform. Referring to FIG. 6, an example in which information processing apparatus 100 is implemented in a single computer is shown, but information processing apparatus 100 according to the present embodiment is implemented in a computer system including a plurality of computers. good. A plurality of computers may be interconnectably connected by a wired or wireless network.
  • information processing apparatus 100 may include CPU 601 , ROM 602 , RAM 603 , HDD 604 , input section 605 , display section 606 , communication I/F 607 , and system bus 608 .
  • Information processing apparatus 100 may also include an external memory.
  • a CPU (Central Processing Unit) 601 comprehensively controls operations in the information processing apparatus 100, and controls each component (602 to 607) via a system bus 608, which is a data transmission path.
  • a ROM (Read Only Memory) 602 is a non-volatile memory that stores control programs and the like necessary for the CPU 601 to execute processing.
  • the program may be stored in a non-volatile memory such as a HDD (Hard Disk Drive) 604 or an SSD (Solid State Drive) or an external memory such as a removable storage medium (not shown).
  • a RAM (Random Access Memory) 603 is a volatile memory and functions as a main memory, a work area, and the like for the CPU 601 . That is, the CPU 601 loads necessary programs and the like from the ROM 602 to the RAM 603 when executing processing, and executes the programs and the like to realize various functional operations.
  • the HDD 604 stores, for example, various data and information necessary for the CPU 601 to perform processing using programs.
  • the HDD 604 also stores various data, information, and the like obtained by the CPU 601 performing processing using programs and the like, for example.
  • An input unit 605 is configured by a pointing device such as a keyboard and a mouse.
  • a display unit 606 is configured by a monitor such as a liquid crystal display (LCD).
  • the display unit 606 may function as a GUI (Graphical User Interface) by being configured in combination with the input unit 605 .
  • GUI Graphic User Interface
  • a communication I/F 607 is an interface that controls communication between the information processing apparatus 100 and an external device.
  • a communication I/F 607 provides an interface with a network and executes communication with an external device via the network.
  • Various data, various parameters, and the like are transmitted/received to/from an external device via the communication I/F 607 .
  • the communication I/F 607 may perform communication via a wired LAN (Local Area Network) conforming to a communication standard such as Ethernet (registered trademark) or a dedicated line.
  • the network that can be used in this embodiment is not limited to this, and may be configured as a wireless network.
  • This wireless network includes a wireless PAN (Personal Area Network) such as Bluetooth (registered trademark), ZigBee (registered trademark), and UWB (Ultra Wide Band). It also includes a wireless LAN (Local Area Network) such as Wi-Fi (Wireless Fidelity) (registered trademark) and a wireless MAN (Metropolitan Area Network) such as WiMAX (registered trademark). Furthermore, wireless WANs (Wide Area Networks) such as LTE/3G, 4G, and 5G are included. It should be noted that the network connects each device so as to be able to communicate with each other, and the communication standard, scale, and configuration are not limited to those described above.
  • At least some of the functions of the elements of the information processing apparatus 100 shown in FIG. 6 can be realized by the CPU 601 executing a program. However, at least some of the functions of the elements of the information processing apparatus 100 shown in FIG. 6 may operate as dedicated hardware. In this case, the dedicated hardware operates under the control of the CPU 601 .
  • the hardware configuration of the user device 10 shown in FIG. 1 can be the same as in FIG. That is, the user device 10 can include a CPU 601 , a ROM 602 , a RAM 603 , an HDD 604 , an input section 605 , a display section 606 , a communication I/F 607 and a system bus 608 .
  • the user device 10 displays various information provided by the information processing device 100 on the display unit 606, and performs processing corresponding to input operations received from the user via the GUI (composed of the input unit 605 and the display unit 606). be able to.
  • the user device 10 can include a camera (not shown), and is configured to perform photographing processing under the control of the CPU 601 according to user's operation.
  • FIG. 7 shows a flowchart of processing executed by the information processing apparatus 100 according to this embodiment.
  • the processing shown in FIG. 6 can be realized by the CPU 601 of the information processing apparatus 100 loading a program stored in the ROM 602 or the like into the RAM 603 and executing the program.
  • the acquisition unit 101 acquires a product image as a query image.
  • the acquisition unit 101 can acquire a product image by acquiring an image included in a search query transmitted from the user device 10 or a URL indicating the image.
  • S72 to S75 are processes for generating (estimating) feature vectors (first feature vector 301, second feature vector 302, gender feature vector 303, color feature vector 304) for the product image acquired in S71. Each process of S72 to S75 may be performed in an order different from that shown in FIG. 6, or may be performed in parallel.
  • the first feature estimation unit 102 applies the product image acquired by the acquisition unit 101 to the first feature estimation model 111 to generate the first feature vector 301 .
  • the first feature estimation model 111 is configured to be able to estimate 200 kinds of first features (categories), and the first feature vector 301 expresses 200 dimensions. is a possible vector.
  • the second feature estimation unit 103 applies the product image acquired by the acquisition unit 101 to the second feature estimation model 112 to generate the second feature vector 302 .
  • the second feature estimation model 112 is configured to be able to estimate 153 types of second features (genres) for each first feature (category). is a vector that can express 153 dimensions.
  • the second feature vector 302 may be configured to have multiple levels. For example, if the category of the product estimated by the first feature estimation unit 102 is women's fashion, the genre of the product estimated by the second feature estimation unit 103 is women's fashion_bottoms/pants from the upper level to the lower level. It may be configured to have two levels.
  • the gender estimation unit 104 applies the product image acquired by the acquisition unit 101 to the gender estimation model 113 to generate the gender feature vector 303.
  • the gender estimation model 113 is configured to be able to estimate four types of gender (male, female, kids, and unisex), and the gender feature vector 303 can express four dimensions. is a vector.
  • the color extraction unit 105 generates the color feature vector 304 from the product image acquired by the acquisition unit 101.
  • the processing for generating the color feature vector 304 is as described above, and the vector can express six dimensions.
  • the connecting unit 106 connects the first feature vector 301, the second feature vector 302, the gender feature vector 303, and the color feature vector 304 output in S72 to S75, embeds them in the feature space, and embeds the composite feature vector 311 to generate
  • the similarity search unit 107 receives the composite feature vector 311 generated by the connection unit 106 and searches for an image (similar image) similar to the product image acquired by the acquisition unit 101.
  • the search process can be performed using the FAISS (Facebook AI Similarity Search) algorithm.
  • FAISS is a neighborhood search algorithm using LSH (Locality Sensitive Hashing).
  • the similarity search unit 107 Prior to the search process, the similarity search unit 107 generates a composite feature vector 311 for each of multiple product images as learning data.
  • each product image is assigned an image ID (index/identifier) for identifying the image.
  • the similarity search unit 107 associates (maps) the composite feature vector 311 with the image ID of the product image indicated by the vector and stores it in the search database 115 .
  • the format of the image ID is not limited to a specific one, and may be information corresponding to a URL or the like.
  • the similarity search unit 107 calculates the degree of similarity (Euclidean distance ) to obtain one or more composite feature vectors similar to composite feature vector 311 . Such processing corresponds to the neighborhood search processing.
  • the similarity search unit 107 acquires one or more image IDs corresponding to one or more similar composite feature vectors, and outputs similar images corresponding to the image IDs.
  • the processing for generating the four feature vectors is not performed.
  • similar image search can be performed. For example, when there is a composite feature vector corresponding to the image ID of the product image associated with the search query received from the user device 10, the similarity search unit 107 retrieves the corresponding composite feature vector from the image ID in the search database 115. Similar images can be retrieved from the corresponding composite feature vectors.
  • FIG. 3B shows a conceptual diagram of the similar image search processing in S77 described above.
  • the neighborhood search process is performed from the composite feature vector 311 generated from the product image or the composite feature vector 311 retrieved from the image ID of the product image.
  • a composite feature vector having a high degree of similarity with the composite feature vector 311 is searched.
  • vectors with close Euclidean distances are determined to have high similarity in the feature space.
  • an image having an image ID corresponding to the searched composite feature vector is searched from the image ID database (included in the search database 115), and the searched image is output as a similar image.
  • the similarity search unit 107 may read feature vectors from the beginning of the composite feature vector 311 and perform similarity search. For example, as shown in FIG. 3A, when the composite feature vector 311 is connected in the order of the gender feature vector 303, the second feature vector 302, the color feature vector 304, and the first feature vector 301, the similarity search unit 107 The gender feature vector 303 can be read first to perform search processing, and then the second feature vector 302 can be read to perform search processing.
  • the output unit 109 outputs (distributes) information including images (similar images) corresponding to one or more image IDs retrieved by the similarity search unit 107 to the user device 10 . That is, as a response (search result) to the search query received from the user device 10 by the acquisition unit 101 , information including the similar image is provided to the user device 10 .
  • FIGS. 8A and 8B show screen display examples of the user device 10 according to the present embodiment.
  • a screen 80 is an example of a screen displayed on the display unit 606 of the user device 10 .
  • the user operates the user device 10 to access an arbitrary e-commerce site (a website such as an EC site), enters an arbitrary search word, and transmits the search word to the information processing device 100 to display the screen 80.
  • data is provided and displayed on the display unit 606 of the user 10 .
  • selection actions include actions such as pressing and touching; the same applies hereinafter
  • a product image 82 in the area 81 and a search button 83 for the product image 82 are displayed.
  • Search button 83 is displayed to be selectable.
  • the search query associated with the product image 82 as the query image is transmitted to the information processing device 100 .
  • the image ID attached to the product image 82 can be included in the search query and transmitted.
  • the information processing device 100 that has received the search query generates a first feature vector 301, a second feature vector 302, a gender feature vector 303, and a color feature vector 304 from the product image 82 associated with the search query. Subsequently, the information processing apparatus 100 generates a composite feature vector 311 from the four feature vectors, searches for one or more similar images from the composite feature vector 311, and obtains a search result (one or more similar images and the image related information) to the user device 10 .
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 8B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • the information processing apparatus 100 extracts the product area from the product image with high accuracy and extracts the color of the product area, thereby generating a color feature vector more suitable for the product. can do. Further, the information processing apparatus 100 searches for a similar image from a composite feature vector obtained by combining a plurality of feature vectors including the color feature vector. As a result, it becomes possible to search for similar images from the point of view of each and every feature of the product, and it is possible to provide similar images with higher precision than in the past, thereby improving usability.
  • the composite feature vector 311 is generated from four feature vectors, but the number of combined feature vectors is not limited to four.
  • a composite feature vector 311 may be generated from the second feature vector 302 and the color feature vector 304, and a similar image may be retrieved from the composite feature vector 311.
  • a similar image may be retrieved from a composite feature vector 311 that combines other feature vectors generated by machine learning.
  • the user device 10 selects one product image on a website such as an EC site, the information processing device 100 searches for similar images similar to the selected product image, and sends the user device 10 provided.
  • the user device 10 is equipped with a camera (imaging means)
  • the user can view not only products handled by the EC site that the user has accessed, but also products that are similar to products contained in product images captured by the camera. It is assumed that the user searches for a product and considers purchasing it.
  • an image is arbitrarily selected from images already captured by a camera and images acquired from an external device, which are stored in the storage unit of the user device 10, and products similar to products included in the selected image are displayed. A case of searching and considering purchase is also assumed.
  • an embodiment will be described in which the user searches for similar images from images captured by a camera or images selected from the storage section of the user device 10 .
  • the description of matters common to the first embodiment will be omitted.
  • the configuration of the information processing apparatus 100 according to this embodiment is the same as that of the first embodiment.
  • the flow of processing executed by the information processing apparatus 100 according to this embodiment is also the same as the processing shown in FIG. 6 described in the first embodiment.
  • a product image as a query image in the first embodiment corresponds to an image captured by the user device 10 or an image selected from the storage unit.
  • FIGS. 9A to 9C show screen display examples of the user device 10 according to this embodiment.
  • a screen 90 in FIG. 9A is an example of a screen displayed on the display unit 606 of the user device 10 .
  • the user operates the user device 10 to access an arbitrary electronic commerce site (EC site), inputs an arbitrary search word, and transmits the search word to the information processing device 100, so that the information on the screen 90 is It is provided and displayed on the display unit 606 of the user device 10 .
  • EC site electronic commerce site
  • the CPU 601 of the user device 10 controls the display unit 606 of the user device 10 to display the camera button 91 and the photo library button 92 together according to the user's operation.
  • the camera button 91 and the photo library button 92 are controlled to be displayed on the screen 90 provided from the information processing apparatus 100, but the EC site accessed by the user is A camera button 91 and a photo library button 92 may be displayed on the associated screen.
  • the camera button 91 and the photo library button 92 may be configured in other forms, such as physical buttons.
  • the camera button 91 is a button for activating a camera function (camera application) provided in the user device 10 .
  • the user device 10 enters a state (shooting mode) in which an arbitrary subject can be shot.
  • the photo library button 92 is a button for browsing one or more images stored in a storage unit such as the RAM 603 of the user device. When the photo library button 92 is selected, one or more images stored in the storage section are displayed on the display section 606 of the user device 10 .
  • FIG. 9B shows an example of the screen when the user selects the camera button 91 on the screen 90 of FIG. 9A and captures an image as a query image for searching for similar images.
  • image 94 shows the captured image.
  • a search button 95 for an image 94 is also displayed on the screen 93 .
  • Search button 95 is displayed to be selectable. In this state, when the user selects the search button 95 , a search query associated with the image 94 as the query image is transmitted to the information processing apparatus 100 .
  • the information processing device 100 that has received the search query generates a first feature vector 301, a second feature vector 302, a gender feature vector 303, and a color feature vector 304 from the image 94 associated with the search query. Subsequently, the information processing apparatus 100 generates a composite feature vector 311 from the four feature vectors, searches for one or more similar images from the composite feature vector 311, and obtains a search result (one or more similar images and the image related information) to the user device 10 .
  • FIG. 9C shows an example of the screen when the user selects the photo library button 92 on the screen 90 of FIG. 9A.
  • a captured image stored in the storage unit of the user device 10 or an image acquired from the outside is displayed on the screen 96 of FIG. 9C.
  • a user can change one or more images displayed on screen 96 by, for example, swiping screen 96 right or left.
  • the image 97 displayed in the center of the screen 96 is the query image.
  • a search button 98 for the image 97 is displayed.
  • a search button 98 is displayed to be selectable.
  • the search query associated with the image 97 as the query image is transmitted to the information processing apparatus 100 .
  • the image displayed in the center of the screen 96 is the query image. Just do it.
  • the information processing device 100 that has received the search query generates a first feature vector 301, a second feature vector 302, a gender feature vector 303, and a color feature vector 304 from the image 97 associated with the search query. Subsequently, the information processing apparatus 100 generates a composite feature vector 311 from the four feature vectors, searches for one or more similar images from the composite feature vector 311, and obtains a search result (one or more similar images and the image related information) to the user device 10 .
  • a query image is selected not from a website such as an EC site, but from an image taken by the user, an image already taken, or an image acquired from the outside. This allows the user to more freely select a query image and search for similar images similar to the query image, which contributes to improving usability.
  • the user device 10 selects one product image on a website such as an EC site, the information processing device 100 searches for similar images similar to the selected product image, and the user device 10 provided to Further, in the second embodiment, the user device 10 selects one image from the images captured by the device or the images already acquired, and the information processing device 100 searches for similar images similar to the selected image. and provided to the user device 10.
  • a website such as an EC site
  • the user device 10 selects one image from the images captured by the device or the images already acquired, and the information processing device 100 searches for similar images similar to the selected image. and provided to the user device 10.
  • an embodiment combining the first embodiment and the second embodiment will be described.
  • the description of matters common to the first embodiment and the second embodiment will be omitted.
  • the configuration of the information processing apparatus 100 according to this embodiment is the same as that of the first embodiment.
  • the flow of processing executed by the information processing apparatus 100 according to this embodiment is also the same as the processing shown in FIG. 6 described in the first embodiment.
  • the processing of the similarity search unit 107 is different from the above embodiment.
  • the user device 10 transmits a search query that associates a product image as a query image with an image (text image) containing text information selected in the product image, and the similarity search unit 107 of the information processing device 100 Similar images are searched using the product image and the text image.
  • FIGS. 10A and 10B show screen display examples of the user device 10 according to the present embodiment.
  • a screen 1000 in FIG. 10A is an example of a screen displayed on the display unit 606 of the user device 10 .
  • the user operates the user device 10 to access an arbitrary electronic commerce site (EC site), inputs an arbitrary search word, and transmits the search word to the information processing device 100, thereby displaying the information on the screen 1000. It is provided and displayed on the display unit 606 of the user device 10 .
  • EC site electronic commerce site
  • the CPU 601 of the user device 10 controls the display unit 606 of the user device 10 to display the camera button 81 and the photo library button 82 together according to the user's operation.
  • the function of camera button 1001 is similar to camera button 81 of FIG. 8A.
  • a product image 1002 is displayed on the screen 1000 in FIG. 10A in response to a user's search operation.
  • the user selects the camera button 1001 to enter the shooting mode and shoots the area 1003 .
  • An image 1004 displayed on the display unit 606 after the shooting is an image corresponding to the area 1003 and is an image including text information (text image).
  • the image 1004 is not limited to an image obtained by a shooting operation, and may be an image obtained by an arbitrary selection operation by a user operation.
  • the image 1004 displays a search button 1005 for the product image 1002 (or area 1003).
  • a search button 1005 is displayed to be selectable. In this state, when the user selects a search button 1005 , a search query associated with the product image 1002 and the image (text image) 1004 is transmitted to the information processing apparatus 100 .
  • the information processing apparatus 100 that has received the search query generates a first feature vector 301, a second feature vector 302, a gender feature vector 303, and a color feature vector 304 from the image 1002 associated with the search query. Subsequently, the information processing apparatus 100 generates a composite feature vector 311 from the four feature vectors. If the composite feature vector 311 has already been generated from the image 1002, the similarity search unit 107 searches and acquires the composite feature vector 311 from the image ID.
  • the similarity search unit 107 analyzes the image 1004 associated with the search query and extracts text information.
  • Various known image processing techniques and machine learning can be used to extract the text information.
  • the similarity search unit 107 is configured to extract text information (eg, at least one of product name and brand name) from the image 1004 using machine learning.
  • the product name extracted is "Mineral Sunscreen” and the brand name extracted is "ABC WHITE”.
  • the similarity search unit 107 searches for one or more similar images to the image 1004 based on the composite feature vector 311 and the extracted text information, and retrieves search results (one or more similar images and various information related to the image). ) to the user device 10 .
  • FIG. 10B shows a screen example in which the search results received by the user device 10 from the information processing device 100 are displayed on the display unit 606.
  • FIG. 10B it is assumed that two similar images 1008A and 1008B are retrieved from the image 1004, and the screen 1007 displays the two similar images 1008A and 1008B.
  • various information such as price and attribute information related to each image can also be displayed.
  • the information processing apparatus 100 predicts a plurality of attributes (features) of the product from the product image, generates a plurality of feature vectors, and combines the plurality of feature vectors to form a composite feature vector. to generate Furthermore, the information processing apparatus 100 extracts text information from the text image in the product image. Then, the information processing apparatus 100 searches for similar images from the composite feature vector and the text information. As a result, it is possible to provide similar images with higher accuracy than in the past, and improve usability.
  • the acquisition unit 101 acquires one product image.
  • the information processing apparatus 100 may search for similar images for each image.
  • 10 User device, 100: Information processing device, 101: Acquisition unit, 102: First feature estimation unit, 103: Second feature estimation unit, 104: Gender estimation unit, 105: Color extraction unit, 106: Connection unit, 107 : similarity search unit, 108: learning unit, 109: output unit, 110: learning model storage unit, 111: first feature estimation model, 112: second feature estimation model, 113: gender estimation model, 114: segment extraction model, 115: Search database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2021/037384 2021-10-08 2021-10-08 情報処理装置、情報処理方法、情報処理システム、およびプログラム Ceased WO2023058233A1 (ja)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21931923.3A EP4187485B1 (en) 2021-10-08 2021-10-08 Information processing device, information processing method, information processing system, and program
US17/915,857 US12548289B2 (en) 2021-10-08 2021-10-08 Information processing apparatus, information processing method, and non-transitory computer readable medium
JP2022508496A JP7138264B1 (ja) 2021-10-08 2021-10-08 情報処理装置、情報処理方法、情報処理システム、およびプログラム
PCT/JP2021/037384 WO2023058233A1 (ja) 2021-10-08 2021-10-08 情報処理装置、情報処理方法、情報処理システム、およびプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/037384 WO2023058233A1 (ja) 2021-10-08 2021-10-08 情報処理装置、情報処理方法、情報処理システム、およびプログラム

Publications (1)

Publication Number Publication Date
WO2023058233A1 true WO2023058233A1 (ja) 2023-04-13

Family

ID=83282307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/037384 Ceased WO2023058233A1 (ja) 2021-10-08 2021-10-08 情報処理装置、情報処理方法、情報処理システム、およびプログラム

Country Status (4)

Country Link
US (1) US12548289B2 (https=)
EP (1) EP4187485B1 (https=)
JP (1) JP7138264B1 (https=)
WO (1) WO2023058233A1 (https=)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429594B (zh) * 2022-01-26 2024-12-13 华北电力大学 基于无人机联邦学习的输电线路典型目标检测方法及系统
WO2024127554A1 (ja) * 2022-12-14 2024-06-20 日本電気株式会社 情報処理装置、推論方法、推論プログラム、および特徴量生成モデルの生成方法
JP2024120324A (ja) * 2023-02-24 2024-09-05 株式会社オーイーシー 遺失物推定システム
JP7644914B2 (ja) * 2023-08-16 2025-03-13 株式会社マーケットヴィジョン 情報処理システム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009251850A (ja) 2008-04-04 2009-10-29 Albert:Kk 類似画像検索を用いた商品推薦システム
JP2017201454A (ja) * 2016-05-02 2017-11-09 日本放送協会 画像処理装置及びプログラム
CN112154451A (zh) * 2018-05-18 2020-12-29 悟图索知 提取图像中对象的代表性特征的方法、设备和计算机程序

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101599875B1 (ko) * 2008-04-17 2016-03-14 삼성전자주식회사 멀티미디어의 컨텐트 특성에 기반한 멀티미디어 부호화 방법 및 장치, 멀티미디어의 컨텐트 특성에 기반한 멀티미디어 복호화 방법 및 장치
WO2018232378A1 (en) * 2017-06-16 2018-12-20 Markable, Inc. Image processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009251850A (ja) 2008-04-04 2009-10-29 Albert:Kk 類似画像検索を用いた商品推薦システム
JP2017201454A (ja) * 2016-05-02 2017-11-09 日本放送協会 画像処理装置及びプログラム
CN112154451A (zh) * 2018-05-18 2020-12-29 悟图索知 提取图像中对象的代表性特征的方法、设备和计算机程序

Also Published As

Publication number Publication date
EP4187485B1 (en) 2024-12-25
EP4187485A4 (en) 2023-06-14
US20240212311A1 (en) 2024-06-27
JP7138264B1 (ja) 2022-09-15
EP4187485A1 (en) 2023-05-31
JPWO2023058233A1 (https=) 2023-04-13
US12548289B2 (en) 2026-02-10

Similar Documents

Publication Publication Date Title
JP7138264B1 (ja) 情報処理装置、情報処理方法、情報処理システム、およびプログラム
US11232324B2 (en) Methods and apparatus for recommending collocating dress, electronic devices, and storage media
US20200387763A1 (en) Item recommendations based on image feature data
US10127688B2 (en) System and process for automatically finding objects of a specific color
US9348844B2 (en) System and method for normalization and codification of colors for dynamic analysis
WO2016123538A1 (en) Mobile visual commerce system
JP7569382B2 (ja) 情報処理装置、情報処理方法、情報処理システム、およびプログラム
JP7265688B1 (ja) 情報処理装置、情報処理方法、およびプログラム
JP6154545B2 (ja) 情報処理装置
WO2013184804A1 (en) System and method for normalization and codificaton of colors for dyanamic analysis
Chae et al. Color navigation by qualitative attributes for fashion recommendation
TW202514538A (zh) 自動化配圖生成系統與方法

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022508496

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 2021931923

Country of ref document: EP

Effective date: 20220927

WWE Wipo information: entry into national phase

Ref document number: 17915857

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWG Wipo information: grant in national office

Ref document number: 17915857

Country of ref document: US