WO2023053364A1 - 情報処理装置、情報処理方法及び情報処理プログラム - Google Patents
情報処理装置、情報処理方法及び情報処理プログラム Download PDFInfo
- Publication number
- WO2023053364A1 WO2023053364A1 PCT/JP2021/036195 JP2021036195W WO2023053364A1 WO 2023053364 A1 WO2023053364 A1 WO 2023053364A1 JP 2021036195 W JP2021036195 W JP 2021036195W WO 2023053364 A1 WO2023053364 A1 WO 2023053364A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information processing
- machine learning
- heat map
- input image
- attributes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—Two-dimensional [2D] image generation
- G06T11/10—Texturing; Colouring; Generation of textures or colours
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present invention relates to an information processing device, an information processing method, and an information processing program.
- Non-Patent Document 1 describes a so-called image cropping, that is, an image processing technique for extracting important parts in an image, in which an attention map in a convolutional neural network is used to obtain an aesthetic value, and such an aesthetic evaluation is performed. It describes obtaining the frame of the main part to be extracted based on the value.
- Non-Patent Document 1 determines the position of the main part in the image by a single evaluation criterion (aesthetic evaluation value in Non-Patent Document 1). are doing.
- the essential parts of the image are originally different depending on the purpose for which the image is used. For example, even if the photographic image shows the same scene, if the photographic image is used as a landscape painting, the main part will be in the background objects in the image, and such a photographic image will be used as a portrait. If it is used, the main part will exist in the image of the person reflected in the foreground in the image. Similarly, even if it is a photographic image of the same person, if such a photographic image is used to identify the person, there will be an important part in the person's face, and such a photographic image will be used to introduce fashion. If used, there should be an essential part of the clothing of the person in the image.
- the present invention has been made in view of such circumstances, and its purpose is to appropriately perform image cropping according to the purpose of the image.
- Information having a plurality of machine learning models each outputting an intermediate heat map for an input image, and a generation unit generating a heat map based on the attributes of the input image and the intermediate heat map. processing equipment.
- the information processing according to (1) including a machine learning model selection unit that selects at least one machine learning model from the plurality of machine learning models as an input target of the input image based on the attribute. Device.
- an intermediate heatmap selection unit that selects at least one intermediate heatmap from among the plurality of intermediate heatmaps output to the plurality of machine learning models based on the attribute, Information processing equipment.
- a computer using one or more machine learning models, an output unit that outputs one or more intermediate heatmaps for an input image, and based on the attributes of the input image and the intermediate heatmaps
- An information processing program that functions as a generation unit that generates a heat map using the
- FIG. 1 is a functional conceptual diagram of an information processing device commonly conceived in various preferred embodiments of the present invention; It is an example of an input image. It is an example of various "important parts" in an input image.
- FIG. 10 is an example of an intermediate heatmap showing CTR prediction;
- FIG. 10 is an example of an intermediate heat map showing aesthetic evaluation values;
- It is an example of an intermediate heat map showing clothing.
- It is an example of an intermediate heat map showing a bag.
- An example of a generated heatmap It is a figure explaining an example of the process performed by the extraction part. It is an example of the main part obtained. It is a figure which shows the processing flow performed by an extraction part.
- 1 is a configuration diagram showing a typical physical configuration of a general computer;
- FIG. 1 is a diagram showing a functional configuration of an information processing apparatus according to a first embodiment of the present invention
- FIG. FIG. 7 is a diagram showing the functional configuration of an information processing apparatus according to a second embodiment of the present invention
- FIG. 10 is a diagram showing the functional configuration of an information processing apparatus according to a third embodiment of the present invention
- FIG. 4 is a diagram showing a common processing flow executed by the information processing apparatus according to the present invention
- FIG. 1 is a functional conceptual diagram of an information processing device 100 commonly conceived in various preferred embodiments of the present invention.
- the information processing apparatus 100 is realized by realizing the functions shown in the figure by appropriate physical means, for example, a computer executing appropriate computer programs.
- the information processing device 100 is a kind of image processing device that includes a machine learning model group 10, an output unit 12, a generation unit 20, and a clipping unit 30. More specifically, the machine learning model group 10 includes a plurality of trained machine learning models 11, each of which can output an intermediate heat map based on the input of the input image. The output unit 12 obtains the same number of intermediate heat maps by inputting the input image to at least one machine learning model 11 . Whether or not the input image is input to all of the multiple machine learning models 11 depends on the mode of implementation. Become.
- the generation unit 20 generates a heat map based on at least one obtained intermediate heat map. Normally, the generation unit 20 obtains a heat map by synthesizing a plurality of intermediate heat maps by a predetermined method, and directly or indirectly uses the attributes of the input image. That is, the generator 20 generates a heat map based on the intermediate heat map based on the attributes of the input image.
- the information processing device 100 may use the heat map output by the generating unit 20 as the final product. However, here, it is assumed that the clipping unit 30 is further provided.
- the clipping unit 30 clips a main part, which is a part of the input image, based on the heat map obtained by the generating unit 20 .
- the information processing apparatus 100 can be said to be a device that clips the main part of the input image based on the attributes of the input image.
- Fig. 2 is an example of an input image.
- the figure shows a photograph of a person, but the subject is not particularly limited. etc.).
- the format of the input image is not particularly limited, and the distinction between raster image and vector image, resolution, and format are arbitrary, but at least when the input image is input to the machine learning model group 10,
- the input image is prepared as electronic data.
- the "important part" of an input image can differ depending on how the input image is used. As a specific example, if a human figure is requested as the "important part" of the input image shown in FIG. 3, which is the same as that shown in FIG. The area within the frame is considered appropriate. However, if clothing (or fashion) is required as the "essential part", the area within the frame indicated by the dashed line B in the figure would be appropriate. is required, the area within the frame indicated by the chain double-dashed line C in the figure would be appropriate.
- the "important part" of the input image must be determined based on information indicating how the input image is to be used, and such information is given by some method separately from the input image.
- Such information is hereinafter referred to as “attributes" of the input image.
- Attributes of the input image may be given based on structural data, such as some text data associated with the input image.
- the attribute of the input image may be selected from at least one attribute obtained by inputting the input image into some object detection model or image classification model. At this time, these models refer to trained machine learning models, for example.
- heatmaps In the technical field of image analysis, it is already known that by preparing appropriate learning data, it is possible to create an evaluation image that numerically indicates the evaluation of the importance of each pixel that constitutes the image. Such evaluation images are hereinafter referred to as "heatmaps" herein.
- the resolution of the heat map does not necessarily have to match that of the input image, and individual evaluation values may be indicated by a plurality of pixels such as 3 ⁇ 3 or 5 ⁇ 5.
- the viewpoint (that is, attribute) on which such a heat map is generated depends on the prepared machine learning learning data, so different heat maps are output according to various attributes. It is usually not possible to prepare a machine learning model to discriminate. It is possible to imagine a machine learning model that outputs a heatmap with attributes as input in addition to the input image, but it is not easy to prepare training data for training such a machine learning model. Therefore, in the information processing apparatus 100, as the machine learning model group 10, a plurality of learned machine learning models 11 that are relatively easy to implement are prepared.
- the machine learning model 11 outputs a heat map from a specific point of view determined for each machine learning model 11, instead of the ultimately required heat map, which may differ depending on the attributes of the input image.
- the heat map that is finally obtained in line with the attributes of the input image is simply referred to as a heat map
- the heat map from a single viewpoint obtained by the individual machine learning model 11 is referred to as an intermediate heat map.
- An intermediate heat map corresponds to an output such as an attention map or attention image generated using an attention model included in an individual machine learning model, for example.
- the machine learning model uses the attention model based on the feature quantity map output by a feature extractor such as a CNN (convolutional neural network) included in the machine learning model, and outputs an attention map, attention image, etc. as an intermediate heatmap.
- a feature extractor such as a CNN (convolutional neural network) included in the machine learning model
- the attention map here may be a map generated based on the attention model, or may be a map generated without being based on the attention model.
- an attention map as an intermediate heat map in CTR prediction or aesthetic evaluation value prediction corresponds to an attention map generated based on an attention model.
- 4 to 7 are examples of intermediate heat maps for the input image illustrated in FIG.
- the intermediate heatmap in Figure 4 is the CTR (click-through rate) prediction.
- the machine learning model 11 that outputs the CTR prediction as an intermediate heat map uses, for example, a machine learning architecture known as CNN, and is trained using images annotated with scores corresponding to CTR as training data. Obtainable. Such learning data can be obtained, for example, by tracking user operations on images displayed on an EC (electronic commerce) site.
- the intermediate heat map in Figure 5 is the aesthetic evaluation value.
- a machine learning model 11 such as a CNN using learning data.
- the learning data is created by annotating the images in such a manner. Note that the aesthetic evaluation value here can be rephrased as an aesthetic score.
- the intermediate heat map in Figure 6 shows clothing. That is, it is an image indicating a portion corresponding to "clothing" in the image, and indicates an area in the input image in which clothes worn by a person are shown.
- dedicated learning data may be created one by one. are extracted and labeled, and the machine learning model 11 is trained using the data of the region labeled “clothes” as learning data.
- image segmentation techniques those known as R-CNN or Faster R-CNN are well known and can be used.
- R-CNN or Faster R-CNN may be used directly as the machine learning model 11, extracting only the data of the region labeled "clothes” and using it as an intermediate heat map.
- the intermediate heat map in FIG. 7 shows the bag, and its meaning indicates the area in the input image in which the bag is shown.
- the same process as that described for the clothing in FIG. 6 may be performed for the bag.
- any desired number of types of machine learning models 11 are prepared so that the necessary number of types of intermediate heatmaps can be obtained.
- these intermediate heat maps are shown as if they were binary images. bit depth grayscale image.
- the architecture of the machine learning model is not limited to this. Not only DNN (deep neural network) such as CNN, but also other machine learning methods may be used, and the architecture is different for each intermediate heat map to be obtained, that is, for each machine learning model 11. may be Also, the format of the input image is converted according to the machine learning model 11 to be input. For example, it is converted into a raster image of a predetermined size and resolution.
- DNN deep neural network
- the architecture is different for each intermediate heat map to be obtained, that is, for each machine learning model 11.
- the format of the input image is converted according to the machine learning model 11 to be input. For example, it is converted into a raster image of a predetermined size and resolution.
- the obtained multiple intermediate heatmaps are passed to the generation unit 20, which directly or indirectly creates a heatmap from the multiple intermediate heatmaps based on the attributes.
- "directly based on attributes” means that, for example, when a heat map is created from a plurality of intermediate maps by synthesis, an intermediate heat map to be used for synthesis is selected according to the attribute, or the difference in synthesis is determined. It refers to using attributes in some way when creating a heat map from a plurality of intermediate heat maps delivered to the generation unit 20, such as by varying the weighting according to the attributes.
- indirectly based on the attribute means that the intermediate heat map to be created to be passed to the generation unit 20 is prepared in advance using the attribute in some form, such as being selected according to the attribute. It means that a plurality of intermediate maps are passed to the generation unit 20 and used to generate a heat map.
- FIG. 8 and 9 are examples of heat maps generated by the generation unit 20.
- FIG. The heat map shown in FIG. 8 designates clothing as an attribute
- the heat map shown in FIG. 9 designates bags as an attribute, both of which are generated based on the same input image shown in FIG. It can be seen that completely different heatmaps are generated when the specified attributes are different.
- the heat map shown in FIG. 8 and the intermediate heat map showing clothing shown in FIG. 6, the two are not the same, and the intermediate heat map in FIG.
- an area centering on the clothing portion and including a moderate area around it is shown. It can be seen that an appropriate region is selected as the region showing In the heat map shown in FIG. 9, it can be seen that similarly appropriate regions are appropriately selected.
- the clipping unit 30 clips a main portion, which is a part of the input image, based on the heat map generated by the generating unit 20.
- the term "cut out” means to specify the position and shape of the main portion, which is a part of the input image, and to delete the portion other than the main portion from the image data itself of the input image. is not necessarily required. This is because even if all the image data of the input image is saved, if the position and shape of the main part are specified, it is possible to display only the main part when displaying the image. be.
- the shape of the main portion is rectangular, but the shape of the main portion may be any shape, such as elliptical, star-shaped, or other irregular shape.
- FIG. 10A and 10B are diagrams illustrating an example of processing performed by the clipping unit 30.
- FIG. 10 As the method described here, a method called sliding window is used.
- the clipping unit 30 sets various clipping windows W having different sizes and shapes on the heat map.
- W Al , W Bm , and W Cn shown in FIG. 10 are all part of the cutting window W.
- the subscripts of the first letter after W (A, B, C...) indicate the size and shape of the cut-out window W
- the subscripts of the second letter (l, m, n...) are The position of the cutout window W on the heat map is shown.
- a window suitable for the main part W opt is selected.
- the method shown in Equation 2 below may be used.
- Equation 2 means selecting the smallest clipping window W included in the candidate window Wcand . It is a thing.
- the main part W opt is obtained, and the input image is trimmed using this main part W opt as an outer frame, An image is obtained by extracting only the portion considered to be important from the input image.
- the method of selecting a window suitable for the main part W opt from the cutout windows W included in W cand is not limited to the above.
- Equation 3 is to select the window W included in W cand that has the largest aesthetic evaluation value. Note that in the present embodiment, any score based on the sum of one or more scores for the clipping window W may be treated as the aesthetic evaluation value in Equation (3).
- FIG. 12 is a diagram showing the processing flow executed by the clipping unit 30 in the example described above.
- the clipping unit 30 sets the clipping windows W of various sizes, shapes, and positions in step S01 as already described, and in step S02, by the method shown in Equation 1 or other similar methods. Extract the candidate window W cand .
- a suitable W opt for the main part is selected by the method shown in Equation 2 or Equation 3 or other methods.
- the size, shape, and position of the frame indicated by the selected main part W opt indicate the region from which the input image should be cut out.
- the clipping unit 30 may obtain the main part W opt by executing another process.
- a machine learning model preferably a trained R-CNN
- R-CNN may be used to output the size, shape and position of the main part W opt directly from the heat map.
- a machine learning model it can be trained with training data showing various examples of heatmaps and corresponding W opts .
- a method called sliding window which was described as the processing performed by the clipping unit 30 above, may be used.
- FIG. 13 is a configuration diagram showing a typical physical configuration of such a general computer 1. As shown in FIG.
- a computer 1 has a CPU (Central Processing Unit) 1a, a RAM (Random Access Memory) 1b, an external storage device 1c, a GC (Graphics Controller) 1d, an input device 1e and an I/O (Input/Output) 1f via a data bus 1g. They are connected so that they can exchange electrical signals with each other.
- the external storage device 1c is a device capable of statically recording information, such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
- a signal from the GC 1d is output to a monitor 1h, such as a CRT (Cathode Ray Tube) or a so-called flat panel display, on which the user visually recognizes an image, and displayed as an image.
- a monitor 1h such as a CRT (Cathode Ray Tube) or a so-called flat panel display, on which the user visually recognizes an image, and displayed as an image.
- the input device 1e is a device such as a keyboard, mouse, touch panel, etc., for a user to input information
- the I/O 1f is an interface for the computer 1 to exchange information with external devices.
- a plurality of CPUs 1a may be prepared to perform parallel operations.
- An application program containing a sequence of instructions for causing the computer 1 to function as the information processing device 100 is installed in the external storage device 1c, read out to the RAM 1b as necessary, and executed by the CPU 1a. Further, such a program may be provided by being recorded on an appropriate computer-readable information recording medium such as an appropriate optical disk, magneto-optical disk, or flash memory, or may be provided via an information communication line such as the Internet.
- an interface related to the computer 1 itself may be implemented so that the user may directly operate the computer 1, or a web interface may be used on another computer. It may be a so-called cloud computing method in which general-purpose software such as a browser is used and functions are provided from the computer 1 via the I/O1f. interface), the computer 1 may operate as the information processing apparatus 100 in response to a request from another computer.
- FIG. 14 is a diagram showing the functional configuration of the information processing device 200 according to the first embodiment of the present invention.
- the first embodiment is an example in which the generation unit 220 generates a heat map indirectly based on the attributes of the input image.
- the machine learning models 211c and 211d are selected based on the attributes of the input image. That is, the machine learning model 211c that outputs the intermediate heat map of clothing and the machine learning model 211d that outputs the intermediate heat map of bags are switched between the presence or absence of the input image according to the attribute.
- the machine learning model 211c is selected and an intermediate heat map for clothes is output, while the machine learning model 211d is not selected and an intermediate heat map for bags is output. No map is generated.
- Such selection is performed by the machine learning model selection unit 212 schematically indicated by a switch in FIG.
- the machine learning model selection unit 212 selects at least one machine learning model from a plurality of machine learning models, here the machine learning model 211c and the machine learning model 211d, as an input target of the input image.
- the dashed line indicates that the machine learning model 211d was not selected.
- the machine learning models 211a and 211b are not selected according to their attributes, and are configured to always output intermediate heatmaps.
- Such a configuration may be adjusted according to the specific purpose of the information processing device 200 and the like. For example, there may or may not be machine learning models that are always used without being selected based on attributes, and there is no limit to the number of machine learning models that are selected based on attributes. Also, a plurality of machine learning models may be selected for a specific attribute, or the number of machine learning models selected for each attribute may be different. In this embodiment, as an example, a machine learning model that outputs an intermediate heat map for CTR and aesthetic evaluation values is treated as not being selected depending on attributes, and a machine learning model that outputs an intermediate heat map for clothing and accessories. We are treating the model as being selected by attributes.
- the number of obtained intermediate heat maps is the same as the number of machine learning models 211a to 211c to which the input image is input. These are combined in the generation unit 220 to obtain the final heat map.
- the method of this synthesis is not particularly limited, but one example is a method of multiplying the respective intermediate heat maps by appropriate weights and adding them. That is, the finally obtained heat map H0 is obtained by the method shown in Equation 4 below.
- Hk is the kth intermediate heatmap and wk is the weighting factor for each intermediate heatmap.
- wk may be determined dynamically as in the third embodiment described later, it may be given as a fixed value in advance. For example, it may be 0.3 for intermediate heatmaps for CTR and aesthetic ratings, and 0.4 for intermediate heatmaps for clothing and bags.
- the clipping unit 230 clips the main portion, which is a part of the input image. This clipping process may be as already described as common to each embodiment.
- FIG. 15 is a diagram showing the functional configuration of an information processing device 300 according to the second embodiment of the present invention.
- the second embodiment is one of the examples in which the generation unit 320 generates a heat map directly based on the attribute of the input image.
- the information processing device 300 is configured such that at least one intermediate heat map is selected from a plurality of intermediate heat maps. That is, either the intermediate heat map of clothes or the intermediate heat map of bags is selected by the intermediate heat map selection unit 321 schematically indicated by a switch in FIG. Such selection is based on attributes, and intermediate heatmaps that are not selected are not used in generator 320 .
- the intermediate heatmaps for CTR and aesthetic evaluation values are not selected according to their attributes, and are always used to synthesize the final heatmap.
- Such a configuration may also be adjusted according to the specific purpose of the information processing apparatus 300 and the like. For example, there may or may not be intermediate heatmaps that are always used without being selected based on attributes, and there is no limit to the number of intermediate heatmaps that are selected based on attributes. Also, a plurality of intermediate heat maps may be selected for a specific attribute, or the number of intermediate heat maps selected for each attribute may be different.
- the number of intermediate heatmaps used for synthesizing heatmaps, including the selected intermediate heatmaps, is equal to or less than the number of machine learning models 311a to 311d to which the input image is input. These are combined in the generation unit 320 to obtain the final heat map.
- the method of synthesis may be similar to that of the previous embodiment. Also, based on the heat map obtained in this way, the clipping unit 330 may clip a main part, which is a part of the input image.
- FIG. 16 is a diagram showing the functional configuration of an information processing device 400 according to the third embodiment of the present invention.
- the third embodiment is another example in which the generating unit 420 generates a heat map based directly on the attributes of the input image.
- the information processing device 400 is configured to use weights based on the attributes of the input image when generating a heat map by synthesizing a plurality of intermediate heat maps in the generation unit 420 . That is, in principle, the input image is input to a plurality of machine learning models 411a to 411d prepared in the machine learning model group 410, and the same number of intermediate heat maps as the plurality of machine learning models 411a to 411d are obtained. The weights used when synthesizing the intermediate heatmaps are made different according to the attributes.
- the generator 420 generates at least part of the weight based on the attribute. Specifically, if the attribute is "clothing", the weights corresponding to the intermediate heat maps for CTR, aesthetic evaluation value, clothing, and bags are set to 0.3, 0.3, 0.3, and 0.1. , and if the attribute is "bag", the weights are similarly assigned as 0.3, 0.3, 0.1, 0.3.
- the attribute items do not necessarily correspond to a specific machine learning model. For example, it is possible to provide "fashion item” as an attribute and assign corresponding weights such as 0.3, 0.3, 0.2, 0.2.
- the weight for the intermediate heatmap corresponding to the CTR and the aesthetic evaluation value is always assigned 0.3 without change.
- the weight may be given in advance as a constant.
- the machine learning model selection unit 212 is provided to At least one machine learning model to be used as an input target for the input image may be selected.
- the intermediate heat map selection unit 321 is provided to select at least one intermediate heat map based on the attribute, and the generation unit 420 , or may have both configurations.
- FIG. 17 is a diagram showing a common processing flow of information processing methods executed by the information processing apparatuses 100 to 400 according to each embodiment of the present invention.
- step S11 one or more intermediate heatmaps are output with respect to the input of the input image.
- steps are performed by the machine learning model group 10 shown in FIG. 1 as common to each embodiment, and have already been described. Further, in each embodiment, the processing executed by the machine learning model group 210 shown in FIG. 14, the machine learning model group 310 shown in FIG. 15, and the machine learning model group 410 shown in FIG. do.
- step S12 a heat map is generated based on the attributes of the input image and the intermediate heat map.
- steps have already been described as being performed by the generation unit 20 shown in FIG. 1 as common to each embodiment.
- the processing executed by the generation unit 220 shown in FIG. 14, the generation unit 320 shown in FIG. 15, and the generation unit 420 shown in FIG. 16 corresponds.
- the selection of the machine learning model by the machine learning model selection unit 212 shown in FIG. this is realized by selection of the intermediate heat map by the intermediate heat map selection unit shown in FIG. 15, and by determination of the weight by the generation unit 420 shown in FIG. 16 in the third embodiment.
- step S13 based on the heat map, a main portion, which is a part of the input image, is cut out.
- steps have already been described as being performed by the clipping unit 30 shown in FIG. is similar.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/029,093 US12597176B2 (en) | 2021-09-30 | 2021-09-30 | Image generator and method of image generation |
| PCT/JP2021/036195 WO2023053364A1 (ja) | 2021-09-30 | 2021-09-30 | 情報処理装置、情報処理方法及び情報処理プログラム |
| JP2022557886A JP7395767B2 (ja) | 2021-09-30 | 2021-09-30 | 情報処理装置、情報処理方法及び情報処理プログラム |
| EP21950363.8A EP4184432A4 (en) | 2021-09-30 | 2021-09-30 | Information processing device, information processing method, and information processing program |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/036195 WO2023053364A1 (ja) | 2021-09-30 | 2021-09-30 | 情報処理装置、情報処理方法及び情報処理プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023053364A1 true WO2023053364A1 (ja) | 2023-04-06 |
Family
ID=85782009
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/036195 Ceased WO2023053364A1 (ja) | 2021-09-30 | 2021-09-30 | 情報処理装置、情報処理方法及び情報処理プログラム |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12597176B2 (https=) |
| EP (1) | EP4184432A4 (https=) |
| JP (1) | JP7395767B2 (https=) |
| WO (1) | WO2023053364A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7594075B1 (ja) | 2023-11-17 | 2024-12-03 | 楽天グループ株式会社 | 画像生成装置、画像生成方法、および画像生成プログラム |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025164632A1 (ja) * | 2024-01-31 | 2025-08-07 | 京セラ株式会社 | 学習方法、学習装置、学習システム、制御プログラムおよび記録媒体 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021005301A (ja) * | 2019-06-27 | 2021-01-14 | 株式会社パスコ | 建物抽出処理装置及びプログラム |
| WO2021130856A1 (ja) * | 2019-12-24 | 2021-07-01 | 日本電気株式会社 | 物体識別装置、物体識別方法、学習装置、学習方法、及び、記録媒体 |
Family Cites Families (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000075889A (ja) * | 1998-09-01 | 2000-03-14 | Oki Electric Ind Co Ltd | 音声認識システム及び音声認識方法 |
| US8009921B2 (en) | 2008-02-19 | 2011-08-30 | Xerox Corporation | Context dependent intelligent thumbnail images |
| US20150170053A1 (en) * | 2013-12-13 | 2015-06-18 | Microsoft Corporation | Personalized machine learning models |
| JP6960722B2 (ja) * | 2016-05-27 | 2021-11-05 | ヤフー株式会社 | 生成装置、生成方法、及び生成プログラム |
| GB201705876D0 (en) | 2017-04-11 | 2017-05-24 | Kheiron Medical Tech Ltd | Recist |
| JP7149692B2 (ja) * | 2017-08-09 | 2022-10-07 | キヤノン株式会社 | 画像処理装置、画像処理方法 |
| US10521927B2 (en) * | 2017-08-15 | 2019-12-31 | Siemens Healthcare Gmbh | Internal body marker prediction from surface data in medical imaging |
| US10909401B2 (en) | 2018-05-29 | 2021-02-02 | Sri International | Attention-based explanations for artificial intelligence behavior |
| GB201812050D0 (en) * | 2018-07-24 | 2018-09-05 | Dysis Medical Ltd | Computer classification of biological tissue |
| US12014530B2 (en) * | 2018-12-21 | 2024-06-18 | Hitachi High-Tech Corporation | Image recognition device and method |
| CN111488475B (zh) * | 2019-01-29 | 2025-08-19 | 北京三星通信技术研究有限公司 | 图像检索方法、装置、电子设备及计算机可读存储介质 |
| CN110930547A (zh) | 2019-02-28 | 2020-03-27 | 上海商汤临港智能科技有限公司 | 车门解锁方法及装置、系统、车、电子设备和存储介质 |
| JP7334432B2 (ja) * | 2019-03-15 | 2023-08-29 | オムロン株式会社 | 物体追跡装置、監視システムおよび物体追跡方法 |
| JP6929322B2 (ja) * | 2019-05-31 | 2021-09-01 | 楽天グループ株式会社 | データ拡張システム、データ拡張方法、及びプログラム |
| US11532036B2 (en) * | 2019-11-04 | 2022-12-20 | Adobe Inc. | Digital image ordering using object position and aesthetics |
| JP7493323B2 (ja) | 2019-11-14 | 2024-05-31 | キヤノン株式会社 | 情報処理装置、情報処理装置の制御方法およびプログラム |
| JP7490359B2 (ja) * | 2019-12-24 | 2024-05-27 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
| US20220180528A1 (en) * | 2020-02-10 | 2022-06-09 | Nvidia Corporation | Disentanglement of image attributes using a neural network |
| CN111611240B (zh) * | 2020-04-17 | 2024-09-06 | 第四范式(北京)技术有限公司 | 执行自动机器学习过程的方法、装置及设备 |
| CN111629212B (zh) | 2020-04-30 | 2023-01-20 | 网宿科技股份有限公司 | 一种对视频进行转码的方法和装置 |
| US11657230B2 (en) * | 2020-06-12 | 2023-05-23 | Adobe Inc. | Referring image segmentation |
| US12004871B1 (en) * | 2020-08-05 | 2024-06-11 | Amazon Technologies, Inc. | Personalized three-dimensional body models and body change journey |
| CN111709533B (zh) * | 2020-08-19 | 2021-03-30 | 腾讯科技(深圳)有限公司 | 机器学习模型的分布式训练方法、装置以及计算机设备 |
| US12008811B2 (en) * | 2020-12-30 | 2024-06-11 | Snap Inc. | Machine learning-based selection of a representative video frame within a messaging application |
| US12406023B1 (en) | 2021-01-04 | 2025-09-02 | Nvidia Corporation | Neural network training method |
| CN112802034B (zh) | 2021-02-04 | 2024-04-12 | 精英数智科技股份有限公司 | 图像分割、识别方法、模型构建方法、装置及电子设备 |
| US12175703B2 (en) * | 2021-02-19 | 2024-12-24 | Nvidia Corporation | Single-stage category-level object pose estimation |
| US11636663B2 (en) * | 2021-02-19 | 2023-04-25 | Microsoft Technology Licensing, Llc | Localizing relevant objects in multi-object images |
| US12437523B2 (en) * | 2021-04-26 | 2025-10-07 | Jidoka Technologies Private Limited | Anomaly detection using a convolutional neural network and feature based memories |
| US12164556B2 (en) * | 2021-06-01 | 2024-12-10 | Google Llc | Smart suggestions for image zoom regions |
| US20230069310A1 (en) * | 2021-08-10 | 2023-03-02 | Nvidia Corporation | Object classification using one or more neural networks |
| US20230153374A1 (en) * | 2021-11-16 | 2023-05-18 | Nvidia Corporation | High-precision matrix multiplication for neural networks |
| US12417602B2 (en) * | 2023-02-27 | 2025-09-16 | Nvidia Corporation | Text-driven 3D object stylization using neural networks |
-
2021
- 2021-09-30 JP JP2022557886A patent/JP7395767B2/ja active Active
- 2021-09-30 US US18/029,093 patent/US12597176B2/en active Active
- 2021-09-30 WO PCT/JP2021/036195 patent/WO2023053364A1/ja not_active Ceased
- 2021-09-30 EP EP21950363.8A patent/EP4184432A4/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021005301A (ja) * | 2019-06-27 | 2021-01-14 | 株式会社パスコ | 建物抽出処理装置及びプログラム |
| WO2021130856A1 (ja) * | 2019-12-24 | 2021-07-01 | 日本電気株式会社 | 物体識別装置、物体識別方法、学習装置、学習方法、及び、記録媒体 |
Non-Patent Citations (2)
| Title |
|---|
| See also references of EP4184432A4 |
| WENGUAN WANGJIANBING SHEN, DEEP CROPPING VIA ATTENTION BOX PREDICTION AND AESTHETICS ASSESSMENT, 13 August 2021 (2021-08-13) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7594075B1 (ja) | 2023-11-17 | 2024-12-03 | 楽天グループ株式会社 | 画像生成装置、画像生成方法、および画像生成プログラム |
| JP2025082417A (ja) * | 2023-11-17 | 2025-05-29 | 楽天グループ株式会社 | 画像生成装置、画像生成方法、および画像生成プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4184432A1 (en) | 2023-05-24 |
| US20240362831A1 (en) | 2024-10-31 |
| US12597176B2 (en) | 2026-04-07 |
| JP7395767B2 (ja) | 2023-12-11 |
| EP4184432A4 (en) | 2023-10-11 |
| JPWO2023053364A1 (https=) | 2023-04-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2023545565A (ja) | 画像検出方法、モデルトレーニング方法、画像検出装置、トレーニング装置、機器及びプログラム | |
| US9501724B1 (en) | Font recognition and font similarity learning using a deep neural network | |
| CN111274981B (zh) | 目标检测网络构建方法及装置、目标检测方法 | |
| JP2019091434A (ja) | 複数のディープ・ラーニング・ニューラル・ネットワークを動的に重み付けすることによるフォント認識の改善 | |
| JP2023527615A (ja) | 目標対象検出モデルのトレーニング方法、目標対象検出方法、機器、電子機器、記憶媒体及びコンピュータプログラム | |
| CN106462940A (zh) | 图像中通用对象检测 | |
| CN108229533A (zh) | 图像处理方法、模型剪枝方法、装置及设备 | |
| JP2022185144A (ja) | 対象検出方法、対象検出モデルのレーニング方法および装置 | |
| JP2021527263A5 (https=) | ||
| JP2020166397A (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| Kumar et al. | Performance analysis of KNN, SVM and ANN techniques for gesture recognition system | |
| JP7395767B2 (ja) | 情報処理装置、情報処理方法及び情報処理プログラム | |
| CN104050628A (zh) | 图像处理方法和图像处理装置 | |
| Montserrat et al. | Logo detection and recognition with synthetic images | |
| JP2018206252A (ja) | 画像処理システム、評価モデル構築方法、画像処理方法及びプログラム | |
| CN116802683A (zh) | 图像的处理方法和系统 | |
| KR102864472B1 (ko) | 기계 학습을 위한 이미지 처리 장치 및 방법 | |
| CN109697722B (zh) | 用于生成三分图的方法及装置 | |
| Abdechiri et al. | Chaotic target representation for robust object tracking | |
| US11288534B2 (en) | Apparatus and method for image processing for machine learning | |
| CN109034070A (zh) | 一种置换混叠图像盲分离方法及装置 | |
| Jia et al. | Context-based modeling for accurate logo detection in complex environments | |
| Deepak et al. | Maximizing YOLOv2 efficiency: A study on multiclass detection of indoor objects | |
| JP5413156B2 (ja) | 画像処理プログラム及び画像処理装置 | |
| JP7265690B2 (ja) | 情報処理装置、情報処理方法及び情報処理プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022557886 Country of ref document: JP |
|
| ENP | Entry into the national phase |
Ref document number: 2021950363 Country of ref document: EP Effective date: 20230123 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 18029093 Country of ref document: US |