WO2022102075A1 - Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme - Google Patents

Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme Download PDF

Info

Publication number
WO2022102075A1
WO2022102075A1 PCT/JP2020/042398 JP2020042398W WO2022102075A1 WO 2022102075 A1 WO2022102075 A1 WO 2022102075A1 JP 2020042398 W JP2020042398 W JP 2020042398W WO 2022102075 A1 WO2022102075 A1 WO 2022102075A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
learning
map
image
indicates
Prior art date
Application number
PCT/JP2020/042398
Other languages
English (en)
Japanese (ja)
Inventor
一郁 児島
真宏 谷
圭佑 池田
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2020/042398 priority Critical patent/WO2022102075A1/fr
Priority to JP2022561800A priority patent/JP7439953B2/ja
Publication of WO2022102075A1 publication Critical patent/WO2022102075A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a learning device, a processing device, a learning method, a processing method, and a program.
  • Non-Patent Document 1 Patent Documents 1 and 2.
  • Non-Patent Document 1 generates a feature amount map, a map related to reproducibility (a repeatability map), and a map related to reliability (a reliability map) based on an image, and based on these, the characteristics of the appearance of the subject included in the image are characteristic. It discloses a technique (R2D2: Repeatable and Reliable Detector and Descriptor) for extracting a key point with high accuracy. Patent Document 1 discloses a technique for extracting the corresponding points between two images with high accuracy. Patent Document 2 discloses a technique for calculating a feature amount for each pixel and analyzing an image.
  • Non-Patent Document 1 By using the technique described in Non-Patent Document 1, it is possible to extract the key points of the appearance of the subject included in the image with high accuracy.
  • the key point extracted by the technique described in Non-Patent Document 1 has a problem that it is vulnerable to changes in lighting conditions. Therefore, for example, in an image taken under a certain lighting condition (eg, daytime), the key point (pixel) can be distinguished from the surrounding pixels, but the image taken under a different lighting condition (eg, nighttime). In, the key point may be indistinguishable from the surrounding pixels.
  • a certain lighting condition eg, daytime
  • An object of the present invention is to be able to extract key points that are robust against changes in lighting conditions.
  • Parameters of the training model that generate a feature map showing the features for each pixel based on the input image and a first weight map for each pixel showing the weighting value used in the process of determining the pixel as a key point.
  • An acquisition means for acquiring a combination of a plurality of learning images that have different lighting conditions and include the same subject.
  • a plurality of parameters calculated based on each of the plurality of feature amount maps generated from each of the plurality of training images, and a parameter calculated based on the first weighting map generated from the plurality of learning images.
  • a learning means that adjusts the parameters of the learning model based on the loss function defined in A learning device having the above is provided.
  • the computer Parameters of the training model that generate a feature map showing the features for each pixel based on the input image and a first weight map for each pixel showing the weighting value used in the process of determining the pixel as a key point.
  • a plurality of parameters calculated based on each of the plurality of feature amount maps generated from each of the plurality of training images, and a parameter calculated based on the first weighting map generated from the plurality of training images.
  • a learning method is provided that adjusts the parameters of the training model based on the loss function defined in use.
  • Computer Parameters of the training model that generate a feature map showing the features for each pixel based on the input image and a first weight map for each pixel showing the weighting value used in the process of determining the pixel as a key point.
  • a storage means to memorize An acquisition means for acquiring a combination of a plurality of learning images that have different lighting conditions and include the same subject.
  • a plurality of parameters calculated based on each of the plurality of feature amount maps generated from each of the plurality of training images, and a parameter calculated based on the first weighting map generated from the plurality of learning images.
  • a learning means that adjusts the parameters of the training model based on the loss function defined using it.
  • a program is provided that functions as.
  • a processing device for determining key points of an input image is provided using the learning model generated by the learning device.
  • a processing method is provided in which a computer uses a learning model generated by the learning device to determine key points of an input image.
  • a program causes a computer to function as a means of determining key points of an input image using a learning model generated by the learning device.
  • the learning device of the present embodiment performs characteristic learning based on characteristic learning data, and adjusts the parameters of the illustrated learning model (eg, CNN: Convolutional Neural Network).
  • the parameters of the illustrated learning model eg, CNN: Convolutional Neural Network
  • the illustrated learning model is based on an input image (“H ⁇ W image” in the figure), and is a feature map showing the feature amount for each pixel (“H ′ ⁇ W ′ ⁇ C feature amount group” in the figure”. ) And a first weighting map (“Saliency map” in the figure) showing the weighting value for each pixel.
  • Pixels to be key points are determined based on the feature amount map and the first weighting map. Specifically, the evaluation value is calculated for each pixel based on the feature amount map and the first weighting map. Then, based on the evaluation value, the pixel to be the key point is determined.
  • the learning model shown may further have a function of determining the key point, and other processing means physically and / or logically separated from the learning model shown may have the function. ..
  • the learning device of this embodiment executes characteristic learning based on characteristic learning data and adjusts the parameters of the illustrated learning model.
  • a feature amount map and a first weighting map in which the evaluation value of the pixel robust to the change of the lighting condition is relatively high are generated. That is, the feature map and the first feature amount map in which the evaluation value of the pixel that can be discriminated from the surrounding pixels is relatively high not only when the image is taken under a specific lighting condition but also when the image is taken under various lighting conditions.
  • a weighted map of 1 will be generated. As a result, it becomes easier to determine pixels that are robust to changes in lighting conditions as key points.
  • Each functional unit of the learning device is stored in the CPU (Central Processing Unit) of an arbitrary computer, memory, a program loaded in the memory, and a storage unit such as a hard disk for storing the program (from the stage of shipping the device in advance).
  • CPU Central Processing Unit
  • a storage unit such as a hard disk for storing the program (from the stage of shipping the device in advance).
  • programs it can also store programs downloaded from storage media such as CDs (Compact Discs) and servers on the Internet), and is realized by any combination of hardware and software centered on the network connection interface. .. And, it is understood by those skilled in the art that there are various variations in the method of realizing the device and the device.
  • FIG. 2 is a block diagram illustrating the hardware configuration of the learning device.
  • the learning device includes a processor 1A, a memory 2A, an input / output interface 3A, a peripheral circuit 4A, and a bus 5A.
  • the peripheral circuit 4A includes various modules.
  • the learning device does not have to have the peripheral circuit 4A.
  • the learning device may be composed of a plurality of physically and / or logically separated devices, or may be composed of one physically and / or logically integrated device. When the learning device is composed of a plurality of physically and / or logically separated devices, each of the plurality of devices can be provided with the above hardware configuration.
  • the bus 5A is a data transmission path for the processor 1A, the memory 2A, the peripheral circuit 4A, and the input / output interface 3A to transmit and receive data to each other.
  • the processor 1A is, for example, an arithmetic processing unit such as a CPU or a GPU (Graphics Processing Unit).
  • the memory 2A is, for example, a memory such as a RAM (RandomAccessMemory) or a ROM (ReadOnlyMemory).
  • the input / output interface 3A includes an interface for acquiring information from an input device, an external device, an external server, an external sensor, a camera, etc., an interface for outputting information to an output device, an external device, an external server, etc. ..
  • the input device is, for example, a keyboard, a mouse, a microphone, a physical button, a touch panel, or the like.
  • the output device is, for example, a display, a speaker, a printer, a mailer, or the like.
  • the processor 1A can issue a command to each module and perform a calculation based on the calculation result thereof.
  • the learning device 10 has a storage unit 11, an acquisition unit 12, and a learning unit 13.
  • the storage unit 11 stores the parameters of the learning model (hereinafter, may be simply referred to as “learning model”) that generates both the feature amount map and the first weighting map based on the input image. That is, the storage unit 11 stores the parameters of the learning model shown in FIG.
  • the feature map shows the features for each pixel.
  • the feature amount of each pixel is indicated by C values (C is 2 or more).
  • C is 2 or more.
  • the feature map is shown as H ′ ⁇ W ′ ⁇ C as shown in FIG.
  • the type of feature amount and the means for generating the feature amount map are not particularly limited, and any conventional technique can be adopted.
  • a feature map is generated in which the evaluation value of the pixel that is robust to changes in lighting conditions is relatively high.
  • the first weighting map is generated from, for example, a feature map.
  • the output of the middle layer of the network may be used to generate a first weighted map.
  • the first weighted map is generated by convolving, for example, a feature map.
  • the feature map is represented by H'x W'x C
  • the first weighting map is H'x W'x1. That is, in the first weighting map, the value of each pixel is one.
  • One value of each pixel is a weighted value referred to in the process of determining a pixel as a key point.
  • the means for generating the H'xW'x1 map by convolving the H'xW'xC feature map is not particularly limited, and any conventional technique can be adopted.
  • the first weighted map in which the evaluation value of the pixel robust to the change of the lighting condition is relatively high is generated.
  • the acquisition unit 12 acquires characteristic learning data. Specifically, the acquisition unit 12 acquires a combination of a plurality of learning images having different lighting conditions and including the same subject as learning data.
  • the plurality of learning images of each combination may be two or three or more.
  • the different lighting conditions mean that the lighting conditions at the time of shooting are different from each other. That is, when the lighting conditions are different from each other, at least one of the state of natural light (sunlight, moonlight, etc.) at the time of shooting and the state of artificially prepared light (light, candle, camera flash, etc.) are different from each other. Means.
  • the weather at the time of shooting may be different from each other (eg, one shot when it is sunny and the other when it is raining), lights (lights installed in the building, lights prepared for shooting, camera flashes). Etc.) states (ON / OFF, strength, etc.) may be different from each other, and others may be different from each other.
  • the learning unit 13 adjusts the parameters of the learning model stored in the storage unit 11 based on the characteristic loss function.
  • a feature amount map and a first weighted map in which the evaluation value of the pixel robust to changes in lighting conditions is relatively high can be generated.
  • the loss function is designed to generate a feature map and a first weighted map in which the evaluation value of pixels that are robust to changes in lighting conditions is relatively high.
  • a pixel that is robust against changes in lighting conditions is a pixel that can be distinguished from surrounding pixels not only when it is shot under specific lighting conditions but also when it is shot under various lighting conditions.
  • a pixel that can be discriminated from surrounding pixels is a pixel that has a sufficiently large difference in features (features shown in a feature map) from surrounding pixels.
  • the loss function of this embodiment is represented by, for example, the following equation (1).
  • the following loss function may be appropriately changed as long as the same result can be obtained.
  • the loss function shows an example of acquiring a combination of two training images as training data.
  • Various parameters are defined as follows.
  • I shows the first learning image. I'indicates a second learning image. i and j indicate the coordinate values of the pixels in the first training image. pij indicates a pixel in the area where the subject exists in the first learning image. It may show all the pixels in the area where the subject exists, or it may show some pixels picked up by any means. In this embodiment, information indicating an area in which a subject exists in an image is used. Information indicating the area where the subject exists may be input to the learning device 10 from the outside, or the learning device 10 may analyze the learning image to specify the area where the subject exists. S ( pij ) indicates the state value of the pij pixel of the first training image. The state value will be described later.
  • U (i, j) indicates a pixel of the second training image corresponding to the (i, j) pixel of the first training image.
  • a "corresponding pixel" is a pixel that indicates the same part of the same subject.
  • information indicating corresponding pixels of a plurality of learned images is used.
  • Information indicating the corresponding pixels of the plurality of learning images may be input to the learning device 10 from the outside, or the learning device 10 may analyze the learning images to identify the corresponding pixels of the plurality of learning images. .. p'U (i, j) indicates the pixel of the second training image corresponding to the pij pixel of the first training image.
  • S (p'U (i, j) ) indicates the state value of the p'U (i, j) pixel of the second training image.
  • C ij indicates the weighted value of the (i, j) pixel of the first weighted map generated based on the first trained image.
  • Cij has a weighted value of (i, j) pixels of the first weighted map generated based on the first trained image and a second weighted value generated based on the second trained image.
  • Statistical values average value, maximum value, minimum value, etc.
  • P indicates a patch group focusing on the pij pixel.
  • a patch group is a set of pixels including the pixel of interest and the pixels around it.
  • the positional relationship between the pixel of interest and the pixel to be included in the "peripheral pixel" can be arbitrarily determined based on the required performance and the like.
  • indicates the number of pixels contained in the patch group.
  • the state value of each pixel will be described.
  • the state value S ( pij ) of the pij pixel of the first training image is represented by the following equation (2).
  • the state value of the p'U (i, j) pixel of the second learning image is also obtained by the same equation.
  • F ij represents a collection of a plurality of features (C values (C is 2 or more)) of pij pixels.
  • var ( Fij ) indicates an unbiased variance of C values of Fij (C is 2 or more).
  • m and n indicate the coordinate values of the pixels excluding the pij included in the patch group focusing on the pij .
  • F mn indicates a collection of a plurality of feature quantities (values of C (C is 2 or more)) of pixels excluding pij included in the patch group focusing on pij .
  • var (F mn ) indicates an unbiased variance of C values of F mn (C is 2 or more).
  • indicates the number obtained by subtracting 1 from the number of patch groups.
  • the loss function is shown by the state value (S (pij), S ( p'U (i, j) )) of each pixel calculated based on the feature amount map and the first weighting map. It is defined using the weighted value Cij of each pixel.
  • the state value of each pixel is calculated based on the feature amount of each pixel and the feature amount of a plurality of pixels around each pixel. Specifically, when the feature amount of each pixel is indicated by C values (C is 2 or more), the state value of each pixel is the unbiased variance of the C values of each pixel and the periphery of each pixel. It is calculated based on the unbiased variance of C values for each of the plurality of pixels.
  • the weighted value of each descriptor (each pixel) shown in the first weighted map is the evaluation value. Then, the pixel having a large evaluation value is determined as a key point. For example, pixels whose evaluation value is equal to or higher than the reference value may be determined as key points, a predetermined number of pixels may be determined as key points from the one with the larger evaluation value, or key points are determined by other criteria. May be done.
  • a feature amount map and a first weighted map in which the evaluation value of the pixel robust to changes in lighting conditions is relatively high can be generated.
  • the pixel that is robust to the change of the lighting condition can be determined as the key point.
  • the learning device of the present embodiment executes characteristic learning (learning based on the characteristic loss function) based on the characteristic learning data, and adjusts the parameters of the learning model.
  • characteristic learning learning based on the characteristic loss function
  • a feature amount map and a first weighting map in which the evaluation value of the pixel robust to the change of the lighting condition is relatively high are generated. That is, the feature map and the first feature amount map in which the evaluation value of the pixel that can be discriminated from the surrounding pixels is relatively high not only when the image is taken under a specific lighting condition but also when the image is taken under various lighting conditions.
  • a weighted map of 1 will be generated. As a result, it becomes easier to determine pixels that are robust to changes in lighting conditions as key points.
  • ⁇ Second embodiment> "overview"
  • the processing apparatus of this embodiment generates a feature amount map and a first weighting map from an input image by using a learning model whose parameters are adjusted by the learning apparatus 10 described in the first embodiment, and based on them. Determine key points.
  • Each functional part of the processing device includes the CPU of an arbitrary computer, memory, a program loaded into the memory, and a storage unit such as a hard disk for storing the program (in addition to the program stored from the stage of shipping the device in advance, a CD). It can also store programs downloaded from storage media such as, and servers on the Internet), and is realized by any combination of hardware and software centered on the network connection interface. And, it is understood by those skilled in the art that there are various variations in the method of realizing the device and the device.
  • FIG. 2 is a block diagram illustrating a hardware configuration of the processing device.
  • the processing device includes a processor 1A, a memory 2A, an input / output interface 3A, a peripheral circuit 4A, and a bus 5A.
  • the peripheral circuit 4A includes various modules.
  • the processing device does not have to have the peripheral circuit 4A.
  • the processing device may be composed of a plurality of physically and / or logically separated devices, or may be composed of one physically and / or logically integrated device. When the processing device is composed of a plurality of physically and / or logically separated devices, each of the plurality of devices can be provided with the above hardware configuration. Since the description of each element in FIG. 2 has been performed in the first embodiment, it will be omitted here.
  • the processing device 20 includes a storage unit 21, an input unit 22, an estimation unit 23, and an output unit 24.
  • the storage unit 21 stores a learning model whose parameters have been adjusted by the learning device 10 described in the first embodiment. Specifically, the storage unit 21 stores information (data) necessary for executing the learning model, such as parameters of the learning model.
  • the input unit 22 accepts the input of the input image.
  • the estimation unit 23 inputs an input image to the learning model stored in the storage unit 21 and obtains the estimation result.
  • the estimation unit 23 is based on the output feature map and the first weighting map.
  • the evaluation value of each pixel is calculated, and the pixel to be the key point is determined based on the evaluation value.
  • the method of calculating the evaluation value and the method of determining the pixel as a key point are as described in the first embodiment.
  • the estimation unit 23 is configured from the learning model.
  • the information indicating the output key point (information indicating the pixel determined as the key point) is acquired as the estimation result.
  • the output unit 24 outputs information indicating the pixel determined by keypointing.
  • the processing device 20 is used for a process of searching the database for an image including a subject similar to the subject included in the query image and outputting the searched image.
  • the processing device 20 executes the above-mentioned processing and determines a key point from the query image. Then, the information indicating the pixel determined as the key point is output. The output information is input to the similar image search device.
  • the similar image search device searches the database for an image containing a feature amount similar to the feature amount of the pixel determined as a key point, and outputs the searched image.
  • pixels that are robust to changes in lighting conditions are determined as key points and output.
  • an image search using the feature amount of such a key point an image including a subject having a similar appearance to the subject included in the query image can be obtained, and the lighting condition at the time of shooting the image is the time when the query image is shot. It is possible to search with high accuracy regardless of whether or not it is equivalent to the lighting conditions of. That is, for example, by an image search based on a query image including a building A taken in the daytime, not only an image including the building A taken in the daytime but also an image including the building A taken at night can be searched with high accuracy. It becomes possible to do.
  • the processing device 20 is used for a process of searching the database for an image including a subject similar to the subject included in the query image and outputting the position information associated with the searched image.
  • the processing device 20 executes the above-mentioned processing and determines a key point from the query image. Then, the information indicating the pixel determined as the key point is output. The output information is input to the similar image search device.
  • the similar image search device searches the database for an image containing a feature amount similar to the feature amount of the pixel determined as a key point. In the database, position information indicating the position where each image was taken is stored in association with each image. The similar image search device outputs the position information associated with the searched image.
  • the processing device 20 of the present embodiment it is possible to determine and output a pixel that is robust against changes in lighting conditions as a key point. By using such key points, image search that is robust against changes in lighting conditions can be realized.
  • the processing device 20 of the present embodiment generates at least one of a map regarding reproducibility and a map regarding reliability in addition to the feature amount map and the first weighting map based on the input image.
  • the generation of the feature amount map and the first weighting map is realized by the method described in the second embodiment.
  • the generation of the map regarding reproducibility and the map regarding reliability is realized by the method disclosed in Non-Patent Document 1.
  • Both the reproducibility map and the reliability map show the weighted value of each pixel.
  • the input image has H ⁇ W pixels and the feature map is shown as H ′ ⁇ W ′ ⁇ C
  • both the reproducibility map and the reliability map are indicated by H ′ ⁇ W ′ ⁇ 1. Is done. That is, in the map regarding reproducibility and the map regarding reliability, the value of each pixel is one.
  • the processing apparatus 20 calculates the evaluation value of each pixel using at least one of the map relating to reproducibility and the map relating to reliability in addition to the feature amount map and the first weighting map, and based on the calculated evaluation value. Determine key points.
  • the evaluation value is calculated by multiplying the weighting value shown in the first weighting map, the weighting value shown in the reproducibility map, and the weighting value shown in the reliability map. May be good.
  • the weighting value shown in the first weighting map, the weighting value shown in the reproducibility map, and the weighting value shown in the reliability map are added to calculate the evaluation value. May be good.
  • the sum of (weighting value shown in the first weighting map) ⁇ ⁇ , (weighting value shown in the map regarding reproducibility) ⁇ ⁇ , and (weighting value indicated by the map regarding reliability) ⁇ ⁇ . May be used as an evaluation value.
  • ⁇ , ⁇ , and ⁇ are weighted values of each map.
  • pixels having a high evaluation value calculated by collectively evaluating the three maps are extracted as key points.
  • pixels having a high overall score of each map may be extracted as key points.
  • the score of each map may be filtered under a predetermined condition (greater than the threshold value).
  • the map related to reproducibility is A
  • the map related to reliability is B
  • the first weighting map is C
  • the process of calculating the evaluation value may be performed on the pixel group M, and key points may be extracted from the pixel group M based on the evaluation value.
  • the evaluation value may be calculated by using only one of the weighted values shown in the map regarding reproducibility and the weighted value shown in the map regarding reliability.
  • the same operation and effect as those of the second embodiment are realized.
  • the key point can be determined by using at least one of the map regarding reproducibility and the map regarding reliability in addition to the feature amount map and the first weighting map. ..
  • By using such key points it is possible to realize an image search that is robust against changes in lighting conditions and robust against changes in shooting angles.
  • acquisition means “the own device goes to fetch the data stored in another device or storage medium” based on the user input or the instruction of the program (actively).
  • Acquisition ", for example, requesting or inquiring about other devices to receive, accessing and reading other devices or storage media, etc., and based on user input or program instructions," Entering data output from another device into your own device (passive acquisition) ”, for example, receiving data to be delivered (or transmitted, push notification, etc.), and received data or information.
  • Some or all of the above embodiments may also be described, but not limited to: 1.
  • 1. Parameters of the training model that generate a feature map showing the features for each pixel based on the input image and a first weight map showing the weights used in the process of determining the pixels as key points for each pixel.
  • An acquisition means for acquiring a combination of a plurality of learning images that have different lighting conditions and include the same subject.
  • a plurality of parameters calculated based on each of the plurality of feature amount maps generated from each of the plurality of training images, and a parameter calculated based on the first weighting map generated from the plurality of learning images.
  • a learning means that adjusts the parameters of the learning model based on the loss function defined in A learning device with. 2.
  • the learning device adjusts the parameters of the learning model that generate both the feature amount map and the first weighting map based on the input image based on the loss function.
  • the loss function is defined by using the state value of each pixel calculated based on the feature map and the weighted value of each pixel shown in the first weighted map.
  • the learning device according to 1 or 2, wherein the state value of each pixel is calculated based on the feature amount of each pixel and the feature amount of a plurality of pixels around each pixel. 4.
  • the feature amount of each pixel is indicated by C values (C is 2 or more).
  • Learning device 5.
  • (I indicates a first training image. I'indicates a second training image. I and j indicate the coordinate values of the pixels in the first training image. Pij indicates the first. Indicates a pixel in the area where the subject exists in the learning image of. S ( pij ) indicates the state value of the pij pixel of the first learning image. U (i, j) indicates the first.
  • P'U (i, j) is the second training image corresponding to the pij pixel of the first training image.
  • S (p'U (i, j) ) indicates the state value of the p'U (i, j) pixel of the second training image.
  • C ij is based on the first training image. The weighted value of the (i, j) pixel of the generated first weighted map is shown.
  • P indicates the patch group focusing on pij .
  • F ij indicates the feature amount of the p ij pixel (value of C (C is 2 or more)).
  • Var (F ij ) indicates the unbiased dispersion of the value of C (C is 2 or more) of F ij . .M and n indicate the coordinate values of the pixels excluding pij included in the patch group focusing on pij .
  • F mn is the feature quantity of the pixels excluding pij included in the patch group focusing on pij (. C (C is 2 or more) values).
  • Var (F mn ) indicates an unbiased dispersion of C C values (C is 2 or more) of F mn .
  • is from the number of patch groups. Shows the number minus 1.) 6.
  • the computer Parameters of the training model that generate a feature map showing the features for each pixel based on the input image and a first weight map showing the weights used in the process of determining the pixels as key points for each pixel.
  • a combination of multiple learning images that have different lighting conditions and include the same subject.
  • a plurality of parameters calculated based on each of the plurality of feature amount maps generated from each of the plurality of training images, and a parameter calculated based on the first weighting map generated from the plurality of learning images.
  • a learning method that adjusts the parameters of the training model based on the loss function defined in use. 7.
  • Parameters of the training model that generate a feature map showing the features for each pixel based on the input image and a first weight map showing the weights used in the process of determining the pixels as key points for each pixel.
  • a storage means to memorize An acquisition means for acquiring a combination of a plurality of learning images that have different lighting conditions and include the same subject.
  • a plurality of parameters calculated based on each of the plurality of feature amount maps generated from each of the plurality of training images, and a parameter calculated based on the first weighting map generated from the plurality of learning images.
  • a learning means that adjusts the parameters of the training model based on the loss function defined using it.
  • a program that functions as. 8.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention fournit un dispositif d'apprentissage (10) comprenant : une unité de stockage (11) qui stocke un paramètre d'un modèle d'apprentissage pour générer, sur la base d'une image d'entrée, une carte de quantité de caractéristiques indiquant une quantité de caractéristiques de chaque pixel et une première carte de pondération indiquant, pour chaque pixel, une valeur de pondération utilisée lors d'un processus de détermination d'un pixel pour servir de point clé; une unité d'acquisition (12) pour acquérir des combinaisons d'une pluralité d'images d'apprentissage dans chacune desquelles des conditions d'éclairage sont différentes les unes des autres et qui comprennent le même sujet photographique; et une unité d'apprentissage (13) qui ajuste le paramètre du modèle d'apprentissage sur la base d'une fonction de perte définie en utilisant une pluralité de paramètres calculés sur la base d'une pluralité de cartes de quantité de caractéristiques générées respectivement à partir de la pluralité d'images d'apprentissage et un paramètre calculé sur la base de la première carte de pondération générée à partir de la pluralité d'images d'apprentissage.
PCT/JP2020/042398 2020-11-13 2020-11-13 Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme WO2022102075A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/042398 WO2022102075A1 (fr) 2020-11-13 2020-11-13 Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme
JP2022561800A JP7439953B2 (ja) 2020-11-13 2020-11-13 学習装置、処理装置、学習方法、処理方法及びプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/042398 WO2022102075A1 (fr) 2020-11-13 2020-11-13 Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme

Publications (1)

Publication Number Publication Date
WO2022102075A1 true WO2022102075A1 (fr) 2022-05-19

Family

ID=81600922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/042398 WO2022102075A1 (fr) 2020-11-13 2020-11-13 Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme

Country Status (2)

Country Link
JP (1) JP7439953B2 (fr)
WO (1) WO2022102075A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018515164A (ja) * 2015-03-27 2018-06-14 シーメンス アクチエンゲゼルシヤフトSiemens Aktiengesellschaft 画像分類を用いた脳腫瘍自動診断方法および脳腫瘍自動診断システム
WO2020042741A1 (fr) * 2018-08-27 2020-03-05 北京百度网讯科技有限公司 Procédé et dispositif de détection de batterie
US20200175673A1 (en) * 2018-11-30 2020-06-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for detecting defect of meal box, server, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018515164A (ja) * 2015-03-27 2018-06-14 シーメンス アクチエンゲゼルシヤフトSiemens Aktiengesellschaft 画像分類を用いた脳腫瘍自動診断方法および脳腫瘍自動診断システム
WO2020042741A1 (fr) * 2018-08-27 2020-03-05 北京百度网讯科技有限公司 Procédé et dispositif de détection de batterie
US20200175673A1 (en) * 2018-11-30 2020-06-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for detecting defect of meal box, server, and storage medium

Also Published As

Publication number Publication date
JP7439953B2 (ja) 2024-02-28
JPWO2022102075A1 (fr) 2022-05-19

Similar Documents

Publication Publication Date Title
You et al. Relative CNN-RNN: Learning relative atmospheric visibility from images
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
US11238310B2 (en) Training data acquisition method and device, server and storage medium
Tian et al. Query-dependent aesthetic model with deep learning for photo quality assessment
US8594385B2 (en) Predicting the aesthetic value of an image
US7809185B2 (en) Extracting dominant colors from images using classification techniques
JP5782404B2 (ja) 画質評価
CN109271542A (zh) 封面确定方法、装置、设备及可读存储介质
JP5261501B2 (ja) 不変の視覚場面及び物体の認識
CN103988202A (zh) 基于索引和搜索的图像吸引力
CN108229418B (zh) 人体关键点检测方法和装置、电子设备、存储介质和程序
CN110765882B (zh) 一种视频标签确定方法、装置、服务器及存储介质
CN102144231A (zh) 用于基于文本的图像搜索结果重新排序的自适应视觉相似性
CN102385592B (zh) 图像概念的检测方法和装置
CN111291887A (zh) 神经网络的训练方法、图像识别方法、装置及电子设备
CN111274981A (zh) 目标检测网络构建方法及装置、目标检测方法
JP5214679B2 (ja) 学習装置、方法及びプログラム
CN112052840A (zh) 图片筛选方法、系统、设备及存储介质
CN111373393B (zh) 图像检索方法和装置以及图像库的生成方法和装置
CN111179245B (zh) 图像质量检测方法、装置、电子设备和存储介质
WO2022102075A1 (fr) Dispositif d'apprentissage, dispositif de traitement, procédé d'apprentissage, procédé de traitement, et programme
Dong et al. Scene-oriented hierarchical classification of blurry and noisy images
CN116958809A (zh) 一种特征库迁移的遥感小样本目标检测方法
JP6778625B2 (ja) 画像検索システム、画像検索方法及び画像検索プログラム
CN113537158B (zh) 一种图像目标检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961603

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022561800

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20961603

Country of ref document: EP

Kind code of ref document: A1