WO2018198233A1 - Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme - Google Patents

Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme Download PDF

Info

Publication number
WO2018198233A1
WO2018198233A1 PCT/JP2017/016565 JP2017016565W WO2018198233A1 WO 2018198233 A1 WO2018198233 A1 WO 2018198233A1 JP 2017016565 W JP2017016565 W JP 2017016565W WO 2018198233 A1 WO2018198233 A1 WO 2018198233A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature amount
intermediate feature
classifier
offset
Prior art date
Application number
PCT/JP2017/016565
Other languages
English (en)
Japanese (ja)
Inventor
大地 小野
Original Assignee
株式会社ソニー・インタラクティブエンタテインメント
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ソニー・インタラクティブエンタテインメント filed Critical 株式会社ソニー・インタラクティブエンタテインメント
Priority to EP17907458.8A priority Critical patent/EP3617991A4/fr
Priority to JP2019514953A priority patent/JP6722351B2/ja
Priority to PCT/JP2017/016565 priority patent/WO2018198233A1/fr
Priority to CN201780089794.5A priority patent/CN110582783B/zh
Priority to US16/603,085 priority patent/US11403560B2/en
Publication of WO2018198233A1 publication Critical patent/WO2018198233A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a learning device, an image recognition device, a learning method, and a program.
  • semantic segmentation which is one of the image recognition techniques for a photographed image using a classifier
  • the meaning of the pixel is specified for each pixel included in the inputted photographed image.
  • the present invention has been made in view of the above problems, and one of its purposes is a learning apparatus and an image recognition apparatus that can improve the image recognition accuracy of a photographed image using a classifier that has been learned using a CG image. It is to provide a learning method and a program.
  • the learning device includes an intermediate 1 included in the first classifier when the input data of the image is input to the first classifier that classifies the CG image and the photographed image.
  • An intermediate feature amount identifying unit that identifies an output of a layer as an intermediate feature amount associated with the image, the intermediate feature amount associated with each of one or more CG images, and each of one or more photographed images Based on the intermediate feature amount associated with the offset feature amount specifying unit for identifying the offset feature amount, the intermediate feature amount associated with the CG image, and the offset feature amount, A post-offset intermediate feature quantity specifying unit for specifying the post-offset intermediate feature quantity associated with the CG image; and the post-offset intermediate feature quantity associated with each of the plurality of CG images.
  • the offset feature amount specifying unit includes a representative value of the intermediate feature amount associated with each of a plurality of CG images and a representative of the intermediate feature amount associated with each of a plurality of actual images.
  • the offset feature quantity is specified based on the value.
  • the first classifier is a classifier after learning using learning data of a CG image and learning data of a live-action image.
  • the learning method outputs the output of one intermediate layer included in the first classifier when the input data of the image is input to the first classifier that classifies the CG image and the photographed image. Identifying as an intermediate feature amount associated with an image, the intermediate feature amount associated with each of one or more CG images, and the intermediate feature amount associated with each of one or more actual images, Based on the step, the offset feature amount is specified, the intermediate feature amount associated with the CG image, and the offset feature amount, the post-offset intermediate feature amount associated with the CG image is identified. Steps and before being associated with the photographed image at the time of image recognition of the photographed image using the post-offset intermediate feature value associated with each of the plurality of CG images And executing the learning of the second classifier intermediate feature quantity is input.
  • the image recognition method outputs the output of one intermediate layer included in the first classifier when the input data of the image is input to the first classifier that classifies the CG image and the photographed image. Identifying as an intermediate feature amount associated with the image, the intermediate feature amount associated with each of the one or more CG images, and the intermediate feature amount associated with each of the one or more photographed images Identifying the post-offset intermediate feature amount associated with the live-action image based on the offset feature amount based on, and the intermediate feature amount associated with the real-image image of the image recognition target; and the image recognition target
  • the second classification in which learning using the intermediate feature amount associated with each of a plurality of CG images is performed on the post-offset intermediate feature amount associated with the actual photographed image In including the steps of identifying an output of the second classifier as a result of image recognition of the photographed image in the case of input as input data.
  • another program outputs the output of one intermediate layer included in the first classifier when the input data of the image is input to the first classifier that classifies the CG image and the photographed image.
  • the post-offset intermediate feature amount associated with the image is input to the second classifier that has been subjected to learning using the intermediate feature amount associated with each of a plurality of CG images.
  • FIG. 1 is a configuration diagram of an information processing apparatus 10 according to an embodiment of the present invention.
  • the information processing apparatus 10 according to the present embodiment is a computer such as a game console or a personal computer, for example.
  • the information processing apparatus 10 according to the present embodiment includes, for example, a processor 12, a storage unit 14, an operation unit 16, and a display unit 18.
  • the processor 12 is a program control device such as a CPU that operates according to a program installed in the information processing apparatus 10, for example.
  • the operation unit 16 is a user interface such as a keyboard, a mouse, and a game console controller, and receives a user operation input and outputs a signal indicating the content to the processor 12.
  • the display unit 18 is a display device such as a liquid crystal display, and displays various images in accordance with instructions from the processor 12.
  • the information processing apparatus 10 may include a communication interface such as a network board, an optical disk drive that reads an optical disk such as a DVD-ROM or a Blu-ray (registered trademark) disk, a USB (Universal Serial Bus) port, and the like.
  • a communication interface such as a network board
  • an optical disk drive that reads an optical disk such as a DVD-ROM or a Blu-ray (registered trademark) disk
  • a USB Universal Serial Bus
  • a convolutional neural network classifier is mounted.
  • image recognition processing such as semantic segmentation is performed on a real image that is an image obtained by photographing a real space with a camera.
  • FIG. 2 is a diagram illustrating an example of an execution result image 20 that is an image of the execution result of semantic segmentation for a live-action image. As shown in FIG. 2, the execution result image 20 is divided in units of pixels into a plurality of areas each associated with a category corresponding to an object that is a recognition result.
  • a wall region 22 associated with a wall In the example of FIG. 2, a wall region 22 associated with a wall, a desk region 24 associated with a desk, a chair region 26 associated with a chair, a floor region 28 associated with a floor, and a curtain associated with a curtain. Region 30 is shown.
  • a plurality of regions separated from each other may be recognized as regions of the same category.
  • the execution result image 20 may be used as an input in various controls such as an automatic driving control of a car.
  • the classifier is trained using a CG image created by computer graphic technology instead of a real image as follows.
  • the information processing apparatus 10 functionally includes, for example, a first classifier 40a, a second classifier 40b, a first classifier learning unit 42, an intermediate feature amount specifying unit 44, and an offset feature amount specifying.
  • the above elements are mainly implemented by the processor 12 and the storage unit 14.
  • the information processing apparatus 10 according to the present embodiment plays a role as a learning apparatus that performs image learning using a classifier, and also serves as an image recognition apparatus that performs image recognition using a learned classifier. ing.
  • the second classifier 40b is a classifier that performs image recognition such as semantic segmentation of a photographed image schematically illustrated in FIG.
  • the second classifier 40b may be a classifier of a convolutional neural network including a convolution layer, a pooling layer, a fully connected layer, and the like, for example.
  • the second classifier 40b according to the present embodiment includes one input layer 72, one output layer 74, and m intermediate layers 76 (76 (1), 76 (2),. (M-2), 76 (m-1), 76 (m)).
  • m is an integer of 1 or more.
  • the value m and the value n described above may be the same or different.
  • FIG. 5 shows five intermediate layers 76, the number of intermediate layers 76 may be four or less. Also shown in FIG. 5 are an input 78 to the input layer 72 and an output 80 from the output layer 74. Also shown in FIG. 5 is an input 82 to the intermediate layer 76 (m). The input 82 will be described later.
  • the output of a specific intermediate layer 64 included in the first classifier 40a (the output 70 of the intermediate layer 64 (n) in the example of FIG. 4) is used as the input 78 to the second classifier 40b. It is done.
  • the output 70 may be an image (map) that is an output of the convolution layer or the pooling layer.
  • the intermediate layer 64 whose output is used as the input 78 to the second classifier 40b is referred to as a specific intermediate layer 64a.
  • the intermediate layer 64 (n) is the specific intermediate layer 64a.
  • the intermediate layer 64 (n-1), the intermediate layer 64 (n-2), or the like which is the intermediate layer 64 before the intermediate layer 64 (n) is replaced with the specific intermediate layer 64a. It may be.
  • the outputs of the intermediate layer 64 (n ⁇ 1), the intermediate layer 64 (n ⁇ 2), and the like are used as the input 78 to the second classifier 40b.
  • the first classifier learning unit 42 performs learning of the first classifier 40a using a plurality of images as learning data.
  • learning using learning data of a plurality of CG images and learning data of a plurality of live-action images may be performed.
  • the learning data may include input data input to the first classifier 40a as the input 66 and teacher data to be compared with the output 68 when the input data is input. In this way, supervised learning using learning data including teacher data may be performed.
  • the input data of the CG image refers to data generated by performing predetermined preprocessing on the CG image.
  • the input data of the photographed image refers to data generated by performing predetermined preprocessing on the photographed image.
  • preprocessing may be executed so that the input data of the CG image and the input data of the actual image can be handled equally.
  • preprocessing may be performed so that the input data of the CG image and the input data of the photographed image can be handled as data having the same format.
  • the intermediate feature amount specifying unit 44 associates the output 70 of the above-described specific intermediate layer 64a with the image when the input data of the image is input to the first classifier 40a.
  • the first classifier 40a to which the input data is inputted is a classifier (learned) after learning using the learning data of the CG image and the learning data of the live-action image by the first classifier learning unit 42. Classifier).
  • an intermediate feature amount associated with a CG image is referred to as a CG intermediate feature amount
  • an intermediate feature amount associated with a live-action image is referred to as a live-action intermediate feature amount.
  • the offset feature amount specifying unit 46 uses the offset feature based on the CG intermediate feature amount associated with each of the one or more CG images and the actual shooting intermediate feature amount associated with each of the one or more actual shooting images. Specify the amount.
  • the intermediate feature amount specifying unit 44 specifies the individual CG intermediate feature amount that is the output 70 from the specific intermediate layer 64a when the input data of the CG image is input to the learned first classifier 40a. May be. Then, the intermediate feature amount specifying unit 44 may specify the CG intermediate feature amount based on the individual CG intermediate feature amounts specified for the plurality of CG images. For example, a representative value such as an average value of individual CG intermediate feature amounts specified for a plurality of CG images may be specified as the value of the CG intermediate feature amount. Note that the individual CG intermediate feature amount specified for one CG image may be specified as the value of the CG intermediate feature amount.
  • the intermediate feature amount specifying unit 44 specifies the individual live-action intermediate feature amount that is the output 70 from the specific intermediate layer 64a when the input data of the live-action image is input to the learned first classifier 40a. Also good. Then, the intermediate feature amount specifying unit 44 may specify the actual shooting intermediate feature amount based on the individual actual shooting intermediate feature amount specified for the plurality of actual shooting images. For example, a representative value such as an average value of the individual live-action intermediate feature values specified for a plurality of live-action images may be specified as the value of the live-action intermediate feature value. Note that the individual live-action intermediate feature amount specified for one live-action image may be specified as the live-action intermediate feature amount.
  • the intermediate feature amount specifying unit 44 includes an individual CG intermediate feature amount associated with the CG image, an individual live-action intermediate feature amount associated with the live-action image, May be specified.
  • the CG image and the photographed image included in the combination may be similar to each other.
  • the CG image and the photographed image included in the combination may be images representing the same object.
  • the offset feature amount specifying unit 46 may specify the difference between the individual CG intermediate feature amount and the individual live-action intermediate feature amount as the individual offset feature amount.
  • an offset feature-value may be specified based on the individual offset feature-value specified about each of the combination of a some CG image and a real image. For example, a representative value such as an average value of individual offset feature amounts specified for each of a combination of a plurality of CG images and a photographed image may be specified as the offset feature amount.
  • the post-offset intermediate feature amount specifying unit 48 specifies the post-offset intermediate feature amount associated with the CG image based on the intermediate feature amount associated with the CG image and the offset feature amount, for example. To do. In the present embodiment, the post-offset intermediate feature amount specifying unit 48 determines the post-offset intermediate feature amount associated with the photographed image based on the intermediate feature amount associated with the photographed image and the offset feature amount, for example. Identify.
  • the value obtained by subtracting the value of the CG intermediate feature value from the value of the live-action intermediate feature value is the value of the offset feature value.
  • a value obtained by adding the value of the offset feature value to the value of the intermediate feature value of the CG image may be calculated as the value of the post-offset intermediate feature value associated with the CG image.
  • a value obtained by subtracting the value of the offset feature value from the value of the intermediate feature value of the live-action image may be calculated as the value of the post-offset intermediate feature value associated with the live-action image.
  • a value obtained by subtracting the value of the live-action intermediate feature value from the value of the CG intermediate feature value is the value of the offset feature value.
  • a value obtained by subtracting the offset feature value from the intermediate feature value of the CG image may be calculated as the post-offset intermediate feature value associated with the CG image.
  • a value obtained by adding the value of the offset feature value to the value of the intermediate feature value of the live-action image may be calculated as the value of the post-offset intermediate feature value associated with the live-action image.
  • the post-offset intermediate feature amount associated with the CG image is referred to as a post-offset CG intermediate feature amount
  • the post-offset intermediate feature amount associated with the live-action image is referred to as a post-offset live-action intermediate feature amount.
  • the post-offset CG intermediate feature value associated with the CG image corresponds to the actual photographed intermediate feature value of the photographed image when it is assumed that a photographed image similar to the CG image exists.
  • the post-offset real-shooting intermediate feature amount associated with the real-shot image corresponds to the CG intermediate feature amount of the CG image when it is assumed that a CG image similar to the real-shot image exists.
  • the second classifier learning unit 50 performs learning of the second classifier 40b.
  • the second classifier learning unit 50 may perform learning of the second classifier 40b using the CG intermediate feature amount associated with the CG image as input data included in the learning data.
  • the learning data includes a CG intermediate feature amount associated with the CG image as input data, and the region is divided by a category such as the execution result image 20 illustrated in FIG. 2 that is associated with the CG image.
  • An image may be included as teacher data. Then, supervised learning using learning data including the teacher data may be performed.
  • the second classifier learning unit 50 may perform learning of the second classifier 40b using the post-offset CG intermediate feature amount associated with the CG image as input data included in the learning data.
  • the learning data includes the post-offset CG intermediate feature amount associated with the CG image as input data, and the region is divided by a category such as the execution result image 20 illustrated in FIG. 2 that is associated with the CG image.
  • the processed image may be included as teacher data. Then, supervised learning using learning data including the teacher data may be performed.
  • the second classifier learning unit 50 learns the second classifier 40b
  • data for interpolating spatial information is input as an input 82 to the intermediate layer 76 (m).
  • the input data of the CG image may be input as the input 82 to the intermediate layer 76 (m).
  • the input data of the CG image may be input to the intermediate layer 76 (m) as the input 82.
  • the parameter value of the second classifier 40b indicating the filter coefficient of the convolution layer, the weight in the fully connected layer, and the like is set to an appropriate value.
  • the second classifier based on a comparison between the output 80 when the CG intermediate feature amount associated with the CG image is input 78 and the teacher data associated with the CG image, for example, by the error back propagation method
  • the value of the parameter 40b is adjusted.
  • the value of the parameter of the second classifier 40b based on the comparison between the output 80 when the post-offset CG intermediate feature amount associated with the CG image is input 78 and the teacher data associated with the CG image. Adjustments are made.
  • the image recognition execution unit 52 executes image recognition processing such as semantic segmentation, for example.
  • data for interpolating spatial information is input as an input 82 to the intermediate layer 76 (m) when the image recognition execution unit 52 performs image recognition of a live-action image.
  • input data of the photographed image may be input to the intermediate layer 76 (m) as the input 82.
  • the second classifier learning unit 50 has performed learning of the second classifier 40b using the post-offset CG intermediate feature amount associated with each of the plurality of CG images as input data included in the learning data. .
  • the second classifier 40b is learning by the post-offset CG intermediate feature amount associated with the CG image. That is, the second classifier 40b performs learning using the post-offset CG intermediate feature value corresponding to the live-action intermediate feature value of the live-action image when it is assumed that the same live-action image as the CG image exists. Therefore, in this case, it is desirable that the input 78 corresponds to a real image. Therefore, in this case, in this embodiment, the image recognition execution unit 52 inputs the actual captured intermediate feature amount associated with the actual captured image to be recognized as the input 78 to the second classifier 40b.
  • the image recognition execution unit 52 outputs the output 80 of the second classifier 40b when the live action intermediate feature amount associated with the real image to be recognized is input to the second classifier 40b. Specified as a result of image recognition.
  • the second classifier learning unit 50 performs learning of the second classifier 40b using the CG intermediate feature amount associated with each of the plurality of CG images as input data included in the learning data.
  • the image recognition execution unit 52 inputs the post-offset live-action intermediate feature amount associated with the real-shot image to be recognized as an input 78 to the second classifier 40b.
  • the post-offset photographed intermediate feature value corresponds to the CG intermediate feature value of the CG image when it is assumed that a CG image similar to the photographed image to be recognized exists.
  • the image recognition execution unit 52 inputs the post-offset live-action intermediate feature amount associated with the live-action image to be recognized as input data to the learned second classifier 40b as the second classifier 40b.
  • the output 80 is specified as a result of image recognition of the photographed image.
  • Data for interpolating the spatial information may be input to an intermediate layer 76 other than the intermediate layer 76 (m).
  • FIG. 6 shows an example of the learning process flow of the second classifier 40b when the learning of the second classifier 40b using the post-offset CG intermediate feature value is executed.
  • the first classifier learning unit 42 performs learning of the first classifier 40a using learning data of a plurality of CG images and learning data of a plurality of photographed images (S101).
  • the intermediate feature amount specifying unit 44 specifies the individual CG intermediate feature amount of the CG image for each of the input data of the plurality of CG images (S102).
  • the intermediate feature amount specifying unit 44 again uses the input data included in the learning data of the CG image used in the process shown in S101 as the first classification learned by the first classifier learning unit 42. It may be input as input 66 to the device 40a. Then, based on the output 70 corresponding to the input 66, the individual CG intermediate feature amount of the CG image may be specified.
  • the intermediate feature quantity specifying unit 44 uses the first classifier learning unit 42 to learn new CG image input data different from the CG image learning data used in the processing shown in S101. 40a may be input as the input 66. Then, the intermediate feature amount specifying unit 44 may specify the individual CG intermediate feature amount of the CG image based on the output 70 corresponding to the input 66.
  • the intermediate feature amount specifying unit 44 specifies the CG intermediate feature amount based on the individual CG intermediate feature amounts specified for the plurality of CG images in the process shown in S102 (S103).
  • the intermediate feature amount specifying unit 44 specifies the individual live-action intermediate feature amount of the live-action image for each of the input data of the plurality of live-action images (S104).
  • the intermediate feature amount specifying unit 44 re-inputs the input data included in the learning data of the live-action image used in the process shown in S101, and the first class learned by the first classifier learning unit 42 again. It may be input as input 66 to the device 40a. Then, based on the output 70 corresponding to the input 66, the individual live-action intermediate feature amount of the live-action image may be specified.
  • the intermediate feature quantity specifying unit 44 uses the first classifier learned by the first classifier learning unit 42 as the input data of a new photographed image different from the learning data of the photographed image used in the processing shown in S101. 40a may be input as the input 66. Then, the intermediate feature amount specifying unit 44 may specify the individual live-action intermediate feature amount of the live-action image based on the output 70 corresponding to the input 66.
  • the intermediate feature quantity specifying unit 44 specifies the live-action intermediate feature quantity based on the individual live-action intermediate feature quantities specified for the plurality of live-action images in the process shown in S104 (S105).
  • the offset feature amount specifying unit 46 specifies the offset feature amount based on the CG intermediate feature amount specified in the process shown in S103 and the live-action intermediate feature amount specified in the process shown in S105 (S106). .
  • the post-offset intermediate feature amount specifying unit 48 specifies the post-offset CG intermediate feature amount associated with each of the plurality of CG images (S107).
  • the input data included in the learning data of a new CG image different from that used in the process shown in S101 and S102 is the first classifier learned by the first classifier learning unit 42.
  • 40a may be input as input 66.
  • the post-offset CG intermediate feature value associated with the CG image may be specified.
  • learning of the second classifier 40b using the post-offset CG intermediate feature amount and the teacher data included in the learning data may be executed.
  • teacher data used for learning in the process shown in S108 may be associated in advance with the CG image associated with the individual CG intermediate feature amount specified in the process shown in S102.
  • the post-offset CG associated with the CG image based on the individual CG intermediate feature quantity associated with the CG image specified in the process shown in S102 and the offset feature quantity.
  • An intermediate feature amount may be specified.
  • the learning of the second classifier 40b using the post-offset CG intermediate feature amount and the teacher data associated with the CG image may be executed.
  • the execution order of the processing shown in FIG. 6 is not limited to the above.
  • the processing shown in S102 to S105 of FIG. 6 may be executed in the order of S102, S104, S103, and S105, the order of S104, S105, S102, and S103.
  • the process shown in S108 for the plurality of process target data need not be executed.
  • a series of processes shown in S107 and S108 may be repeatedly executed by executing a series of processes shown in S107 and S108 for each processing target.
  • FIG. 7 shows an example of the flow of an image recognition process for a photographed image when the learning of the second classifier 40b is executed by the series of processes shown in FIG.
  • the image recognition execution unit 52 acquires input data of a real image to be recognized (S201).
  • the intermediate feature amount specifying unit 44 executes input to the first classifier 40a using the input data of the photographed image acquired in the process shown in S201 as the input 66 (S202).
  • the intermediate feature amount specifying unit 44 specifies the output 70 corresponding to the input 66 in the input executed in the process shown in S202 as the actual shooting intermediate feature amount of the actual image (S203).
  • the image recognition execution unit 52 inputs to the second classifier 40b the actual image intermediate feature amount specified in the process shown in S203 as an input 78 and the input data of the actual image acquired in the process shown in S201 as an input 82. Is input (S204).
  • the image recognition execution unit 52 specifies the output 78 according to the input 78 and the input 82 in the input executed in the process shown in S204 as the recognition result of the real image to be recognized (S205), and this processing example The process shown in FIG.
  • FIG. 8 shows an example of the learning process flow of the second classifier 40b when the learning of the second classifier 40b using the CG intermediate feature amount is executed.
  • the second classifier learning unit 50 performs learning of the second classifier 40b using the CG intermediate feature amount associated with each of the plurality of CG images (S307), The processing shown in the processing example is terminated.
  • the input data included in the learning data of a new CG image different from that used in the processing shown in S301 and S302 is the first classifier that has been learned by the first classifier learning unit 42.
  • 40a may be input as input 66.
  • the output 70 corresponding to the input 66 may be specified as a CG intermediate feature amount associated with the CG image.
  • the learning of the second classifier 40b using the specified CG intermediate feature amount and the teacher data included in the learning data may be executed.
  • FIG. 9 shows an example of the flow of an image recognition process for a photographed image when the learning of the second classifier 40b is executed by the series of processes shown in FIG.
  • the post-offset intermediate feature amount specifying unit 48 specifies the post-offset intermediate feature amount that is associated with the actual image to be recognized (S404).
  • the post-offset live-action intermediate feature quantity is specified based on the live-action intermediate feature quantity specified in the process shown in S403 and the offset feature quantity specified in the process shown in S306.
  • the image recognition execution unit 52 uses the post-offset live-action intermediate feature amount specified in the process shown in S404 as an input 78, and the input data of the live-action image acquired in the process shown in S401 as an input 82.
  • the input to 40b is executed (S405).
  • the image recognition execution unit 52 specifies the output 78 according to the input 78 and the input 82 in the input executed in the process shown in S405 as the recognition result of the real image to be recognized (S406), and this processing example The process shown in FIG.
  • the first classifier 40a that classifies the CG image and the photographed image is a feature value extraction in which the CG intermediate feature value and the photographed intermediate feature value are extracted by the output 70 from the specific intermediate layer 64a. It will also play a role as a vessel. Then, the difference in the feature amount between the CG image and the actual captured image similar to the CG image is corrected by the offset feature amount specified based on the CG intermediate feature amount and the actual captured intermediate feature amount. In this way, in this embodiment, the image recognition accuracy of a real image using a classifier that has been learned using a CG image is improved.
  • the image recognition performed by the image recognition execution unit 52 is not limited to semantic segmentation.
  • the image recognition execution unit 52 executes processing for specifying what image the input real image is, and processing for specifying what is arranged in the input real image. Also good.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un dispositif d'apprentissage, un dispositif de reconnaissance d'image, un procédé d'apprentissage et un programme permettant d'améliorer la précision avec laquelle une image capturée est reconnue à l'aide d'un classificateur qui a été entraîné à l'aide d'images CG. Une unité d'identification de quantité de caractéristique intermédiaire (44) identifie une quantité de caractéristique intermédiaire. Une unité d'identification de quantité de caractéristique de décalage (46) identifie une quantité de caractéristique de décalage sur la base d'une quantité de caractéristique intermédiaire de CG et d'une quantité de caractéristique intermédiaire d'image capturée. Une unité d'identification de quantité de caractéristique intermédiaire post-décalage (48) identifie une quantité de caractéristique intermédiaire post-décalage associée à une image CG, sur la base d'une quantité caractéristique intermédiaire et d'une quantité caractéristique de décalage qui sont associées à l'image CG. Une seconde unité d'apprentissage de classificateur (50) entraîne un second classificateur (40b), qui reçoit une quantité de caractéristique intermédiaire associée à une image capturée lorsque l'image capturée est reconnue à l'aide d'une quantité de caractéristique intermédiaire post-décalage associée à une image CG.
PCT/JP2017/016565 2017-04-26 2017-04-26 Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme WO2018198233A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP17907458.8A EP3617991A4 (fr) 2017-04-26 2017-04-26 Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme
JP2019514953A JP6722351B2 (ja) 2017-04-26 2017-04-26 学習装置、画像認識装置、学習方法及びプログラム
PCT/JP2017/016565 WO2018198233A1 (fr) 2017-04-26 2017-04-26 Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme
CN201780089794.5A CN110582783B (zh) 2017-04-26 2017-04-26 训练装置、图像识别装置、训练方法和计算机可读信息存储介质
US16/603,085 US11403560B2 (en) 2017-04-26 2017-04-26 Training apparatus, image recognition apparatus, training method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/016565 WO2018198233A1 (fr) 2017-04-26 2017-04-26 Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme

Publications (1)

Publication Number Publication Date
WO2018198233A1 true WO2018198233A1 (fr) 2018-11-01

Family

ID=63920228

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/016565 WO2018198233A1 (fr) 2017-04-26 2017-04-26 Dispositif d'apprentissage, dispositif de reconnaissance d'image, procédé d'apprentissage et programme

Country Status (5)

Country Link
US (1) US11403560B2 (fr)
EP (1) EP3617991A4 (fr)
JP (1) JP6722351B2 (fr)
CN (1) CN110582783B (fr)
WO (1) WO2018198233A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021114048A (ja) * 2020-01-16 2021-08-05 本田技研工業株式会社 情報処理装置、車両、車両の制御方法、プログラム、情報処理サーバ、情報処理方法
JP2021533493A (ja) * 2019-04-03 2021-12-02 平安科技(深▲せん▼)有限公司Ping An Technology (Shenzhen) Co., Ltd. 指静脈照合方法、装置、コンピュータ機器、及び記憶媒体
JP2022505775A (ja) * 2019-03-26 2022-01-14 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 画像分類モデルの訓練方法、画像処理方法及びその装置、並びにコンピュータプログラム
EP4049183A4 (fr) * 2019-10-25 2023-11-08 Mashgin Inc. Procédé et système d'identification d'article
US11844458B2 (en) 2020-10-13 2023-12-19 June Life, Llc Method and system for automatic cook program determination
JP7420607B2 (ja) 2020-03-19 2024-01-23 本田技研工業株式会社 情報処理装置、情報処理方法、車両、情報処理サーバ、及びプログラム
US12002245B2 (en) 2019-10-25 2024-06-04 Mashgin Inc. Method and system for item identification

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0573663A (ja) * 1991-09-17 1993-03-26 Nippon Telegr & Teleph Corp <Ntt> 3次元物体の画像認識法
JP2014137756A (ja) * 2013-01-17 2014-07-28 Canon Inc 画像処理装置および画像処理方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4606828B2 (ja) * 2003-09-25 2011-01-05 富士フイルム株式会社 特定シーンの画像を選別する装置、プログラムおよびプログラムを記録した記録媒体
CN101729916A (zh) * 2008-10-10 2010-06-09 富士胶片株式会社 图像处理装置、图像记录装置、方法及程序
JP2012243180A (ja) * 2011-05-23 2012-12-10 Sony Corp 学習装置および方法、並びにプログラム
CN102663451A (zh) * 2012-03-29 2012-09-12 天津科技大学 一种基于色彩空间特征的图形图像分类方法
KR102146560B1 (ko) * 2014-02-17 2020-08-20 삼성전자주식회사 영상 보정 방법 및 장치
US9652846B1 (en) * 2015-10-22 2017-05-16 International Business Machines Corporation Viewpoint recognition in computer tomography images
CN105335712A (zh) * 2015-10-26 2016-02-17 小米科技有限责任公司 图像识别方法、装置及终端
US9984471B2 (en) * 2016-07-26 2018-05-29 Intuit Inc. Label and field identification without optical character recognition (OCR)
US20190149725A1 (en) * 2017-09-06 2019-05-16 Trax Technologies Solutions Pte Ltd. Using augmented reality for image capturing a retail unit
US10607118B2 (en) * 2017-12-13 2020-03-31 Microsoft Technology Licensing, Llc Ensemble model for image recognition processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0573663A (ja) * 1991-09-17 1993-03-26 Nippon Telegr & Teleph Corp <Ntt> 3次元物体の画像認識法
JP2014137756A (ja) * 2013-01-17 2014-07-28 Canon Inc 画像処理装置および画像処理方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KENTARO TAJI ET AL.: "Fingerspelling Recognition based on AdaBoost with Hand-shape CG images", IEICE TECHNICAL REPORT, vol. 111, no. 472, 2 March 2012 (2012-03-02), pages 1 - 2, XP055635507 *
MASAMLTSU TSUCHIYA ET AL.: "Hybrid Transfer Learning for Efficient Learning in Object Detection", IEICE TECHNICAL REPORT, vol. 112, no. 386, 16 January 2013 (2013-01-16), pages 1 - 2, XP055635514, DOI: 10.1109/ACPR.2013.8 *
See also references of EP3617991A4 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022505775A (ja) * 2019-03-26 2022-01-14 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 画像分類モデルの訓練方法、画像処理方法及びその装置、並びにコンピュータプログラム
EP3951654A4 (fr) * 2019-03-26 2022-05-25 Tencent Technology (Shenzhen) Company Limited Procédé d'entraînement de classification d'images, et procédé et dispositif de traitement d'images
JP7185039B2 (ja) 2019-03-26 2022-12-06 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 画像分類モデルの訓練方法、画像処理方法及びその装置、並びにコンピュータプログラム
JP2021533493A (ja) * 2019-04-03 2021-12-02 平安科技(深▲せん▼)有限公司Ping An Technology (Shenzhen) Co., Ltd. 指静脈照合方法、装置、コンピュータ機器、及び記憶媒体
JP7141518B2 (ja) 2019-04-03 2022-09-22 平安科技(深▲せん▼)有限公司 指静脈照合方法、装置、コンピュータ機器、及び記憶媒体
US11893773B2 (en) 2019-04-03 2024-02-06 Ping An Technology (Shenzhen) Co., Ltd. Finger vein comparison method, computer equipment, and storage medium
EP4049183A4 (fr) * 2019-10-25 2023-11-08 Mashgin Inc. Procédé et système d'identification d'article
US12002245B2 (en) 2019-10-25 2024-06-04 Mashgin Inc. Method and system for item identification
JP2021114048A (ja) * 2020-01-16 2021-08-05 本田技研工業株式会社 情報処理装置、車両、車両の制御方法、プログラム、情報処理サーバ、情報処理方法
JP6992099B2 (ja) 2020-01-16 2022-01-13 本田技研工業株式会社 情報処理装置、車両、車両の制御方法、プログラム、情報処理サーバ、情報処理方法
JP7420607B2 (ja) 2020-03-19 2024-01-23 本田技研工業株式会社 情報処理装置、情報処理方法、車両、情報処理サーバ、及びプログラム
US11844458B2 (en) 2020-10-13 2023-12-19 June Life, Llc Method and system for automatic cook program determination

Also Published As

Publication number Publication date
JP6722351B2 (ja) 2020-07-15
US20210056464A1 (en) 2021-02-25
CN110582783B (zh) 2023-04-04
EP3617991A4 (fr) 2020-12-09
JPWO2018198233A1 (ja) 2019-08-08
CN110582783A (zh) 2019-12-17
EP3617991A1 (fr) 2020-03-04
US11403560B2 (en) 2022-08-02

Similar Documents

Publication Publication Date Title
WO2018198233A1 (fr) Dispositif d&#39;apprentissage, dispositif de reconnaissance d&#39;image, procédé d&#39;apprentissage et programme
CN109671126B (zh) 使用神经网络预测分块位移图
US10789504B2 (en) Method and device for extracting information in histogram
WO2020063314A1 (fr) Procédé et appareil d&#39;identification de la segmentation de caractères, dispositif électronique et support d&#39;enregistrement
CN111783505A (zh) 伪造人脸的识别方法、装置和计算机可读存储介质
JP7013489B2 (ja) 学習装置、実写画像分類装置の生成システム、実写画像分類装置の生成装置、学習方法及びプログラム
CN108334878B (zh) 视频图像检测方法、装置、设备及可读存储介质
US11681910B2 (en) Training apparatus, recognition apparatus, training method, recognition method, and program
JP7477260B2 (ja) 情報処理装置、情報処理プログラム及び情報処理方法
US20230021533A1 (en) Method and apparatus for generating video with 3d effect, method and apparatus for playing video with 3d effect, and device
KR20220066944A (ko) 조직 분할을 위한 머신 러닝 모델의 상호작용 트레이닝
WO2020063835A1 (fr) Génération de modèle
JP6874136B2 (ja) 画像認識装置、画像認識方法及びプログラム
CN113906468A (zh) 用于训练图像着色模型的系统和方法
EP4220555A1 (fr) Procédé et appareil d&#39;apprentissage pour un modèle de segmentation d&#39;image, procédé et appareil de segmentation d&#39;image, et dispositif
JP6791593B2 (ja) 認識装置、認識プログラム、及び認識方法
US20160078312A1 (en) Image processing method and apparatus using training dictionary
CN103310222A (zh) 图像处理器及图像处理方法
KR101592087B1 (ko) 배경 영상의 위치를 이용한 관심맵 생성 방법 및 이를 기록한 기록 매체
JP7053195B2 (ja) 家屋異動推定装置、家屋異動学習装置、家屋異動推定方法、識別器のパラメータ生成方法及びプログラム
CN115641317A (zh) 面向病理图像的动态知识回溯多示例学习及图像分类方法
CN113837236A (zh) 图像中目标对象的识别方法、装置、终端设备及存储介质
JP6892557B2 (ja) 学習装置、画像生成装置、学習方法、画像生成方法及びプログラム
CN113808192A (zh) 一种户型图生成方法、装置、设备及存储介质
JP6989873B2 (ja) システム、画像認識方法、及び計算機

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17907458

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019514953

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017907458

Country of ref document: EP

Effective date: 20191126