WO2021059388A1 - Learning device, image processing device, learning method, and learning program - Google Patents

Learning device, image processing device, learning method, and learning program Download PDF

Info

Publication number
WO2021059388A1
WO2021059388A1 PCT/JP2019/037552 JP2019037552W WO2021059388A1 WO 2021059388 A1 WO2021059388 A1 WO 2021059388A1 JP 2019037552 W JP2019037552 W JP 2019037552W WO 2021059388 A1 WO2021059388 A1 WO 2021059388A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning
conversion
neural network
feature extraction
unit
Prior art date
Application number
PCT/JP2019/037552
Other languages
French (fr)
Japanese (ja)
Inventor
真弥 山口
美樹 境
哲哉 塩田
足立 一樹
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2019/037552 priority Critical patent/WO2021059388A1/en
Publication of WO2021059388A1 publication Critical patent/WO2021059388A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a learning device, an image processing device, a learning method, and a learning program.
  • Deep learning deep neural network: DNN
  • DNN deep neural network
  • This deep learning model requires a large amount of labeled learning data to achieve high accuracy. It is necessary to attach a teacher label to the collected image data of this labeled learning data, and the cost is particularly high when the label cannot be attached only to a specialist such as a medical image. Creating learning data is a significant cost and is a major impediment to the use of deep learning.
  • some methods have been proposed as a technique for training the DNN model even if the amount of labeled data is small.
  • a technique for training a DNN model even if the amount of data is small transfer learning that diverts a model that has been trained in another data set, and data expansion that processes the original data to increase the number of data (Data).
  • Augmentation self-supervised learning (see, for example, Non-Patent Document 1) in which features are acquired by creating labels from data and solving subtasks has been proposed.
  • Non-Patent Document 1 acquires expressions useful in subsequent tasks (classification, etc.) by solving a pre-task for predicting a rotation angle. Since this method is simple yet has a powerful effect, it is adopted for learning of hostile generative networks (GAN: Generative Adversarial Networks) and learning in combination with other methods.
  • GAN Generative Adversarial Networks
  • the method described in Non-Patent Document 1 diverts expressions acquired by unsupervised learning. Specifically, in the method described in Non-Patent Document 1, the rotation angle of the image is learned in the pre-task, and the classification is learned in the post-task. In this method, the geometrical features of the image data can be captured by predicting the rotation angle, but for data for which rotation cannot be defined or for data with features that can easily predict the rotation angle, a preliminary task is required. The effect of is low. For example, in a landscape image, the angle of the image can be predicted only by the position of the sky, so that the effect of the pre-task is low in this method.
  • the present invention has been made in view of the above, and an object of the present invention is to provide a learning device, an image processing device, a learning method, and a learning program capable of capturing the features of an image and improving the accuracy of image processing. ..
  • the learning device takes image data in which a plurality of conversions have been performed as an input, and uses a first neural network to perform conversion for each conversion type.
  • the first learning unit that updates the parameters including the shared parameters of the feature extraction layer of the first neural network and the first learning unit that inputs arbitrary image data
  • the first neural network has a second learning unit that learns the parameters of the second neural network so as to perform predetermined processing by using the second neural network in which the learned shared parameters are applied to the feature extraction layer. It has a feature extraction layer and a plurality of pre-learning neural networks corresponding to each of a plurality of conversion conversion types.
  • the plurality of pre-learning neural networks share the feature extraction layer and display the conversion contents for each conversion type. It is characterized by estimating.
  • the image processing apparatus has a processing unit that processes input image data using a model having a deep neural network in which trained parameters are set, and has been trained.
  • the parameters were updated by inputting image data in which a plurality of conversions were performed and performing multitask learning so as to estimate the conversion contents for each conversion type of the conversions using the first neural network.
  • the first neural network Based on the shared parameters of the feature extraction layer of the first neural network, the first neural network has the feature extraction layer and a plurality of pre-learning neural networks corresponding to the conversion types of the plurality of transformations. A plurality of pre-learning neural networks share the feature extraction layer and estimate the conversion content for each conversion type.
  • the learning method according to the present invention is a learning method executed by a learning device, in which image data subjected to a plurality of conversions is input and conversion contents for each conversion type of conversion are used by using a first neural network.
  • the first learning step of updating the parameters including the shared parameters of the feature extraction layer of the first neural network and the learning in the first learning step by inputting arbitrary image data.
  • the first neural network includes a second learning step of learning the parameters of the second neural network so as to perform a predetermined process using the second neural network in which the already shared parameters are applied to the feature extraction layer. It has a layer and a plurality of pre-learning neural networks corresponding to each of a plurality of conversion conversion types, and the plurality of pre-learning neural networks share a feature extraction layer and estimate the conversion content for each conversion type. It is characterized by that.
  • the learning program takes image data in which a plurality of conversions have been performed as an input, and uses a first neural network to perform multitasking learning so as to estimate the conversion contents for each conversion type of conversion. Then, the first learning step of updating the parameters including the shared parameters of the feature extraction layer of the first neural network and the shared parameters learned in the first learning step were applied to the feature extraction layer by inputting arbitrary image data.
  • a computer is made to execute a second learning step of learning the parameters of the second neural network so as to perform a predetermined process, and the first neural network is a feature extraction layer and conversion of a plurality of transformations. It has a plurality of pre-learning neural networks corresponding to each type, and the plurality of pre-learning neural networks share a feature extraction layer and estimate the conversion content for each conversion type.
  • FIG. 1 is a diagram showing an example of a configuration of an image processing system according to an embodiment.
  • FIG. 2 is a diagram showing an example of the configuration of the learning device shown in FIG.
  • FIG. 3 is a diagram showing the estimation accuracy of the converted content for the image data subjected to the conversion process.
  • FIG. 4 is a diagram illustrating a flow of learning processing according to the embodiment.
  • FIG. 5 is a diagram showing an example of the configuration of the first task learning unit shown in FIG.
  • FIG. 6 is a flowchart showing a processing procedure of image processing performed by the image processing system according to the embodiment.
  • FIG. 7 is a flowchart showing a processing procedure of the learning process shown in FIG. FIG.
  • FIG. 8 is a diagram showing the accuracy of each classification between the DNN model trained using the learning method according to the embodiment and the DNN trained using the learning method described in Non-Patent Document 1.
  • FIG. 9 is a diagram showing an example of a computer in which a learning device or an image processing device is realized by executing a program.
  • the learning device As a pre-task (first task), a plurality of conversions are performed on the image data, self-supervised learning is performed, and multi-task learning for estimating each conversion content is performed, and as a latter-stage task (second task).
  • first task a pre-task
  • second task a latter-stage task
  • An example of learning the image data classification process by diverting the learning result of the first task will be described. Further, the present invention is not limited to the embodiments described below.
  • FIG. 1 is a diagram showing an example of a configuration of an image processing system according to an embodiment.
  • the image processing system 1 in the embodiment is characterized by an image processing device 20 that classifies image data using a deep learning (deep neural network: DNN) model and image data for learning. It has a learning device 10 for setting parameters of the DNN model used by the image processing device 20 by learning the above.
  • DNN deep neural network
  • the learning device 10 has a self-supervised learning unit 11 (first learning unit) and a second task learning unit 13 (second learning unit) provided after the self-supervised learning unit 11.
  • the self-supervised learning unit 11 uses the first neural network (NN) to perform self-supervised learning on image data for learning.
  • the self-supervised learning unit 11 performs a plurality of types of conversion into image data.
  • the first NN is a feature extraction layer that extracts features related to conversion from image data that has been subjected to a plurality of conversions, and a plurality of NNs for pre-learning that estimate conversion contents for each conversion type of conversion to image data. And have.
  • the self-supervised learning unit 11 takes the image data in which a plurality of conversions have been performed as input, and uses the first NN to perform multitask learning so as to estimate the conversion contents for each conversion type of the conversion, thereby performing the first NN. Perform the first task of updating the parameters including the shared parameters of the feature extraction layer.
  • the self-supervised learning unit 11 has a learning data generation unit 14 and a first task learning unit 15.
  • the learning data generation unit 14 generates self-teacher data based on the input learning image data 30, and also performs a plurality of types of conversions on the learning image data.
  • the first task learning unit 15 uses the self-teacher data generated by the learning data generation unit 14 and the image data to which a plurality of types of conversions have been performed to perform multitask learning on the characteristics of a plurality of conversion contents. Learn by doing.
  • the first task learning unit 15 shares the shared parameters of the feature extraction layer that extracts the features related to the conversion from the image data in which the plurality of conversions have been performed, and sets a plurality of NNs for pre-learning corresponding to the plurality of conversions. By performing multi-task learning so as to estimate the conversion content for each conversion type of conversion, each parameter of a plurality of NNs and the shared parameter of the feature extraction layer are updated.
  • the feature extraction layer is composed of a DNN model containing a large number of non-linear functions.
  • the first task learning unit 15 outputs the DNN model (first learned DNN 16) with updated parameters to the second task learning unit 13.
  • the second task learning unit 13 uses the learning result of the first task to perform the second task of learning the predetermined process.
  • the second task learning unit 13 inputs arbitrary image data, and uses the second NN to which the shared parameters learned by the self-supervised learning unit 11 are applied to the feature extraction layer, and sets the parameters of the second NN so as to perform predetermined processing. learn.
  • the supervised learning learning data 40 is composed of a pair of learning data for learning and teacher data indicating a class of each learning data.
  • the second task learning unit 13 outputs the DNN (second learned DNN 17) whose parameters have been updated by learning to the image processing device 20.
  • image data classification processing class classification
  • the predetermined processing is not limited to class classification, such as segmentation and object range detection. It may be.
  • the image processing device 20 applies the parameters of the second trained DNN 17 to the DNN model, and performs an analysis unit 21 that performs class classification (predetermined processing (class classification, segmentation, object range detection, etc.)) on the image data to be processed. Have.
  • the image processing device 20 outputs the classification result by the analysis unit 21 as an estimation result for the image data.
  • FIG. 2 is a diagram showing an example of the configuration of the learning device shown in FIG.
  • the learning data generation unit 14 has a self-teacher data generation unit 141 and a data conversion unit 142.
  • the self-teacher data generation unit 141 generates self-teacher data 32 based on the input image data for learning, and outputs it to the first parameter update unit 153 (described later).
  • the self-teacher data generation unit 141 generates a class corresponding to each image data as self-teacher data.
  • the data conversion unit 142 performs a plurality of conversions on the image data 30 for learning.
  • the learning device 10 executes any two or more conversions of Rotation, ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness as conversions for the image data.
  • the data conversion unit 142 outputs the image data 31 that has undergone two or more conversions to the feature extraction unit 151 (described later).
  • Rotation is a rotation process.
  • Shear X is a horizontal bending process.
  • Shear Y is a vertical curve.
  • Solarize is an inversion process, and inverts pixels whose pixel value exceeds a threshold value.
  • Brightness is a brightness adjustment process. For example, when the size of the pixel value is 0, it is converted into a black image, and when it is 1, it is left as it is.
  • Color is a color tone adjustment process, and like the color TV set, when the size of the pixel value is 0, it is converted to black or white, and when it is 1, it is left as it is.
  • Contrast is a contrast adjustment process, and when the size of the pixel value is 0, it is converted into a gray image, and when it is 1, it is left as it is.
  • Sharpness is an edge sharpening process.
  • the 12 types of conversions are Shear X, Shear Y, Translate X, Translate Y, Solarize, Posterize, Contrast, Color, Brightness, Sharpness, CutOut (Size prediction), and CutOut (number).
  • the conversion type such as Shear X and Shear Y described above will be the conversion type, and the conversion content for each conversion type (in the case of Rotation, the degree of rotation is set in four stages) will be the conversion content.
  • FIG. 3 is a diagram showing the estimation accuracy of the task (classification) in the latter stage when the image data subjected to each conversion process is used for the pre-learning.
  • the estimation accuracy of the model under the Random condition where the feature extraction layer is initialized with random numbers and the estimation accuracy of the model in which the feature extraction layer is pre-trained by the Rotation estimation task are shown on the far right of the graph. It is shown second from the right.
  • the estimation accuracy of ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness was higher than that of the Random condition. Therefore, in the present embodiment, any two or more conversions of Rotation, ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness are set as the conversion type of the conversion process performed by the data conversion unit 142.
  • the data conversion unit 142 performs a plurality of conversions having different conversion properties on the input image data 30 for learning. For example, when the data conversion unit 142 adds other data conversions in addition to Rotation, the conversions have different properties, in other words, the conversions are less likely to affect each other, or the features are orthogonal. Add from the conversion that is.
  • the data conversion unit 142 sets a plurality of conversions to be executed for the input image data 30 according to the contents of the second task (predetermined processing) executed by the second task learning unit 13.
  • the data conversion unit 142 converts in the order of Rotation, Sharpness, and Solarize in the first task. By converting in the order of Rotation, Sharpness, and Solarize in this way, it is possible to convert after correctly emphasizing the edges in the image.
  • the first task learning unit 15 has a feature extraction unit 151, a first task classification unit 152 (estimation unit), and a first parameter update unit 153.
  • the feature extraction unit 151 has a feature extraction layer composed of a DNN, and is based on the shared parameters of the DNNs constituting the feature extraction layer from the image data 31 that has been subjected to a plurality of conversions by the data conversion unit 142. Specifically, for example, a feature vector) is extracted.
  • a shared parameter which is a parameter updated by the first parameter update unit 153, is set in the feature extraction layer of the feature extraction unit 151. Shared parameters are learned to extract features useful for capturing each transformation (eg, features related to the shape, texture, and color of objects present in the image) through pre-learning to estimate the transformations for the input image. To.
  • the first task classification unit 152 has a plurality of NNs for pre-learning corresponding to each of the plurality of conversions, and based on the features extracted by the feature extraction unit 151, the conversion contents for each conversion type for the image data 31. To estimate.
  • the first task classification unit 152 classifies the image data 31 that has undergone a plurality of conversions based on the feature vector extracted by the feature extraction unit 151.
  • Initial values are set for a plurality of NNs for pre-learning. As the initial value, for example, a value generated from a uniformly distributed random number is used. Then, the parameters updated by the first parameter update unit 153 are set in the plurality of NNs for pre-learning.
  • the first parameter update unit 153 multitasks the first neural network based on each conversion content estimated by the first task classification unit 152 and the self-teacher data 32 generated by the self-teacher data generation unit 141. Learning is executed, the parameters of the NN for pre-learning are updated, and the shared parameters of the feature extraction layer are updated. As a result, the DNN parameters of the feature extraction layer reflect the features of the image data that has been subjected to a plurality of conversions by the data conversion unit 142.
  • the first task learning unit 15 outputs the DNN model (first learned DNN 16) whose parameters have been updated to the second task learning unit 13.
  • the NN parameters for pre-learning correspond to each classification layer corresponding to a plurality of conversion types of the first task classification unit 152.
  • the second task learning unit 13 has a feature extraction unit 131, a second task classification unit 132, and a second parameter update unit 133.
  • the second task learning unit 13 accepts the input of the image data 41 for learning and the teacher data 42 as the supervised learning learning data 40.
  • the feature extraction unit 131 has a feature extraction layer composed of DNN.
  • the feature extraction unit 131 applies the parameters of the first trained DNN 16 to the DNN and extracts features (for example, feature vectors) from the image data 41 for learning.
  • the feature amount to be extracted at this time is the same as that of the feature extraction unit 151.
  • the classification unit 132 for the second task has an NN, classifies the image data 41 for learning based on the features extracted by the feature extraction unit 131, and updates the classified class as the second parameter as an estimation result. Output to unit 133.
  • the second parameter update unit 133 sets the parameters of the feature extraction layer of the feature extraction unit 131 and the NN of the second task classification unit 132 based on the estimation result by the second task classification unit 132 and the teacher data 42. Update the parameters.
  • the second task learning unit 13 outputs the learned NN and DNN (second learned DNN 17) that have been learned (parameter update by learning is completed) to the image processing device 20.
  • FIG. 4 is a diagram illustrating a flow of learning processing according to the embodiment.
  • FIG. 5 is a diagram showing an example of the configuration of the first task learning unit 15 shown in FIG.
  • multitask learning is performed by sharing the parameters of the feature extraction layer.
  • Rotation, Solarize, and Sharpness are continuously executed by the data converter (self-teacher data generation unit 141) for the image data 311 for learning.
  • the self-teacher data 32 a class of image data 311 is generated and used as the self-teacher data 32.
  • a specific example of the class will be described.
  • the four-step angle of ⁇ 0,90,180,270 ⁇ is the class.
  • the class has four levels of emphasis of ⁇ 0.0,1.0,1.5,2.0 ⁇ .
  • the class is a four-step inversion threshold value of ⁇ 0,96,192,256 ⁇ . Therefore, the learning data generation unit 14 performs four-step conversion on the image and generates a class according to the step.
  • the feature extraction layer 1511 (feature extraction unit 151) in which the shared parameter ⁇ sh is set extracts the feature vector from the image data 311.
  • the feature vector extracted in the feature extraction layer 1511 is output to the Rotation classification layer (NN) 1521, the Solarize classification layer (NN) 1522, and the Sharpness classification layer (NN) 1523 in the first task classification unit 152. ..
  • the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523 are classified into four classes by applying their own parameters ( ⁇ 1 to ⁇ 3 ), and each of the corresponding conversions has four stages. Estimate the degree. That is, the Rotation classification layer 1521 classifies the Rotation class.
  • the classification layer 1522 for Solarize classifies Solarize.
  • the Sharpness classification layer 1523 classifies Sharpness.
  • the first parameter update unit 153 is based on the classification results of the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523 and the self-teacher data 32, and the Rotation classification layer 1521 and the Solarize classification layer 1522. And each parameter of the classification layer 1523 for Sharpness is updated.
  • the first parameter update unit 153 calculates the loss of the four-class classification output from the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523 using the softmax cross entropy. Specifically, the first parameter update unit 153 uses the equation (1) to divide the classes classified by the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523, and the self-teacher data 32. Calculate the loss to and from the indicated class for each Rotation, Solarize, and Sharpness transformation (task).
  • L T is the loss of each task.
  • is the number of classes of self-teacher data, which is 4 in this example.
  • task 1 is Rotation
  • task 2 is Solarize
  • task 3 is Sharpness.
  • ⁇ sh is a common parameter of the feature extraction layer 1511.
  • ⁇ t is a parameter peculiar to the task t.
  • c T is the output when the feature vector is input to the classification layer to which ⁇ t is applied. F corresponds to ⁇ sh.
  • the first parameter update unit 153 updates each parameter ⁇ 1 to ⁇ 3 by the inverse error propagation method based on the loss and the gradient of each task.
  • the Rotation parameter update unit 1531 updates the parameter ⁇ 1 of the Rotation classification layer 1521.
  • the Solarize parameter update unit 1532 updates the parameter ⁇ 2 of the Solarize classification layer 1522.
  • the Sharpness parameter update unit 1533 updates the parameter ⁇ 3 of the Sharpness classification layer 1523.
  • the first parameter update unit 153 calculates the total loss with respect to the shared parameter ⁇ th using the Frank-Wolfe method (see Reference 2) based on the loss and gradient of each task, and determines that the total loss is the minimum.
  • the shared parameter ⁇ th is optimized so that The first parameter updating unit 153, by optimizing the sharing parameter theta th with Frank-Wolfe method for multi-tasking, omit the complexity of parameter search, which improves the accuracy in a small amount of additional computation cost.
  • Reference 2 Ozan Sener and Vladlen Koltun, “Multi-Task learning as Multi-Objective Optimization”, Advances in Neural Information Processing Systems. 201 (NIPS 2018)
  • the first parameter update unit 153 minimizes the total loss with respect to the shared parameter ⁇ th by using the equation (2).
  • ⁇ t is a weight for coordination between tasks.
  • z i is g (x i ; ⁇ sh ).
  • the first parameter update unit 153 performs weight calculation using the gradients of the parameters ⁇ 1 to ⁇ 3 of each task in the feature extraction parameter update unit 1534 using the equation (2), and shares the weight.
  • the parameter ⁇ sh and the total loss and gradient of the parameters ⁇ 1 to ⁇ 3 of each task are calculated, and the shared parameter ⁇ sh is updated by the inverse error propagation method of the total loss.
  • the self-supervised learning unit 11 repeats the processes in the learning data generation unit 14, the feature extraction layer 1511, the first task classification unit 152, and the first parameter update unit 153 until a predetermined end condition is reached.
  • the end condition is, for example, that the number of learning steps becomes a preset maximum number of learning steps.
  • the second task learning unit 13 applies the parameters of the first learned DNN 16 to the feature extraction layer 1311 of the feature extraction unit 131, and supervised learning learning data. Learning is performed for 40.
  • the supervised learning learning data 40 is a set of pairs composed of image data and correct answer output when a predetermined process is performed.
  • the feature extraction layer 1311 extracts a feature vector from the image data 41 of the supervised learning data. Then, the classification layer 1321 of the classification unit for the second task classifies the image data 41 based on the feature vector.
  • the second parameter update unit 133 sets the parameters of the DNN model of the feature extraction layer 1311 and the NN of the classification layer 13211 based on the loss between the class classified by the classification layer 1321 and the class shown by the teacher data 42. Update the parameters of.
  • the second task learning unit 13 outputs the NN and the DNN (second learned DNN 17) whose parameters have been updated by learning to the image processing device 20.
  • the second task learning unit 13 may perform arbitrary processing as predetermined processing (second-stage task, second task). In the example of FIG. 4, the second task learning unit 13 executed supervised learning, but unsupervised learning such as reinforcement learning may be used.
  • the learning device 10 acquires features other than the features obtained by Rotation in the first task of performing conversion other than Rotation. Then, the learning device 10 simultaneously uses data conversion other than Rotation together with Rotation, and performs multitask learning in the first task. As a result, in the first task, the learning device 10 can set the first trained DNN 16 that appropriately captures the characteristics of a plurality of conversions for image data, not limited to Rotation. Therefore, in the second task, the accuracy of the classification task is correct. Can be raised.
  • any two or more conversions of Rotation, ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness are executed according to the processing contents in the image processing apparatus 20. Then, multitask learning should be performed.
  • FIG. 6 is a flowchart showing a processing procedure of image processing performed by the image processing system 1 in the embodiment.
  • the learning device 10 performs a learning process of learning the characteristics of the image data for learning in order to set the parameters of the DNN model used by the image processing device 20 (step). S1).
  • the image processing device 20 performs an image analysis process for classifying the image data using the second trained DNN 17 whose parameters are set by the learning process of the learning device 10 (step S2).
  • FIG. 7 is a flowchart showing a processing procedure of the learning process shown in FIG.
  • the learning device 10 sets the learning step as an initial value and executes the first learning step.
  • the learning data generation unit 14 of the self-supervised learning unit 11 generates self-teacher data based on the input image data 30 for learning, and performs a plurality of conversions on the image data for learning. (Step S11).
  • the feature extraction layer 1511 in which the shared parameter ⁇ sh is set performs a feature extraction process for extracting a feature vector from the image data 311 that has undergone a plurality of conversions (step S12).
  • the first task classification unit 152 classifies each conversion for image data based on the feature vector extracted by the feature extraction unit 151 using a plurality of pre-learning NNs corresponding to the plurality of conversions. Perform the classification process for the first task to be performed (step S13).
  • the first parameter update unit 153 calculates the loss and the gradient from the classification result of each NN of the first task classification unit 152 (step S14).
  • the first parameter update unit 153 calculates the loss between each class output from each NN and the class indicated by the teacher data for each NN using the equation (1).
  • the first parameter update unit 153 updates the parameters of each NN by the inverse error propagation method based on the calculated loss and gradient for each NN (step S15).
  • the first parameter update unit 153 performs a weight calculation based on the equation (2) using the loss and the gradient of each NN (step S16). At this time, the first parameter update unit 153 uses the Frank-Wolfe method. Then, the first parameter update unit 153 calculates the shared parameter of the feature extraction layer 1511 and the total loss and gradient of the parameters of each NN (step S17), and updates the shared parameter ⁇ sh by the inverse error propagation method of the total loss. (Step S18).
  • the first parameter update unit 153 determines whether or not the number of learning steps is smaller than the maximum number of learning steps (step S19). When the number of learning steps is smaller than the maximum number of learning steps (step S19: Yes), 1 is added to the learning steps, and the first parameter update unit 153 returns to step S11 and returns to step S11 to obtain the first learning data for the next learning. Execute one task.
  • step S19: No when the number of learning steps is not smaller than the maximum number of learning steps (step S19: No), that is, when the number of learning steps reaches the maximum number of learning steps, the DNN model with updated parameters (first trained DNN16) is used.
  • the second task learning unit 13 uses the first learned DNN 16 to learn the second task (classification of image data) (step S20).
  • the second task learning unit 13 outputs the DNN (second learned DNN 17) whose parameters have been updated by learning to the image processing device 20.
  • the learning of the second task learning unit 13 may be performed by using a general method for learning the second task. In the present embodiment, since the second task is described as the classification, a general method may be used when learning the classification in the neural network.
  • evaluation experiment An evaluation experiment was conducted with the second task as an image classification, and a DNN model trained using the pre-learning method according to the present embodiment and a DNN trained using the pre-learning method described in Non-Patent Document 1. , The accuracy of each classification in the second task was obtained.
  • an image data set it is a data set for supervised learning, and a data set consisting of a pair of images and teacher information: CIFAR-100 (with teacher information consisting of 50,000 for learning and 10,000 for testing) Data set) is used.
  • the second task is to classify 100 classes using the image data of CIFAR-100.
  • 45,000 image data from the learning data set are used during learning, and as a validation data set, 5,000 image data in which the classes are evenly divided are used for cross-validation. I do.
  • the teacher information of the data set is not used.
  • the feature extraction layer of the feature extraction unit 151 is trained (100 epoch).
  • the DNN model is pre-learned using the image data obtained by converting Rotation, Sharpness, and Solarize. Further, in the learning method described in Non-Patent Document 1, the DNN model is pre-learned using the image data obtained only by rotation. Then, the parameters of the trained feature extraction layer are fixed, and the classification is trained (5000iteration) by the Linear-Regression model as the learning of the second task. Then, in the second task, the test classification accuracy (Top-1 Accuracy) is measured by the trained Linear-Regression model. The second task was performed 3 times each, and the mean and standard deviation were calculated.
  • FIG. 8 is a diagram showing the accuracy of each classification between the DNN model trained using the learning method according to the embodiment and the DNN trained using the learning method described in Non-Patent Document 1.
  • the classification accuracy of the DNN model trained using the learning method described in Non-Patent Document 1 was 42.97%.
  • the classification accuracy of the DNN model trained using the learning method according to the embodiment was 48.88%. Therefore, according to the embodiment, it was confirmed that the classification accuracy higher than that of the conventional method can be achieved.
  • a more robust classifier can be realized by adding other conversion predictions in addition to Rotation.
  • the present embodiment as the first task, a plurality of types of conversions are performed on one image data, and self-supervised learning is performed by multitask learning, so that not only geometric features but also color tones and edges are formed. It is possible to acquire various characteristics of the conversion contents related to. Then, in the present embodiment, as the first task, self-supervised learning that combines Rotation and data transformation that can acquire useful features is performed, so that the second task is compared with the case of only a single transformation. The accuracy of the task can be improved. Therefore, according to the present embodiment, it is possible to improve the accuracy of image processing by capturing the features of the image. Further, in the present embodiment, since learning can be performed by a scheme almost the same as that of the conventional self-supervised learning, additional implementation is easy.
  • the present embodiment since one image data is subjected to multiple conversions in the first task, an effect similar to data expansion can be obtained. Specifically, according to the present embodiment, it is possible to obtain an effect that the accuracy is higher than that of using the input obtained by converting each one by one. Further, in the present embodiment, by sharing the input from the feature extraction unit 151 to the classification unit 152 for the first task, it becomes possible to use the approximate expression when updating the gradient, and the calculation efficiency is improved. Dozens of times better.
  • the first task as conversion to the image data, not only Rotation but also ShearX, ShearY, Solarize, Brightness, Color, Contrast, Sharpness, two or more according to the processing contents of the second task. It is also possible to choose a conversion.
  • the second task is class classification has been described, but the second task is not limited to this, and may be any task such as object detection or segmentation that outputs some kind of output by inputting an image.
  • the input image data is converted in the order of Solarize, Color, and Sharpness.
  • a high effect can be expected by learning the conversion contents from various aspects by combining conversions having different properties.
  • each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed in arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.
  • all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed. It is also possible to automatically perform all or part of the above by a known method.
  • the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. That is, the processes described in the learning method and the speech recognition method are not only executed in chronological order according to the order of description, but also executed in parallel or individually as required by the processing capacity of the device that executes the processes. You may.
  • FIG. 9 is a diagram showing an example of a computer in which the learning device 10 or the image processing device 20 is realized by executing the program.
  • the computer 1000 has, for example, a memory 1010 and a CPU 1020.
  • the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.
  • Memory 1010 includes ROM 1011 and RAM 1012.
  • the ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to the hard disk drive 1031.
  • the disk drive interface 1040 is connected to the disk drive 1100.
  • a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100.
  • the serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120.
  • the video adapter 1060 is connected to, for example, the display 1130.
  • the hard disk drive 1031 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the program that defines each process of the learning device 10 or the image processing device 20 is implemented as a program module 1093 in which a code that can be executed by the computer 1000 is described.
  • the program module 1093 is stored in, for example, the hard disk drive 1031.
  • a program module 1093 for executing processing similar to the functional configuration in the learning device 10 or the image processing device 20 is stored in the hard disk drive 1031.
  • the hard disk drive 1031 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1031 into the RAM 1012 as needed, and executes the program.
  • the program module 1093 and the program data 1094 are not limited to the case where they are stored in the hard disk drive 1031, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.
  • LAN Local Area Network
  • WAN Wide Area Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A learning device (10) has: a self-supervised learning unit (11) that updates parameters including shared parameters for a features extraction layer in a first NN, by using image data that has been converted a plurality of times as inputs therefor, by using the first NN, and by performing multitask learning so as to estimate the conversion details for each type of conversion; and a second task learning unit (13) that uses any image data as inputs therefor, uses a second neural network applying the shared parameters learned by the self-supervised learning unit (11) to the features extraction layer, and learns second 2NN parameters so as to perform a prescribed processing. The first NN has the features extraction layer and a plurality of NN for prior learning, each NN corresponding to the respective conversion types for the plurality of conversions. The plurality of NN for prior learning share the features extraction layer and estimate the conversion details for each conversion type.

Description

学習装置、画像処理装置、学習方法及び学習プログラムLearning device, image processing device, learning method and learning program
 本発明は、学習装置、画像処理装置、学習方法及び学習プログラムに関する。 The present invention relates to a learning device, an image processing device, a learning method, and a learning program.
 深層学習(ディープニューラルネットワーク:DNN)は、画像認識などで大きな成功を収めている。例えば、DNNを使った画像認識では、DNNモデルに画像を入力すると、その画像が何を写しているのかという分類結果を出力する。 Deep learning (deep neural network: DNN) has been very successful in image recognition and so on. For example, in image recognition using DNN, when an image is input to the DNN model, a classification result of what the image reflects is output.
 この深層学習モデルでは、高い精度を実現するために大量のラベル付き学習データが必要となる。このラベル付き学習データは、収集した画像データに対して教師ラベルを付加する必要があり、例えば、医療画像等、専門家でなければラベルを付与することができない場合では、特にコストが高くなる。学習データの作成は、重大なコストであり、深層学習の利用を阻む大きな要因となっている。 This deep learning model requires a large amount of labeled learning data to achieve high accuracy. It is necessary to attach a teacher label to the collected image data of this labeled learning data, and the cost is particularly high when the label cannot be attached only to a specialist such as a medical image. Creating learning data is a significant cost and is a major impediment to the use of deep learning.
 ここで、ラベルつきデータが少量であってもDNNモデルを訓練する技術としていくつかの方法が提案されている。例えば、データが少量であってもDNNモデルを訓練する技術として、別のデータセットで学習済みのモデルを流用する転移学習(Transfer Learning)、元データを加工してデータ数を増やすデータ拡張(Data Augmentation)、データからラベルを作成し、サブタスクを解くことで特徴を獲得する自己教師学習(Self-Supervised Learning)(例えば、非特許文献1参照)が提案されている。 Here, some methods have been proposed as a technique for training the DNN model even if the amount of labeled data is small. For example, as a technique for training a DNN model even if the amount of data is small, transfer learning that diverts a model that has been trained in another data set, and data expansion that processes the original data to increase the number of data (Data). Augmentation), self-supervised learning (see, for example, Non-Patent Document 1) in which features are acquired by creating labels from data and solving subtasks has been proposed.
 非特許文献1に記載の方法は、回転角度を予測する事前タスクを解くことによって後段タスク(分類など)で役立つ表現を獲得する。この方法は、簡易でありながら強力な効果を得られるため、敵対的生成ネットワーク(GAN:Generative Adversarial Networks)の学習や、他の方法と組み合わせた学習に採用されている。 The method described in Non-Patent Document 1 acquires expressions useful in subsequent tasks (classification, etc.) by solving a pre-task for predicting a rotation angle. Since this method is simple yet has a powerful effect, it is adopted for learning of hostile generative networks (GAN: Generative Adversarial Networks) and learning in combination with other methods.
 非特許文献1に記載の方法は、教師なし学習によって獲得される表現の転用を行っている。具体的には、非特許文献1に記載の方法は、事前タスクで画像の回転角度を学習し、後段タスクでクラス分類を学習する。この方法では、回転角度を予測することによって、画像データの幾何的特徴を捉えることができるが、回転が定義できないようなデータや、簡単に回転角度を予測できる特徴を有するデータには、事前タスクの効果が低い。例えば、風景画像は、空の位置だけで画像の角度が予測できるため、この方法では事前タスクの効果が低い。 The method described in Non-Patent Document 1 diverts expressions acquired by unsupervised learning. Specifically, in the method described in Non-Patent Document 1, the rotation angle of the image is learned in the pre-task, and the classification is learned in the post-task. In this method, the geometrical features of the image data can be captured by predicting the rotation angle, but for data for which rotation cannot be defined or for data with features that can easily predict the rotation angle, a preliminary task is required. The effect of is low. For example, in a landscape image, the angle of the image can be predicted only by the position of the sky, so that the effect of the pre-task is low in this method.
 本発明は、上記に鑑みてなされたものであって、画像の特徴を捉えて画像処理の精度を高めることができる学習装置、画像処理装置、学習方法及び学習プログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a learning device, an image processing device, a learning method, and a learning program capable of capturing the features of an image and improving the accuracy of image processing. ..
 上述した課題を解決し、目的を達成するために、本発明に係る学習装置は、複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、第1ニューラルネットワークの特徴抽出層の共有パラメータを含むパラメータを更新する第1学習部と、任意の画像データを入力とし、第1学習部によって学習済みの共有パラメータを特徴抽出層に適用した第2ニューラルネットワークを用いて、所定処理を行うよう第2ニューラルネットワークのパラメータを学習する第2学習部と、を有し、第1ニューラルネットワークは、特徴抽出層と、複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、複数の事前学習用ニューラルネットワークは、特徴抽出層を共有し、変換種別ごとの変換内容を推定することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the learning device according to the present invention takes image data in which a plurality of conversions have been performed as an input, and uses a first neural network to perform conversion for each conversion type. By performing multi-task learning so as to estimate the contents, the first learning unit that updates the parameters including the shared parameters of the feature extraction layer of the first neural network and the first learning unit that inputs arbitrary image data The first neural network has a second learning unit that learns the parameters of the second neural network so as to perform predetermined processing by using the second neural network in which the learned shared parameters are applied to the feature extraction layer. It has a feature extraction layer and a plurality of pre-learning neural networks corresponding to each of a plurality of conversion conversion types. The plurality of pre-learning neural networks share the feature extraction layer and display the conversion contents for each conversion type. It is characterized by estimating.
 また、本発明に係る画像処理装置は、学習済みのパラメータが設定されたディープニューラルネットワークを有するモデルを用いて、入力された画像データに対して処理を行う処理部を有し、前記学習済みのパラメータは、複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、前記変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで更新された、前記第1ニューラルネットワークの特徴抽出層の共有パラメータ基づき、前記第1ニューラルネットワークは、前記特徴抽出層と、前記複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、前記複数の事前学習用ニューラルネットワークは、前記特徴抽出層を共有し、前記変換種別ごとの変換内容を推定することを特徴とする。 Further, the image processing apparatus according to the present invention has a processing unit that processes input image data using a model having a deep neural network in which trained parameters are set, and has been trained. The parameters were updated by inputting image data in which a plurality of conversions were performed and performing multitask learning so as to estimate the conversion contents for each conversion type of the conversions using the first neural network. Based on the shared parameters of the feature extraction layer of the first neural network, the first neural network has the feature extraction layer and a plurality of pre-learning neural networks corresponding to the conversion types of the plurality of transformations. A plurality of pre-learning neural networks share the feature extraction layer and estimate the conversion content for each conversion type.
 また、本発明に係る学習方法は、学習装置が実行する学習方法であって、複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、第1ニューラルネットワークの特徴抽出層の共有パラメータを含むパラメータを更新する第1学習工程と、任意の画像データを入力とし、第1学習工程において学習済みの共有パラメータを特徴抽出層に適用した第2ニューラルネットワークを用いて、所定処理を行うよう第2ニューラルネットワークのパラメータを学習する第2学習工程と、を含み、第1ニューラルネットワークは、特徴抽出層と、複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、複数の事前学習用ニューラルネットワークは、特徴抽出層を共有し、変換種別ごとの変換内容を推定することを特徴とする。 Further, the learning method according to the present invention is a learning method executed by a learning device, in which image data subjected to a plurality of conversions is input and conversion contents for each conversion type of conversion are used by using a first neural network. By performing multi-task learning so as to estimate each of the above, the first learning step of updating the parameters including the shared parameters of the feature extraction layer of the first neural network and the learning in the first learning step by inputting arbitrary image data. The first neural network includes a second learning step of learning the parameters of the second neural network so as to perform a predetermined process using the second neural network in which the already shared parameters are applied to the feature extraction layer. It has a layer and a plurality of pre-learning neural networks corresponding to each of a plurality of conversion conversion types, and the plurality of pre-learning neural networks share a feature extraction layer and estimate the conversion content for each conversion type. It is characterized by that.
 また、本発明に係る学習プログラムは、複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、第1ニューラルネットワークの特徴抽出層の共有パラメータを含むパラメータを更新する第1学習ステップと、任意の画像データを入力とし、第1学習ステップにおいて学習済みの共有パラメータを特徴抽出層に適用した第2ニューラルネットワークを用いて、所定処理を行うよう第2ニューラルネットワークのパラメータを学習する第2学習ステップと、をコンピュータに実行させ、第1ニューラルネットワークは、特徴抽出層と、複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、複数の事前学習用ニューラルネットワークは、特徴抽出層を共有し、変換種別ごとの変換内容を推定する。 Further, the learning program according to the present invention takes image data in which a plurality of conversions have been performed as an input, and uses a first neural network to perform multitasking learning so as to estimate the conversion contents for each conversion type of conversion. Then, the first learning step of updating the parameters including the shared parameters of the feature extraction layer of the first neural network and the shared parameters learned in the first learning step were applied to the feature extraction layer by inputting arbitrary image data. Using the second neural network, a computer is made to execute a second learning step of learning the parameters of the second neural network so as to perform a predetermined process, and the first neural network is a feature extraction layer and conversion of a plurality of transformations. It has a plurality of pre-learning neural networks corresponding to each type, and the plurality of pre-learning neural networks share a feature extraction layer and estimate the conversion content for each conversion type.
 本発明によれば、画像の特徴を捉えて画像処理の精度を高めることができる。 According to the present invention, it is possible to improve the accuracy of image processing by capturing the features of an image.
図1は、実施の形態に係る画像処理システムの構成の一例を示す図である。FIG. 1 is a diagram showing an example of a configuration of an image processing system according to an embodiment. 図2は、図1に示す学習装置の構成の一例を示す図である。FIG. 2 is a diagram showing an example of the configuration of the learning device shown in FIG. 図3は、変換処理を実施した画像データに対する変換内容の推定精度を示す図である。FIG. 3 is a diagram showing the estimation accuracy of the converted content for the image data subjected to the conversion process. 図4は、実施の形態に係る学習処理の流れを説明する図である。FIG. 4 is a diagram illustrating a flow of learning processing according to the embodiment. 図5は、図1に示す第1タスク学習部の構成の一例を示す図である。FIG. 5 is a diagram showing an example of the configuration of the first task learning unit shown in FIG. 図6は、実施の形態における画像処理システムが実施する画像処理の処理手順を示すフローチャートである。FIG. 6 is a flowchart showing a processing procedure of image processing performed by the image processing system according to the embodiment. 図7は、図6に示す学習処理の処理手順を示すフローチャートである。FIG. 7 is a flowchart showing a processing procedure of the learning process shown in FIG. 図8は、実施の形態に係る学習方法を用いて訓練したDNNモデルと、非特許文献1に記載の学習方法とを用いて訓練したDNNとの各分類精度を示す図である。FIG. 8 is a diagram showing the accuracy of each classification between the DNN model trained using the learning method according to the embodiment and the DNN trained using the learning method described in Non-Patent Document 1. 図9は、プログラムが実行されることにより、学習装置或いは画像処理装置が実現されるコンピュータの一例を示す図である。FIG. 9 is a diagram showing an example of a computer in which a learning device or an image processing device is realized by executing a program.
 以下に、本願に係る学習装置、画像処理装置、学習方法及び学習プログラムの実施の形態を図面に基づいて詳細に説明する。なお、本発明は、事前タスク(第1タスク)として、画像データに複数の変換を行ない、自己教師学習を行って各変換内容を推定するマルチタスク学習を行い、後段タスク(第2タスク)として、第1タスクの学習結果を流用し、画像データの分類処理を学習する例について説明する。また、本発明は、以下に説明する実施の形態により限定されるものではない。 Hereinafter, the learning device, the image processing device, the learning method, and the embodiment of the learning program according to the present application will be described in detail based on the drawings. In the present invention, as a pre-task (first task), a plurality of conversions are performed on the image data, self-supervised learning is performed, and multi-task learning for estimating each conversion content is performed, and as a latter-stage task (second task). , An example of learning the image data classification process by diverting the learning result of the first task will be described. Further, the present invention is not limited to the embodiments described below.
[実施の形態]
 まず、図1を参照して、実施の形態における画像処理システムについて説明する。図1は、実施の形態における画像処理システムの構成の一例を示す図である。
[Embodiment]
First, the image processing system according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of a configuration of an image processing system according to an embodiment.
 図1に示すように、実施の形態における画像処理システム1は、深層学習(ディープニューラルネットワーク:DNN)モデルを用いて画像データに対するクラス分類を行う画像処理装置20と、学習用の画像データの特徴を学習することによって、画像処理装置20が用いるDNNモデルのパラメータを設定する学習装置10とを有する。 As shown in FIG. 1, the image processing system 1 in the embodiment is characterized by an image processing device 20 that classifies image data using a deep learning (deep neural network: DNN) model and image data for learning. It has a learning device 10 for setting parameters of the DNN model used by the image processing device 20 by learning the above.
 学習装置10は、自己教師学習部11(第1学習部)と、自己教師学習部11の後段に設けられた第2タスク学習部13(第2学習部)とを有する。 The learning device 10 has a self-supervised learning unit 11 (first learning unit) and a second task learning unit 13 (second learning unit) provided after the self-supervised learning unit 11.
 自己教師学習部11は、第1ニューラルネットワーク(NN)を用いて、学習用の画像データに対する自己教師学習を行なう。自己教師学習部11は、画像データに複数の種別の変換を行なう。ここで、第1NNは、複数の変換が行なわれた画像データから変換に関する特徴を抽出する特徴抽出層と、画像データに対する変換の変換種別ごとに変換内容をそれぞれ推定する事前学習用の複数のNNと、を有する。自己教師学習部11は、複数の変換が行なわれた画像データを入力とし、第1NNを用いて、変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、第1NNの特徴抽出層の共有パラメータを含むパラメータを更新する第1タスクを行う。自己教師学習部11は、学習データ生成部14と、第1タスク学習部15とを有する。 The self-supervised learning unit 11 uses the first neural network (NN) to perform self-supervised learning on image data for learning. The self-supervised learning unit 11 performs a plurality of types of conversion into image data. Here, the first NN is a feature extraction layer that extracts features related to conversion from image data that has been subjected to a plurality of conversions, and a plurality of NNs for pre-learning that estimate conversion contents for each conversion type of conversion to image data. And have. The self-supervised learning unit 11 takes the image data in which a plurality of conversions have been performed as input, and uses the first NN to perform multitask learning so as to estimate the conversion contents for each conversion type of the conversion, thereby performing the first NN. Perform the first task of updating the parameters including the shared parameters of the feature extraction layer. The self-supervised learning unit 11 has a learning data generation unit 14 and a first task learning unit 15.
 学習データ生成部14は、入力された学習用の画像データ30を基に、自己教師データを生成するとともに、学習用の画像データに対し複数の種別の変換を行なう。 The learning data generation unit 14 generates self-teacher data based on the input learning image data 30, and also performs a plurality of types of conversions on the learning image data.
 第1タスク学習部15は、学習データ生成部14によって生成された自己教師データと、複数の種別の変換が行なわれた画像データとを用いて、複数の変換内容の特徴を、マルチタスク学習を行うことによって学習する。 The first task learning unit 15 uses the self-teacher data generated by the learning data generation unit 14 and the image data to which a plurality of types of conversions have been performed to perform multitask learning on the characteristics of a plurality of conversion contents. Learn by doing.
 第1タスク学習部15は、複数の変換が行なわれた画像データから変換に関する特徴を抽出する特徴抽出層の共有パラメータを共有して、複数の変換にそれぞれ対応する事前学習用の複数のNNを用いて、変換の変換種別ごとに変換内容をそれぞれ推定するようマルチタスク学習を行うことで、複数のNNの各パラメータ及び特徴抽出層の共有パラメータを更新する。特徴抽出層は、多数の非線形関数を含んだDNNモデルによって構成される。第1タスク学習部15は、パラメータを更新したDNNモデル(第1学習済みDNN16)を第2タスク学習部13に出力する。 The first task learning unit 15 shares the shared parameters of the feature extraction layer that extracts the features related to the conversion from the image data in which the plurality of conversions have been performed, and sets a plurality of NNs for pre-learning corresponding to the plurality of conversions. By performing multi-task learning so as to estimate the conversion content for each conversion type of conversion, each parameter of a plurality of NNs and the shared parameter of the feature extraction layer are updated. The feature extraction layer is composed of a DNN model containing a large number of non-linear functions. The first task learning unit 15 outputs the DNN model (first learned DNN 16) with updated parameters to the second task learning unit 13.
 第2タスク学習部13は、第1タスクの学習結果を流用し、所定処理を学習する第2タスクを行う。第2タスク学習部13は、任意の画像データを入力とし、自己教師学習部11によって学習済みの共有パラメータを特徴抽出層に適用した第2NNを用いて、所定処理を行うよう第2NNのパラメータを学習する。教師あり学習用学習データ40は、学習用の学習データと、各学習データのクラスを示す教師データとの対からなる。第2タスク学習部13は、学習によってパラメータを更新したDNN(第2学習済みDNN17)を画像処理装置20に出力する。なお、以降では、第2タスク学習部13が、所定処理として、画像データの分類処理(クラス分類)を行う例を説明するが、所定処理は、クラス分類に限らず、セグメンテーションや物体範囲検出等であってもよい。 The second task learning unit 13 uses the learning result of the first task to perform the second task of learning the predetermined process. The second task learning unit 13 inputs arbitrary image data, and uses the second NN to which the shared parameters learned by the self-supervised learning unit 11 are applied to the feature extraction layer, and sets the parameters of the second NN so as to perform predetermined processing. learn. The supervised learning learning data 40 is composed of a pair of learning data for learning and teacher data indicating a class of each learning data. The second task learning unit 13 outputs the DNN (second learned DNN 17) whose parameters have been updated by learning to the image processing device 20. In the following, an example in which the second task learning unit 13 performs image data classification processing (class classification) as predetermined processing will be described, but the predetermined processing is not limited to class classification, such as segmentation and object range detection. It may be.
 画像処理装置20は、第2学習済みDNN17のパラメータをDNNモデルに適用して、処理対象の画像データに対するクラス分類(所定処理(クラス分類、セグメンテーション、物体範囲検出等))を行う解析部21を有する。画像処理装置20は、解析部21によるクラス分類結果を、画像データに対する推定結果として出力する。 The image processing device 20 applies the parameters of the second trained DNN 17 to the DNN model, and performs an analysis unit 21 that performs class classification (predetermined processing (class classification, segmentation, object range detection, etc.)) on the image data to be processed. Have. The image processing device 20 outputs the classification result by the analysis unit 21 as an estimation result for the image data.
[学習装置]
 次に、図1に示す学習装置10について説明する。図2は、図1に示す学習装置の構成の一例を示す図である。
[Learning device]
Next, the learning device 10 shown in FIG. 1 will be described. FIG. 2 is a diagram showing an example of the configuration of the learning device shown in FIG.
 図1に示すように、学習装置10において、学習データ生成部14は、自己教師データ生成部141とデータ変換部142とを有する。 As shown in FIG. 1, in the learning device 10, the learning data generation unit 14 has a self-teacher data generation unit 141 and a data conversion unit 142.
 自己教師データ生成部141は、入力された学習用の画像データを基に、自己教師データ32を生成し、第1パラメータ更新部153(後述)に出力する。自己教師データ生成部141は、自己教師データとして、各画像データに対応するクラスを生成する。 The self-teacher data generation unit 141 generates self-teacher data 32 based on the input image data for learning, and outputs it to the first parameter update unit 153 (described later). The self-teacher data generation unit 141 generates a class corresponding to each image data as self-teacher data.
 データ変換部142は、学習用の画像データ30に対し複数の変換を行なう。ここで、学習装置10では、画像データに対する変換として、Rotation、ShearX、ShearY、Solarize、Brightness、Color、Contrast、Sharpnessのいずれか二以上の変換を実行する。データ変換部142は、二以上の変換を実行した画像データ31を特徴抽出部151(後述)に出力する。 The data conversion unit 142 performs a plurality of conversions on the image data 30 for learning. Here, the learning device 10 executes any two or more conversions of Rotation, ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness as conversions for the image data. The data conversion unit 142 outputs the image data 31 that has undergone two or more conversions to the feature extraction unit 151 (described later).
 Rotationは、回転処理である。Shear Xは、水平方向の湾曲処理である。Shear Yは、垂直方向の湾曲である。Solarizeは、反転処理であり、ピクセル値の大きさが閾値を超えた画素を反転する。Brightnessは、輝度調整処理であり、例えば、ピクセル値の大きさが0である場合には黒イメージに変換し、1である場合にはそのままとする。Colorは、色調調整処理であり、カラーTVセット同様に、ピクセル値の大きさが0である場合には黒または白に変換し、1である場合にはそのままとする。Contrastは、コントラスト調整処理であり、ピクセル値の大きさが0である場合にはグレーイメージに変換し、1である場合にはそのままとする。Sharpnessは、輪郭(エッジ)鮮明化処理である。 Rotation is a rotation process. Shear X is a horizontal bending process. Shear Y is a vertical curve. Solarize is an inversion process, and inverts pixels whose pixel value exceeds a threshold value. Brightness is a brightness adjustment process. For example, when the size of the pixel value is 0, it is converted into a black image, and when it is 1, it is left as it is. Color is a color tone adjustment process, and like the color TV set, when the size of the pixel value is 0, it is converted to black or white, and when it is 1, it is left as it is. Contrast is a contrast adjustment process, and when the size of the pixel value is 0, it is converted into a gray image, and when it is 1, it is left as it is. Sharpness is an edge sharpening process.
 これらの処理は、データ拡張の文脈で使用されるデータ変換を基に設定した。具体的には、Auto Augmentation(参考文献1を参照)で用いられている変換から、変換の度合いを4段階の離散値で設定可能な変換を抽出した。
参考文献1:Ekin D. Cubuk et al,“AutoAugment: Learning Augmentation Policies from Data”,arXiv preprint arXiv:1805.09501 (ICML 2019).
These processes were set up based on the data transformations used in the context of data expansion. Specifically, from the transformations used in Auto Augmentation (see Reference 1), transformations in which the degree of transformation can be set with four discrete values were extracted.
Reference 1: Ekin D. Cubuk et al, “AutoAugment: Learning Augmentation Policies from Data”, arXiv preprint arXiv: 1805.09501 (ICML 2019).
 ここで、学習装置10で実行する変換種別を設定するにあたり、合計12種類の変換を対象とした予備実験を実施した。12種類の変換は、Shear X、Shear Y、Translate X、Translate Y、Solarize、Posterize、Contrast、Color、Brightness、Sharpness、CutOut(Size予測)、CutOut(数)である。以降において、上記のShear X、Shear Y等、変換の種類を変換種別とし、変換種別ごとの変換の内容(Rotationなら、回転の度合いを4段階で設定したもの)を変換内容とする。 Here, in setting the conversion type to be executed by the learning device 10, a preliminary experiment was conducted for a total of 12 types of conversion. The 12 types of conversions are Shear X, Shear Y, Translate X, Translate Y, Solarize, Posterize, Contrast, Color, Brightness, Sharpness, CutOut (Size prediction), and CutOut (number). In the following, the conversion type such as Shear X and Shear Y described above will be the conversion type, and the conversion content for each conversion type (in the case of Rotation, the degree of rotation is set in four stages) will be the conversion content.
 図3は、各変換処理を実施した画像データを事前学習に用いた場合の、後段のタスク(クラス分類)の推定精度を示す図である。参考までに、特徴抽出層を乱数で初期化したRandom条件下のモデルの推定精度と、特徴抽出層をRotationの推定タスクにより事前学習を行ったモデルの推定精度とをグラフの一番右と、右から2番目に示す。図3に示すように、ShearX、ShearY、Solarize、Brightness、Color、Contrast、Sharpnessは、Random条件よりも、推定精度が上回ることが判明した。そこで、本実施の形態では、データ変換部142が実施する変換処理の変換種別として、Rotation、ShearX、ShearY、Solarize、Brightness、Color、Contrast、Sharpnessのいずれか二以上の変換を設定する。 FIG. 3 is a diagram showing the estimation accuracy of the task (classification) in the latter stage when the image data subjected to each conversion process is used for the pre-learning. For reference, the estimation accuracy of the model under the Random condition where the feature extraction layer is initialized with random numbers and the estimation accuracy of the model in which the feature extraction layer is pre-trained by the Rotation estimation task are shown on the far right of the graph. It is shown second from the right. As shown in FIG. 3, it was found that the estimation accuracy of ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness was higher than that of the Random condition. Therefore, in the present embodiment, any two or more conversions of Rotation, ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness are set as the conversion type of the conversion process performed by the data conversion unit 142.
 また、データ変換部142は、入力された学習用の画像データ30に対し、変換の性質が異なる複数の変換を行なう。例えば、データ変換部142は、Rotationに加え、他のデータ変換を追加していく場合には、性質が異なる変換、言い換えると、変換同士が影響を与えにくい変換、或いは、特徴量が直行している変換から追加する。 Further, the data conversion unit 142 performs a plurality of conversions having different conversion properties on the input image data 30 for learning. For example, when the data conversion unit 142 adds other data conversions in addition to Rotation, the conversions have different properties, in other words, the conversions are less likely to affect each other, or the features are orthogonal. Add from the conversion that is.
 また、データ変換部142は、入力された画像データ30に対し、第2タスク学習部13が実行する第2タスク(所定処理)の内容に合わせて、実行する複数の変換を設定する。第2タスクがクラス分類の場合、データ変換部142は、第1タスクにおいて、Rotation, Sharpness, Solarizeの順で変換を行なう。このように、Rotation,Sharpness,Solarizeの順に変換することによって、画像内のエッジを正しく強調させた上で変換することができる。 Further, the data conversion unit 142 sets a plurality of conversions to be executed for the input image data 30 according to the contents of the second task (predetermined processing) executed by the second task learning unit 13. When the second task is a class classification, the data conversion unit 142 converts in the order of Rotation, Sharpness, and Solarize in the first task. By converting in the order of Rotation, Sharpness, and Solarize in this way, it is possible to convert after correctly emphasizing the edges in the image.
 第1タスク学習部15は、特徴抽出部151、第1タスク用分類部152(推定部)、及び、第1パラメータ更新部153を有する。 The first task learning unit 15 has a feature extraction unit 151, a first task classification unit 152 (estimation unit), and a first parameter update unit 153.
 特徴抽出部151は、DNNで構成された特徴抽出層を有し、データ変換部142において複数の変換が行なわれた画像データ31から、特徴抽出層を構成するDNNの共有パラメータに基づき、特徴(具体的には、例えば、特徴ベクトル)を抽出する。特徴抽出部151の特徴抽出層には、第1パラメータ更新部153によって更新されたパラメータである共有パラメータが設定される。共有パラメータは、入力画像に対する変換内容を推定する事前学習を通して、各変換内容を捉えるのに有用な特徴(例えば、画像内に存在する物体の形状、テクスチャ、色に関する特徴)を抽出するよう学習される。 The feature extraction unit 151 has a feature extraction layer composed of a DNN, and is based on the shared parameters of the DNNs constituting the feature extraction layer from the image data 31 that has been subjected to a plurality of conversions by the data conversion unit 142. Specifically, for example, a feature vector) is extracted. A shared parameter, which is a parameter updated by the first parameter update unit 153, is set in the feature extraction layer of the feature extraction unit 151. Shared parameters are learned to extract features useful for capturing each transformation (eg, features related to the shape, texture, and color of objects present in the image) through pre-learning to estimate the transformations for the input image. To.
 第1タスク用分類部152は、複数の変換にそれぞれ対応する事前学習用の複数のNNを有し、特徴抽出部151によって抽出された特徴を基に、画像データ31に対する変換種別ごとの変換内容を推定する。第1タスク用分類部152は、特徴抽出部151が抽出した特徴ベクトルを基に、複数の変換が施された画像データ31のクラス分類を行う。事前学習用の複数のNNには、初期値が設定されている。初期値として、例えば、一様分布の乱数から生成された値が用いられる。そして、事前学習用の複数のNNには、第1パラメータ更新部153によってそれぞれ更新されたパラメータが設定される。 The first task classification unit 152 has a plurality of NNs for pre-learning corresponding to each of the plurality of conversions, and based on the features extracted by the feature extraction unit 151, the conversion contents for each conversion type for the image data 31. To estimate. The first task classification unit 152 classifies the image data 31 that has undergone a plurality of conversions based on the feature vector extracted by the feature extraction unit 151. Initial values are set for a plurality of NNs for pre-learning. As the initial value, for example, a value generated from a uniformly distributed random number is used. Then, the parameters updated by the first parameter update unit 153 are set in the plurality of NNs for pre-learning.
 第1パラメータ更新部153は、第1タスク用分類部152が推定した各変換内容と、自己教師データ生成部141において生成された自己教師データ32とを基に、第1ニューラルネットワークに対しマルチタスク学習を実行し、事前学習用のNNのパラメータを更新するとともに、特徴抽出層の共有パラメータを更新する。これによって、特徴抽出層のDNNのパラメータは、データ変換部142によって複数の変換を行なわれた画像データの特徴を反映したものとなる。第1タスク学習部15は、パラメータを更新が完了したDNNモデル(第1学習済みDNN16)を第2タスク学習部13に出力する。事前学習用のNNのパラメータは、第1タスク用分類部152の、複数の変換種別に対応する各分類層に対応する。 The first parameter update unit 153 multitasks the first neural network based on each conversion content estimated by the first task classification unit 152 and the self-teacher data 32 generated by the self-teacher data generation unit 141. Learning is executed, the parameters of the NN for pre-learning are updated, and the shared parameters of the feature extraction layer are updated. As a result, the DNN parameters of the feature extraction layer reflect the features of the image data that has been subjected to a plurality of conversions by the data conversion unit 142. The first task learning unit 15 outputs the DNN model (first learned DNN 16) whose parameters have been updated to the second task learning unit 13. The NN parameters for pre-learning correspond to each classification layer corresponding to a plurality of conversion types of the first task classification unit 152.
 次に、第2タスク学習部13について説明する。第2タスク学習部13は、特徴抽出部131、第2タスク用分類部132及び第2パラメータ更新部133を有する。第2タスク学習部13は、教師あり学習用学習データ40として、学習用の画像データ41と、教師データ42との入力を受け付ける。 Next, the second task learning unit 13 will be described. The second task learning unit 13 has a feature extraction unit 131, a second task classification unit 132, and a second parameter update unit 133. The second task learning unit 13 accepts the input of the image data 41 for learning and the teacher data 42 as the supervised learning learning data 40.
 特徴抽出部131は、DNNで構成された特徴抽出層を有する。特徴抽出部131は、第1学習済みDNN16のパラメータをDNNに適用して、学習用の画像データ41から特徴(例えば、特徴ベクトル)を抽出する。この際抽出する特徴量は、特徴抽出部151と同じものとなる。 The feature extraction unit 131 has a feature extraction layer composed of DNN. The feature extraction unit 131 applies the parameters of the first trained DNN 16 to the DNN and extracts features (for example, feature vectors) from the image data 41 for learning. The feature amount to be extracted at this time is the same as that of the feature extraction unit 151.
 第2タスク用分類部132は、NNを有し、特徴抽出部131が抽出した特徴を基に、学習用の画像データ41のクラス分類を行い、分類したクラスを、推定結果として第2パラメータ更新部133に出力する。 The classification unit 132 for the second task has an NN, classifies the image data 41 for learning based on the features extracted by the feature extraction unit 131, and updates the classified class as the second parameter as an estimation result. Output to unit 133.
 第2パラメータ更新部133は、第2タスク用分類部132による推定結果と、教師データ42とを基に、特徴抽出部131の特徴抽出層のパラメータと、第2タスク用分類部132のNNのパラメータとを更新する。第2タスク学習部13は、学習済み(学習によるパラメータ更新が完了した)NN及びDNN(第2学習済みDNN17)を画像処理装置20に出力する。 The second parameter update unit 133 sets the parameters of the feature extraction layer of the feature extraction unit 131 and the NN of the second task classification unit 132 based on the estimation result by the second task classification unit 132 and the teacher data 42. Update the parameters. The second task learning unit 13 outputs the learned NN and DNN (second learned DNN 17) that have been learned (parameter update by learning is completed) to the image processing device 20.
[学習処理の流れ]
 続いて、学習装置10が実行する学習処理の流れについて説明する。図4は、実施の形態に係る学習処理の流れを説明する図である。図5は、図1に示す第1タスク学習部15の構成の一例を示す図である。
[Flow of learning process]
Subsequently, the flow of the learning process executed by the learning device 10 will be described. FIG. 4 is a diagram illustrating a flow of learning processing according to the embodiment. FIG. 5 is a diagram showing an example of the configuration of the first task learning unit 15 shown in FIG.
 第1タスクでは、特徴抽出層のパラメータを共有してマルチタスク学習を行う。具体的には、図4の(1)及び図5に示すように、学習用の画像データ311に対し、データ変換器(自己教師データ生成部141)でRotation、Solarize、Sharpnessを連続して実行するとともに、自己教師データ32として、画像データ311のクラスを生成し、自己教師データ32とする。なお、クラスの具体例を説明する。例えば、Rotationの場合には、{0,90,180,270}の4段階の角度がクラスとなる。また、Sharpnessの場合、{0.0,1.0,1.5,2.0}の4段階の強調度がクラスとなる。また、Solarizeの場合、{0,96,192,256}の4段階の反転しきい値がクラスとなる。このため、学習データ生成部14は、画像に対して4段階の変換を行い、その段階に応じたクラスを生成する。 In the first task, multitask learning is performed by sharing the parameters of the feature extraction layer. Specifically, as shown in (1) and FIG. 5 of FIG. 4, Rotation, Solarize, and Sharpness are continuously executed by the data converter (self-teacher data generation unit 141) for the image data 311 for learning. At the same time, as the self-teacher data 32, a class of image data 311 is generated and used as the self-teacher data 32. A specific example of the class will be described. For example, in the case of Rotation, the four-step angle of {0,90,180,270} is the class. In the case of Sharpness, the class has four levels of emphasis of {0.0,1.0,1.5,2.0}. In the case of Solarize, the class is a four-step inversion threshold value of {0,96,192,256}. Therefore, the learning data generation unit 14 performs four-step conversion on the image and generates a class according to the step.
 そして、第1タスクでは、共有パラメータθshが設定された特徴抽出層1511(特徴抽出部151)で画像データ311から特徴ベクトルを抽出する。特徴抽出層1511において抽出された特徴ベクトルは、第1タスク用分類部152におけるRotation用分類層(NN)1521、Solarize用分類層(NN)1522及びSharpness用分類層(NN)1523に出力される。 Then, in the first task, the feature extraction layer 1511 (feature extraction unit 151) in which the shared parameter θ sh is set extracts the feature vector from the image data 311. The feature vector extracted in the feature extraction layer 1511 is output to the Rotation classification layer (NN) 1521, the Solarize classification layer (NN) 1522, and the Sharpness classification layer (NN) 1523 in the first task classification unit 152. ..
 Rotation用分類層1521、Solarize用分類層1522及びSharpness用分類層1523は、それぞれ専用のパラメータ(θ~θ)を適用して、4クラス分類を行い、それぞれ対応する各変換の4段階の度合いを推定する。すなわち、Rotation用分類層1521は、Rotationに関するクラス分類を行う。Solarize用分類層1522は、Solarizeに関するクラス分類を行う。Sharpness用分類層1523は、Sharpnessに関するクラス分類を行う。 The Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523 are classified into four classes by applying their own parameters (θ 1 to θ 3 ), and each of the corresponding conversions has four stages. Estimate the degree. That is, the Rotation classification layer 1521 classifies the Rotation class. The classification layer 1522 for Solarize classifies Solarize. The Sharpness classification layer 1523 classifies Sharpness.
 第1パラメータ更新部153は、Rotation用分類層1521、Solarize用分類層1522及びSharpness用分類層1523による各分類結果と自己教師データ32とを基に、Rotation用分類層1521、Solarize用分類層1522及びSharpness用分類層1523の各パラメータを更新する。 The first parameter update unit 153 is based on the classification results of the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523 and the self-teacher data 32, and the Rotation classification layer 1521 and the Solarize classification layer 1522. And each parameter of the classification layer 1523 for Sharpness is updated.
 第1パラメータ更新部153は、Rotation用分類層1521、Solarize用分類層1522及びSharpness用分類層1523から出力された4クラス分類の損失を、ソフトマックスクロスエントロピーを用いて計算する。具体的には、第1パラメータ更新部153は、式(1)を用いて、Rotation用分類層1521、Solarize用分類層1522及びSharpness用分類層1523によって分類されたクラスと、自己教師データ32で示されたクラスとの間の損失をRotation、Solarize、Sharpnessの変換(タスク)ごとに計算する。 The first parameter update unit 153 calculates the loss of the four-class classification output from the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523 using the softmax cross entropy. Specifically, the first parameter update unit 153 uses the equation (1) to divide the classes classified by the Rotation classification layer 1521, the Solarize classification layer 1522, and the Sharpness classification layer 1523, and the self-teacher data 32. Calculate the loss to and from the indicated class for each Rotation, Solarize, and Sharpness transformation (task).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 (1)式において、Lは、各タスクの損失である。τは、自己教師データのクラス数であり、今回の例では4となる。t(=1,・・・,T)は、タスク番号である。今回の例では、タスク1はRotationであり、タスク2はSolarizeであり、タスク3はSharpnessである。θshは、特徴抽出層1511の共通パラメータである。θは、タスクt固有のパラメータである。cはθが適用された分類層に特徴ベクトルを入力したときの出力である。Fは、θshに相当する。 (1) In the formula, L T is the loss of each task. τ is the number of classes of self-teacher data, which is 4 in this example. t (= 1, ..., T) is a task number. In this example, task 1 is Rotation, task 2 is Solarize, and task 3 is Sharpness. θ sh is a common parameter of the feature extraction layer 1511. θ t is a parameter peculiar to the task t. c T is the output when the feature vector is input to the classification layer to which θ t is applied. F corresponds to θ sh.
 第1パラメータ更新部153は、各タスクの損失と勾配とを基に、各パラメータθ~θを逆誤差伝搬法で更新する。Rotation用パラメータ更新部1531は、Rotation用分類層1521のパラメータθを更新する。Solarize用パラメータ更新部1532は、Solarize用分類層1522のパラメータθを更新する。Sharpness用パラメータ更新部1533は、Sharpness分類層1523のパラメータθを更新する。 The first parameter update unit 153 updates each parameter θ 1 to θ 3 by the inverse error propagation method based on the loss and the gradient of each task. The Rotation parameter update unit 1531 updates the parameter θ 1 of the Rotation classification layer 1521. The Solarize parameter update unit 1532 updates the parameter θ 2 of the Solarize classification layer 1522. The Sharpness parameter update unit 1533 updates the parameter θ 3 of the Sharpness classification layer 1523.
 続いて、第1パラメータ更新部153は、各タスクの損失及び勾配を基に、Frank-Wolfe法(参考文献2参照)を用いて共有パラメータθthに対する全体損失を計算し、全体損失が最小となるように、共有パラメータθthを最適化する。第1パラメータ更新部153は、マルチタスク向けのFrank-Wolfe法を用いて共有パラメータθthを最適化することによって、パラメータ探索の煩雑さを省略し、少量の追加計算コストで精度を改善する。
参考文献2:Ozan Sener and Vladlen Koltun,“Multi-Task learning as Multi-Objective Optimization”, Advances in Neural Information Processing Systems. 201 (NIPS 2018)
Subsequently, the first parameter update unit 153 calculates the total loss with respect to the shared parameter θth using the Frank-Wolfe method (see Reference 2) based on the loss and gradient of each task, and determines that the total loss is the minimum. The shared parameter θ th is optimized so that The first parameter updating unit 153, by optimizing the sharing parameter theta th with Frank-Wolfe method for multi-tasking, omit the complexity of parameter search, which improves the accuracy in a small amount of additional computation cost.
Reference 2: Ozan Sener and Vladlen Koltun, “Multi-Task learning as Multi-Objective Optimization”, Advances in Neural Information Processing Systems. 201 (NIPS 2018)
 まず、第1パラメータ更新部153は、式(2)を用いて、共有パラメータθthに対する全体損失の最小化を行う。 First, the first parameter update unit 153 minimizes the total loss with respect to the shared parameter θ th by using the equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 式(2)において、αは、タスク間の調整のための重みである。Z(=z,・・,z)は、θshの更新を行うためのパラメータである。ziは、g(xi;θsh)である。 In equation (2), α t is a weight for coordination between tasks. Z (= z 1 , ..., Z t ) is a parameter for updating θ sh. z i is g (x i ; θ sh ).
 具体的には、第1パラメータ更新部153は、式(2)を用いて、特徴抽出用パラメータ更新部1534において各タスクのパラメータθ~θでの勾配を用いて重み計算を行い、共有パラメータθsh、各タスクのパラメータθ~θの全体損失と勾配とを計算し、全体損失の逆誤差伝播法による共有パラメータθshの更新を行う。 Specifically, the first parameter update unit 153 performs weight calculation using the gradients of the parameters θ 1 to θ 3 of each task in the feature extraction parameter update unit 1534 using the equation (2), and shares the weight. The parameter θ sh and the total loss and gradient of the parameters θ 1 to θ 3 of each task are calculated, and the shared parameter θ sh is updated by the inverse error propagation method of the total loss.
 自己教師学習部11は、所定の終了条件に達するまで、学習データ生成部14、特徴抽出層1511、第1タスク用分類部152、第1パラメータ更新部153における処理を繰り返す。終了条件は、例えば、学習ステップ数が、予め設定された最大学習ステップ数になることである。 The self-supervised learning unit 11 repeats the processes in the learning data generation unit 14, the feature extraction layer 1511, the first task classification unit 152, and the first parameter update unit 153 until a predetermined end condition is reached. The end condition is, for example, that the number of learning steps becomes a preset maximum number of learning steps.
 続いて、図4の(2)に示すように、第2タスク学習部13は、第1学習済みDNN16のパラメータを、特徴抽出部131の特徴抽出層1311に適用し、教師あり学習用学習データ40に対して、学習を行う。なお、教師あり学習用学習データ40は、画像データと、所定の処理を行った際の正解出力とからなる対の集合である。 Subsequently, as shown in (2) of FIG. 4, the second task learning unit 13 applies the parameters of the first learned DNN 16 to the feature extraction layer 1311 of the feature extraction unit 131, and supervised learning learning data. Learning is performed for 40. The supervised learning learning data 40 is a set of pairs composed of image data and correct answer output when a predetermined process is performed.
 具体的には、特徴抽出層1311が教師あり学習用学習データの画像データ41から特徴ベクトルを抽出する。そして、第2タスク用分類部の分類層1321が、特徴ベクトルを基に、画像データ41のクラス分類を行う。第2パラメータ更新部133は、分類層1321が分類したクラスと、教師データ42で示されたクラスとの間の損失を基に、特徴抽出層1311のDNNモデルのパラメータと、分類層13211のNNのパラメータとを更新する。第2タスク学習部13は、学習によってパラメータを更新したNN及びDNN(第2学習済みDNN17)を画像処理装置20に出力する。なお、第2タスク学習部13は、所定処理(後段タスク、第2タスク)として任意の処理を行なえばよい。図4の例では、第2タスク学習部13は、教師あり学習を実行したが、強化学習等の教師なし学習でもよい。 Specifically, the feature extraction layer 1311 extracts a feature vector from the image data 41 of the supervised learning data. Then, the classification layer 1321 of the classification unit for the second task classifies the image data 41 based on the feature vector. The second parameter update unit 133 sets the parameters of the DNN model of the feature extraction layer 1311 and the NN of the classification layer 13211 based on the loss between the class classified by the classification layer 1321 and the class shown by the teacher data 42. Update the parameters of. The second task learning unit 13 outputs the NN and the DNN (second learned DNN 17) whose parameters have been updated by learning to the image processing device 20. The second task learning unit 13 may perform arbitrary processing as predetermined processing (second-stage task, second task). In the example of FIG. 4, the second task learning unit 13 executed supervised learning, but unsupervised learning such as reinforcement learning may be used.
 このように、学習装置10は、Rotation以外の変換を行なう第1タスクにおいて、Rotationで得られる特徴以外の特徴を獲得する。そして、学習装置10は、RotationとともにRotation以外のデータ変換を同時使用し、第1タスクにおいてマルチタスク学習を行う。この結果、学習装置10は、第1タスクにおいて、Rotationに限らず、画像データに対する複数の変換の特徴を適切に捉えた第1学習済みDNN16を設定できるため、第2タスクにおいて、分類タスクの精度を底上げすることができる。 In this way, the learning device 10 acquires features other than the features obtained by Rotation in the first task of performing conversion other than Rotation. Then, the learning device 10 simultaneously uses data conversion other than Rotation together with Rotation, and performs multitask learning in the first task. As a result, in the first task, the learning device 10 can set the first trained DNN 16 that appropriately captures the characteristics of a plurality of conversions for image data, not limited to Rotation. Therefore, in the second task, the accuracy of the classification task is correct. Can be raised.
 なお、図4及び図5では、第1タスクでは、画像処理装置20における処理内容に対応させて、Rotation、ShearX、ShearY、Solarize、Brightness、Color、Contrast、Sharpnessのいずれか二以上の変換を実行して、マルチタスク学習を行なえばよい。 In addition, in FIGS. 4 and 5, in the first task, any two or more conversions of Rotation, ShearX, ShearY, Solarize, Brightness, Color, Contrast, and Sharpness are executed according to the processing contents in the image processing apparatus 20. Then, multitask learning should be performed.
[画像処理の処理手順]
 次に、実施の形態における画像処理システム1が実施する画像処理の流れについて説明する。図6は、実施の形態における画像処理システム1が実施する画像処理の処理手順を示すフローチャートである。
[Image processing procedure]
Next, the flow of image processing performed by the image processing system 1 in the embodiment will be described. FIG. 6 is a flowchart showing a processing procedure of image processing performed by the image processing system 1 in the embodiment.
 図6に示すように、画像処理システム1では、学習装置10が、画像処理装置20が用いるDNNモデルのパラメータを設定するために、学習用の画像データの特徴を学習する学習処理を行う(ステップS1)。画像処理装置20は、学習装置10の学習処理によってパラメータが設定された第2学習済みDNN17を用いて、画像データに対するクラス分類を行う画像解析処理を行う(ステップS2)。 As shown in FIG. 6, in the image processing system 1, the learning device 10 performs a learning process of learning the characteristics of the image data for learning in order to set the parameters of the DNN model used by the image processing device 20 (step). S1). The image processing device 20 performs an image analysis process for classifying the image data using the second trained DNN 17 whose parameters are set by the learning process of the learning device 10 (step S2).
[学習処理の処理手順]
 次に、学習処理(ステップS1)の処理手順について説明する。図7は、図6に示す学習処理の処理手順を示すフローチャートである。
[Processing procedure of learning process]
Next, the processing procedure of the learning process (step S1) will be described. FIG. 7 is a flowchart showing a processing procedure of the learning process shown in FIG.
 図7に示すように、学習装置10では、学習ステップを初期値にし、最初の学習ステップを実行する。具体的には、自己教師学習部11の学習データ生成部14が、入力された学習用の画像データ30を基に、自己教師データを生成するとともに、学習用の画像データに対し複数の変換を行なう(ステップS11)。 As shown in FIG. 7, the learning device 10 sets the learning step as an initial value and executes the first learning step. Specifically, the learning data generation unit 14 of the self-supervised learning unit 11 generates self-teacher data based on the input image data 30 for learning, and performs a plurality of conversions on the image data for learning. (Step S11).
 続いて、特徴抽出部151では、共有パラメータθshが設定された特徴抽出層1511が、複数の変換が行なわれた画像データ311から特徴ベクトルを抽出する特徴抽出処理を行う(ステップS12)。第1タスク用分類部152は、複数の変換にそれぞれ対応する事前学習用の複数のNNを用いて、特徴抽出部151によって抽出された特徴ベクトルを基に、画像データに対する各変換のクラス分類を行う第1タスク用分類処理を行う(ステップS13)。 Subsequently, in the feature extraction unit 151, the feature extraction layer 1511 in which the shared parameter θ sh is set performs a feature extraction process for extracting a feature vector from the image data 311 that has undergone a plurality of conversions (step S12). The first task classification unit 152 classifies each conversion for image data based on the feature vector extracted by the feature extraction unit 151 using a plurality of pre-learning NNs corresponding to the plurality of conversions. Perform the classification process for the first task to be performed (step S13).
 第1パラメータ更新部153は、第1タスク用分類部152の各NNの分類結果から、損失及び勾配を計算する(ステップS14)。第1パラメータ更新部153は、式(1)を用いて、各NNから出力された各クラスと教師データで示されたクラスとの間の損失を、NNごとに計算する。第1パラメータ更新部153は、計算したNNごとの各損失と勾配とを基に、各NNのパラメータを逆誤差伝搬法によって更新する(ステップS15)。 The first parameter update unit 153 calculates the loss and the gradient from the classification result of each NN of the first task classification unit 152 (step S14). The first parameter update unit 153 calculates the loss between each class output from each NN and the class indicated by the teacher data for each NN using the equation (1). The first parameter update unit 153 updates the parameters of each NN by the inverse error propagation method based on the calculated loss and gradient for each NN (step S15).
 続いて、第1パラメータ更新部153は、式(2)を基に、各NNの損失及び勾配を用いて重み計算を行う(ステップS16)。この際、第1パラメータ更新部153は、Frank-Wolfe法を用いる。そして、第1パラメータ更新部153は、特徴抽出層1511の共有パラメータ、各NNのパラメータの全体損失と勾配とを計算し(ステップS17)、全体損失の逆誤差伝播法による共有パラメータθshの更新を行う(ステップS18)。 Subsequently, the first parameter update unit 153 performs a weight calculation based on the equation (2) using the loss and the gradient of each NN (step S16). At this time, the first parameter update unit 153 uses the Frank-Wolfe method. Then, the first parameter update unit 153 calculates the shared parameter of the feature extraction layer 1511 and the total loss and gradient of the parameters of each NN (step S17), and updates the shared parameter θ sh by the inverse error propagation method of the total loss. (Step S18).
 第1パラメータ更新部153は、学習ステップ数が最大学習ステップ数より小であるか否かを判定する(ステップS19)。学習ステップ数が最大学習ステップ数より小である場合(ステップS19:Yes)、学習ステップに1を加算し、第1パラメータ更新部153は、ステップS11に戻り、次の学習用の学習データに対する第1タスクを実行する。 The first parameter update unit 153 determines whether or not the number of learning steps is smaller than the maximum number of learning steps (step S19). When the number of learning steps is smaller than the maximum number of learning steps (step S19: Yes), 1 is added to the learning steps, and the first parameter update unit 153 returns to step S11 and returns to step S11 to obtain the first learning data for the next learning. Execute one task.
 一方、学習ステップ数が最大学習ステップ数より小でない場合(ステップS19:No)、すなわち、学習ステップ数が最大学習ステップ数に達した場合、パラメータを更新したDNNモデル(第1学習済みDNN16)を第2タスク学習部13に出力する。そして、第2タスク学習部13は、第1学習済みDNN16を流用し、第2タスク(画像データの分類)の学習を行う(ステップS20)。第2タスク学習部13は、学習によってパラメータを更新したDNN(第2学習済みDNN17)を画像処理装置20に出力する。なお、第2タスク学習部13の学習は、第2タスクの学習として一般的な方法を用いればよい。本実施の形態においては、第2タスクをクラス分類として説明しているため、ニューラルネットワークにおいて、クラス分類を学習する際に、一般的な方法を用いればよい。 On the other hand, when the number of learning steps is not smaller than the maximum number of learning steps (step S19: No), that is, when the number of learning steps reaches the maximum number of learning steps, the DNN model with updated parameters (first trained DNN16) is used. Output to the second task learning unit 13. Then, the second task learning unit 13 uses the first learned DNN 16 to learn the second task (classification of image data) (step S20). The second task learning unit 13 outputs the DNN (second learned DNN 17) whose parameters have been updated by learning to the image processing device 20. The learning of the second task learning unit 13 may be performed by using a general method for learning the second task. In the present embodiment, since the second task is described as the classification, a general method may be used when learning the classification in the neural network.
[評価実験]
 第2タスクを画像のクラス分類として評価実験を行い、本実施の形態に係る事前学習方法を用いて訓練したDNNモデルと、非特許文献1に記載の事前学習方法を用いて訓練したDNNとの、第2タスクにおける各分類精度を求めた。
[Evaluation experiment]
An evaluation experiment was conducted with the second task as an image classification, and a DNN model trained using the pre-learning method according to the present embodiment and a DNN trained using the pre-learning method described in Non-Patent Document 1. , The accuracy of each classification in the second task was obtained.
 実験設定を説明する。実験では画像データセットとして、教師あり学習用データセットであり、画像と、教師情報との対からなるデータセット:CIFAR-100(学習用が50000枚、テスト用が10000枚からなる教師情報付きのデータセット)を用いる。第2タスクは、このCIFAR-100の画像データを用い、100クラスのクラス分類を行うものとする。第1タスク、第2タスク共に、学習時には学習用のデータセットのうち45000枚の画像データを使用し、検証データセットとして、クラスが均等に分割された5000枚の画像データを使用してクロスバリデーションを行う。なお、第1タスク学習時には、データセットの教師情報は用いない。クラス分類を行う第2タスク(画像解析)のテスト時には、テスト用データセットの10000枚の画像データを使用する。また、特徴抽出部151及び特徴抽出部131における特徴抽出には、Wide-Resnet-40-10 (参考文献3参照)を使用する。
参考文献3:Sergey Zagoruyko and Nikos Komodakis. “Wide Residual Networks”,arXiv preprint arXiv:1605.07146 (BMVC 2016).
The experimental settings will be described. In the experiment, as an image data set, it is a data set for supervised learning, and a data set consisting of a pair of images and teacher information: CIFAR-100 (with teacher information consisting of 50,000 for learning and 10,000 for testing) Data set) is used. The second task is to classify 100 classes using the image data of CIFAR-100. For both the first task and the second task, 45,000 image data from the learning data set are used during learning, and as a validation data set, 5,000 image data in which the classes are evenly divided are used for cross-validation. I do. At the time of learning the first task, the teacher information of the data set is not used. At the time of the test of the second task (image analysis) for classifying, 10000 image data of the test data set is used. Further, Wide-Resnet-40-10 (see Reference 3) is used for feature extraction in the feature extraction unit 151 and the feature extraction unit 131.
Reference 3: Sergey Zagoruyko and Nikos Komodakis. “Wide Residual Networks”, arXiv preprint arXiv: 1605.07146 (BMVC 2016).
 続いて、実験手順を説明する。評価実験では、第1タスクとして、特徴抽出部151の特徴抽出層に学習(100epoch)させる。第1タスクにおいて、本実施の形態に係る学習方法では、Rotation、Sharpness、Solarizeの変換を行なった画像データを用いてDNNモデルの事前学習を行う。また、非特許文献1に記載の学習方法では、Rotationのみを行なった画像データを用いてDNNモデルの事前学習を行う。そして、学習済み特徴抽出層のパラメータをそれぞれ固定し、第2タスクの学習として、Linear-Regressionモデルでクラス分類を学習(5000iteration)させる。そして、第2タスクでは、学習済みLinear-Regressionモデルでテスト分類精度(Top-1 Accuracy)を計測する。第2タスクはそれぞれ3回施行し、平均と標準偏差を求めた。 Next, the experimental procedure will be explained. In the evaluation experiment, as the first task, the feature extraction layer of the feature extraction unit 151 is trained (100 epoch). In the first task, in the learning method according to the present embodiment, the DNN model is pre-learned using the image data obtained by converting Rotation, Sharpness, and Solarize. Further, in the learning method described in Non-Patent Document 1, the DNN model is pre-learned using the image data obtained only by rotation. Then, the parameters of the trained feature extraction layer are fixed, and the classification is trained (5000iteration) by the Linear-Regression model as the learning of the second task. Then, in the second task, the test classification accuracy (Top-1 Accuracy) is measured by the trained Linear-Regression model. The second task was performed 3 times each, and the mean and standard deviation were calculated.
 図8は、実施の形態に係る学習方法を用いて訓練したDNNモデルと、非特許文献1に記載の学習方法とを用いて訓練したDNNとの各分類精度を示す図である。図8に示すように、非特許文献1に記載の学習方法を用いて訓練したDNNモデルの分類精度(図8のRotation参照)は、42.97%であった。これに対し、実施の形態に係る学習方法を用いて訓練したDNNモデルの分類精度(図8のRotation+Solarize+Sharpness参照)は、48.88%であった。したがって、実施の形態によれば、従来方法を上回る分類精度を達成できることが確認できた。本実施の形態では、Rotationに加え、他の変換予測を加えることで、さらに頑健な分類器を実現することができる。 FIG. 8 is a diagram showing the accuracy of each classification between the DNN model trained using the learning method according to the embodiment and the DNN trained using the learning method described in Non-Patent Document 1. As shown in FIG. 8, the classification accuracy of the DNN model trained using the learning method described in Non-Patent Document 1 (see Rotation in FIG. 8) was 42.97%. On the other hand, the classification accuracy of the DNN model trained using the learning method according to the embodiment (see Rotation + Solarize + Sharpness in FIG. 8) was 48.88%. Therefore, according to the embodiment, it was confirmed that the classification accuracy higher than that of the conventional method can be achieved. In the present embodiment, a more robust classifier can be realized by adding other conversion predictions in addition to Rotation.
[実施の形態の効果]
 従来の方法では、事前タスク(第1タスク)として、Rotationのみの自己教師学習を行っていたため、回転角度を簡単に予測可能なデータ構造を持つデータセットや回転が定義できないデータセットに対して後段タスク(第2タスク)の予測性能向上効果が小さいという課題があった。
[Effect of Embodiment]
In the conventional method, self-supervised learning is performed only for Rotation as a pre-task (first task), so the latter stage for a dataset with a data structure that can easily predict the rotation angle or a dataset for which rotation cannot be defined. There is a problem that the effect of improving the prediction performance of the task (second task) is small.
 これに対し、本実施の形態は、第1タスクとして、一つの画像データに対して複数の種別の変換を施し、自己教師学習をマルチタスク学習で行うことによって、幾何特徴だけでない色調やエッジなどに関する変換内容の各種特徴を獲得することができる。そして、本実施の形態では、第1タスクとして、Rotationに加えて有益な特徴を獲得可能なデータ変換を組み合わせた自己教師学習を行うため、単一の変換のみの場合と比較して、第2タスクでの精度を高くすることができる。したがって、本実施の形態によれば、画像の特徴を捉えて画像処理の精度を高めることができる。また、本実施の形態では、従来の自己教師学習とほぼ同じスキームでの学習が可能であるため、追加実装が容易である。 On the other hand, in the present embodiment, as the first task, a plurality of types of conversions are performed on one image data, and self-supervised learning is performed by multitask learning, so that not only geometric features but also color tones and edges are formed. It is possible to acquire various characteristics of the conversion contents related to. Then, in the present embodiment, as the first task, self-supervised learning that combines Rotation and data transformation that can acquire useful features is performed, so that the second task is compared with the case of only a single transformation. The accuracy of the task can be improved. Therefore, according to the present embodiment, it is possible to improve the accuracy of image processing by capturing the features of the image. Further, in the present embodiment, since learning can be performed by a scheme almost the same as that of the conventional self-supervised learning, additional implementation is easy.
 本実施の形態では、第1タスクにおいて、1つの画像データに対して多重に変換をかけるため、データ拡張と類似した効果が得られる。具体的には、本実施の形態によれば、それぞれ一つずつ変換をかけた入力を使用するもよりも精度が高くなるという効果が得られる。また、本実施の形態では、第1タスク用分類部152への特徴抽出部151からの入力を共通にすることによって、勾配更新の際に近似式を利用することが利用可能となり、計算効率が数十倍改善する。 In the present embodiment, since one image data is subjected to multiple conversions in the first task, an effect similar to data expansion can be obtained. Specifically, according to the present embodiment, it is possible to obtain an effect that the accuracy is higher than that of using the input obtained by converting each one by one. Further, in the present embodiment, by sharing the input from the feature extraction unit 151 to the classification unit 152 for the first task, it becomes possible to use the approximate expression when updating the gradient, and the calculation efficiency is improved. Dozens of times better.
 本実施の形態では、第1タスクにおいて、画像データに対する変換として、Rotationに限らず、ShearX、ShearY、Solarize、Brightness、Color、Contrast、Sharpnessから、第2タスクの処理内容に合わせて、二以上の変換を選択することも可能である。本実施の形態として、第2タスクがクラス分類の場合について説明したが、第2タスクは、これに限らず、物体検出やセグメンテーションなど、画像を入力として何らかの出力を行うタスクであればよい。例えば、本実施の形態では、第2タスクがセグメンテーションの場合、第1タスクにおいて、入力された画像データに対しSolarize、Color、Sharpnessの順で変換を行なう。このように、実施の形態では、性質が異なるもの変換を組み合わせて変換内容を多方面から学習することで、高い効果が期待できる。 In the present embodiment, in the first task, as conversion to the image data, not only Rotation but also ShearX, ShearY, Solarize, Brightness, Color, Contrast, Sharpness, two or more according to the processing contents of the second task. It is also possible to choose a conversion. As the present embodiment, the case where the second task is class classification has been described, but the second task is not limited to this, and may be any task such as object detection or segmentation that outputs some kind of output by inputting an image. For example, in the present embodiment, when the second task is segmentation, in the first task, the input image data is converted in the order of Solarize, Color, and Sharpness. As described above, in the embodiment, a high effect can be expected by learning the conversion contents from various aspects by combining conversions having different properties.
[システム構成等]
 また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、CPU及び当該CPUにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。
[System configuration, etc.]
Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed in arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.
 また、本実施の形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。すなわち、上記学習方法及び音声認識方法において説明した処理は、記載の順にしたがって時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 Further, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed. It is also possible to automatically perform all or part of the above by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. That is, the processes described in the learning method and the speech recognition method are not only executed in chronological order according to the order of description, but also executed in parallel or individually as required by the processing capacity of the device that executes the processes. You may.
[プログラム]
 図9は、プログラムが実行されることにより、学習装置10或いは画像処理装置20が実現されるコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010、CPU1020を有する。また、コンピュータ1000は、ハードディスクドライブインタフェース1030、ディスクドライブインタフェース1040、シリアルポートインタフェース1050、ビデオアダプタ1060、ネットワークインタフェース1070を有する。これらの各部は、バス1080によって接続される。
[program]
FIG. 9 is a diagram showing an example of a computer in which the learning device 10 or the image processing device 20 is realized by executing the program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.
 メモリ1010は、ROM1011及びRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1031に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1100に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ1100に挿入される。シリアルポートインタフェース1050は、例えばマウス1110、キーボード1120に接続される。ビデオアダプタ1060は、例えばディスプレイ1130に接続される。 Memory 1010 includes ROM 1011 and RAM 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, the display 1130.
 ハードディスクドライブ1031は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093、プログラムデータ1094を記憶する。すなわち、学習装置10或いは画像処理装置20の各処理を規定するプログラムは、コンピュータ1000により実行可能なコードが記述されたプログラムモジュール1093として実装される。プログラムモジュール1093は、例えばハードディスクドライブ1031に記憶される。例えば、学習装置10或いは画像処理装置20における機能構成と同様の処理を実行するためのプログラムモジュール1093が、ハードディスクドライブ1031に記憶される。なお、ハードディスクドライブ1031は、SSD(Solid State Drive)により代替されてもよい。 The hard disk drive 1031 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the program that defines each process of the learning device 10 or the image processing device 20 is implemented as a program module 1093 in which a code that can be executed by the computer 1000 is described. The program module 1093 is stored in, for example, the hard disk drive 1031. For example, a program module 1093 for executing processing similar to the functional configuration in the learning device 10 or the image processing device 20 is stored in the hard disk drive 1031. The hard disk drive 1031 may be replaced by an SSD (Solid State Drive).
 また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ1094として、例えばメモリ1010やハードディスクドライブ1031に記憶される。そして、CPU1020が、メモリ1010やハードディスクドライブ1031に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して実行する。 Further, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1031 into the RAM 1012 as needed, and executes the program.
 なお、プログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1031に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ1100等を介してCPU1020によって読み出されてもよい。あるいは、プログラムモジュール1093及びプログラムデータ1094は、ネットワーク(LAN(Local Area Network)、WAN(Wide Area Network)等)を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール1093及びプログラムデータ1094は、他のコンピュータから、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to the case where they are stored in the hard disk drive 1031, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.
 以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等は全て本発明の範疇に含まれる。 Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the description and the drawings which form a part of the disclosure of the present invention according to the present embodiment. That is, other embodiments, examples, operational techniques, and the like made by those skilled in the art based on the present embodiment are all included in the scope of the present invention.
 1 画像処理システム
 10 学習装置
 11 自己教師学習部
 13 第2タスク学習部
 14 学習データ生成部
 15 第1タスク学習部
 16 第1学習済みDNN
 17 第2学習済みDNN
 20 画像処理装置
 21 解析部
 31,41 画像データ
 32 自己教師データ
 42 教師データ
 131,151 特徴抽出部
 132 第2タスク用分類部
 133 第2パラメータ更新部
 141 自己教師データ生成部
 142 データ変換部
 152 第1タスク用分類部
 153 第1パラメータ更新部
1 Image processing system 10 Learning device 11 Self-teacher learning unit 13 Second task learning unit 14 Learning data generation unit 15 First task learning unit 16 First learned DNN
17 Second learned DNN
20 Image processing device 21 Analysis unit 31, 41 Image data 32 Self-teacher data 42 Teacher data 131, 151 Feature extraction unit 132 Second task classification unit 133 Second parameter update unit 141 Self-teacher data generation unit 142 Data conversion unit 152 Classification unit for 1 task 153 1st parameter update unit

Claims (8)

  1.  複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、前記変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、前記第1ニューラルネットワークの特徴抽出層の共有パラメータを含むパラメータを更新する第1学習部と、
     任意の画像データを入力とし、前記第1学習部によって学習済みの前記共有パラメータを特徴抽出層に適用した第2ニューラルネットワークを用いて、所定処理を行うよう前記第2ニューラルネットワークのパラメータを学習する第2学習部と、
     を有し、
     前記第1ニューラルネットワークは、前記特徴抽出層と、前記複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、
     前記複数の事前学習用ニューラルネットワークは、前記特徴抽出層を共有し、前記変換種別ごとの変換内容を推定することを特徴とする学習装置。
    Features of the first neural network by inputting image data that has undergone a plurality of conversions and performing multitask learning so as to estimate the conversion contents for each conversion type of the conversions using the first neural network. The first learning part that updates the parameters including the shared parameters of the extraction layer,
    The parameters of the second neural network are learned so as to perform predetermined processing by using the second neural network in which arbitrary image data is input and the shared parameters learned by the first learning unit are applied to the feature extraction layer. Second learning department and
    Have,
    The first neural network has the feature extraction layer and a plurality of pre-learning neural networks corresponding to the conversion types of the plurality of transformations.
    The plurality of pre-learning neural networks are learning devices that share the feature extraction layer and estimate the conversion content for each conversion type.
  2.  前記第1学習部は、
     前記入力された画像データに対し複数の変換を行なうとともに、前記変換の変換種別ごとに変換内容に応じた自己教師データを生成する変換部と、
     前記特徴抽出層を有し、前記共有パラメータに基づき、前記複数の変換が行なわれた画像データから特徴を抽出する特徴抽出部と、
     前記複数の事前学習用ニューラルネットワークを有し、前記特徴抽出部によって抽出された前記特徴を基に、前記画像データに対する前記変換種別ごとの変換内容を推定する推定部と、
     前記推定部が推定した前記変換内容と前記自己教師データとを基に、前記第1ニューラルネットワークに対し前記マルチタスク学習を実行し、前記共有パラメータを更新する第1パラメータ更新部と、
     を有することを特徴とする請求項1に記載の学習装置。
    The first learning unit
    A conversion unit that performs a plurality of conversions on the input image data and generates self-teacher data according to the conversion contents for each conversion type of the conversion.
    A feature extraction unit having the feature extraction layer and extracting features from the image data subjected to the plurality of conversions based on the shared parameters.
    An estimation unit that has the plurality of pre-learning neural networks and estimates the conversion content for each conversion type for the image data based on the features extracted by the feature extraction unit.
    Based on the conversion content estimated by the estimation unit and the self-teacher data, the first parameter update unit that executes the multitask learning on the first neural network and updates the shared parameter,
    The learning device according to claim 1, wherein the learning device has.
  3.  前記変換部は、前記入力された画像データに対し、変換の性質が異なる前記複数の変換を行なうことを特徴とする請求項2に記載の学習装置。 The learning device according to claim 2, wherein the conversion unit performs the plurality of conversions having different conversion properties on the input image data.
  4.  前記変換部は、前記入力された画像データに対し、前記所定処理の内容に合わせて、実行する前記複数の変換を設定することを特徴とする請求項2または3に記載の学習装置。 The learning device according to claim 2 or 3, wherein the conversion unit sets the plurality of conversions to be executed according to the content of the predetermined processing with respect to the input image data.
  5.  前記変換種別は、回転、水平方向または垂直方向の湾曲、反転、輝度調整、色調調整、コントラスト調整、または、輪郭鮮明化のいずれか二以上であることを特徴とする請求項1~4のいずれか一つに記載の学習装置。 Any of claims 1 to 4, wherein the conversion type is any two or more of rotation, horizontal or vertical curvature, inversion, brightness adjustment, color tone adjustment, contrast adjustment, and contour sharpening. The learning device described in one.
  6.  学習済みのパラメータが設定されたニューラルネットワークを有するモデルを用いて、入力された画像データに対して所定処理を行う処理部
     を有し、
     前記学習済みのパラメータは、複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、前記変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで更新された、前記第1ニューラルネットワークの特徴抽出層の共有パラメータ基づき、
     前記第1ニューラルネットワークは、前記特徴抽出層と、前記複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、
     前記複数の事前学習用ニューラルネットワークは、前記特徴抽出層を共有し、前記変換種別ごとの変換内容を推定することを特徴とする画像処理装置。
    It has a processing unit that performs predetermined processing on the input image data using a model having a neural network in which trained parameters are set.
    The trained parameters are updated by inputting image data in which a plurality of conversions have been performed and performing multitask learning so as to estimate the conversion contents for each conversion type of the conversions using the first neural network. Based on the shared parameters of the feature extraction layer of the first neural network
    The first neural network has the feature extraction layer and a plurality of pre-learning neural networks corresponding to the conversion types of the plurality of transformations.
    The plurality of pre-learning neural networks are image processing devices that share the feature extraction layer and estimate the conversion content for each conversion type.
  7.  学習装置が実行する学習方法であって、
     複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、前記変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、前記第1ニューラルネットワークの特徴抽出層の共有パラメータを含むパラメータを更新する第1学習工程と、
     任意の画像データを入力とし、前記第1学習工程において学習済みの前記共有パラメータを特徴抽出層に適用した第2ニューラルネットワークを用いて、所定処理を行うよう前記第2ニューラルネットワークのパラメータを学習する第2学習工程と、
     を含み、
     前記第1ニューラルネットワークは、前記特徴抽出層と、前記複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、
     前記複数の事前学習用ニューラルネットワークは、前記特徴抽出層を共有し、前記変換種別ごとの変換内容を推定することを特徴とする学習方法。
    It is a learning method executed by the learning device.
    Features of the first neural network by inputting image data that has undergone a plurality of conversions and performing multitask learning so as to estimate the conversion contents for each conversion type of the conversions using the first neural network. The first learning step to update the parameters including the shared parameters of the extraction layer,
    The parameters of the second neural network are learned so as to perform predetermined processing by using the second neural network in which arbitrary image data is input and the shared parameters learned in the first learning step are applied to the feature extraction layer. The second learning process and
    Including
    The first neural network has the feature extraction layer and a plurality of pre-learning neural networks corresponding to the conversion types of the plurality of transformations.
    A learning method characterized in that the plurality of pre-learning neural networks share the feature extraction layer and estimate the conversion content for each conversion type.
  8.  複数の変換が行なわれた画像データを入力とし、第1ニューラルネットワークを用いて、前記変換の変換種別ごとの変換内容をそれぞれ推定するようマルチタスク学習を行うことで、前記第1ニューラルネットワークの特徴抽出層の共有パラメータを含むパラメータを更新する第1学習ステップと、
     任意の画像データを入力とし、前記第1学習ステップにおいて学習済みの前記共有パラメータを特徴抽出層に適用した第2ニューラルネットワークを用いて、所定処理を行うよう前記第2ニューラルネットワークのパラメータを学習する第2学習ステップと、
     をコンピュータに実行させ、
     前記第1ニューラルネットワークは、前記特徴抽出層と、前記複数の変換の変換種別にそれぞれ対応した複数の事前学習用ニューラルネットワークとを有し、
     前記複数の事前学習用ニューラルネットワークは、前記特徴抽出層を共有し、前記変換種別ごとの変換内容を推定するための学習プログラム。
    Features of the first neural network by inputting image data that has undergone a plurality of conversions and performing multitask learning so as to estimate the conversion contents for each conversion type of the conversions using the first neural network. The first learning step to update the parameters, including the shared parameters of the extraction layer,
    The parameters of the second neural network are learned so as to perform predetermined processing by using the second neural network in which arbitrary image data is input and the shared parameters learned in the first learning step are applied to the feature extraction layer. The second learning step and
    Let the computer run
    The first neural network has the feature extraction layer and a plurality of pre-learning neural networks corresponding to the conversion types of the plurality of transformations.
    The plurality of pre-learning neural networks share the feature extraction layer, and are learning programs for estimating the conversion contents for each conversion type.
PCT/JP2019/037552 2019-09-25 2019-09-25 Learning device, image processing device, learning method, and learning program WO2021059388A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/037552 WO2021059388A1 (en) 2019-09-25 2019-09-25 Learning device, image processing device, learning method, and learning program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/037552 WO2021059388A1 (en) 2019-09-25 2019-09-25 Learning device, image processing device, learning method, and learning program

Publications (1)

Publication Number Publication Date
WO2021059388A1 true WO2021059388A1 (en) 2021-04-01

Family

ID=75164884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/037552 WO2021059388A1 (en) 2019-09-25 2019-09-25 Learning device, image processing device, learning method, and learning program

Country Status (1)

Country Link
WO (1) WO2021059388A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022249415A1 (en) * 2021-05-27 2022-12-01 日本電信電話株式会社 Information provision device, information provision method, and information provision program
WO2023000872A1 (en) * 2021-07-22 2023-01-26 腾讯科技(深圳)有限公司 Supervised learning method and apparatus for image features, device, and storage medium
WO2023139760A1 (en) * 2022-01-21 2023-07-27 日本電気株式会社 Data augmentation device, data augmentation method, and non-transitory computer-readable medium
WO2023181222A1 (en) * 2022-03-23 2023-09-28 日本電信電話株式会社 Training device, training method, and training program
WO2023238258A1 (en) * 2022-06-07 2023-12-14 日本電信電話株式会社 Information provision device, information provision method, and information provision program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002300417A (en) * 2001-03-30 2002-10-11 Fuji Photo Film Co Ltd Method for setting image processing conditions and image processor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002300417A (en) * 2001-03-30 2002-10-11 Fuji Photo Film Co Ltd Method for setting image processing conditions and image processor

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022249415A1 (en) * 2021-05-27 2022-12-01 日本電信電話株式会社 Information provision device, information provision method, and information provision program
WO2023000872A1 (en) * 2021-07-22 2023-01-26 腾讯科技(深圳)有限公司 Supervised learning method and apparatus for image features, device, and storage medium
WO2023139760A1 (en) * 2022-01-21 2023-07-27 日本電気株式会社 Data augmentation device, data augmentation method, and non-transitory computer-readable medium
WO2023181222A1 (en) * 2022-03-23 2023-09-28 日本電信電話株式会社 Training device, training method, and training program
WO2023238258A1 (en) * 2022-06-07 2023-12-14 日本電信電話株式会社 Information provision device, information provision method, and information provision program

Similar Documents

Publication Publication Date Title
WO2021059388A1 (en) Learning device, image processing device, learning method, and learning program
US10289909B2 (en) Conditional adaptation network for image classification
US10909455B2 (en) Information processing apparatus using multi-layer neural network and method therefor
US8331655B2 (en) Learning apparatus for pattern detector, learning method and computer-readable storage medium
US11270124B1 (en) Temporal bottleneck attention architecture for video action recognition
CN111507993A (en) Image segmentation method and device based on generation countermeasure network and storage medium
EP3664019A1 (en) Information processing device, information processing program, and information processing method
US9443287B2 (en) Image processing method and apparatus using trained dictionary
JP6943291B2 (en) Learning device, learning method, and program
US20220092407A1 (en) Transfer learning with machine learning systems
WO2020260862A1 (en) Facial behaviour analysis
CN113128478B (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN109242097B (en) Visual representation learning system and method for unsupervised learning
US20190362226A1 (en) Facilitate Transfer Learning Through Image Transformation
EP4200762A1 (en) Method and system for training a neural network model using gradual knowledge distillation
Wang et al. Jpeg artifacts removal via contrastive representation learning
Radman et al. BiLSTM regression model for face sketch synthesis using sequential patterns
JPWO2016125500A1 (en) Feature conversion device, recognition device, feature conversion method, and computer-readable recording medium
WO2017188048A1 (en) Preparation apparatus, preparation program, and preparation method
JP2010009517A (en) Learning equipment, learning method and program for pattern detection device
CN114492581A (en) Method for classifying small sample pictures based on transfer learning and attention mechanism element learning application
CN113723587A (en) Differential learning for learning networks
CN110717402B (en) Pedestrian re-identification method based on hierarchical optimization metric learning
JP2010086466A (en) Data classification device and program
US20230073175A1 (en) Method and system for processing image based on weighted multiple kernels

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946382

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946382

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP