WO2023124278A1 - 图像处理模型的训练方法、图像分类方法及装置 - Google Patents
图像处理模型的训练方法、图像分类方法及装置 Download PDFInfo
- Publication number
- WO2023124278A1 WO2023124278A1 PCT/CN2022/120011 CN2022120011W WO2023124278A1 WO 2023124278 A1 WO2023124278 A1 WO 2023124278A1 CN 2022120011 W CN2022120011 W CN 2022120011W WO 2023124278 A1 WO2023124278 A1 WO 2023124278A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- training
- image processing
- original
- sample pairs
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 230
- 238000012545 processing Methods 0.000 title claims abstract description 194
- 238000000034 method Methods 0.000 title claims abstract description 91
- 238000013145 classification model Methods 0.000 claims description 63
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 239000013598 vector Substances 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 6
- 241000894007 species Species 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000009336 multiple cropping Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000015219 food category Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Definitions
- the present application relates to the field of machine learning, in particular to an image processing model training method, image classification method and device.
- a large number of training samples can be used to train an image processing model to ensure better performance of the trained image processing model.
- the present application provides an image processing model training method, image classification method and device. Described technical scheme is as follows:
- a method for training an image processing model comprising:
- each of the original image sets includes multiple original images of the same category, and the original images included in different original image sets are of different categories;
- the training sample set includes a plurality of training samples, each of the training samples is an original image, or for an original image
- each of the positive sample pairs includes two training samples obtained based on different original images in the same original image set, and each of the negative sample pairs
- the sample pair includes two training samples based on original images in different original image sets;
- An image processing model is trained by using the plurality of positive sample pairs and the plurality of negative sample pairs.
- each original image used for cropping in the plurality of original image sets randomly generate a cropping size within the target size range; determine the size of the cropping area based on the size of the original image and the cropping size
- a reference point determining the cropping area in the original image based on the cropping size and the reference point, and cropping the cropping area.
- the target size range includes a width range and a height range
- the cropping size includes a width within the width range, and a height within the height range
- the cropping area is a rectangular area
- the The reference point of the clipping area is a vertex of the rectangular area, or a center point of the rectangular area.
- each of the candidate sample pairs includes two training samples obtained based on different original images in the same original image set; determine each of the The similarity of the candidate sample pair; the candidate sample pair whose similarity is greater than the similarity threshold is determined as a positive sample pair.
- a convolutional neural network is used to extract the feature vector of each training sample in each of the candidate sample pairs; for each of the candidate sample pairs, a similarity measurement algorithm is used to process the candidate sample pairs The feature vectors of the two training samples in , get the similarity of the candidate sample pair.
- a plurality of negative sample pairs whose number is the same as the number of the plurality of positive sample pairs is determined from the training sample set.
- mark the true value of each positive sample pair as 1, and mark the true value of each negative sample pair as 0; using the marked positive sample pairs, and The marked pairs of negative samples are used to train an image processing model.
- an image classification method comprising:
- the target image is input into an image classification model to obtain the category of the target image output by the image classification model; wherein, the image classification model is trained by using the training method of the image processing model described in the above aspect.
- the target image is input into the image classification model to obtain the similarity between the target image output by the image classification model and reference images of different categories;
- the category of the reference image with the highest similarity is determined as the category of the target image.
- the target image is input into the image classification model to obtain the similarity between the target image output by the image classification model and image features of different categories;
- the category of the image feature with the highest similarity is determined as the category of the target image; wherein, the image feature of each category is obtained by performing feature extraction on a plurality of training samples of the category.
- a training device for an image processing model comprising:
- An acquisition module configured to acquire a plurality of original image sets, each of which includes multiple original images of the same category, and the categories of original images included in different original image sets are different;
- a cropping module configured to crop a plurality of original images in the plurality of original image sets to obtain a training sample set, the training sample set includes a plurality of training samples, each of which is an original image, or A sub-image obtained by cropping an original image;
- a determining module configured to determine a plurality of positive sample pairs and a plurality of negative sample pairs from the training sample set, wherein each of the positive sample pairs includes two training samples obtained based on different original images in the same original image set, Each negative sample pair includes two training samples obtained based on original images in different original image sets;
- the training module is used for training an image processing model by using the plurality of positive sample pairs and the plurality of negative sample pairs.
- the clipping module is used for:
- the target size range includes a width range and a height range
- the cropping size includes a width within the width range, and a height within the height range
- the cropping area is a rectangular area
- the The reference point of the clipping area is a vertex of the rectangular area, or a center point of the rectangular area.
- the determination module is used for:
- each of the candidate sample pairs includes two training samples obtained based on different original images in the same original image set;
- the candidate sample pairs whose similarity is greater than the similarity threshold are determined as positive sample pairs.
- the determination module is used for:
- a similarity measurement algorithm is used to process the feature vectors of two training samples in the pair of candidate samples to obtain the similarity of the pair of candidate samples.
- the determining module is configured to determine a plurality of negative sample pairs whose number is the same as the number of the plurality of positive sample pairs from the training sample set.
- the training module is used for:
- An image processing model is trained by using the multiple marked positive sample pairs and the multiple marked negative sample pairs.
- an image classification device comprising:
- Obtaining module for obtaining the target image to be classified
- a classification module configured to input the target image into an image classification model to obtain the category of the target image output by the image classification model; wherein, the image classification model adopts the training device for the image processing model described in the above aspect Get trained.
- the classification module is used for:
- the target image is input to the image classification model to obtain the similarity between the target image output by the image classification model and the reference images of different categories;
- the category of the reference image with the highest similarity to the target image among the reference images of different categories is determined as the category of the target image.
- the classification module is used for:
- the target image is input to the image classification model to obtain the similarity between the target image output by the image classification model and image features of different categories;
- the category of the image feature with the highest similarity with the target image is determined as the category of the target image; wherein, the image features of each category are a plurality of training
- the sample is obtained by feature extraction.
- an image processing device in another aspect, includes a processor and a memory, and instructions are stored in the memory, and the instructions are loaded and executed by the processor to implement the above-mentioned aspect A training method for an image processing model, or an image classification method as described above.
- a computer-readable storage medium is provided, and instructions are stored in the storage medium, and the instructions are loaded and executed by a processor to implement the training method of the image processing model as described in the above aspect, or as described above The image classification method described in the aspect.
- a computer program product includes computer instructions, the computer instructions are loaded and executed by a processor to implement the training method of the image processing model as described in the above aspect, or as described in the above aspect image classification method.
- FIG. 1 is a schematic structural diagram of a training system for an image processing model provided in an embodiment of the present application
- FIG. 2 is a schematic flowchart of a training method for an image processing model provided in an embodiment of the present application
- Fig. 3 is a schematic flowchart of another image processing model training method provided by the embodiment of the present application.
- Fig. 4 is a schematic diagram of cropping an original image provided by an embodiment of the present application.
- Fig. 5 is a schematic flow chart of an image classification method provided by an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a training device for an image processing model provided in an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of an image classification device provided in an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
- the image processing model training method provided in the embodiment of the present application can be applied to scenarios with a small number of samples (that is, small samples).
- the training method can also be called a small sample learning method.
- the goal of small sample learning is to achieve relatively good model training accuracy in the case of limited samples.
- improvements are generally made from three aspects: data (ie, training samples), models, and training algorithms.
- the training samples may be converted to obtain new training samples, thereby expanding the training sample set.
- new training samples can be obtained by converting unlabeled or non-standard samples, so as to expand the training sample set.
- the data in a similar data set of the training sample can be converted to obtain a new training sample, thereby expanding the training sample set.
- the model can be trained using multi-task learning, embedding learning and learning based on external memory.
- optimizing the training algorithm methods such as improving existing parameters, improving meta-learning parameters, or learning an optimizer can be used.
- FIG. 1 is a schematic structural diagram of an image processing model training system provided by an embodiment of the present application.
- the system includes: a server 110 and a terminal 120 .
- a wired or wireless communication connection is established between the server 110 and the terminal 120 .
- the server 110 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers.
- the terminal 120 may be a personal computer (personal computer, PC), a tablet computer, a smart phone, a wearable device, an intelligent robot, or a terminal capable of data calculation, processing, and storage.
- the terminal device 120 in the system can be used to obtain the original image set used for model training, and send the original image to the server 110 .
- the server 110 can further process the original images in the original image set, and use the processed original images as training samples to train the image processing model.
- the trained image processing model can be applied to image classification tasks, image recognition tasks, or image segmentation tasks.
- the system may be a system capable of performing specific image processing tasks, such as image classification tasks.
- the terminal device 120 in the system can be used to acquire the original image to be detected, and send the original image to be detected to the server 110 for detection.
- the image processing model that has been trained is pre-stored in the server 110 .
- the server 110 After the server 110 acquires the original image to be detected, the original image can be input into an image processing model, and the image processing model can further detect and recognize the original image, and output a detection result. Afterwards, the server 110 may send the detection result to the terminal 120 .
- FIG. 2 is a flowchart of a method for training an image processing model provided by an embodiment of the present application.
- the method can be applied to an image processing device, and the image processing device can be the server 110 shown in FIG. 1 .
- the method includes:
- Step 101 acquiring multiple original image sets.
- the image processing device can obtain multiple original image sets stored in it in advance, or can obtain multiple original image sets sent by other devices (such as terminals).
- each original image set includes multiple original images of the same category, and different original image sets include different categories of original images. It can be understood that the categories of all the original images included in each original image set are the same, that is, each original image set only includes original images of one category.
- the category of each original image in the plurality of original image sets may be manually marked, and the category may refer to the category of the main object in the original image.
- Step 102 Crop multiple original images in multiple original image sets to obtain a training sample set.
- the image processing device can crop multiple original images in each original image set to obtain a training sample set.
- the image processing device can crop each original image in each original image set.
- the training sample set obtained by cropping multiple original images by the image processing device includes multiple training samples, and each training sample is an original image, or a sub-image obtained by cropping an original image.
- the number of sub-images obtained by cropping an original image may be greater than or equal to 1.
- any two sub-images cropped from an original image have different sizes and/or positions in the original image.
- the size and/or position of each sub-image in the original image may be randomly determined by the image processing device, or may be pre-configured in the image processing device.
- the category of any sub-image obtained after cropping an original image is the same as the category of the original image.
- Step 103 determining multiple positive sample pairs and multiple negative sample pairs from the training sample set.
- each negative sample pair includes two training samples based on original images in different original image sets.
- Each positive sample pair includes two training samples based on different original images in the same original image set. That is, the image processing device can determine two training samples of the same category but from different original images as a positive sample pair, and can determine two training samples of different categories as a negative sample pair.
- the two training samples in each positive sample pair have the same category, it can be ensured that the two training samples in each positive sample pair include some or all image features of the subject object of the same category. Thus, it can be ensured that after the image processing model is trained based on the positive sample pair, the image processing model can accurately learn the features of the subject object of this category.
- Step 104 using multiple positive sample pairs and multiple negative sample pairs to train an image processing model.
- an initial image processing model is pre-stored in the image processing device.
- the image processing model may be a convolutional neural network (convolutional neural networks, CNN) model.
- CNN convolutional neural networks
- the image processing model may be trained by using the plurality of positive sample pairs and the same number of negative sample pairs as the positive samples.
- the image processing device may stop training the image processing model when the precision of the image processing model reaches a preset precision, or the number of training rounds of the image processing model reaches a preset number of rounds.
- the two training samples in each positive sample pair used to train the image processing model in the training sample set have the same category, while the negative sample pair
- the categories of the two training samples in are different, so it can be ensured that the trained image processing model can better learn the features of images of different categories, that is, the features of different categories of subject objects.
- the embodiment of the present application provides a method for training an image processing model.
- the training method can effectively expand the number of training samples by cropping a plurality of original images, so as to ensure a better effect of the image processing model obtained through training.
- the categories of the two training samples in each positive sample pair for training the image processing model are the same, while the categories of the two training samples in the negative sample pair are different.
- the trained image processing model can better learn the features of images of different categories, thereby further improving the effect of the image processing model.
- FIG. 3 is a flow chart of another image processing model training method provided by an embodiment of the present application.
- the method can be applied to an image processing device, and the image processing device can be the server 110 shown in FIG. 1 .
- the method includes:
- Step 201 Acquire multiple original image sets.
- the image processing device may acquire multiple original image sets stored in it in advance, or may acquire multiple original image sets sent by other devices (such as terminals).
- each original image set includes multiple original images of the same category, and different original image sets include different categories of original images.
- the category of each original image in the plurality of original image sets may be manually marked, and the category may refer to the category of the main object in the original image.
- the number of original images included in different original image sets may be the same or different.
- the category of the original image may be an animal category, or may be a plant category, or may also be a food category, or may also be a furniture category.
- This embodiment of the present application does not limit it.
- the image processing model to be trained is a model for identifying species categories of animals
- multiple original image sets of different species may be pre-stored in the image processing device.
- the original image set of each species includes multiple original images of the species.
- Step 202 for each original image used for cropping in the plurality of original image sets, randomly generate a cropping size within the target size range.
- the image processing device After the image processing device acquires multiple original image sets, it can randomly generate a cropping size within the target size range for each original image used for cropping in the multiple original image sets, so that based on each generated cropping size , crop a sub-image from an original image.
- the target size range may be pre-stored by the image processing device, and the target size range may be determined based on the size of an original image. For example, the upper limit of the target size range may be equal to the size of an original image.
- the cropped area may be a rectangular area.
- the target size range may include a width range and a height range, and the crop size includes a width within the width range and a height within the height range.
- the clipping area may also be an area of other shapes, and correspondingly, the target size range may include ranges of other parameters.
- the target size range may be a range of radius or diameter.
- a certain original image set D includes K original images, and for the kth (k is an integer not greater than K) original image I k in the K original images, the image processing device can randomly generate T cropping sizes, namely The image processing device may perform T (T is an integer greater than 1) times of cropping on the k-th original image I k to obtain T sub-images.
- the width w t of the t-th cropping size (t is an integer not greater than T) among the T cropping sizes satisfies: W max ⁇ w t ⁇ W min
- the height h t satisfies: H max ⁇ h t ⁇ H min .
- Step 203 Determine the reference point of the cropping area based on the size of the original image and the cropping size.
- the image processing device may also determine the reference point of the cropping area based on the size of the original image to be cropped and a randomly generated cropping size.
- the reference point is used to determine the position of the cropping area in the original image to be cropped.
- the reference point determined by the image processing device needs to make the cropped area be located in the original image to be cropped.
- the clipping area may be a rectangular area, and the reference point of the clipping area may be a vertex of the rectangular area (for example, the upper left vertex of the rectangular area), or a center point of the rectangular area.
- the reference point of the clipping area may be the center of the circle.
- Step 204 Determine a cropping area in the original image based on the cropping size and the reference point, and crop the cropping area.
- the image processing device can determine a cropping area in the original image based on the randomly determined cropping size and the determined reference point, and then can crop the cropped area to obtain a sub image.
- the image processing device may determine multiple cropping regions based on the methods shown in steps 202 to 204 above.
- the multiple cropping regions are different in size and/or position in the original image.
- the image processing device crops each original image, multiple sub-images can be obtained. It can be understood that the image processing device can use each original image as a training sample, and can use each sub-image as a training sample.
- the number of sub-images obtained by cropping different original images by the image processing device may be the same or different. Assuming that the image processing device crops T sub-images in each original image, the image processing device can generate T+1 training samples based on each original image.
- the image processing device may determine six cropping regions a1 to a6 in the original image. After the image processing device crops the 6 cropping regions, 6 sub-images can be obtained.
- Step 205 determining a plurality of candidate sample pairs from the training sample set.
- a training sample set can be obtained.
- the training sample set includes a plurality of training samples, wherein each training sample is an original image, or a sub-image obtained by cropping an original image.
- the image processing device may determine a plurality of candidate sample pairs from the training sample set, wherein each candidate sample pair includes two training samples obtained based on different original images in the same original image set. That is, the two training samples included in each candidate sample pair are obtained based on two original images of the same category.
- Step 206 using CNN to extract the feature vector of each training sample in each candidate sample pair.
- each training sample in each candidate sample pair can be input into the CNN.
- CNN can perform feature extraction on the input training samples and calculate the feature vector of each training sample.
- the image processing device can first use a large number of labeled image data sets, such as image network (ImageNet) data sets, to train an initial CNN, so that the CNN can better extract training samples. features, and has a certain image classification ability.
- the basic structure of the CNN may include a convolutional layer, a pooling layer, and a fully connected layer.
- convolutional layers and pooling layers are alternately distributed.
- the convolution layer can extract the features of the training samples through convolution calculation, and the pooling layer can down-sample the training samples input to the CNN model, that is, shrink the training samples while retaining important information in the training samples.
- Fully connected layers classify images based on the image features determined by the convolutional layers.
- the fully connected layer of the CNN can be removed.
- the image processing device may input each training sample in each candidate sample pair to the CNN after removing the fully connected layer.
- CNN can perform feature extraction on the input training samples and calculate the feature vector of each training sample.
- Step 207 for each candidate sample pair, use a similarity measurement algorithm to process the feature vectors of the two training samples in the candidate sample pair to obtain the similarity of the candidate sample pair.
- the similarity measurement algorithm may include: algorithms such as cosine distance (also called cosine similarity), Euclidean metric (also called Euclidean distance) and Patacharya distance (also called Bhattacharyya distance) .
- Step 208 determining candidate sample pairs whose similarity is greater than the similarity threshold as positive sample pairs.
- the image processing device may determine a candidate sample pair whose similarity is greater than a similarity threshold among the plurality of candidate sample pairs as a positive sample pair.
- the similarity threshold may be a fixed value preconfigured in the image processing device.
- the image processing device may use a clustering (for example, K-means clustering) algorithm to cluster the multiple candidate sample pairs according to the similarity of each candidate sample pair.
- a clustering for example, K-means clustering
- the image processing device may cluster the plurality of candidate sample pairs into two categories, and determine a category of candidate sample pairs with higher similarity as positive sample pairs.
- the image features contained in the two training samples may be quite different.
- at least one of the two training samples may not contain the image features of the subject of this category, or, if the two training samples may contain image features of different parts of the subject.
- the two positive sample pairs in each positive sample pair determined by the image processing device can be The similarity of the two training samples is relatively high, that is, the probability that the two training samples both include the image features of the subject object of the same category is relatively high.
- using the multiple positive sample pairs to train the image processing model can ensure that the image processing model can better learn the features of multiple subject objects corresponding to the same category.
- the image processing device may use a similarity value of 0.75 and a similarity value of 0.25 as two cluster centers. Afterwards, the image processing device can calculate the distance between the similarity value of each candidate sample pair and the two cluster centers, and assign the similarity value of each candidate sample pair to the nearest cluster center. Finally, the image processing device may determine the multiple candidate sample pairs corresponding to the multiple similarities with the similarity value 0.75 as the cluster center as positive sample pairs.
- Step 209 determining a plurality of negative sample pairs whose number is the same as that of the plurality of positive sample pairs from the training sample set.
- the image processing device may determine a plurality of negative sample pairs whose number is the same as that of the plurality of positive sample pairs from the training sample set.
- each negative sample pair includes two training samples based on original images in different original image sets.
- the multiple positive sample pairs and multiple negative sample pairs determined by the image processing device will be used for training the image processing model. If the number of positive sample pairs used for training is the same as the number of negative sample pairs, the training effect of the image processing model can be improved.
- the number of the multiple negative sample pairs may also be different from the number of the multiple positive sample pairs.
- Step 210 mark the true value of each positive sample pair as 1, and mark the true value of each negative sample pair as 0.
- the true value (ground truth) of each positive sample pair in the plurality of positive sample pairs can be marked as 1, and the multiple negative sample pairs can be marked as 1.
- the ground truth value of each negative sample pair in the sample pair is marked as 0.
- the true value of a sample pair may also be referred to as a label of the sample pair, and the true value is used to characterize the similarity of two training samples in the sample pair.
- Step 211 train an image processing model by using multiple marked positive sample pairs and multiple marked negative sample pairs.
- an initial image processing model is pre-stored in the image processing device.
- the image processing device may use multiple marked positive sample pairs and multiple marked negative sample pairs to perform multiple rounds of training on the initial image processing model.
- the image processing device may stop training the image processing model when the precision of the image processing model reaches a preset precision, or the number of training rounds of the image processing model reaches a preset number of rounds.
- the preset number of rounds may be negatively correlated with the number of sample pairs used for training the image processing model (that is, the total number of positive sample pairs and negative sample pairs). That is, the greater the number of sample pairs used for training, the less the number of training rounds of the image processing model can be. For example, if the number of sample pairs used to train the image processing model is 1 million, the preset number of rounds may be 10 rounds. If the number of sample pairs used for training is 10,000, the preset number of rounds may be 100 rounds.
- the image processing device can sequentially input the two training samples in each positive sample pair and the two training samples in each negative sample pair to the image Handle the model.
- the image processing model can further extract features of two training samples in each input sample pair, and determine a feature vector of each training sample. after.
- the image processing model can determine the similarity of each sample pair based on the feature vector of each training sample.
- the image processing device can determine the difference between the similarity of each positive sample and the true value of the positive sample pair based on the image processing model, and the difference between the similarity of each negative sample pair and the true value of the negative sample pair value, adjust the parameters of the image processing model to optimize the accuracy of the image processing model.
- the image processing model trained by the image processing device may be the CNN used in step 206 above without fully connected layers.
- the image processing device can apply the image processing model to a specific image processing task (such as an image classification task, an image recognition task, or an image segmentation task, etc.).
- a specific image processing task such as an image classification task, an image recognition task, or an image segmentation task, etc.
- the order of the steps in the method for training the image processing model provided in the embodiment of the present application can be adjusted appropriately, and the steps can also be increased or decreased accordingly according to the situation.
- the above step 202 and step 203 can be deleted according to the situation.
- the image processing device can crop the fixed cropping area in the original image. Any person familiar with the technical field within the scope of the technology disclosed in this application can easily think of changing methods, which should be covered by the scope of protection of this application, so no more details will be given here.
- the embodiment of the present application provides a method for training an image processing model.
- the training method can effectively expand the number of training samples by cropping a plurality of original images, so as to ensure a better effect of the image processing model obtained through training.
- the categories of the two training samples in each positive sample pair used to train the image processing model are the same and have a high degree of similarity, while the categories of the two training samples in the negative sample pair are different.
- the trained image processing model can better learn the features of images of different categories, thereby further improving the effect of the image processing model.
- FIG. 5 is a schematic flowchart of an image classification method provided by an embodiment of the present application, and the method can be applied to an image processing device.
- the image processing device may be the server 110 or the terminal 120 in the scenario shown in FIG. 1 .
- the method may include the following steps.
- Step 301 Obtain target images to be classified.
- the server may acquire the target image to be classified sent by the terminal.
- target images to be classified may be pre-stored in the terminal, or the terminal may acquire target images to be classified sent by other devices (such as another terminal). It can be understood that the target image to be classified does not have a manually marked category, that is, the current category of the target image is unknown.
- Step 302 Input the target image into the image classification model to obtain the category of the target image output by the image classification model.
- An image classification model is pre-stored in the image processing device, and the image classification model can be trained by using the image processing model training method provided in the above method embodiment. After the image processing device acquires the target image to be classified, it can input the target image to be classified into the image classification model, and the image classification model can further identify the category of the target image and output it.
- the image classification model may be sent to the terminal by the server.
- the above step 302 may include the following steps:
- the image classification model can extract the image features of the input target image, and based on the image features of the target image and the image features of multiple reference images of different categories, determine the similarity between the target image and multiple reference images of different categories. That is, the image classification model can compare the target image with each reference image to determine the similarity between the target image and each reference image.
- the plurality of reference images may be training samples used when training the image classification model.
- the category of the reference image with the highest similarity to the target image is determined as the category of the target image.
- the image processing device may determine the category of the reference image with the highest similarity to the target image as the category of the target image.
- the image processing device may calculate an average value of similarities between the target image and the reference images of the category. Afterwards, the image processing device may determine the category with the highest average value of the similarity as the category of the target image.
- the above step 302 may include the following steps:
- Step 302b Input the target image into the image classification model, and obtain the similarity between the target image output by the image classification model and image features of different categories.
- the image classification model can extract image features of the target image, and based on the image features of the target image and image features of different categories, determine the similarity between the target image and image features of different categories.
- the image features of each category are obtained by feature extraction from a plurality of training samples in the category.
- the mean value of the image features of multiple training samples of the category may be determined as the image feature of the category.
- Step 302b2 the category of the image feature with the highest similarity to the target image is determined as the category of the target image.
- the image classification model determines the category of the target image based on the similarity between the target image and multiple image features of different categories, which can enable the image processing model to quickly determine the category of the target image, thereby improving the classification efficiency.
- the embodiment of the present application provides an image classification method.
- the method can input the target image to be classified into the image classification model, and the image classification model can then output the category of the target image. Since the image classification model is trained based on the image processing model training method provided in the above method embodiment, the performance of the image classification model is relatively good. That is, the image classification model can better extract the image features of the target image, and accurately determine the category of the target image based on the image features of the target image.
- Fig. 6 is a structural block diagram of a training device for an image processing model provided in an embodiment of the present application. As shown in Fig. 6, the device includes:
- the acquiring module 401 is configured to acquire multiple original image sets, each original image set includes multiple original images of the same category, and different original image sets include different categories of original images.
- the cropping module 402 is used to crop a plurality of original images in a plurality of original image sets to obtain a training sample set, the training sample set includes a plurality of training samples, each training sample is an original image, or an original image The sub-image obtained by cropping the image.
- the determination module 403 is used to determine a plurality of positive sample pairs and a plurality of negative sample pairs from the training sample set, wherein each positive sample pair includes two training samples obtained based on different original images in the same original image set, and each negative sample pair A sample pair includes two training samples based on original images in different original image sets.
- a training module 404 configured to train an image processing model using multiple positive sample pairs and multiple negative sample pairs.
- the cropping module 402 is configured to: for each original image used for cropping in multiple original image sets, randomly generate a cropping size within the target size range; determine the cropping area based on the size of the original image and the cropping size The reference point; determine the cropping area in the original image based on the cropping size and the reference point, and crop the cropping area.
- the target size range includes a width range and a height range
- the clipping size includes a width within the width range and a height within the height range
- the clipping area is a rectangular area
- the reference point of the clipping area is a vertex of the rectangular area , or the center point of the rectangular area.
- the determination module 403 is configured to: determine a plurality of candidate sample pairs from the training sample set, each candidate sample pair includes two training samples obtained based on different original images in the same original image set; determine each The similarity of candidate sample pairs; the candidate sample pairs whose similarity is greater than the similarity threshold are determined as positive sample pairs.
- the determination module 403 is configured to: use a convolutional neural network to extract the feature vector of each training sample in each candidate sample pair; for each candidate sample pair, use a similarity measurement algorithm to process the candidate samples The feature vectors of the two training samples in the pair are obtained to obtain the similarity of the candidate samples.
- the determining module 403 is configured to determine a plurality of negative sample pairs whose number is the same as that of the plurality of positive sample pairs from the training sample set.
- the training module 404 is configured to: mark the true value of each positive sample pair as 1, and mark the true value of each negative sample pair as 0; adopt multiple positive sample pairs after marking, And multiple negative sample pairs after marking to train the image processing model.
- the embodiment of the present application provides a training device for an image processing model, which can effectively expand the number of training samples by cutting out multiple original images, and ensure that the trained image processing model has a better effect .
- the categories of the two training samples in each positive sample pair used to train the image processing model are the same and have a high degree of similarity, while the categories of the two training samples in the negative sample pair are different.
- the trained image processing model can better learn the features of images of different categories, thereby further improving the effect of the image processing model.
- Fig. 7 is a structural block diagram of an image classification device provided by an embodiment of the present application. As shown in Fig. 7, the device includes:
- the acquiring module 501 is configured to acquire target images to be classified.
- the classification module 502 is configured to input the target image into the image classification model to obtain the category of the target image output by the image classification model; wherein, the image classification model is trained by using the training device for the image processing model provided in the above embodiment.
- the classification module 502 is configured to: input the target image into the image classification model to obtain the similarity between the target image output by the image classification model and reference images of different categories; The category of the reference image with the highest similarity is determined as the category of the target image.
- the classification module 502 is configured to: input the target image into the image classification model to obtain the similarity between the target image output by the image classification model and image features of different categories; The category of the image feature with the highest similarity is determined as the category of the target image; wherein, the image feature of each category is obtained by performing feature extraction on multiple training samples of the category.
- the embodiment of the present application provides an image classification device, which can input a target image to be classified into an image classification model, and the image classification model can further output the category of the target image. Since the image classification model is trained based on the image processing model training device provided by the above device embodiment, the performance of the image classification model is relatively good. That is, the image classification model can better extract the image features of the target image, and accurately determine the category of the target image based on the image features of the target image.
- image processing model training device and the image classification device provided in the above embodiment are only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules according to needs. Completion means that the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the image processing model training device and the image processing model training method embodiment provided by the above embodiment belong to the same concept
- the image classification device and the image classification device method embodiment belong to the same concept
- the specific implementation process is detailed in the method embodiment. I won't go into details here.
- Embodiments of the present application also provide an image processing device, where the image processing device may be a computer device, such as a server or a terminal.
- the image processing device may include the image processing model training device and/or the image classification device provided in the above embodiments.
- the image processing device may include a processor 601 and a memory 602, the memory 602 stores instructions, the instructions are loaded and executed by the processor 601 to implement the training method of the image processing model provided by the above method embodiment , or the image classification method provided by the above method embodiment.
- the embodiment of the present application also provides a computer-readable storage medium, the storage medium stores instructions, and the instructions are loaded and executed by the processor to implement the training method of the image processing model provided by the above method embodiment, or the implementation of the above method The image classification method provided by the example.
- Embodiments of the present application also provide a computer program product or computer program, the computer program product or computer program includes computer instructions, loaded and executed by a processor to implement the image processing model training method as described in the above aspect, or An image classification method as described in the above aspect.
- the program can be stored in a computer-readable storage medium.
- the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
本申请公开了一种图像处理模型的训练方法、图像分类方法及装置,该训练方法通过对多张原始图像进行裁剪,可以有效扩充训练样本的数量,确保训练得到的图像处理模型的效果较好。并且,用于训练图像处理模型的每个正样本对中的两个训练样本的类别相同,而负样本对中的两个训练样本的类别不同。由此,可以确保训练得到的图像处理模型能够较好的学习到不同类别的图像的特征,进而进一步改善了图像处理模型的效果。
Description
本申请要求于2021年12月29日提交的申请号为202111640853.8、发明名称为“图像处理模型的训练方法、图像分类方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及机器学习领域,特别涉及一种图像处理模型的训练方法、图像分类方法及装置。
在机器学习领域,可以采用大量训练样本对图像处理模型进行训练,以确保训练得到的图像处理模型的性能较好。例如,对于图像分类模型,需要获取大量不同类别的图像作为训练样本,以对图像分类模型进行训练。
但是,由于某些场景下能够的训练样本的数量有限(例如某些类别的图像的数量有限),导致训练得到的图像处理模型的效果较差。
发明内容
本申请提供了一种图像处理模型的训练方法、图像分类方法及装置。所述技术方案如下:
一方面,提供了一种图像处理模型的训练方法,所述方法包括:
获取多个原始图像集,每个所述原始图像集包括多张相同类别的原始图像,且不同所述原始图像集包括的原始图像的类别不同;
对所述多个原始图像集中的多张原始图像进行裁剪,得到训练样本集,所述训练样本集包括多个训练样本,每个所述训练样本为一张原始图像,或者对一张原始图像进行裁剪得到的子图像;
从所述训练样本集中确定多个正样本对和多个负样本对,其中每个所述正样本对包括基于同一个原始图像集中的不同原始图像得到的两个训练样本,每个所述负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本;
采用所述多个正样本对和多个负样本对训练图像处理模型。
可选地,对于所述多个原始图像集中用于裁剪的每一张原始图像,随机生成位于目标尺寸范围内的一个裁剪尺寸;基于所述原始图像的尺寸和所述裁剪尺寸确定裁剪区域的参考点;基于所述裁剪尺寸和所述参考点在所述原始图像中确定所述裁剪区域,并对所述裁剪区域进行裁剪。
可选地,所述目标尺寸范围包括宽度范围和高度范围,所述裁剪尺寸包括位于所述宽度范围内的宽度,以及位于所述高度范围内的高度;所述裁剪区域为矩形区域,所述裁剪区域的参考点为所述矩形区域的一个顶点,或所述矩形区域的中心点。
可选地,从所述训练样本集中确定多个备选样本对,每个所述备选样本对均包括基于同一个原始图像集中的不同原始图像得到的两个训练样本;确定每个所述备选样本对的相似度;将相似度大于相似度阈值的所述备选样本对确定为正样本对。
可选地,采用卷积神经网络提取每个所述备选样本对中的每个训练样本的特征向量;对于每个所述备选样本对,采用相似度度量算法处理所述备选样本对中的两个训练样本的特征向量,得到所述备选样本对的相似度。
可选地,从所述训练样本集中确定数量与所述多个正样本对的数量相同的多个负样本对。
可选地,将每个所述正样本对的真值均标记为1,并将每个所述负样本对的真值均标记为0;采用标记后的所述多个正样本对,以及标记后的所述多个负样本对训练图像处理模型。
另一方面,提供了一种图像分类方法,所述方法包括:
获取待分类的目标图像;
将所述目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像的类别;其中,所述图像分类模型采用如上述方面所述图像处理模型的训练方法训练得到。
可选地,将目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像与不同类别的参考图像的相似度;将所述不同类别的参考图像中,与所述目标图像的相似度最高的参考图像的类别确定为所述目标图像的类别。
可选地,将目标图像输入至图像分类模型,得到所述图像分类模型输出的 所述目标图像与不同类别的图像特征的相似度;将所述不同类别的图像特征中,与所述目标图像的相似度最高的图像特征的类别确定为所述目标图像的类别;其中,每个类别的图像特征是对所述类别的多个训练样本进行特征提取得到的。
又一方面,提供了一种图像处理模型的训练装置,所述装置包括:
获取模块,用于获取多个原始图像集,每个所述原始图像集包括多张相同类别的原始图像,且不同所述原始图像集包括的原始图像的类别不同;
裁剪模块,用于对所述多个原始图像集中的多张原始图像进行裁剪,得到训练样本集,所述训练样本集包括多个训练样本,每个所述训练样本为一张原始图像,或者对一张原始图像进行裁剪得到的子图像;
确定模块,用于从所述训练样本集中确定多个正样本对和多个负样本对,其中每个所述正样本对包括基于同一个原始图像集中的不同原始图像得到的两个训练样本,每个所述负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本;
训练模块,用于采用所述多个正样本对和多个负样本对训练图像处理模型。
可选地,所述裁剪模块用于:
对于所述多个原始图像集中用于裁剪的每一张原始图像,随机生成位于目标尺寸范围内的一个裁剪尺寸;
基于所述原始图像的尺寸和所述裁剪尺寸确定裁剪区域的参考点;
基于所述裁剪尺寸和所述参考点在所述原始图像中确定所述裁剪区域,并对所述裁剪区域进行裁剪。
可选地,所述目标尺寸范围包括宽度范围和高度范围,所述裁剪尺寸包括位于所述宽度范围内的宽度,以及位于所述高度范围内的高度;所述裁剪区域为矩形区域,所述裁剪区域的参考点为所述矩形区域的一个顶点,或所述矩形区域的中心点。
可选地,所述确定模块用于:
从所述训练样本集中确定多个备选样本对,每个所述备选样本对均包括基于同一个原始图像集中的不同原始图像得到的两个训练样本;
确定每个所述备选样本对的相似度;
将相似度大于相似度阈值的所述备选样本对确定为正样本对。
可选地,所述确定模块用于:
采用卷积神经网络提取每个所述备选样本对中的每个训练样本的特征向量;
对于每个所述备选样本对,采用相似度度量算法处理所述备选样本对中的两个训练样本的特征向量,得到所述备选样本对的相似度。
可选地,所述确定模块,用于从所述训练样本集中确定数量与所述多个正样本对的数量相同的多个负样本对。
可选地,所述训练模块用于:
将每个所述正样本对的真值均标记为1,并将每个所述负样本对的真值均标记为0;
采用标记后的所述多个正样本对,以及标记后的所述多个负样本对训练图像处理模型。
再一方面,提供了一种图像分类装置,所述装置包括:
获取模块,用于获取待分类的目标图像;
分类模块,用于将所述目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像的类别;其中,所述图像分类模型采用上述方面所述的图像处理模型的训练装置训练得到。
可选地,所述分类模块用于:
将目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像与不同类别的参考图像的相似度;
将所述不同类别的参考图像中,与所述目标图像的相似度最高的参考图像的类别确定为所述目标图像的类别。
可选地,所述分类模块用于:
将目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像与不同类别的图像特征的相似度;
将所述不同类别的图像特征中,与所述目标图像的相似度最高的图像特征的类别确定为所述目标图像的类别;其中,每个类别的图像特征是对所述类别的多个训练样本进行特征提取得到的。
再一方面,提供了一种图像处理设备,所述图像处理设备包括处理器和存储器,所述存储器中存储有指令,所述指令由所述处理器加载并执行以实现如上述方面所述的图像处理模型的训练方法,或如上述方面所述的图像分类方法。
再一方面,提供了一种计算机可读存储介质,所述存储介质中存储有指令,所述指令由处理器加载并执行以实现如上述方面所述的图像处理模型的训练方法,或如上述方面所述的图像分类方法。
再一方面,提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令由处理器加载并执行以实现如上述方面所述的图像处理模型的训练方法,或如上述方面所述的图像分类方法。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种图像处理模型的训练系统的结构示意图;
图2是本申请实施例提供的一种图像处理模型的训练方法的流程示意图;
图3是本申请实施例提供的另一种图像处理模型的训练方法的流程示意图;
图4是本申请实施例提供的一种原始图像的裁剪示意图;
图5是本申请实施例提供的一种图像分类方法的流程示意图;
图6是本申请实施例提供的一种图像处理模型的训练装置的结构示意图;
图7是本申请实施例提供的一种图像分类装置的结构示意图;
图8是本申请实施例提供的一种图像处理设备的结构示意图。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。本申请实施例提供的图像处理模型的训练方法可以应用于样本数量较少(即小样本)场景,相应的,该训练方法也可以称为小样本学习方法。小样本学习的目标就是在样本有限的情况下实现相对较好的模型训练精度。
为了改善小样本学习方法的性能,一般会从数据(即训练样本)、模型和训练算法三方面进行改进。其中,对训练样本进行改进时,可以对训练样本进行转换得到新的训练样本,从而扩充训练样本集合。或者,可以对若标注或无 标准样本进行转换得到新的训练样本,从而扩充训练样本集合。又或者,可以对训练样本的类似数据集中的数据进行转换得到新的训练样本,从而扩充训练样本集合。对模型进行改进时,可以采用多任务学习、嵌入学习和基于外部记忆的学习方法对模型进行训练。对训练算法进行优化时,可以采用改善已有参数、改善元学习(meta-learning)参数或学习优化器等方式。
图1是本申请实施例提供的一种图像处理模型的训练系统的结构示意图。参见图1,该系统包括:服务器110和终端120。该服务器110与终端120之间建立有线或无线通信连接。可选地,该服务器110可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统。该终端120可以是个人计算机(personal computer,PC)、平板电脑、智能手机、可穿戴设备、智能机器人等具备数据计算、处理和存储能力的终端。
在本申请实施例中,该系统中的终端设备120可以用于获取用于模型训练的原始图像集,并将该原始图像发送至服务器110。服务器110进而可以对该原始图像集中的原始图像进行处理,并将处理后的原始图像作为训练样本,对图像处理模型进行训练。该训练后的图像处理模型能应用于图像分类任务、图像识别任务或者图像分割任务等。
或者,该系统可以是一种能够执行具体图像处理任务(例如图像分类任务)的系统。相应的,该系统中的终端设备120可以用于获取待检测的原始图像,并将该待检测的原始图像发送至服务器110进行检测。服务器110中预先存储有已完成训练的图像处理模型。服务器110获取到待检测的原始图像后,可以将该原始图像输入至图像处理模型,该图像处理模型进而能够对该原始图像进行检测和识别,并输出检测结果。之后,服务器110可以将该检测结果发送至终端120。
图2是本申请实施例提供的一种图像处理模型的训练方法的流程图,该方法可以应用于图像处理设备,该图像处理设备可以是如图1所示的服务器110。参见图2,该方法包括:
步骤101、获取多个原始图像集。
图像处理设备可以获取其预先存储的多个原始图像集,或者可以获取其他 设备(例如终端)发送的多个原始图像集。其中,每个原始图像集包括多张相同类别的原始图像,且不同原始图像集包括的原始图像的类别不同。可以理解的是,每个原始图像集包括的所有原始图像的类别均相同,即每个原始图像集只包括一个类别的原始图像。该多个原始图像集中的每张原始图像的类别可以是人工标注的,且该类别可以是指原始图像中主体对象的类别。
步骤102、对多个原始图像集中的多张原始图像进行裁剪,得到训练样本集。
图像处理设备能够对每个原始图像集中的多张原始图像进行裁剪,得到训练样本集。例如,图像处理设备能够对每个原始图像集中的每一张原始图像均进行裁剪。图像处理设备对多张原始图像进行裁剪后得到的训练样本集包括多个训练样本,每个训练样本为一张原始图像,或者为对一张原始图像进行裁剪得到的子图像。
其中,对一张原始图像进行裁剪得到的子图像的张数可以大于或等于1。并且,从一张原始图像中裁剪得到的任意两张子图像的尺寸和/或在该原始图像中的位置不同。可选地,每张子图像的尺寸和/或在该原始图像中的位置可以是图像处理设备随机确定的,或者,也可以是图像处理设备中预先配置的。
可以理解的是,对一张原始图像进行裁剪后得到的任一张子图像的类别与该张原始图像的类别相同。通过对原始图像集中的每一张原始图像进行裁剪处理,并将裁剪得到的子图像作为训练样本,可以有效扩充训练样本集中训练样本的数量。
步骤103、从训练样本集中确定多个正样本对和多个负样本对。
图像处理设备得到训练样本集后,可以从训练样本集中确定多个正样本对和多个负样本对。其中,每个负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本。每个正样本对包括基于同一个原始图像集中的不同原始图像得到的两个训练样本。也即是,图像处理设备可以将类别相同且来自不同原始图像的两个训练样本确定为一个正样本对,并可以将类别不同的两个训练样本确定为一个负样本对。
可以理解的是,由于每个正样本对中的两个训练样本的类别相同,因此可以确保每个正样本对中的两个训练样本均包括同一类别的主体对象的部分或全部图像特征。由此,可以确保基于该正样本对训练图像处理模型后,图像处理模型能够准确学习到该类别的主体对象的特征。
步骤104、采用多个正样本对和多个负样本对训练图像处理模型。
在本申请实施例中,图像处理设备中预先存储有初始的图像处理模型。该图像处理模型可以为卷积神经网络(convolutional neural networks,CNN)模型。图像处理设备从训练样本集中确定出多个正样本对和负样本对后,可以采用该多个正样本对以及和正样本数量相同的负样本对训练该图像处理模型。并且,图像处理设备可以在该图像处理模型的精度达到预设精度,或者该图像处理模型的训练轮数达到预设轮数时,停止对该图像处理模型的训练。
可以理解的是,每个原始图像集中,不同原始图像中的主体对象的位置和大小均存在差异。因此,在原始图像集所包括的原始图像的数量有限的场景下,若直接采用该原始图像集训练图像处理模型,则图像处理模型难以准确地获取到同一类别的不同原始图像中的主体对象的特征。相应的,训练得到的图像处理模型的效果也较差。而在本申请实施例中,由于可以对原始图像进行裁剪得到训练样本集,该训练样本集中用于训练图像处理模型的每个正样本对中的两个训练样本的类别相同,而负样本对中的两个训练样本的类别不同,因此可以确保训练得到的图像处理模型能够较好的学习到不同类别的图像的特征,即不同类别的主体对象的特征。
综上所述,本申请实施例提供了一种图像处理模型的训练方法。该训练方法通过对多张原始图像进行裁剪,可以有效扩充训练样本的数量,确保训练得到的图像处理模型的效果较好。并且,用于训练图像处理模型的每个正样本对中的两个训练样本的类别相同,而负样本对中的两个训练样本的类别不同。由此,可以确保训练得到的图像处理模型能够较好的学习到不同类别的图像的特征,进而进一步改善了图像处理模型的效果。
图3是本申请实施例提供的另一种图像处理模型的训练方法的流程图。该方法可以应用于图像处理设备,该图像处理设备可以是如图1所示的服务器110。参见图3,该方法包括:
步骤201、获取多个原始图像集。
图像处理设备可以获取其预先存储的多个原始图像集,或者可以获取其他设备(例如终端)发送的多个原始图像集。其中,每个原始图像集包括多张相同类别的原始图像,且不同原始图像集包括的原始图像的类别不同。可以理解 的是,每个原始图像集包括的所有原始图像的类别均相同,即每个原始图像集只包括一个类别的原始图像。该多个原始图像集中的每张原始图像的类别可以是人工标注的,且该类别可以是指原始图像中主体对象的类别。其中,不同原始图像集所包括的原始图像的张数可以相同,也可以不同。
可选地,原始图像的类别可以是动物的物种类别,或者可以是植物的物种类别,又或者还可以是食物的种类类别,再或者还可以是家具的种类类别。本申请实施例对此不做限定。
例如,假设待训练的图像处理模型是用于识别动物的物种类别的模型,则图像处理设备中可以预先存储有多个不同物种的原始图像集。其中,每个物种的原始图像集中包括多张该物种的原始图像。
步骤202、对于多个原始图像集中用于裁剪的每一张原始图像,随机生成位于目标尺寸范围内的一个裁剪尺寸。
图像处理设备获取到多个原始图像集后,可以对该多个原始图像集中用于裁剪的每一张原始图像,随机生成位于目标尺寸范围内的一个裁剪尺寸,以便基于生成的每个裁剪尺寸,在一张原始图像中裁剪出一个子图像。其中,该目标尺寸范围可以是图像处理设备预先存储的,且该目标尺寸范围可以是基于一张原始图像的尺寸确定的。例如,该目标尺寸范围的上限可以等于一张原始图像的尺寸。
可选地,图像处理设备对一张原始图像进行裁剪时的裁剪区域可以为矩形区域。相应的,该目标尺寸范围可以包括宽度范围和高度范围,裁剪尺寸包括位于该宽度范围内的宽度,以及位于该高度范围内的高度。
可以理解的是,该裁剪区域还可以为其他形状的区域,相应的,该目标尺寸范围可以包括其他参数的范围。例如,若裁剪区域为圆形,则该目标尺寸范围可以是半径或直径的范围。
示例的,假设目标尺寸范围中的宽度范围为[W
max,W
min],高度范围为[H
max,H
min]。某个原始图像集D中包括K张原始图像,对于该K张原始图像中的第k(k为不大于K的整数)张原始图像I
k,图像处理设备可以随机生成T个裁剪尺寸,即图像处理设备可以对该第k张原始图像I
k进行T(T为大于1的整数)次裁剪,以得到T张子图像。其中,该T个裁剪尺寸中的第t个(t为不大于T的整数)裁剪尺寸的宽度w
t满足:W
max≤w
t≤W
min,高度h
t满足:H
max≤h
t≤H
min。
步骤203、基于原始图像的尺寸和裁剪尺寸确定裁剪区域的参考点。
图像处理设备还可以基于待裁剪的原始图像的尺寸,以及随机生成的一个裁剪尺寸,确定裁剪区域的参考点。该参考点用于确定裁剪区域在待裁剪的原始图像中的位置。在本申请实施例中,图像处理设备确定出的参考点需使得裁剪区域位于待裁剪的原始图像内。
可选地,该裁剪区域可以为矩形区域,裁剪区域的参考点可以为矩形区域的一个顶点(例如矩形区域的左上顶点),或矩形区域的中心点。或者,若该裁剪区域为圆形,则该裁剪区域的参考点可以为圆形区域的圆心。
步骤204、基于裁剪尺寸和参考点在原始图像中确定裁剪区域,并对裁剪区域进行裁剪。
对于待裁剪的一张原始图像,图像处理设备可以基于随机确定出裁剪尺寸,以及确定出的参考点,在该张原始图像中确定出一个裁剪区域,进而可以对该裁剪区域进行裁剪,得到一个子图像。
可选地,对于用于裁剪的每一张原始图像,图像处理设备均可以基于上述步骤202至步骤204所示的方法,确定出多个裁剪区域。该多个裁剪区域的尺寸和/或在原始图像中的位置不同。相应的,图像处理设备对每一张原始图像进行裁剪后,可以得到多张子图像。可以理解的是,图像处理设备可以将每一张原始图像均作为一个训练样本,并可以将每一张子图像均作为一个训练样本。
还可以理解的是,图像处理设备对不同原始图像进行裁剪得到的子图像的个数可以相同,也可以不同。假设图像处理设备在每一张原始图像中均裁剪出了T张子图像,则图像处理设备基于每一张原始图像可以生成T+1个训练样本。
示例的,假设待裁剪的一张原始图像如图4所示,则图像处理设备可以在该张原始图像中确定出a1至a6共6个裁剪区域。图像处理设备对该6个裁剪区域进行裁剪后,可以得到6张子图像。
步骤205、从训练样本集中确定多个备选样本对。
图像处理设备对多个原始图形集中的多张原始图像均进行裁剪后,可以得到一个训练样本集。该训练样本集包括多个训练样本,其中每个训练样本为一张原始图像,或者为对一张原始图像进行裁剪得到的子图像。之后,图像处理设备可以从训练样本集中确定出多个备选样本对,其中每个备选样本对均包括基于同一个原始图像集中的不同原始图像得到的两个训练样本。也即是,每个 备选样本对包括的两个训练样本是基于两个相同类别的原始图像得到的。
步骤206、采用CNN提取每个备选样本对中的每个训练样本的特征向量。
图像处理设备从训练样本集中确定出多个备选样本对后,可以将每个备选样本对中的每个训练样本输入CNN。CNN可以对输入的训练样本进行特征提取,并计算出每个训练样本的特征向量。
在本申请实施例中,图像处理设备可以先采用大量带标签的图像数据集,例如图像网络(ImageNet)数据集,对一个初始的CNN进行训练,从而使得该CNN能够较好地提取训练样本的特征,并具备一定的图像分类能力。其中,该CNN的基本结构可以包括卷积层、池化层以及全连接层。其中,在CNN的网络结构中,卷积层和池化层交替分布。卷积层可以通过卷积计算提取出训练样本的特征,池化层可以对输入至CNN模型的训练样本进行降采样处理,即对训练样本进行缩小处理,并同时保留训练样本中的重要信息。全连接层基于卷积层确定的图像特征,对图像进行分类。
图像处理设备在完成对初始CNN的训练后,可以去除该CNN的全连接层。之后,图像处理设备可以将每个备选样本对中的每个训练样本输入至去除全连接层后的CNN。CNN可以对输入的训练样本进行特征提取,并计算出每个训练样本的特征向量。
步骤207、对于每个备选样本对,采用相似度度量算法处理备选样本对中的两个训练样本的特征向量,得到备选样本对的相似度。
其中,该相似度度量算法可以包括:余弦距离(也称为余弦相似度),欧几里得度量(也称欧氏距离)和巴塔恰里雅距离(也称为巴氏距离)等算法。
步骤208、将相似度大于相似度阈值的备选样本对确定为正样本对。
图像处理设备确定出每个备选样本对的相似度后,可以将该多个备选样本对中相似度大于相似度阈值的备选样本对确定为正样本对。其中,该相似度阈值可以是图像处理设备中预先配置的固定值。
可选地,图像处理设备可以采用聚类(例如K均值聚类)算法,按照各个备选样本对的相似度,对该多个备选样本对进行聚类。例如,图像处理设备可以将该多个备选样本对聚成两类,并将相似度较高的一类备选样本对确定为正样本对。
可以理解的是,每个备选样本对中的两个训练样本虽来自同一类别的两个 原始图像,但该两个训练样本中包含的图像特征可能存在较大差异。例如,该两个训练样本中的至少一个训练样本可能不包含该类别的主体对象的图像特征,或者,若两个训练样本可能包含主体对象的不同部位的图像特征。本申请实施例通过计算该两个训练样本的相似度,并将相似度大于相似度阈值的备选样本对确定为正样本对,可以使得图像处理设备确定出的每个正样本对中的两个训练样本的相似度较高,即该两个训练样本均包括同一类别的主体对象的图像特征的概率较高。相应的,采用该多个正样本对训练图像处理模型,则可以确保该图像处理模型能够较好地学习到同一类别对应的多个主体对象的特征。
示例的,假设图像处理设备采用K均值聚类算法确定正样本对,则图像处理设备可以将相似度值0.75和相似度值0.25作为两个聚类中心。之后,图像处理设备可以分别算每个备选样本对的相似度值与该两个聚类中心之间的距离,并将每个备选样本对的相似度值分配给距离其最近的聚类中心。最后,图像处理设备可以将以相似度值0.75为聚类中心的多个相似度对应的多个备选样本对确定为正样本对。
步骤209、从训练样本集中确定数量与多个正样本对的数量相同的多个负样本对。
图像处理设备从多个备选样本对中确定出多个正样本对后,可以从训练样本集中确定出数量与多个正样本对的数量相同的多个负样本对。其中,每个负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本。
可以理解的是,图像处理设备确定出的多个正样本对和多个负样本对将用于图像处理模型的训练。若用于训练的多个正样本对的数量和负样本对的数量相同,则可以使得图像处理模型的训练效果较好。
还可以理解的是,该多个负样本对的数量也可以与多个正样本对的数量不同。
步骤210、将每个正样本对的真值均标记为1,并将每个负样本对的真值均标记为0。
图像处理设备确定出多个正样本对和多个负样本对后,可以将该多个正样本对中的每个正样本对的真值(ground truth)均标记为1,并将多个负样本对中的每个负样本对的真值均标记为0。其中,样本对的真值也可以称为该样本对的标签,该真值用于表征该样本对中的两个训练样本的相似度情况。
步骤211、采用标记后的多个正样本对,以及标记后的多个负样本对训练图像处理模型。
在本申请实施例中,图像处理设备中预先存储有初始的图像处理模型。图像处理设备可以采用标记后的多个正样本对,以及标记后的多个负样本对,对该初始图像处理模型进行多轮训练。并且,图像处理设备可以在该图像处理模型的精度达到预设精度,或者该图像处理模型的训练轮数达到预设轮数时,停止对该图像处理模型的训练。
其中,该预设轮数与用于训练图像处理模型的样本对的数量(即正样本对和负样本对的总数)可以负相关。即用于训练的样本对的数量越多,图像处理模型的训练轮数可以越少。示例的,若用于训练图像处理模型的样本对的数量为100万个,则该预设轮数可以为10轮。若用于训练的样本对的数量为1万个,则该预设轮数可以为100轮。
可以理解的是,在对图像处理模型进行训练的过程中,图像处理设备可以将每个正样本对中的两个训练样本,以及每个负样本对中的两个训练样本均依次输入至图像处理模型。该图像处理模型进而可以提取输入的每个样本对的中两个训练样本的特征,并确定每个训练样本的特征向量。之后。该图像处理模型可以基于每个训练样本的特征向量,确定每个样本对的相似度。最后,图像处理设备可以基于图像处理模型确定的每个正样本的相似度与该正样本对的真值的差值,以及每个负样本对的相似度与该负样本对的真值的差值,对该图像处理模型的参数进行调节,以优化图像处理模型的精度。
可选地,图像处理设备所训练的图像处理模型可以是上述步骤206中所采用的去除全连接层的CNN。
在完成对该图像处理模型的训练后,图像处理设备可以将该图像处理模型应用于具体的图像处理任务(例如图像分类任务、图像识别任务或者图像分割任务等)。
可以理解的是,本申请实施例提供的图像处理模型的训练方法的步骤的先后顺序可以进行适当调整,步骤也可以根据情况进行相应增减。例如,上述步骤202和步骤203可以根据情况删除,相应的,在上述步骤204中,图像处理设备可以对原始图像中固定的裁剪区域进行裁剪。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请 的保护范围之内,因此不再赘述。
综上所述,本申请实施例提供了一种图像处理模型的训练方法。该训练方法通过对多张原始图像进行裁剪,可以有效扩充训练样本的数量,确保训练得到的图像处理模型的效果较好。并且,用于训练图像处理模型的每个正样本对中的两个训练样本的类别相同,且相似度较高,而负样本对中的两个训练样本的类别不同。由此,可以确保训练得到的图像处理模型能够较好的学习到不同类别的图像的特征,进而进一步改善了图像处理模型的效果。
图5是本申请实施例提供的一种图像分类方法的流程示意图,该方法可以应用于图像处理设备。该图像处理设备可以是图1所示场景中的服务器110或终端120。参见图5,该方法可以包括如下步骤。
步骤301、获取待分类的目标图像。
在本申请实施例中,若该图像处理设备为服务器,则服务器可以获取终端发送的待分类的目标图像。若该图像处理设备为终端,则该终端中可以预先存储有待分类的目标图像,或者,该终端可以获取其他设备(例如另一终端)发送的待分类的目标图像。可以理解的是,该待分类的目标图像不具有人工标注的类别,即当前该目标图像的类别是未知的。
步骤302、将目标图像输入至图像分类模型,得到图像分类模型输出的目标图像的类别。
图像处理设备中预先存储有图像分类模型,且该图像分类模型可以采用上述方法实施例提供的图像处理模型的训练方法训练得到。图像处理设备获取到待分类的目标图像后,可以将该待分类的目标图像输入至图像分类模型,图像分类模型进而能够识别该目标图像的类别并输出。
可以理解的是,若该图像处理设备为终端,则该图像分类模型可以是由服务器发送至该终端的。
作为一种可能的示例,上述步骤302可以包括如下步骤:
302a1、将目标图像输入至图像分类模型,得到图像分类模型输出的目标图像与不同类别的参考图像的相似度。
图像分类模型可以提取输入的目标图像的图像特征,并基于该目标图像的图像特征与不同类别的多张参考图像的图像特征,确定该目标图像与不同类别 的多张参考图像的相似度。也即是,该图像分类模型可以将目标图像分别与每一张参考图像进行对比,以确定该目标图像与每一张参考图像的相似度。其中,该多张参考图像可以是训练图像分类模型时所采用的训练样本。
302a2、将不同类别的参考图像中,与目标图像的相似度最高的参考图像的类别确定为目标图像的类别。
在本申请实施例中,图像处理设备计算得到目标图像与每一张参考图像的相似度后,可以将与目标图像的相似度最高的参考图像的类别确定为目标图像的类别。
对于每个类别包括多张参考图像的场景,图像处理设备可以计算目标图像与该类别的各张参考图像的相似度的均值。之后,图像处理设备可以将相似度的均值最高的类别,确定为目标图像的类别。
作为另一种可能的实施例,上述步骤302可以包括如下步骤:
步骤302b1、将目标图像输入至图像分类模型,得到图像分类模型输出的目标图像与不同类别的图像特征的相似度。
图像分类模型可以提取该目标图像的图像特征,并基于该目标图像的图像特征与不同类别的图像特征,确定该目标图像与不同类别的图像特征的相似度。其中,每个类别的图像特征是对该类别中的多个训练样本进行特征提取得到的。可选地,可以将该类别的多个训练样本的图像特征的均值确定为该类别的图像特征。
步骤302b2、将不同类别的图像特征中,与目标图像的相似度最高的图像特征的类别确定为目标图像的类别。
图像分类模型基于目标图像和多个不同类别的图像特征的相似度来确定目标图像的类别,可以使得图像处理模型能够快速地确定出目标图像类别,从而提高分类效率。
综上所述,本申请实施例提供了一种图像分类方法。该方法能够将待分类的目标图像输入至图像分类模型,该图像分类模型进而能够输出该目标图像的类别。由于该图像分类模型是基于上述方法实施例提供的图像处理模型的训练方法训练得到的,因此该图像分类模型的性能较好。也即是,该图像分类模型能够较好地提取出目标图像的图像特征,并基于该目标图像的图像特征准确地确定出该目标图像的类别。
图6是本申请实施例提供的一种图像处理模型的训练装置的结构框图,如图6所示,该装置包括:
获取模块401,用于获取多个原始图像集,每个原始图像集包括多张相同类别的原始图像,且不同原始图像集包括的原始图像的类别不同。
裁剪模块402,用于对多个原始图像集中的多张原始图像进行裁剪,得到训练样本集,该训练样本集包括多个训练样本,每个训练样本为一张原始图像,或者对一张原始图像进行裁剪得到的子图像。
确定模块403,用于从训练样本集中确定多个正样本对和多个负样本对,其中每个正样本对包括基于同一个原始图像集中的不同原始图像得到的两个训练样本,每个负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本。
训练模块404,用于采用多个正样本对和多个负样本对训练图像处理模型。
可选地,该裁剪模块402用于:对于多个原始图像集中用于裁剪的每一张原始图像,随机生成位于目标尺寸范围内的一个裁剪尺寸;基于原始图像的尺寸和裁剪尺寸确定裁剪区域的参考点;基于裁剪尺寸和参考点在原始图像中确定裁剪区域,并对裁剪区域进行裁剪。
可选地,目标尺寸范围包括宽度范围和高度范围,裁剪尺寸包括位于宽度范围内的宽度,以及位于高度范围内的高度;该裁剪区域为矩形区域,裁剪区域的参考点为矩形区域的一个顶点,或矩形区域的中心点。
可选地,该确定模块403用于:从训练样本集中确定多个备选样本对,每个备选样本对均包括基于同一个原始图像集中的不同原始图像得到的两个训练样本;确定每个备选样本对的相似度;将相似度大于相似度阈值的备选样本对确定为正样本对。
可选地,该确定模块403用于:采用卷积神经网络提取每个备选样本对中的每个训练样本的特征向量;对于每个备选样本对,采用相似度度量算法处理备选样本对中的两个训练样本的特征向量,得到备选样本的相似度。
可选地,该确定模块403用于从训练样本集中确定数量与多个正样本对的数量相同的多个负样本对。
可选地,该训练模块404用于:将每个正样本对的真值均标记为1,并将每 个负样本对的真值均标记为0;采用标记后的多个正样本对,以及标记后的多个负样本对训练图像处理模型。
综上所述,本申请实施例提供了一种图像处理模型的训练装置,该装置通过对多张原始图像进行裁剪,可以有效扩充训练样本的数量,确保训练得到的图像处理模型的效果较好。并且,用于训练图像处理模型的每个正样本对中的两个训练样本的类别相同,且相似度较高,而负样本对中的两个训练样本的类别不同。由此,可以确保训练得到的图像处理模型能够较好的学习到不同类别的图像的特征,进而进一步改善了图像处理模型的效果。
图7是本申请实施例提供的一种图像分类装置的结构框图,如图7所示,该装置包括:
获取模块501,用于获取待分类的目标图像。
分类模块502,用于将目标图像输入至图像分类模型,得到图像分类模型输出的目标图像的类别;其中,该图像分类模型采用上述实施例提供的图像处理模型的训练装置训练得到。
可选地,该分类模块502用于:将目标图像输入至图像分类模型,得到图像分类模型输出的目标图像与不同类别的参考图像的相似度;将不同类别的参考图像中,与目标图像的相似度最高的参考图像的类别确定为目标图像的类别。
可选地,该分类模块502用于:将目标图像输入至图像分类模型,得到图像分类模型输出的目标图像与不同类别的图像特征的相似度;将不同类别的图像特征中,与目标图像的相似度最高的图像特征的类别确定为目标图像的类别;其中,每个类别的图像特征是对类别的多个训练样本进行特征提取得到的。
综上所述,本申请实施例提供了一种图像分类装置,该图像分类装置能够将待分类的目标图像输入至图像分类模型,该图像分类模型进而能够输出该目标图像的类别。由于该图像分类模型是基于上述装置实施例提供的图像处理模型的训练装置训练得到,因此该图像分类模型的性能较好。也即是,该图像分类模型能够较好地提取出目标图像的图像特征,并基于该目标图像的图像特征准确地确定出该目标图像的类别。
可以理解的是,上述实施例提供的图像处理模型的训练装置和图像分类装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而 将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
另外,上述实施例提供的图像处理模型的训练装置和图像处理模型的训练方法实施例属于同一构思,图像分类装置与图像分类装置方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请的实施例还提供了一种图像处理设备,该图像处理设备可以为计算机设备,例如可以为服务器或者终端。并且,该图像处理设备可以包括上述实施例提供的图像处理模型的训练装置,和/或图像分类装置。
如图8所示,该图像处理设备可以包括处理器601和存储器602,该存储器602中存储有指令,该指令由处理器601加载并执行以实现上述方法实施例提供的图像处理模型的训练方法,或上述方法实施例提供的图像分类方法。
本申请的实施例还提供了一种计算机可读存储介质,该存储介质中存储有指令,指令由处理器加载并执行以实现上述方法实施例提供的图像处理模型的训练方法,或上述方法实施例提供的图像分类方法。
本申请的实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,由处理器加载并执行以实现如上述方面所述的图像处理模型的训练方法,或如上述方面所述的图像分类方法。
可以理解的是,本申请中术语“至少一个”是指一个或多个,“多个”的含义是指两个或两个以上。
在本文中提及的“和/或”,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
Claims (14)
- 一种图像处理模型的训练方法,其特征在于,所述方法包括:获取多个原始图像集,每个所述原始图像集包括多张相同类别的原始图像,且不同所述原始图像集包括的原始图像的类别不同;对所述多个原始图像集中的多张原始图像进行裁剪,得到训练样本集,所述训练样本集包括多个训练样本,每个所述训练样本为一张原始图像,或者对一张原始图像进行裁剪得到的子图像;从所述训练样本集中确定多个正样本对和多个负样本对,其中每个所述正样本对包括基于同一个原始图像集中的不同原始图像得到的两个训练样本,每个所述负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本;采用所述多个正样本对和多个负样本对训练图像处理模型。
- 根据权利要求1所述的方法,其特征在于,所述对所述多个原始图像集中的多张原始图像样本进行裁剪,包括:对于所述多个原始图像集中用于裁剪的每一张原始图像,随机生成位于目标尺寸范围内的一个裁剪尺寸;基于所述原始图像的尺寸和所述裁剪尺寸确定裁剪区域的参考点;基于所述裁剪尺寸和所述参考点在所述原始图像中确定所述裁剪区域,并对所述裁剪区域进行裁剪。
- 根据权利要求2所述的方法,其特征在于,所述目标尺寸范围包括宽度范围和高度范围,所述裁剪尺寸包括位于所述宽度范围内的宽度,以及位于所述高度范围内的高度;所述裁剪区域为矩形区域,所述裁剪区域的参考点为所述矩形区域的一个顶点,或所述矩形区域的中心点。
- 根据权利要求1至3任一所述的方法,其特征在于,从所述训练样本集中确定多个正样本对,包括:从所述训练样本集中确定多个备选样本对,每个所述备选样本对均包括基 于同一个原始图像集中的不同原始图像得到的两个训练样本;确定每个所述备选样本对的相似度;将相似度大于相似度阈值的所述备选样本对确定为正样本对。
- 根据权利要求4所述的方法,其特征在于,所述确定每个所述备选样本对的相似度,包括:采用卷积神经网络提取每个所述备选样本对中的每个训练样本的特征向量;对于每个所述备选样本对,采用相似度度量算法处理所述备选样本对中的两个训练样本的特征向量,得到所述备选样本对的相似度。
- 根据权利要求1至5任一所述的方法,其特征在于,从所述训练样本集中确定多个负样本对,包括:从所述训练样本集中确定数量与所述多个正样本对的数量相同的多个负样本对。
- 根据权利要求1至6任一所述的方法,其特征在于,所述采用所述多个正样本对和多个负样本对训练图像处理模型,包括:将每个所述正样本对的真值均标记为1,并将每个所述负样本对的真值均标记为0;采用标记后的所述多个正样本对,以及标记后的所述多个负样本对训练图像处理模型。
- 一种图像分类方法,其特征在于,所述方法包括:获取待分类的目标图像;将所述目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像的类别;其中,所述图像分类模型采用如权利要求1至7任一所述的方法训练得到。
- 根据权利要求8所述的方法,其特征在于,所述将所述目标图像输入至图 像分类模型,得到所述图像分类模型输出的所述目标图像的类别,包括:将目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像与不同类别的参考图像的相似度;将所述不同类别的参考图像中,与所述目标图像的相似度最高的参考图像的类别确定为所述目标图像的类别。
- 根据权利要求8所述的方法,其特征在于,所述将目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像的类别,包括:将目标图像输入至图像分类模型,得到所述图像分类模型输出的所述目标图像与不同类别的图像特征的相似度;将所述不同类别的图像特征中,与所述目标图像的相似度最高的图像特征的类别确定为所述目标图像的类别;其中,每个类别的图像特征是对所述类别的多个训练样本进行特征提取得到的。
- 一种图像处理模型的训练装置,其特征在于,所述装置包括:获取模块,用于获取多个原始图像集,每个所述原始图像集包括多张相同类别的原始图像,且不同所述原始图像集包括的原始图像的类别不同;裁剪模块,用于对所述多个原始图像集中的多张原始图像进行裁剪,得到训练样本集,所述训练样本集包括多个训练样本,每个所述训练样本为一张原始图像,或者对一张原始图像进行裁剪得到的子图像;确定模块,用于从所述训练样本集中确定多个正样本对和多个负样本对,其中每个所述正样本对包括基于同一个原始图像集中的不同原始图像得到的两个训练样本,每个所述负样本对包括基于不同原始图像集中的原始图像得到的两个训练样本;训练模块,用于采用所述多个正样本对和多个负样本对训练图像处理模型。
- 一种图像分类装置,其特征在于,所述装置包括:获取模块,用于获取待分类的目标图像;分类模块,用于将所述目标图像输入至图像分类模型,得到所述图像分类 模型输出的所述目标图像的类别;其中,所述图像分类模型采用如权利要求11所述的图像处理模型的训练装置训练得到。
- 一种图像处理设备,其特征在于,所述图像处理设备包括处理器和存储器,所述存储器中存储有指令,所述指令由所述处理器加载并执行以实现如权利要求1至7任一所述的图像处理模型的训练方法,或如权利要求8至10任一所述的图像分类方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有指令,所述指令由处理器加载并执行以实现如权利要求1至7任一所述的图像处理模型的训练方法,或如权利要求8至10任一所述的图像分类方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/565,030 US20240203097A1 (en) | 2021-12-29 | 2022-09-20 | Method and apparatus for training image processing model, and image classifying method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111640853.8 | 2021-12-29 | ||
CN202111640853.8A CN114299363A (zh) | 2021-12-29 | 2021-12-29 | 图像处理模型的训练方法、图像分类方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023124278A1 true WO2023124278A1 (zh) | 2023-07-06 |
Family
ID=80972041
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/120011 WO2023124278A1 (zh) | 2021-12-29 | 2022-09-20 | 图像处理模型的训练方法、图像分类方法及装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240203097A1 (zh) |
CN (1) | CN114299363A (zh) |
WO (1) | WO2023124278A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114299363A (zh) * | 2021-12-29 | 2022-04-08 | 京东方科技集团股份有限公司 | 图像处理模型的训练方法、图像分类方法及装置 |
CN114897060B (zh) * | 2022-04-25 | 2024-05-03 | 中国平安人寿保险股份有限公司 | 样本分类模型的训练方法和装置、样本分类方法和装置 |
CN115965817B (zh) * | 2023-01-05 | 2024-09-17 | 北京百度网讯科技有限公司 | 图像分类模型的训练方法、装置及电子设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401307A (zh) * | 2020-04-08 | 2020-07-10 | 中国人民解放军海军航空大学 | 基于深度度量学习的卫星遥感图像目标关联方法和装置 |
US20210166073A1 (en) * | 2019-12-03 | 2021-06-03 | Ping An Technology (Shenzhen) Co., Ltd. | Image generation method and computing device |
CN113111960A (zh) * | 2021-04-25 | 2021-07-13 | 北京文安智能技术股份有限公司 | 图像处理方法和装置以及目标检测模型的训练方法和系统 |
CN113435545A (zh) * | 2021-08-14 | 2021-09-24 | 北京达佳互联信息技术有限公司 | 图像处理模型的训练方法及装置 |
CN113705596A (zh) * | 2021-03-04 | 2021-11-26 | 腾讯科技(北京)有限公司 | 图像识别方法、装置、计算机设备和存储介质 |
CN114299363A (zh) * | 2021-12-29 | 2022-04-08 | 京东方科技集团股份有限公司 | 图像处理模型的训练方法、图像分类方法及装置 |
-
2021
- 2021-12-29 CN CN202111640853.8A patent/CN114299363A/zh active Pending
-
2022
- 2022-09-20 WO PCT/CN2022/120011 patent/WO2023124278A1/zh active Application Filing
- 2022-09-20 US US18/565,030 patent/US20240203097A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210166073A1 (en) * | 2019-12-03 | 2021-06-03 | Ping An Technology (Shenzhen) Co., Ltd. | Image generation method and computing device |
CN111401307A (zh) * | 2020-04-08 | 2020-07-10 | 中国人民解放军海军航空大学 | 基于深度度量学习的卫星遥感图像目标关联方法和装置 |
CN113705596A (zh) * | 2021-03-04 | 2021-11-26 | 腾讯科技(北京)有限公司 | 图像识别方法、装置、计算机设备和存储介质 |
CN113111960A (zh) * | 2021-04-25 | 2021-07-13 | 北京文安智能技术股份有限公司 | 图像处理方法和装置以及目标检测模型的训练方法和系统 |
CN113435545A (zh) * | 2021-08-14 | 2021-09-24 | 北京达佳互联信息技术有限公司 | 图像处理模型的训练方法及装置 |
CN114299363A (zh) * | 2021-12-29 | 2022-04-08 | 京东方科技集团股份有限公司 | 图像处理模型的训练方法、图像分类方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN114299363A (zh) | 2022-04-08 |
US20240203097A1 (en) | 2024-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023124278A1 (zh) | 图像处理模型的训练方法、图像分类方法及装置 | |
WO2020177432A1 (zh) | 基于目标检测网络的多标签物体检测方法、系统、装置 | |
WO2019100724A1 (zh) | 训练多标签分类模型的方法和装置 | |
Xia et al. | Loop closure detection for visual SLAM using PCANet features | |
US10169683B2 (en) | Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium | |
CN111310662B (zh) | 一种基于集成深度网络的火焰检测识别方法及系统 | |
WO2022022695A1 (zh) | 图像识别方法和装置 | |
CN110765882B (zh) | 一种视频标签确定方法、装置、服务器及存储介质 | |
CN111310800B (zh) | 图像分类模型生成方法、装置、计算机设备和存储介质 | |
CN106557728B (zh) | 查询图像处理和图像检索方法和装置以及监视系统 | |
JP6997369B2 (ja) | プログラム、測距方法、及び測距装置 | |
CN110046574A (zh) | 基于深度学习的安全帽佩戴识别方法及设备 | |
CN110458078A (zh) | 一种人脸图像数据聚类方法、系统及设备 | |
CN110765976B (zh) | 人脸特征点的生成方法、数据网络的训练方法及相关装置 | |
WO2023123923A1 (zh) | 人体重识别方法、人体重识别装置、计算机设备及介质 | |
WO2022127814A1 (zh) | 一种图像的显著性物体检测方法、装置、设备及存储介质 | |
CN112418327A (zh) | 图像分类模型的训练方法、装置、电子设备以及存储介质 | |
CN111753119A (zh) | 一种图像搜索方法、装置、电子设备及存储介质 | |
CN112200056A (zh) | 人脸活体检测方法、装置、电子设备及存储介质 | |
CN111723600A (zh) | 一种基于多任务学习的行人重识别特征描述子 | |
CN112991280A (zh) | 视觉检测方法、系统及电子设备 | |
CN114333062B (zh) | 基于异构双网络和特征一致性的行人重识别模型训练方法 | |
Gurrala et al. | A new segmentation method for plant disease diagnosis | |
CN114445691A (zh) | 模型训练方法、装置、电子设备及存储介质 | |
CN113963295A (zh) | 视频片段中地标识别方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22913543 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18565030 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |