WO2021143267A1 - 基于图像检测的细粒度分类模型处理方法、及其相关设备 - Google Patents
基于图像检测的细粒度分类模型处理方法、及其相关设备 Download PDFInfo
- Publication number
- WO2021143267A1 WO2021143267A1 PCT/CN2020/124434 CN2020124434W WO2021143267A1 WO 2021143267 A1 WO2021143267 A1 WO 2021143267A1 CN 2020124434 W CN2020124434 W CN 2020124434W WO 2021143267 A1 WO2021143267 A1 WO 2021143267A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fine
- image
- model
- training
- grained
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for processing a fine-grained classification model based on image detection.
- fine-grained image classification is a hot topic in computer vision.
- the goal of fine-grained image classification is to retrieve and identify images of different sub-categories under a broad category, involving image detection in artificial intelligence.
- the inventor realizes that in the traditional fine-grained image classification technology, in order to improve the accuracy of classification, it is usually necessary to prepare a large-scale image data set.
- the images in the image data set are manually labeled before training and application can be carried out, which is time-consuming and laborious , Resulting in lower processing efficiency of fine-grained image classification.
- the purpose of the embodiments of the present application is to propose a fine-grained classification model processing method, device, computer equipment, and storage medium based on image detection, so as to solve the problem of low efficiency of fine-grained image classification processing.
- the embodiments of the present application provide a fine-grained classification model processing method based on image detection, which adopts the following technical solutions:
- an embodiment of the present application also provides a fine-grained classification model processing device based on image detection, which adopts the following technical solutions:
- the data set building module is used to build an image data set through a search engine based on the received keywords
- a data set grouping module for randomly grouping the image data set into several training sets
- a data set input module configured to input the several sets of training sets into the fine-grained classification initial model to obtain the attention weight vector of each image in the several sets of training sets;
- An instance generation module configured to pool the attention weight vector to generate training instances corresponding to the several groups of training sets
- the loss calculation module is used to input the obtained training examples into the classifier of the fine-grained classification initial model to calculate the model loss;
- the parameter adjustment module is configured to adjust the model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model.
- an embodiment of the present application further provides a computer device, including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the following steps when executing the computer-readable instructions:
- embodiments of the present application further provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:
- the embodiments of the present application mainly have the following beneficial effects: the image data set can be constructed directly through the search engine according to the keywords, the image data set can be quickly expanded through the Internet, and the speed of establishing the image data set is improved; Independent of each other, the image data are randomly grouped into several training sets, which reduces the negative impact of images that do not meet the label; input several training sets into the fine-grained classification initial model, and the fine-grained classification initial model is integrated with the attention mechanism to calculate the input
- the attention weighting vector of the image to enhance the image area related to the keyword in the image, so that the model can focus on the image area related to the classification; according to the attention weighting vector to generate training examples, the training examples include the characteristics of each image in the corresponding training set ;
- the model parameters are adjusted according to the model loss to obtain a fine-grained classification model that can be accurately classified, which quickly and accurately realizes the processing of fine-grained image classification.
- Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
- FIG. 2 is a flowchart of an embodiment of a method for processing a fine-grained classification model based on image detection according to the present application
- FIG. 3 is a schematic structural diagram of an embodiment of a fine-grained classification model processing device based on image detection according to the present application
- Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
- the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
- the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
- the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
- the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
- Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, may be installed on the terminal devices 101, 102, and 103.
- the terminal devices 101, 102, and 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
- MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
- MP4 Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4
- laptop portable computers and desktop computers etc.
- the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
- the fine-grained classification model processing method based on image detection provided by the embodiments of the present application is generally executed by a server, and accordingly, the fine-grained classification model processing device based on image detection is generally set in the server.
- terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
- the fine-grained classification model processing method based on image detection includes the following steps:
- step S201 based on the received keywords, an image data set is constructed through a search engine.
- the electronic device (such as the server shown in FIG. 1) on which the fine-grained classification model processing method based on image detection runs can communicate with the terminal through a wired connection or a wireless connection.
- the above-mentioned wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .
- the keyword may be a word, word or phrase that instructs the server to search for an image; the keyword may be the name of a subcategory in fine-grained image classification.
- the image data set may be a collection of images acquired based on keywords.
- the fine-grained image classification requires the subject, that is, the keyword, the name of the sub-category in the fine-grained image classification task can be used as the keyword, and the keyword can be manually input and sent to the server.
- the server After the server receives the keywords, it searches for pictures in the search engine according to the keywords, and constructs an image data set according to the search results.
- the image data set may include positive samples and negative samples, where the positive samples are related to keywords, and the negative samples are not related to keywords.
- building an image data set through a search engine includes: receiving keywords sent by the terminal; sending the keywords to the search engine to instruct the search engine to search for images from the Internet according to the keywords ; Build an image data set based on the searched images.
- the user can control the processing of the fine-grained classification initial model at the terminal.
- the user inputs keywords at the terminal, and the terminal sends the keywords to the server.
- the server calls the interface of the search engine, and sends the keywords to the search engine, so as to search for images from the Internet through the search engine.
- the server can directly search for keywords in the search engine, use the searched image as a positive sample, and construct an image data set based on the positive sample.
- the server can also randomly search for images in the search engine to obtain negative samples, and merge the positive and negative samples to obtain an image data set.
- the negative samples will be used as noise interference during training to prevent the model from overfitting.
- the positive sample is taken as an example in the interpretation of this application. After the negative sample is input into the model, it has the same data processing process as the positive sample and is processed synchronously with the positive sample.
- Black swan is a subcategory of swan.
- Black swan can be used as a keyword, and the server searches for black swan related images in the search engine as a positive sample.
- the positive samples are not necessarily all black swan images, but there can also be white swan images, swan paintings, etc., but the positive samples are all from the search results of keywords.
- Negative samples have nothing to do with fine-grained image classification. For example, negative samples can be images of cars, landscape paintings, and so on.
- searching from the Internet through a search engine can quickly obtain a large number of images, which greatly improves the construction speed of the image data set.
- Step S202 Randomly group the image data set into several training sets.
- the server randomly groups the image data sets to obtain several training sets. Assuming that the probability that the image in the image data set does not match the keyword is ⁇ , because each image is independent of each other, the probability p that the training set label is correct is:
- K is the number of images in the training set, and K is a positive integer. It is easy to know that as K increases, the probability that the training set label is correct will increase rapidly.
- Step S203 Input several groups of training sets into the fine-grained classification initial model, and obtain the attention weight vectors of each image in several groups of training sets.
- the fine-grained classification initial model may be a fine-grained classification model that has not been trained yet.
- the attention weighting vector may be a vector representation output after processing each image, which has been weighted by the attention mechanism.
- the server inputs several sets of training sets to the convolutional layer of the fine-grained classification initial model.
- the convolutional layer performs convolution processing on each image in each set of training sets, and combines the attention mechanism to calculate the vector in the convolutional layer. Perform attention weighting to obtain the attention weight vector of each image.
- the vectors in the convolutional layer are used for fine-grained image classification.
- the attention mechanism aims to polarize the vectors in the convolutional layer.
- the vectors related to keywords are strengthened by the attention mechanism, and the vectors that are not related to keywords It is weakened by the attention mechanism, so that the fine-grained image classification initial model can learn better according to the strengthened vector, thereby improving the accuracy of classification.
- the attention detector can be set in the initial model of fine-grained image classification, and the attention mechanism is realized by the attention detector.
- Step S204 Pooling the attention weight vector to generate several groups of training examples corresponding to the training set.
- the training example is the fusion of the images in the training set, combining the attention weight vectors of the images in the training set.
- a pooling layer can be set in the fine-grained image classification initial model, and the pooling layer performs global average pooling on the attention weight vector to generate training examples of the training set respectively.
- the training example combines the image features of each image in the training set for further fine-grained image classification.
- the formula for global average pooling is:
- h n is the training example
- d is the scale of the feature map in the model
- k is the k-th picture in the training set.
- Step S205 Input the obtained training example into the classifier of the fine-grained classification initial model to calculate the model loss.
- the server inputs the training instance into the classifier of the fine-grained classification initial model, and the classifier classifies the training instance according to the training instance, and outputs the classification result.
- the server can use keywords as tags, and calculate the model loss based on the classification results and tags.
- Step S206 Adjust the model parameters of the fine-grained classification initial model according to the model loss to obtain the fine-grained classification model.
- the server adjusts the model parameters of the fine-grained classification initial model with the goal of reducing the model loss, and continues training after each adjustment of the model parameters.
- the training stop condition may be that the model loss is less than a preset loss threshold.
- the adjusted model parameters include the parameters in the convolutional layer, the attention detector, and the classifier.
- the attention detector can effectively identify image regions in the image that are not related to keywords, and can suppress or weaken the attention weighting vectors of these image regions, and at the same time strengthen the attention of image regions related to keywords Weighted vector.
- model parameters after training can also be stored in a node of a blockchain.
- the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
- the image data set is constructed directly through the search engine according to the keywords, and the image data set can be quickly expanded through the Internet, which improves the speed of establishing the image data set; because the images are independent of each other, the image data sets are randomly grouped into several The group training set reduces the negative impact of images that do not meet the label; several groups of training sets are input to the fine-grained classification initial model, and the fine-grained classification initial model is integrated with the attention mechanism to calculate the attention weight vector of the input image to enhance the image Keyword-related image regions make the model focus on the image regions related to the classification; training examples are generated according to the attention-weighted vector, and the training examples include the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain the model loss, The model parameters are adjusted according to the model loss, and a fine-grained classification model that can be accurately classified is obtained, and the processing of fine-grained image classification is realized quickly and accurately.
- step S203 may include: respectively inputting each image in the several sets of training sets into the convolutional layer of the fine-grained classification initial model to obtain the convolution feature vector of each image region in each image; calculating the convolution by the attention detector Regularized attention scores of feature vectors; among them, the regularized attention scores are used to characterize the degree of association between the image area and keywords; the regularized attention scores are multiplied by the convolution feature vector to obtain the attention weight of each image vector.
- the convolution feature vector may be a vector representation outputted after the convolution layer performs convolution processing on the image area in each image.
- the server inputs each image in several sets of training sets into the convolution layer of the fine-grained image classification initial model, and the convolution layer outputs the convolution feature vector of each image region in each image after convolution processing.
- the image area may be based on pixel points, or multiple pixel points, for example, 2*2 pixels, 3*3 pixels as units.
- the server summarizes the convolution feature vector and inputs it to the attention detector, and the attention detector calculates the regularized attention score of the convolution feature vector according to the weight and bias.
- the regularized attention score can represent the degree of association between the image area corresponding to the convolutional feature vector and the keyword. The higher the degree of association, the larger the regularized attention score.
- the server For each image, the server respectively multiplies the convolution feature vector with the corresponding regularized attention score to obtain the attention weight vector.
- the steps of inputting the images in the training sets into the convolutional layer of the fine-grained classification initial model, and obtaining the convolution feature vectors of the image regions in each image include: inputting the training sets into the fine-grained classification.
- the convolutional layer of the initial model of granularity classification obtain the convolution feature map output by the last convolution layer of the convolution layer; set the vector corresponding to each image area in the convolution feature map as the convolution feature vector.
- the convolution feature map may be a vector matrix, and each sub-matrix of the convolution feature map corresponds to each image region in the image.
- the convolutional layer may be composed of multiple sub-layers, and perform multi-layer convolution processing on the input training set.
- the last convolutional layer is the last convolutional layer in the convolutional layer.
- the server obtains the convolution feature map output by the last convolutional layer.
- the sub-matrix at each position in the convolutional feature map corresponds to each image area in the image.
- the vector corresponding to each image area in the convolution feature map is used as the convolution feature vector.
- the training set is input to the convolutional layer, and the convolutional feature map output by the last convolutional layer is obtained.
- the vector in the convolutional feature map corresponds to each image area in the image, and the corresponding relationship can be accurately extracted To the convolution feature vector.
- w ⁇ R c and b ⁇ R respectively represent the weight and bias of the attention detector, which are the key factors for the attention detector to strengthen or weaken the image area, which can be obtained by adjusting the model parameters.
- the attention detector gets the attention score, it can perform regularization operations on the attention score, compress the attention score to the [0,1] interval, and get the regularized attention score
- ⁇ is a constant, which can be an empirical value, used to regularize the attention score
- the distribution is more reasonable, if there is no ⁇ and Very small, may lead to very small Corresponds to a very large If ⁇ is set reasonably, a very small would make Where d is the scale of the feature map in the model.
- the convolution feature vector and the regularized attention score corresponding to the convolution feature vector are multiplied element by element to obtain the vector representation weighted by the regularized attention score Attention weight vector
- ⁇ means multiply element by element.
- the image in the training set is input into the convolutional layer to obtain the convolution feature vector of each image region in the image
- the attention mechanism is introduced through the attention detector
- the convolution feature vector is calculated to obtain the regularized attention score.
- the regularized attention score can be used as the weight of the convolution feature vector, and the attention weight vector is obtained after the corresponding multiplication.
- the attention weight vector has completed the enhancement or suppression of the image area, so that the fine-grained classification initial model can be targeted for learning.
- step S205 may include: inputting the obtained training examples into the classifier to calculate the classifier loss; calculating the regularization factor according to the convolution feature vector; performing linear operations on the classifier loss and the regularization factor to obtain the model loss.
- the classifier loss may be the loss calculated by the classifier;
- the model loss may be the total loss calculated by the fine-grained classification initial model;
- the regularization factor may be a factor for regularizing the classifier loss.
- the server inputs the training examples into the classifier of the fine-grained classification initial model, the classifier classifies according to the training examples, outputs the classification result, and calculates the classifier loss according to the classification result.
- the attention mechanism in this application aims to make the regularized attention scores of one or several image regions in the images that match the keywords in the training set have a higher value; for those that do not match the keywords or are classified with fine-grained images For irrelevant images, the regularized attention score of each image area should be close and low.
- this application also sets a separate regularization factor.
- the negative samples in this application are used as noise interference, which can also realize the regularization of attention calculation.
- the regularization factor is calculated based on the convolution feature vector. After the server obtains the regularization factor, it linearly adds the classifier loss and the regularization factor to obtain the model loss at the model level.
- the training example is input to the classifier to calculate the classifier loss, and then the regularization factor is calculated according to the convolution feature vector to further enhance or suppress the image.
- the regularization factor is calculated according to the convolution feature vector to further enhance or suppress the image.
- Model loss Based on the linear operation of the classifier loss and the regularization factor, we obtain Model loss, so that the fine-grained classification initial model can adjust the model parameters more reasonably according to the model loss.
- the above step of inputting the obtained training examples into the classifier to calculate the classifier loss includes: inputting the obtained training examples into the classifier to obtain the fine-grained categories of each image in the training examples; setting the keywords as the instance labels; The instance label and the fine-grained category of each image in the training instance are used to calculate the classifier loss of the training instance.
- the fine-grained category may be the classification result output by the classifier.
- the server inputs the training examples into the classifier of the fine-grained classification initial model, and the classifier classifies according to the training examples, and outputs multiple fine-grained categories.
- the number of fine-grained categories is equal to the number of images in the training set.
- Keywords can be used as instance labels, and the server calculates the classifier loss on the training instance as a whole according to the output fine-grained categories and instance labels.
- the classifier loss is cross-entropy loss
- the calculation formula is as follows:
- fine-grained category F n output the training examples
- y n is the instance number
- Second attention score Different from what is involved in the calculation of the regularized attention score in:
- the positive samples from the training set can also be from the negative samples in the training set; b is the bias of the attention detector.
- the attention mechanism aims to achieve when From the positive samples in the training set, the attention mechanism aims to achieve at least one image area, so that Combining the two cases, the regularization factor is as follows:
- ⁇ n ⁇ 1, -1 ⁇ , when the image is a positive sample, then take 1, otherwise take 0.
- ⁇ is the weight, used to adjust the relative importance of the classifier loss and the regularization factor
- R is the regularization factor in formula (8).
- the specific effects of the attention mechanism are as follows: if two images are from the training set, one is related to fine-grained image classification and related to keywords, the regularized attention score will be pushed up in the image area related to the keywords; For images that are not related to fine-grained image classification or not related to keywords, the regularized attention score averages to zero in each image region, and the classifier will not pay too much attention to these regions, that is, less learning or classification Consider the characteristics of these areas. Therefore, the attention mechanism in this application can filter out image regions that are not related to fine-grained image classification tasks or keywords in the images of the training set, and can also detect image regions in the image that are helpful for fine-grained image classification.
- the fine-grained categories are obtained after the training examples are input into the classifier, and then keywords are used as instance labels, and the training examples are used as a whole to calculate the classifier loss, which ensures that the classifier loss can take into account the information fused in the training examples .
- step S206 it may further include: obtaining the image to be classified; inputting the image to be classified into a fine-grained classification model to obtain the attention weight vector of the image to be classified; generating a test instance of the image to be classified based on the attention weight vector; Input the test instance into the classifier of the fine-grained classification model to obtain the fine-grained category of the image to be classified.
- the server obtains a fine-grained classification model after completing the training.
- the image to be classified is obtained, and the image to be classified can be sent by the terminal.
- the server inputs the image to be classified into the convolutional layer of the fine-grained classification model, and the output of the last convolutional layer of the convolutional layer is input to the attention detector to obtain the attention weight vector of each image region in the image to be classified.
- one image can be input at a time when testing an application, so there is no need for a pooling layer during application testing, and a test instance of the image to be classified can be obtained according to the attention weight vector.
- the image area related to fine-grained image classification has been strengthened, and the image area unrelated to the fine-grained image classification is suppressed.
- the test case is input to the classifier, and the classifier processes according to the test case, and outputs the fine-grained image to be classified. Granularity category.
- the image to be classified is input into the fine-grained classification model during the application test to obtain a test example.
- the test example strengthens the image area related to the fine-grained image classification and suppresses the image area irrelevant to the fine-grained image classification task. This enables the classifier to accurately output fine-grained categories.
- the processing of the fine-grained classification model is explained through a specific application scenario. Taking the recognition of swan species as an example, the swan is a major category, and the black swan and white swan in the swan are sub-categories to identify black swan and white swan.
- the model is the fine-grained classification model.
- a large number of images are obtained from the Internet according to the "black swan" to obtain an image data set.
- the image data set is randomly grouped into several training sets, and "black swan" is the label of each training set.
- Each image in the training set is input to the convolutional layer of the fine-grained classification initial model to obtain the convolution feature vector, and the convolution feature vector is input to the attention detector to obtain the attention weight vector, and the attention weight vector is pooled to obtain the training example.
- the training example integrates the characteristics of each image in the training set. Images related to the black swan in the image are enhanced by the attention detector, and images that do not match the black swan (such as the image of the white swan) are suppressed by the attention detector.
- the attention detector filters the information in the image so that the model can focus on learning.
- the classifier classifies and calculates the model loss according to the training examples.
- the fine-grained classification model adjusts the model parameters according to the model loss to strengthen the attention detector and the classifier. After the training is completed, the fine-grained classification model can be obtained.
- the fine-grained classification initial model can learn the characteristics of the black swan and the white swan during training.
- images of other sub-categories can also be collected for supplementary training. For example, you can collect images of white swan for supplementary training.
- the fine-grained classification model When the fine-grained classification model is in use, input an image to be classified into the model.
- the fine-grained classification model calculates the attention weight vector of the image to be classified and generates a test instance.
- the test instance weights the image to be classified. The areas where the granularity classification is useful are enhanced.
- the classifier can accurately identify whether the image is a black swan or a white swan according to the test case, and realize fine-grained image classification.
- the fine-grained classification model processing method based on image detection in this application involves neural networks, machine learning, and computer vision in the field of artificial intelligence.
- the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium.
- the computer-readable instructions When executed, they may include the processes of the above-mentioned method embodiments.
- the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
- this application provides an embodiment of a device for processing a fine-grained classification model based on image detection, which is similar to the method embodiment shown in FIG. 2
- the device can be specifically applied to various electronic devices.
- the apparatus 300 for processing fine-grained classification models based on image detection in this embodiment includes: a data set construction module 301, a data set grouping module 302, a data set input module 303, an instance generation module 304, and loss calculation The module 305 and the parameter adjustment module 306, wherein:
- the data set construction module 301 is used to construct an image data set through a search engine based on the received keywords.
- the data set grouping module 302 is used to randomly group the image data set into several training sets.
- the data set input module 303 is used to input several sets of training sets into the fine-grained classification initial model to obtain the attention weight vectors of each image in the several sets of training sets.
- the instance generation module 304 is used to pool the attention weight vector to generate several groups of training instances corresponding to the training set.
- the loss calculation module 305 is used to input the obtained training examples into the classifier of the fine-grained classification initial model to calculate the model loss.
- the parameter adjustment module 306 is configured to adjust the model parameters of the fine-grained classification initial model according to the model loss to obtain the fine-grained classification model.
- the image data set is constructed directly through the search engine according to the keywords, and the image data set can be quickly expanded through the Internet, which improves the speed of establishing the image data set; because the images are independent of each other, the image data sets are randomly grouped into several The group training set reduces the negative impact of images that do not meet the label; several groups of training sets are input to the fine-grained classification initial model, and the fine-grained classification initial model is integrated with the attention mechanism to calculate the attention weight vector of the input image to enhance the image Keyword-related image regions make the model focus on the image regions related to the classification; training examples are generated according to the attention-weighted vector, and the training examples include the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain the model loss, The model parameters are adjusted according to the model loss, and a fine-grained classification model that can be accurately classified is obtained, and the processing of fine-grained image classification is realized quickly and accurately.
- the aforementioned data set construction module 301 includes: a receiving submodule, a search submodule, and a construction submodule, where:
- the receiving sub-module is used to receive keywords sent by the terminal.
- the search sub-module is used to send keywords to the search engine to instruct the search engine to search for images from the Internet according to the keywords.
- the construction sub-module is used to construct an image data set based on the searched images.
- searching from the Internet through a search engine can quickly obtain a large number of images, which greatly improves the construction speed of the image data set.
- the aforementioned data set input module 303 includes: a data set input submodule, a score calculation submodule, and a multiplication submodule, wherein:
- the data set input sub-module is used to input each image in the training set into the convolution layer of the fine-grained classification initial model to obtain the convolution feature vector of each image region in each image.
- the score calculation sub-module is used to calculate the regularized attention score of the convolution feature vector through the attention detector; among them, the regularized attention score is used to characterize the degree of association between the image area and the keyword.
- the multiplication sub-module is used to multiply the regularized attention score and the convolution feature vector to obtain the attention weight vector of each image.
- the image in the training set is input into the convolutional layer to obtain the convolution feature vector of each image region in the image
- the attention mechanism is introduced through the attention detector
- the convolution feature vector is calculated to obtain the regularized attention score.
- the regularized attention score can be used as the weight of the convolution feature vector, and the attention weight vector is obtained after the corresponding multiplication.
- the attention weight vector has completed the enhancement or suppression of the image area, so that the fine-grained classification initial model can be targeted for learning.
- the aforementioned data set input submodule includes:
- the training set input unit is used to input several sets of training sets into the convolutional layer of the fine-grained classification initial model.
- the output obtaining unit is used to obtain the convolution feature map output by the last convolution layer of the convolution layer.
- the vector setting unit is used to set the vector corresponding to each image area in the convolution feature map as the convolution feature vector.
- the training set is input to the convolutional layer, and the convolutional feature map output by the last convolutional layer is obtained.
- the vector in the convolutional feature map corresponds to each image area in the image, and the corresponding relationship can be accurately extracted To the convolution feature vector.
- the above-mentioned loss calculation module includes: a loss calculation sub-module, a factor calculation sub-module, and a linear operation sub-module, wherein:
- the loss calculation sub-module is used to input the obtained training examples into the classifier to calculate the classifier loss.
- the factor calculation sub-module is used to calculate the regularization factor according to the convolution feature vector.
- the linear operation sub-module is used to perform linear operations on the classifier loss and the regularization factor to obtain the model loss.
- the training example is input to the classifier to calculate the classifier loss, and then the regularization factor is calculated according to the convolution feature vector to further enhance or suppress the image.
- the regularization factor is calculated according to the convolution feature vector to further enhance or suppress the image.
- Model loss Based on the linear operation of the classifier loss and the regularization factor, we obtain Model loss, so that the fine-grained classification initial model can adjust the model parameters more reasonably according to the model loss.
- the aforementioned loss calculation submodule includes: an instance input unit, a label setting unit, and a loss calculation unit, where:
- the instance input unit is used to input the obtained training instance into the classifier to obtain the fine-grained category of each image in the training instance.
- the label setting unit is used to set keywords as instance labels.
- the loss calculation unit is used to calculate the classifier loss of the training instance according to the instance label and the fine-grained category of each image in the training instance.
- the fine-grained categories are obtained after the training examples are input into the classifier, and then keywords are used as instance labels, and the training examples are used as a whole to calculate the classifier loss, which ensures that the classifier loss can take into account the information fused in the training examples .
- the above-mentioned fine-grained classification model processing device 300 based on image detection further includes: a to-be-classified acquisition module, a to-be-classified input module, a test generation module, and a test input module, wherein:
- the acquisition module to be classified is used to acquire the image to be classified.
- the input module to be classified is used to input the image to be classified into the fine-grained classification model to obtain the attention weight vector of the image to be classified.
- the test generation module is used to generate a test instance of the image to be classified based on the attention weight vector.
- the test input module is used to input the test instance into the classifier of the fine-grained classification model to obtain the fine-grained category of the image to be classified.
- the image to be classified is input into the fine-grained classification model during the application test to obtain a test example.
- the test example strengthens the image area related to the fine-grained image classification and suppresses the image area irrelevant to the fine-grained image classification task. This enables the classifier to accurately output fine-grained categories.
- FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
- the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
- Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
- ASIC Application Specific Integrated Circuit
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- DSP Digital Processor
- the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
- the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
- the memory 41 includes at least one type of computer-readable storage medium.
- the computer-readable storage medium may be non-volatile or volatile.
- the computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
- the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
- the memory 41 may also be an external storage device of the computer device 4, for example, a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc.
- the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
- the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions of a fine-grained classification model processing method based on image detection.
- the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
- the processor 42 is generally used to control the overall operation of the computer device 4.
- the processor 42 is configured to run computer-readable instructions or processed data stored in the memory 41, for example, run the computer-readable instructions of the fine-grained classification model processing method based on image detection.
- the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
- the computer device provided in this embodiment can execute the steps of the above-mentioned fine-grained classification model processing method based on image detection.
- the steps of the fine-grained classification model processing method based on image detection may be the steps in the fine-grained classification model processing method based on image detection in each of the foregoing embodiments.
- the image data set is constructed directly through the search engine according to the keywords, and the image data set can be quickly expanded through the Internet, which improves the speed of establishing the image data set; because the images are independent of each other, the image data sets are randomly grouped into several The group training set reduces the negative impact of images that do not meet the label; several groups of training sets are input to the fine-grained classification initial model, and the fine-grained classification initial model is integrated with the attention mechanism to calculate the attention weight vector of the input image to enhance the image Keyword-related image regions make the model focus on the image regions related to the classification; training examples are generated according to the attention-weighted vector, and the training examples include the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain the model loss, The model parameters are adjusted according to the model loss, and a fine-grained classification model that can be accurately classified is obtained, and the processing of fine-grained image classification is realized quickly and accurately.
- the present application also provides another implementation manner, that is, a computer-readable storage medium is provided with computer-readable instructions stored thereon, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned fine-grained classification model processing method based on image detection.
- the image data set is constructed directly through the search engine according to the keywords, and the image data set can be quickly expanded through the Internet, which improves the speed of establishing the image data set; because the images are independent of each other, the image data sets are randomly grouped into several The group training set reduces the negative impact of images that do not meet the label; several groups of training sets are input to the fine-grained classification initial model, and the fine-grained classification initial model is integrated with the attention mechanism to calculate the attention weight vector of the input image to enhance the image Keyword-related image regions make the model focus on the image regions related to the classification; training examples are generated according to the attention-weighted vector, and the training examples include the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain the model loss, The model parameters are adjusted according to the model loss, and a fine-grained classification model that can be accurately classified is obtained, and the processing of fine-grained image classification is realized quickly and accurately.
- the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
- a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
一种基于图像检测的细粒度分类模型处理方法,属于人工智能领域;包括接收关键词,通过搜索引擎构建图像数据集;将图像数据集随机分组为若干组训练集;将若干组训练集输入细粒度分类初始模型,得到若干组训练集中各图像的注意力加权向量;对注意力加权向量进行池化,分别生成若干组训练集所对应的训练实例;将训练实例输入细粒度分类初始模型的分类器,以计算模型损失;根据模型损失调整模型参数,得到细粒度分类模型。还提供一种基于图像检测的细粒度分类模型处理装置、计算机设备及存储介质。此外,还涉及区块链技术,训练完毕的模型参数可存储于区块链中。可以快速而准确地实现细粒度图像分类的处理。
Description
本申请要求于2020年09月07日提交中国专利局、申请号为202010930234.1,发明名称为“基于图像检测的细粒度分类模型处理方法、及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能技术领域,尤其涉及一种基于图像检测的细粒度分类模型处理方法、装置、计算机设备及存储介质。
随着计算机技术的发展,计算机视觉的研究与应用也越来越广泛,其中,细粒度图像分类就是计算机视觉中的热门话题。细粒度图像分类的目标是检索和识别出一个大类下不同子类的图像,涉及人工智能中的图像检测。
发明人意识到,传统的细粒度图像分类技术中,为了提升分类的准确度,通常需要准备大规模的图像数据集,由人工对图像数据集中的图像进行标注后才能进行训练与应用,费时费力,导致细粒度图像分类的处理效率较低。
发明内容
本申请实施例的目的在于提出一种基于图像检测的细粒度分类模型处理方法、装置、计算机设备及存储介质,以解决细粒度图像分类处理效率较低的问题。
为了解决上述技术问题,本申请实施例提供一种基于图像检测的细粒度分类模型处理方法,采用了如下所述的技术方案:
基于接收到的关键词,通过搜索引擎构建图像数据集;
将所述图像数据集随机分组为若干组训练集;
将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;
对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;
将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;
根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
为了解决上述技术问题,本申请实施例还提供一种基于图像检测的细粒度分类模型处理装置,采用了如下所述的技术方案:
数据集构建模块,用于基于接收到的关键词,通过搜索引擎构建图像数据集;
数据集分组模块,用于将所述图像数据集随机分组为若干组训练集;
数据集输入模块,用于将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;
实例生成模块,用于对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;
损失计算模块,用于将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;
参数调整模块,用于根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
基于接收到的关键词,通过搜索引擎构建图像数据集;
将所述图像数据集随机分组为若干组训练集;
将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;
对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;
将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;
根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:
基于接收到的关键词,通过搜索引擎构建图像数据集;
将所述图像数据集随机分组为若干组训练集;
将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;
对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;
将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;
根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
与现有技术相比,本申请实施例主要有以下有益效果:根据关键词直接通过搜索引擎构建图像数据集,可以通过互联网快速地扩充图像数据集,提高了建立图像数据集的速度;因图像互相独立,将图像数据集中随机进行分组为若干组训练集,降低了不符合标签的图像的负面影响;将若干组训练集输入细粒度分类初始模型,细粒度分类初始模型融合注意力机制计算输入图像的注意力加权向量,以增强图像中与关键词相关的图像区域,使模型专注于对分类有关的图像区域;依据注意力加权向量生成训练实例,训练实例包含了对应训练集中各图像的特征;将训练实例输入分类器得到模型损失后,根据模型损失调整模型参数,得到可以准确分类的细粒度分类模型,快速而准确地实现了细粒度图像分类的处理。
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的基于图像检测的细粒度分类模型处理方法的一个实施例的流程图;
图3是根据本申请的基于图像检测的细粒度分类模型处理装置的一个实施例的结构示意图;
图4是根据本申请的计算机设备的一个实施例的结构示意图。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例 中的技术方案进行清楚、完整地描述。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。
需要说明的是,本申请实施例所提供的基于图像检测的细粒度分类模型处理方法一般由服务器执行,相应地,基于图像检测的细粒度分类模型处理装置一般设置于服务器中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的基于图像检测的细粒度分类模型处理方法的一个实施例的流程图。所述的基于图像检测的细粒度分类模型处理方法,包括以下步骤:
步骤S201,基于接收到的关键词,通过搜索引擎构建图像数据集。
在本实施例中,基于图像检测的细粒度分类模型处理方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式与终端进行通信。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。
其中,关键词可以是指示服务器搜索图像的字、词或者短语;关键词可以是细粒度图像分类中子类的名称。图像数据集可以是基于关键词获取到的图像的集合。
具体地,细粒度图像分类需要主题即关键词,细粒度图像分类任务中子类的名称可以作为关键词,关键词可以由人工输入并发送至服务器。服务器接收到关键词后,在搜索引擎中根据关键词进行图片搜索,并根据搜索结果构建图像数据集。
在一个实施例中,图像数据集可以包括正样本以及负样本,其中,正样本与关键词相关,负样本与关键词无关。
在一个实施例中,基于接收到的关键词,通过搜索引擎构建图像数据集包括:接收终端发送的关键词;将关键词发送至搜索引擎,以指示搜索引擎从互联网中根据关键词进行图像搜索;基于搜索到的图像构建图像数据集。
具体地,用户可以在终端控制细粒度分类初始模型的处理。用户在终端输入关键词,由终端将关键词发送给服务器。服务器调用搜索引擎的接口,将关键词发送至搜索引擎,从而通过搜索引擎从互联网中进行图像搜索。
服务器可以直接在搜索引擎中搜索关键词,将搜索到的图像作为正样本,基于正样本构建图像数据集。此外,服务器还可以在搜索引擎中随机搜索图像,得到负样本,将正样本和负样本进行合并,得到图像数据集,此时,负样本将作为训练中的噪声干扰,防止模型过拟合。在此声明,本申请解释时以正样本为例,负样本输入模型后具有与正样本相同的数据处理过程,并与正样本同步处理。
举例说明,假定天鹅由黑天鹅与白天鹅组成,黑天鹅是天鹅中的子类,“黑天鹅”可以作为关键词,由服务器在搜索引擎中搜索黑天鹅相关的图像作为正样本。需要指出的是, 正样本不一定全部是黑天鹅的图像,还可以存在白天鹅的图像,天鹅画等,但是正样本均来自关键词的搜索结果。负样本则与细粒度图像分类无关,例如,负样本可以是汽车的图像、风景画等。
本实施例中,接收到关键词后,通过搜索引擎从互联网中进行搜索,可以快速得到大量图像,大大提高了图像数据集的构建速度。
步骤S202,将图像数据集随机分组为若干组训练集。
具体地,若直接从图像数据集中取出一张图像,该图像有一定的概率与关键词不匹配;当从图像数据集中取出多张图像时,多张图像与关键词均不匹配的概率极小,只要多张图像中有一张图像与关键词相匹配,多张图像组成的整体就可以认为与关键词相匹配,关键词可以视作该整体的标签。
因此,服务器对图像数据集进行随机分组,得到若干组训练集。假设图像数据集中的图像与关键词不匹配的概率为ζ,因为各图像互相具备独立性,则训练集标签正确的概率p为:
p=1-ζ
K (1)
其中,K为训练集中图像的数量,K为正整数。易知,随着K的增大,训练集标签正确的概率将快速增大。
步骤S203,将若干组训练集输入细粒度分类初始模型,得到若干组训练集中各图像的注意力加权向量。
其中,细粒度分类初始模型可以是尚未完成训练的细粒度分类模型。注意力加权向量可以是对各图像进行处理后输出的向量表示,经过了注意力机制的加权处理。
具体地,服务器将若干组训练集输入到细粒度分类初始模型的卷积层,卷积层对各组训练集中的各图像进行卷积处理,并结合注意力机制,对卷积层中的向量进行注意力加权,得到各图像的注意力加权向量。
其中,卷积层中的向量用于细粒度图像分类,注意力机制旨在将卷积层中的向量进行两极分化,与关键词相关的向量被注意力机制进行强化,与关键词无关的向量被注意力机制进行弱化,以使细粒度图像分类初始模型根据被强化的向量更好地进行学习,从而提高分类的准确率。细粒度图像分类初始模型中可以设置注意力检测器,由注意力检测器实现注意力机制。
步骤S204,对注意力加权向量进行池化,分别生成若干组训练集所对应的训练实例。
其中,训练实例是对训练集中各图像的融合,合并了训练集中各图像的注意力加权向量。
具体地,细粒度图像分类初始模型中可以设置池化层,由池化层对注意力加权向量进行全局平均池化,从而分别生成训练集的训练实例。训练实例融合了训练集中各图像的图像特征,用于进一步的细粒度图像分类。
在一个实施例中,全局平均池化的公式为:
步骤S205,将得到的训练实例输入细粒度分类初始模型的分类器,以计算模型损失。
具体地,服务器将训练实例输入到细粒度分类初始模型的分类器中,分类器依据训练实例进行分类,输出分类结果。服务器可以将关键词作为标签,以分类结果和标签为基础,计算模型损失。
步骤S206,根据模型损失调整细粒度分类初始模型的模型参数,得到细粒度分类模型。
具体地,服务器以减小模型损失为目标调整细粒度分类初始模型的模型参数,每次调 整完模型参数后继续进行训练,当模型损失满足训练停止条件时,停止训练,得到细粒度分类模型。其中,训练停止条件可以是模型损失小于预设的损失阈值。
调整的模型参数包括卷积层、注意力检测器以及分类器中的参数。训练完毕后,注意力检测器可以有效地识别图像中与关键词无关的图像区域,并可以将这些图像区域的注意力加权向量进行抑制或弱化,同时强化与关键词相关的图像区域的注意力加权向量。
需要强调的是,为进一步保证上述模型参数的私密和安全性,训练完毕后的模型参数还可以存储于一区块链的节点中。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本实施例中,根据关键词直接通过搜索引擎构建图像数据集,可以通过互联网快速地扩充图像数据集,提高了建立图像数据集的速度;因图像互相独立,将图像数据集中随机进行分组为若干组训练集,降低了不符合标签的图像的负面影响;将若干组训练集输入细粒度分类初始模型,细粒度分类初始模型融合注意力机制计算输入图像的注意力加权向量,以增强图像中与关键词相关的图像区域,使模型专注于对分类有关的图像区域;依据注意力加权向量生成训练实例,训练实例包含了对应训练集中各图像的特征;将训练实例输入分类器得到模型损失后,根据模型损失调整模型参数,得到可以准确分类的细粒度分类模型,快速而准确地实现了细粒度图像分类的处理。
进一步的,上述步骤S203可以包括:分别将若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到各图像中各图像区域的卷积特征向量;通过注意力检测器计算卷积特征向量的正则化注意力分数;其中,正则化注意力分数用于表征图像区域与关键词的关联程度;将正则化注意力分数与卷积特征向量对应相乘,得到各图像的注意力加权向量。
其中,卷积特征向量可以是卷积层对各图像中的图像区域进行卷积处理后输出的向量表示。
具体地,服务器将若干组训练集中的各图像输入细粒度图像分类初始模型的卷积层,卷积层经过卷积处理后输出各图像中各图像区域的卷积特征向量。其中,图像区域可以是以像素点为单位,还可以是以多个像素点为单位,例如以2*2个像素点、3*3个像素点为单位。
对于每一个训练集,服务器汇总卷积特征向量后输入注意力检测器,由注意力检测器依据权重和偏置计算卷积特征向量的正则化注意力分数。
正则化注意力分数可以表征卷积特征向量所对应的图像区域与关键词的关联程度,关联程度越高,正则化注意力分数可以越大。对于每张图像,服务器分别将卷积特征向量与对应的正则化注意力分数相乘,得到注意力加权向量。
在一个实施例中,上述分别将若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到各图像中各图像区域的卷积特征向量的步骤包括:将若干组训练集输入细粒度分类初始模型的卷积层;获取卷积层的末层卷积层输出的卷积特征图;将卷积特征图中各图像区域所对应的向量设置为卷积特征向量。
其中,卷积特征图可以是一个向量矩阵,卷积特征图的各子矩阵对应于图像中的各图像区域。
具体地,卷积层可以由多个子层构成,对输入的训练集进行多层卷积处理。末层卷积层是卷积层中的最后一层卷积层,服务器获取末层卷积层输出的卷积特征图,卷积特征图中各位置的子矩阵与图像中的各图像区域相对应,将卷积特征图中各图像区域所对应的向量作为卷积特征向量。
本实施例中,将训练集输入卷积层,获取末层卷积层输出的卷积特征图,卷积特征图中的向量与图像中的各图像区域分别对应,依据对应关系可以准确地提取到卷积特征向量。
f(x)=ln(1+exp(x)) (4)
其中,w∈R
c、b∈R分别表示注意力检测器的权重和偏置,是注意力检测器对图像区域进行强化或减弱的关键因子,可以通过模型参数的调整得到。
本实施例中,将训练集中的图像输入卷积层得到图像中各图像区域的卷积特征向量,通过注意力检测器引入注意力机制,对卷积特征向量进行计算得到正则化注意力分数,正则化注意力分数可以作为卷积特征向量的权重,对应相乘后得到注意力加权向量,注意力加权向量已经完成对图像区域的加强或抑制,使得细粒度分类初始模型可以进行针对性学习。
进一步的,上述步骤S205可以包括:将得到的训练实例输入分类器以计算分类器损失;根据卷积特征向量计算正则化因子;对分类器损失和正则化因子进行线性运算,得到模型损失。
其中,分类器损失可以是分类器计算得到的损失;模型损失可以是细粒度分类初始模型计算得到的总损失;正则化因子可以是对分类器损失进行正则化的因子。
具体地,服务器将训练实例输入到细粒度分类初始模型的分类器中,分类器依据训练实例进行分类,输出分类结果,并根据分类结果计算分类器损失。
本申请中的注意力机制旨在使训练集中与关键词匹配的图像中,一个或若干个图像区域的正则化注意力分数具有较高的值;对于与关键词不匹配或者与细粒度图像分类无关的图像,各图像区域的正则化注意力分数应该接近且较低。为了在训练中实现上述目标,本申请除了分类器损失,还单独设置了正则化因子。本申请中的负样本作为噪声干扰,还可以实现注意力计算的正则化。
具体地,正则化因子依据卷积特征向量计算。服务器得到正则化因子后,将分类器损失与正则化因子进行线性相加,得到模型层面的模型损失。
本实施例中,将训练实例输入分类器以计算分类器损失,再根据卷积特征向量计算正则化因子以进一步对图像进行强化或抑制,基于对分类器损失和正则化因子进行线性运算,得到模型损失,使得细粒度分类初始模型可以根据模型损失更合理地调整模型参数。
进一步的,上述将得到的训练实例输入分类器以计算分类器损失的步骤包括:将得到的训练实例输入分类器,得到训练实例中各图像的细粒度类别;将关键词设置为实例标签; 根据实例标签和训练实例中各图像的细粒度类别,计算训练实例的分类器损失。
其中,细粒度类别可以是分类器输出的分类结果。
具体地,服务器将训练实例输入到细粒度分类初始模型的分类器中,分类器依据训练实例进行分类,输出多个细粒度类别,细粒度类别的个数等于训练集中图像的数量。
关键词可以作为实例标签,服务器根据输出的细粒度类别和实例标签,将训练实例作为一个整体计算分类器损失。
在一个实施例中,分类器损失为交叉熵损失,计算公式如下:
其中,F
n为训练实例中输出的细粒度类别,y
n为实例标签,L
class为分类器损失。
其中,
来自训练集中的正样本,也可以来自于训练集中的负样本;b为注意力检测器的偏置。当
来自训练集中的负样本,注意力机制旨在实现
当
来自训练集中的正样本,注意力机制旨在实现至少有一个图像区域,使得
将两种情况进行合并,则有正则化因子如下:
其中,δ
n={1,-1},当图像为正样本时,则取1,否则取0。
将正则化因子和分类器损失h
n进行线性运算,则有模型损失:
L=L
class+λR (9)
其中λ为权重,用于调整分类器损失和正则化因子的相对重要性;R为公式(8)中的正则化因子。
注意力机制的具体效果如下:若两张图像均来自训练集,一张与细粒度图像分类相关且与关键词相关,则正则化注意力分数在与关键词相关的图像区域被推高;对于与细粒度图像分类无关或者与关键词不相关的图像,正则化注意力分数在各图像区域均平均地趋于零,分类器不会在这些区域上过多关注,即学习或者分类时较少考虑这些区域的特征。因此,本申请中的注意力机制可以过滤掉训练集的图像中与细粒度图像分类任务无关或者与关键词不相关的图像区域,还可以检测图像中有助于细粒度图像分类的图像区域。
本实施例中,将训练实例输入分类器后得到细粒度类别,再以关键词作为实例标签,将训练实例作为整体计算分类器损失,保证了分类器损失可以考虑了训练实例中所融合的信息。
进一步的,上述步骤S206之后,还可以包括:获取待分类图像;将待分类图像输入细粒度分类模型,得到待分类图像的注意力加权向量;基于注意力加权向量生成待分类图像的测试实例;将测试实例输入细粒度分类模型的分类器,得到待分类图像的细粒度类别。
具体地,服务器完成训练后得到细粒度分类模型。在应用时,获取待分类图像,待分类图像可以由终端发送。服务器将待分类图像输入细粒度分类模型的卷积层,卷积层的末层卷积层的输出被输入至注意力检测器,得到待分类图像中各图像区域的注意力加权向量。
不同于训练时一次输入多张图像,测试应用时一次可以输入一张图像,因此应用测试时无需池化层,根据注意力加权向量即可得到待分类图像的测试实例。测试实例中,与细粒度图像分类相关的图像区域得到了加强,与细粒度图像分类无关的图像区域被抑制,测试实例被输入分类器,分类器根据测试实例进行处理,输出待分类图像的细粒度类别。
本实施例中,在应用测试时将待分类图像输入细粒度分类模型,得到测试实例,测试实例加强了与细粒度图像分类相关的图像区域,抑制了与细粒度图像分类任务无关的图像 区域,使得分类器能够准确地输出细粒度类别。
现通过一个具体的应用场景来对细粒度分类模型的处理进行说明,以天鹅种类的识别为例,天鹅为大类,天鹅中的黑天鹅和白天鹅则是子类,识别黑天鹅和白天鹅的模型即为细粒度分类模型。
在训练阶段,依据“黑天鹅”从互联网获取大量图像,得到图像数据集。将图像数据集随机分组为若干组训练集,“黑天鹅”为每组训练集标签。训练集中的各图像输入细粒度分类初始模型的卷积层得到卷积特征向量,卷积特征向量输入注意力检测器得到注意力加权向量,对注意力加权向量进行池化得到训练实例。训练实例融合了训练集中各图像的特征,图像中与黑天鹅相关的图像被注意力检测器进行了加强,不符合黑天鹅的图像(例如白天鹅的图像)被注意力检测器进行了抑制,即注意力检测器对图像中的信息进行了过滤,使得模型可以专注学习。分类器根据训练实例进行分类并计算模型损失,细粒度分类模型依据模型损失调整模型参数以强化注意力检测器和分类器,训练完成后即可得到细粒度分类模型。
细粒度分类初始模型在训练中可以学习到黑天鹅和白天鹅两种天鹅的特征。当细粒度图像分类任务的子类较多时,还可以再采集其他子类的图像进行补充训练。例如,可以再采集白天鹅的图像进行补充训练。
细粒度分类模型在使用时,向模型输入一张待分类图像,细粒度分类模型计算待分类图像的注意力加权向量并生成测试实例,测试实例对待分类图像进行了加权,待分类图像中对细粒度分类有用的区域被加强。测试实例输入分类器后,分类器可以依据测试实例准确识别出图像是黑天鹅还是白天鹅,实现细粒度图像分类。
本申请中基于图像检测的细粒度分类模型处理方法涉及人工智能领域中的神经网络、机器学习和计算机视觉。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种基于图像检测的细粒度分类模型处理装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图3所示,本实施例所述的基于图像检测的细粒度分类模型处理装置300包括:数据集构建模块301、数据集分组模块302、数据集输入模块303、实例生成模块304、损失计算模块305以及参数调整模块306,其中:
数据集构建模块301,用于基于接收到的关键词,通过搜索引擎构建图像数据集。
数据集分组模块302,用于将图像数据集随机分组为若干组训练集。
数据集输入模块303,用于将若干组训练集输入细粒度分类初始模型,得到若干组训练集中各图像的注意力加权向量。
实例生成模块304,用于对注意力加权向量进行池化,分别生成若干组训练集所对应的训练实例。
损失计算模块305,用于将得到的训练实例输入细粒度分类初始模型的分类器,以计 算模型损失。
参数调整模块306,用于根据模型损失调整细粒度分类初始模型的模型参数,得到细粒度分类模型。
本实施例中,根据关键词直接通过搜索引擎构建图像数据集,可以通过互联网快速地扩充图像数据集,提高了建立图像数据集的速度;因图像互相独立,将图像数据集中随机进行分组为若干组训练集,降低了不符合标签的图像的负面影响;将若干组训练集输入细粒度分类初始模型,细粒度分类初始模型融合注意力机制计算输入图像的注意力加权向量,以增强图像中与关键词相关的图像区域,使模型专注于对分类有关的图像区域;依据注意力加权向量生成训练实例,训练实例包含了对应训练集中各图像的特征;将训练实例输入分类器得到模型损失后,根据模型损失调整模型参数,得到可以准确分类的细粒度分类模型,快速而准确地实现了细粒度图像分类的处理。
在本实施例的一些可选的实现方式中,上述数据集构建模块301包括:接收子模块、搜索子模块以及构建子模块,其中:
接收子模块,用于接收终端发送的关键词。
搜索子模块,用于将关键词发送至搜索引擎,以指示搜索引擎从互联网中根据关键词进行图像搜索。
构建子模块,用于基于搜索到的图像构建图像数据集。
本实施例中,接收到关键词后,通过搜索引擎从互联网中进行搜索,可以快速得到大量图像,大大提高了图像数据集的构建速度。
在本实施例的一些可选的实现方式中,上述数据集输入模块303包括:数据集输入子模块、分数计算子模块以及相乘子模块,其中:
数据集输入子模块,用于分别将若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到各图像中各图像区域的卷积特征向量。
分数计算子模块,用于通过注意力检测器计算卷积特征向量的正则化注意力分数;其中,正则化注意力分数用于表征图像区域与关键词的关联程度。
相乘子模块,用于将正则化注意力分数与卷积特征向量对应相乘,得到各图像的注意力加权向量。
本实施例中,将训练集中的图像输入卷积层得到图像中各图像区域的卷积特征向量,通过注意力检测器引入注意力机制,对卷积特征向量进行计算得到正则化注意力分数,正则化注意力分数可以作为卷积特征向量的权重,对应相乘后得到注意力加权向量,注意力加权向量已经完成对图像区域的加强或抑制,使得细粒度分类初始模型可以进行针对性学习。
在本实施例的一些可选的实现方式中,上述数据集输入子模块包括:
训练集输入单元,用于将若干组训练集输入细粒度分类初始模型的卷积层。
输出获取单元,用于获取卷积层的末层卷积层输出的卷积特征图。
向量设置单元,用于将卷积特征图中各图像区域所对应的向量设置为卷积特征向量。
本实施例中,将训练集输入卷积层,获取末层卷积层输出的卷积特征图,卷积特征图中的向量与图像中的各图像区域分别对应,依据对应关系可以准确地提取到卷积特征向量。
在本实施例的一些可选的实现方式中,上述损失计算模块包括:损失计算子模块、因子计算子模块以及线性运算子模块,其中:
损失计算子模块,用于将得到的训练实例输入分类器以计算分类器损失。
因子计算子模块,用于根据卷积特征向量计算正则化因子。
线性运算子模块,用于对分类器损失和正则化因子进行线性运算,得到模型损失。
本实施例中,将训练实例输入分类器以计算分类器损失,再根据卷积特征向量计算正则化因子以进一步对图像进行强化或抑制,基于对分类器损失和正则化因子进行线性运算,得到模型损失,使得细粒度分类初始模型可以根据模型损失更合理地调整模型参数。
在本实施例的一些可选的实现方式中,上述损失计算子模块包括:实例输入单元、标签设置单元以及损失计算单元,其中:
实例输入单元,用于将得到的训练实例输入分类器,得到训练实例中各图像的细粒度类别。
标签设置单元,用于将关键词设置为实例标签。
损失计算单元,用于根据实例标签和训练实例中各图像的细粒度类别,计算训练实例的分类器损失。
本实施例中,将训练实例输入分类器后得到细粒度类别,再以关键词作为实例标签,将训练实例作为整体计算分类器损失,保证了分类器损失可以考虑了训练实例中所融合的信息。
在本实施例的一些可选的实现方式中,上述基于图像检测的细粒度分类模型处理装置300还包括:待分类获取模块、待分类输入模块、测试生成模块以及测试输入模块,其中:
待分类获取模块,用于获取待分类图像。
待分类输入模块,用于将待分类图像输入细粒度分类模型,得到待分类图像的注意力加权向量。
测试生成模块,用于基于注意力加权向量生成待分类图像的测试实例。
测试输入模块,用于将测试实例输入细粒度分类模型的分类器,得到待分类图像的细粒度类别。
本实施例中,在应用测试时将待分类图像输入细粒度分类模型,得到测试实例,测试实例加强了与细粒度图像分类相关的图像区域,抑制了与细粒度图像分类任务无关的图像区域,使得分类器能够准确地输出细粒度类别。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器41至少包括一种类型的计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如基于图像检测的细粒度分类模型处理方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行所述基于图像检测的细粒度分类模型处理方法的计算机可读指令。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本实施例中提供的计算机设备可以执行上述基于图像检测的细粒度分类模型处理方法的步骤。此处基于图像检测的细粒度分类模型处理方法的步骤可以是上述各个实施例的基于图像检测的细粒度分类模型处理方法中的步骤。
本实施例中,根据关键词直接通过搜索引擎构建图像数据集,可以通过互联网快速地扩充图像数据集,提高了建立图像数据集的速度;因图像互相独立,将图像数据集中随机进行分组为若干组训练集,降低了不符合标签的图像的负面影响;将若干组训练集输入细粒度分类初始模型,细粒度分类初始模型融合注意力机制计算输入图像的注意力加权向量,以增强图像中与关键词相关的图像区域,使模型专注于对分类有关的图像区域;依据注意力加权向量生成训练实例,训练实例包含了对应训练集中各图像的特征;将训练实例输入分类器得到模型损失后,根据模型损失调整模型参数,得到可以准确分类的细粒度分类模型,快速而准确地实现了细粒度图像分类的处理。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于图像检测的细粒度分类模型处理方法的步骤。
本实施例中,根据关键词直接通过搜索引擎构建图像数据集,可以通过互联网快速地扩充图像数据集,提高了建立图像数据集的速度;因图像互相独立,将图像数据集中随机进行分组为若干组训练集,降低了不符合标签的图像的负面影响;将若干组训练集输入细粒度分类初始模型,细粒度分类初始模型融合注意力机制计算输入图像的注意力加权向量,以增强图像中与关键词相关的图像区域,使模型专注于对分类有关的图像区域;依据注意力加权向量生成训练实例,训练实例包含了对应训练集中各图像的特征;将训练实例输入分类器得到模型损失后,根据模型损失调整模型参数,得到可以准确分类的细粒度分类模型,快速而准确地实现了细粒度图像分类的处理。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。
Claims (20)
- 一种基于图像检测的细粒度分类模型处理方法,其中,包括下述步骤:基于接收到的关键词,通过搜索引擎构建图像数据集;将所述图像数据集随机分组为若干组训练集;将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
- 根据权利要求1所述的基于图像检测的细粒度分类模型处理方法,其中,所述基于接收到的关键词,通过搜索引擎构建图像数据集的步骤包括:接收终端发送的关键词;将所述关键词发送至搜索引擎,以指示所述搜索引擎从互联网中根据所述关键词进行图像搜索;基于搜索到的图像构建图像数据集。
- 根据权利要求1所述的基于图像检测的细粒度分类模型处理方法,其中,所述将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量的步骤包括:分别将所述若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到所述各图像中各图像区域的卷积特征向量;通过注意力检测器计算所述卷积特征向量的正则化注意力分数;其中,所述正则化注意力分数用于表征图像区域与所述关键词的关联程度;将所述正则化注意力分数与所述卷积特征向量对应相乘,得到所述各图像的注意力加权向量。
- 根据权利要求3所述的基于图像检测的细粒度分类模型处理方法,其中,所述分别将所述若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到所述各图像中各图像区域的卷积特征向量的步骤包括:将所述若干组训练集输入细粒度分类初始模型的卷积层;获取所述卷积层的末层卷积层输出的卷积特征图;将所述卷积特征图中各图像区域所对应的向量设置为卷积特征向量。
- 根据权利要求3所述的基于图像检测的细粒度分类模型处理方法,其中,所述将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失的步骤包括:将得到的训练实例输入分类器以计算分类器损失;根据所述卷积特征向量计算正则化因子;对所述分类器损失和所述正则化因子进行线性运算,得到模型损失。
- 根据权利要求5所述的基于图像检测的细粒度分类模型处理方法,其中,所述将得到的训练实例输入分类器以计算分类器损失的步骤包括:将得到的训练实例输入分类器,得到所述训练实例中各图像的细粒度类别;将所述关键词设置为实例标签;根据所述实例标签和所述训练实例中各图像的细粒度类别,计算所述训练实例的分类器损失。
- 根据权利要求1-6中任一项所述的基于图像检测的细粒度分类模型处理方法,其中,在所述根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型的步骤之后还包括:获取待分类图像;将所述待分类图像输入所述细粒度分类模型,得到所述待分类图像的注意力加权向量;基于所述注意力加权向量生成所述待分类图像的测试实例;将所述测试实例输入所述细粒度分类模型的分类器,得到所述待分类图像的细粒度类别。
- 一种基于图像检测的细粒度分类模型处理装置,其中,包括:数据集构建模块,用于基于接收到的关键词,通过搜索引擎构建图像数据集;数据集分组模块,用于将所述图像数据集随机分组为若干组训练集;数据集输入模块,用于将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;实例生成模块,用于对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;损失计算模块,用于将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;参数调整模块,用于根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
- 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:基于接收到的关键词,通过搜索引擎构建图像数据集;将所述图像数据集随机分组为若干组训练集;将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
- 根据权利要求9所述的计算机设备,其中,所述将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量的步骤包括:分别将所述若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到所述各图像中各图像区域的卷积特征向量;通过注意力检测器计算所述卷积特征向量的正则化注意力分数;其中,所述正则化注意力分数用于表征图像区域与所述关键词的关联程度;将所述正则化注意力分数与所述卷积特征向量对应相乘,得到所述各图像的注意力加权向量。
- 根据权利要求10所述的计算机设备,其中,所述分别将所述若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到所述各图像中各图像区域的卷积特征向量的步骤包括:将所述若干组训练集输入细粒度分类初始模型的卷积层;获取所述卷积层的末层卷积层输出的卷积特征图;将所述卷积特征图中各图像区域所对应的向量设置为卷积特征向量。
- 根据权利要求10所述的计算机设备,其中,所述将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失的步骤包括:将得到的训练实例输入分类器以计算分类器损失;根据所述卷积特征向量计算正则化因子;对所述分类器损失和所述正则化因子进行线性运算,得到模型损失。
- 根据权利要求12所述的计算机设备,其中,所述将得到的训练实例输入分类器以计算分类器损失的步骤包括:将得到的训练实例输入分类器,得到所述训练实例中各图像的细粒度类别;将所述关键词设置为实例标签;根据所述实例标签和所述训练实例中各图像的细粒度类别,计算所述训练实例的分类 器损失。
- 根据权利要求9-13任一项所述的计算机设备,其中,在所述根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型的步骤之后还包括:获取待分类图像;将所述待分类图像输入所述细粒度分类模型,得到所述待分类图像的注意力加权向量;基于所述注意力加权向量生成所述待分类图像的测试实例;将所述测试实例输入所述细粒度分类模型的分类器,得到所述待分类图像的细粒度类别。
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令;其中,所述计算机可读指令被处理器执行时实现如下步骤:基于接收到的关键词,通过搜索引擎构建图像数据集;将所述图像数据集随机分组为若干组训练集;将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量;对所述注意力加权向量进行池化,分别生成所述若干组训练集所对应的训练实例;将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失;根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型。
- 根据权利要求15所述的计算机可读存储介质,其中,所述将所述若干组训练集输入细粒度分类初始模型,得到所述若干组训练集中各图像的注意力加权向量的步骤包括:分别将所述若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到所述各图像中各图像区域的卷积特征向量;通过注意力检测器计算所述卷积特征向量的正则化注意力分数;其中,所述正则化注意力分数用于表征图像区域与所述关键词的关联程度;将所述正则化注意力分数与所述卷积特征向量对应相乘,得到所述各图像的注意力加权向量。
- 根据权利要求16所述的一种计算机可读存储介质,其中,所述分别将所述若干组训练集中的各图像输入细粒度分类初始模型的卷积层,得到所述各图像中各图像区域的卷积特征向量的步骤包括:将所述若干组训练集输入细粒度分类初始模型的卷积层;获取所述卷积层的末层卷积层输出的卷积特征图;将所述卷积特征图中各图像区域所对应的向量设置为卷积特征向量。
- 根据权利要求16所述的一种计算机可读存储介质,其中,所述将得到的训练实例输入所述细粒度分类初始模型的分类器,以计算模型损失的步骤包括:将得到的训练实例输入分类器以计算分类器损失;根据所述卷积特征向量计算正则化因子;对所述分类器损失和所述正则化因子进行线性运算,得到模型损失。
- 根据权利要求18所述的一种计算机可读存储介质,其中,所述将得到的训练实例输入分类器以计算分类器损失的步骤包括:将得到的训练实例输入分类器,得到所述训练实例中各图像的细粒度类别;将所述关键词设置为实例标签;根据所述实例标签和所述训练实例中各图像的细粒度类别,计算所述训练实例的分类器损失。
- 根据权利要求15-19任一项所述的计算机可读存储介质,其中,在所述根据所述模型损失调整所述细粒度分类初始模型的模型参数,得到细粒度分类模型的步骤之后还包括:获取待分类图像;将所述待分类图像输入所述细粒度分类模型,得到所述待分类图像的注意力加权向量;基于所述注意力加权向量生成所述待分类图像的测试实例;将所述测试实例输入所述细粒度分类模型的分类器,得到所述待分类图像的细粒度类别。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010930234.1A CN112101437B (zh) | 2020-09-07 | 2020-09-07 | 基于图像检测的细粒度分类模型处理方法、及其相关设备 |
CN202010930234.1 | 2020-09-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021143267A1 true WO2021143267A1 (zh) | 2021-07-22 |
Family
ID=73750691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/124434 WO2021143267A1 (zh) | 2020-09-07 | 2020-10-28 | 基于图像检测的细粒度分类模型处理方法、及其相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112101437B (zh) |
WO (1) | WO2021143267A1 (zh) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723256A (zh) * | 2021-08-24 | 2021-11-30 | 北京工业大学 | 一种花粉颗粒识别方法及装置 |
CN114049255A (zh) * | 2021-11-08 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像处理方法及其装置、存算一体芯片和电子设备 |
CN114419336A (zh) * | 2022-01-25 | 2022-04-29 | 南京理工大学 | 一种基于离散小波注意力模块的图像分类方法及系统 |
CN114529574A (zh) * | 2022-02-23 | 2022-05-24 | 平安科技(深圳)有限公司 | 基于图像分割的图像抠图方法、装置、计算机设备及介质 |
CN115131608A (zh) * | 2022-06-17 | 2022-09-30 | 广东技术师范大学 | 一种细粒度图像分类方法、装置、计算机设备及存储介质 |
CN115457308A (zh) * | 2022-08-18 | 2022-12-09 | 苏州浪潮智能科技有限公司 | 细粒度图像识别方法、装置和计算机设备 |
CN115953622A (zh) * | 2022-12-07 | 2023-04-11 | 广东省新黄埔中医药联合创新研究院 | 一种结合注意力互斥正则的图像分类方法 |
CN116109629A (zh) * | 2023-04-10 | 2023-05-12 | 厦门微图软件科技有限公司 | 一种基于细粒度识别与注意力机制的缺陷分类方法 |
CN116310425A (zh) * | 2023-05-24 | 2023-06-23 | 山东大学 | 一种细粒度图像检索方法、系统、设备及存储介质 |
CN117372791A (zh) * | 2023-12-08 | 2024-01-09 | 齐鲁空天信息研究院 | 细粒度定向能毁伤区域检测方法、装置及存储介质 |
CN118504543A (zh) * | 2024-07-19 | 2024-08-16 | 蒲惠智造科技股份有限公司 | 用于SaaS软件的细粒度实施计划生成方法及系统 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094171B (zh) * | 2021-03-31 | 2024-07-26 | 北京达佳互联信息技术有限公司 | 数据处理方法、装置、电子设备和存储介质 |
CN115082432B (zh) * | 2022-07-21 | 2022-11-01 | 北京中拓新源科技有限公司 | 基于细粒度图像分类的小目标螺栓缺陷检测方法及装置 |
CN117115565B (zh) * | 2023-10-19 | 2024-07-23 | 南方科技大学 | 一种基于自主感知的图像分类方法、装置及智能终端 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704877A (zh) * | 2017-10-09 | 2018-02-16 | 哈尔滨工业大学深圳研究生院 | 一种基于深度学习的图像隐私感知方法 |
CN107730553A (zh) * | 2017-11-02 | 2018-02-23 | 哈尔滨工业大学 | 一种基于伪真值搜寻法的弱监督物体检测方法 |
CN107958272A (zh) * | 2017-12-12 | 2018-04-24 | 北京旷视科技有限公司 | 图片数据集更新方法、装置、系统及计算机存储介质 |
CN108805259A (zh) * | 2018-05-23 | 2018-11-13 | 北京达佳互联信息技术有限公司 | 神经网络模型训练方法、装置、存储介质及终端设备 |
CN111079862A (zh) * | 2019-12-31 | 2020-04-28 | 西安电子科技大学 | 基于深度学习的甲状腺乳头状癌病理图像分类方法 |
CN111178458A (zh) * | 2020-04-10 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | 分类模型的训练、对象分类方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10074041B2 (en) * | 2015-04-17 | 2018-09-11 | Nec Corporation | Fine-grained image classification by exploring bipartite-graph labels |
CN109086792A (zh) * | 2018-06-26 | 2018-12-25 | 上海理工大学 | 基于检测和识别网络架构的细粒度图像分类方法 |
CN110647912A (zh) * | 2019-08-15 | 2020-01-03 | 深圳久凌软件技术有限公司 | 细粒度图像识别方法、装置、计算机设备及存储介质 |
CN111126459A (zh) * | 2019-12-06 | 2020-05-08 | 深圳久凌软件技术有限公司 | 一种车辆细粒度识别的方法及装置 |
-
2020
- 2020-09-07 CN CN202010930234.1A patent/CN112101437B/zh active Active
- 2020-10-28 WO PCT/CN2020/124434 patent/WO2021143267A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704877A (zh) * | 2017-10-09 | 2018-02-16 | 哈尔滨工业大学深圳研究生院 | 一种基于深度学习的图像隐私感知方法 |
CN107730553A (zh) * | 2017-11-02 | 2018-02-23 | 哈尔滨工业大学 | 一种基于伪真值搜寻法的弱监督物体检测方法 |
CN107958272A (zh) * | 2017-12-12 | 2018-04-24 | 北京旷视科技有限公司 | 图片数据集更新方法、装置、系统及计算机存储介质 |
CN108805259A (zh) * | 2018-05-23 | 2018-11-13 | 北京达佳互联信息技术有限公司 | 神经网络模型训练方法、装置、存储介质及终端设备 |
CN111079862A (zh) * | 2019-12-31 | 2020-04-28 | 西安电子科技大学 | 基于深度学习的甲状腺乳头状癌病理图像分类方法 |
CN111178458A (zh) * | 2020-04-10 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | 分类模型的训练、对象分类方法及装置 |
Non-Patent Citations (2)
Title |
---|
LUO XIONGWEN: "Research about Deep Learning Based Two Stage Disease Diagnosis Method for Medical Image", MEDICINE & PUBLIC HEALTH, CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 February 2020 (2020-02-15), XP055829663 * |
WANG PEISEN, SONG YAN;DAI LIRONG: "Fine-Grained Image Classification with Multi-channel Visual Attention", SHUJU CAIJI YU CHULI - JOURNAL OF DATA ACQUISITION & PROCESSING, SHUJU CAIJI YU CHULI, XINXIANG, CN, vol. 34, no. 1, 1 January 2019 (2019-01-01), CN, pages 157 - 166, XP055829666, ISSN: 1004-9037, DOI: 10.16337/j.1004-9037.2019.01.016 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723256A (zh) * | 2021-08-24 | 2021-11-30 | 北京工业大学 | 一种花粉颗粒识别方法及装置 |
CN114049255A (zh) * | 2021-11-08 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像处理方法及其装置、存算一体芯片和电子设备 |
CN114419336A (zh) * | 2022-01-25 | 2022-04-29 | 南京理工大学 | 一种基于离散小波注意力模块的图像分类方法及系统 |
CN114529574A (zh) * | 2022-02-23 | 2022-05-24 | 平安科技(深圳)有限公司 | 基于图像分割的图像抠图方法、装置、计算机设备及介质 |
CN115131608A (zh) * | 2022-06-17 | 2022-09-30 | 广东技术师范大学 | 一种细粒度图像分类方法、装置、计算机设备及存储介质 |
CN115131608B (zh) * | 2022-06-17 | 2024-08-27 | 广东技术师范大学 | 一种细粒度图像分类方法、装置、计算机设备及存储介质 |
CN115457308A (zh) * | 2022-08-18 | 2022-12-09 | 苏州浪潮智能科技有限公司 | 细粒度图像识别方法、装置和计算机设备 |
CN115457308B (zh) * | 2022-08-18 | 2024-03-12 | 苏州浪潮智能科技有限公司 | 细粒度图像识别方法、装置和计算机设备 |
CN115953622B (zh) * | 2022-12-07 | 2024-01-30 | 广东省新黄埔中医药联合创新研究院 | 一种结合注意力互斥正则的图像分类方法 |
CN115953622A (zh) * | 2022-12-07 | 2023-04-11 | 广东省新黄埔中医药联合创新研究院 | 一种结合注意力互斥正则的图像分类方法 |
CN116109629B (zh) * | 2023-04-10 | 2023-07-25 | 厦门微图软件科技有限公司 | 一种基于细粒度识别与注意力机制的缺陷分类方法 |
CN116109629A (zh) * | 2023-04-10 | 2023-05-12 | 厦门微图软件科技有限公司 | 一种基于细粒度识别与注意力机制的缺陷分类方法 |
CN116310425B (zh) * | 2023-05-24 | 2023-09-26 | 山东大学 | 一种细粒度图像检索方法、系统、设备及存储介质 |
CN116310425A (zh) * | 2023-05-24 | 2023-06-23 | 山东大学 | 一种细粒度图像检索方法、系统、设备及存储介质 |
CN117372791A (zh) * | 2023-12-08 | 2024-01-09 | 齐鲁空天信息研究院 | 细粒度定向能毁伤区域检测方法、装置及存储介质 |
CN117372791B (zh) * | 2023-12-08 | 2024-03-22 | 齐鲁空天信息研究院 | 细粒度定向能毁伤区域检测方法、装置及存储介质 |
CN118504543A (zh) * | 2024-07-19 | 2024-08-16 | 蒲惠智造科技股份有限公司 | 用于SaaS软件的细粒度实施计划生成方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN112101437B (zh) | 2024-05-31 |
CN112101437A (zh) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021143267A1 (zh) | 基于图像检测的细粒度分类模型处理方法、及其相关设备 | |
CN111079022B (zh) | 基于联邦学习的个性化推荐方法、装置、设备及介质 | |
WO2022126971A1 (zh) | 基于密度的文本聚类方法、装置、设备及存储介质 | |
WO2021151336A1 (zh) | 基于注意力机制的道路图像目标检测方法及相关设备 | |
CN109241412B (zh) | 一种基于网络表示学习的推荐方法、系统及电子设备 | |
CN108287864B (zh) | 一种兴趣群组划分方法、装置、介质及计算设备 | |
WO2021155713A1 (zh) | 基于权重嫁接的模型融合的人脸识别方法及相关设备 | |
US8762383B2 (en) | Search engine and method for image searching | |
WO2020237856A1 (zh) | 基于知识图谱的智能问答方法、装置及计算机存储介质 | |
WO2023138188A1 (zh) | 特征融合模型训练及样本检索方法、装置和计算机设备 | |
US11822590B2 (en) | Method and system for detection of misinformation | |
WO2020007177A1 (zh) | 计算机执行的报价方法、报价装置、电子设备及存储介质 | |
CN106250464A (zh) | 排序模型的训练方法及装置 | |
CN112749300B (zh) | 用于视频分类的方法、装置、设备、存储介质和程序产品 | |
WO2022142032A1 (zh) | 手写签名校验方法、装置、计算机设备及存储介质 | |
WO2024041483A1 (zh) | 一种推荐方法及相关装置 | |
CN107291774B (zh) | 错误样本识别方法和装置 | |
CN112381236A (zh) | 联邦迁移学习的数据处理方法、装置、设备及存储介质 | |
WO2023185925A1 (zh) | 一种数据处理方法及相关装置 | |
CN110598084A (zh) | 对象排序方法、商品排序方法、装置及电子设备 | |
CN112668482A (zh) | 人脸识别训练方法、装置、计算机设备及存储介质 | |
CN113360788A (zh) | 一种地址推荐方法、装置、设备及存储介质 | |
WO2021000411A1 (zh) | 基于神经网络的文档分类方法、装置、设备及存储介质 | |
CN117312535B (zh) | 基于人工智能的问题数据处理方法、装置、设备及介质 | |
CN117132950A (zh) | 一种车辆追踪方法、系统、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20914598 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20914598 Country of ref document: EP Kind code of ref document: A1 |