WO2017096758A1 - 图像分类方法、电子设备和存储介质 - Google Patents
图像分类方法、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2017096758A1 WO2017096758A1 PCT/CN2016/083064 CN2016083064W WO2017096758A1 WO 2017096758 A1 WO2017096758 A1 WO 2017096758A1 CN 2016083064 W CN2016083064 W CN 2016083064W WO 2017096758 A1 WO2017096758 A1 WO 2017096758A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- neural network
- network model
- classified
- probability value
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2111—Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/1916—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
Definitions
- the present invention relates to the field of computer vision technology, and in particular, to an image classification method, an electronic device, and a storage medium.
- the image classification method is an image processing method that separates different types of objects according to different features reflected in the image information, and specifically uses a computer to quantitatively analyze the image, and draws each pixel or region in the image or image. Classified into one of several categories to replace human visual interpretation. After the images are classified, they can be further applied according to the classification results, such as image retrieval, video surveillance, and semantic analysis involving images.
- the neural network model can achieve more accurate image classification.
- the requirements for image classification accuracy are constantly improved. Therefore, how to improve the accuracy of image classification has become a problem that needs to be solved. important question.
- an image classification method an electronic device, and a storage medium that can improve image classification accuracy are provided.
- An image classification method comprising:
- the linear classifier is Training based on the characteristics of the corresponding training images extracted by the corresponding neural network model;
- An electronic device comprising a memory and a processor, wherein the memory stores instructions that, when executed by the processor, cause the processor to perform the following steps:
- the linear classifier is Training based on the characteristics of the corresponding training images extracted by the corresponding neural network model;
- One or more computer readable non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of:
- the linear classifier is Training based on the characteristics of the corresponding training images extracted by the corresponding neural network model;
- FIG. 1 is a schematic structural diagram of an electronic device for implementing an image classification method in an embodiment
- FIG. 2 is a schematic flow chart of an image classification method in an embodiment
- FIG. 3 is a schematic structural diagram of a simplified neural network model in a specific example
- mapping function 4 is a schematic diagram of a mapping function in an embodiment
- FIG. 5 is a schematic flowchart of a step of inputting an image to be classified into a plurality of different neural network models, and acquiring data of a plurality of non-input layer outputs of each neural network model to generate a corresponding plurality of image features;
- FIG. 6 is a schematic diagram of a matrix of output probability values when a training image larger than a standard size is input when retraining a neural network model in an embodiment
- FIG. 7 is a schematic flowchart of a step of determining, according to each obtained probability value, whether an image to be classified includes an object image of a preset category according to each obtained probability value;
- FIG. 8 is a structural block diagram of an electronic device in an embodiment
- FIG. 9 is a structural block diagram of an image feature extraction module of an electronic device in an embodiment
- FIG. 10 is a structural block diagram of an electronic device in another embodiment
- FIG. 11 is a structural block diagram of a discriminating module of an electronic device in an embodiment.
- an electronic device for implementing an image classification method including a processor, a non-volatile storage medium, and an internal memory connected by a system bus.
- the processor has a computing function and a function of controlling the operation of the electronic device, the processor being configured to perform an image classification method.
- the non-volatile storage medium includes at least one of a magnetic storage medium, an optical storage medium, and a flash storage medium, and the non-volatile storage medium stores an operating system.
- the non-volatile storage medium and internal memory can store computer readable instructions that, when executed by a processor, cause the processor to perform an image classification method.
- an image classification method is provided.
- This embodiment is exemplified by the method applied to the electronic device shown in FIG. The method specifically includes the following steps:
- Step 202 Input the image to be classified into a plurality of different neural network models, and acquire data of the specified plurality of non-input layer outputs of each neural network model to generate corresponding plurality of image features.
- the image to be classified refers to an image that needs to be classified, and can be carried in a preset format, such as a JPEG format, a PNG format, a BMP format, or a GIF format.
- the neural network model also known as Artificial Neural Networks (abbreviated as ANNs), is a machine learning model that simulates the structure of the brain. In the field of machine learning, neural networks are often used to model more complex tasks. The size of the neural network, including depth and width, can be adjusted, depending on the application area and the size of the problem. Because of its strong expressive ability, neural networks are widely used in applications such as speech recognition, image classification, face recognition, natural language processing, and advertising.
- the structure of the neural network model includes multiple layers, the first layer is the input layer, the last layer is the output layer, and the middle layer includes zero or more intermediate layers, each layer including one Or multiple nodes.
- the input layer size is determined by the number of input variables, and the output layer size depends on the number of classification categories.
- the hidden layer includes multiple neurons, and adjusting the number of neurons can adjust the complexity and expressive power of the neural network model. In general, the wider and deeper the neural network, the stronger its modeling capabilities.
- the multiple neural network models are at least two neural network models.
- the different neural network models are mainly different training sets used in training.
- the different training sets refer to different training images in the training set. Of course, there are a few identical training sets in different training sets.
- the training image is also acceptable, and the training image is an image of the category to which it belongs.
- the architecture of the non-output layer of different neural network models may be uniform, and the number and width of the non-output layer may be the same, and the architecture here does not include the coefficients connecting the different layers.
- the non-output layer refers to the input layer and the intermediate layer, while the non-input layer refers to the intermediate layer and the output layer. The number of nodes in the output layer is multiple.
- the neural network model may preferably be a convolutional neural network model.
- the connection between two adjacent neurons is changed from the original full connection to each neuron connected to only a few neurons, and the coefficients (or weights) of the connections are Neurons are identical, called shared convolution kernels, or shared weights.
- This convolution-like connection method can greatly reduce the learning parameters, and learn some invariant features, which are very suitable for processing image data, and can further improve the classification accuracy when used for image classification.
- the output layer output of the neural network model may be the probability that the image to be classified belongs to a preset category, and each node of the output layer represents a preset category.
- the non-input layer when acquiring data output from the non-input layer is preferably selected from the output layer to the input layer, such as selecting the output layer and the penultimate layer, or selecting the output layer, the penultimate layer, and the third to last layer.
- Step 204 Enter a plurality of image features into a linear classifier for discriminating a preset category corresponding to each neural network model, and obtain a probability value of the corresponding image of the object to be classified that includes the preset category; the linear classifier is based on The characteristics of the corresponding training images extracted by the corresponding neural network models are trained.
- each neural network model separately trains a linear classifier for discriminating a corresponding preset category for each preset category, and the linear classifier of the preset category is included according to the known preset category.
- the training image of the true probability value of the object image is trained by extracting the image feature by the neural network model corresponding to the linear classifier, and the image feature is extracted.
- a plurality of image features may be respectively input into a linear classifier corresponding to each neural network model for discriminating a specific preset category; If the classified image includes which preset category or categories, multiple image features may be respectively input to all linear classifiers corresponding to each neural network model, and each linear classifier is used to discriminate a preset category.
- An image of an object containing a preset category such as an image containing a television set, an image containing a dog, or an image containing a human.
- the result of the linear classifier output can be a real range.
- the mapping function with the argument of [0, 1] can be used to map the result of the linear classifier output to the image to be classified.
- the linear classifier is a linear classifier based on SVM (Support Vector Machine).
- the dependent variable of the mapping function is positively correlated with the independent variable, that is, the dependent variable increases with the increase of the independent variable, and decreases with the decrease of the independent variable.
- the mapping function can use the Sigmoid function, and the Sigmoid function is specifically Where e is the natural base, x is the independent variable, and S(x) is the dependent variable.
- the curve of the Sigmoid function is shown in Figure 4.
- the mapping function can be integrated into the linear classifier such that the linear classifier directly outputs the probability value of the image to be classified containing the object image of the preset category.
- Step 206 Determine, according to the obtained probability values, whether the image to be classified includes an object image of a preset category.
- the probability values obtained in step 204 may be averaged or weighted averaged to obtain a comprehensive probability value, thereby determining the magnitude of the combined probability value and the probability value threshold of the corresponding preset category, if greater than or equal to It is determined that the image to be classified includes an object image of a preset category; if less, it is determined that the image to be classified does not include an object image of a preset category.
- a plurality of weight combinations may be prepared in advance, and the image classification accuracy rate of each weight combination is separately verified by using the verification set, and the weight combination with the highest image classification accuracy rate is selected as the calculation. The weight of each probability value when calculating the weighted average.
- the image classification method described above utilizes data output from a plurality of non-input layers of a neural network model to extract features of an image to be classified, and can more accurately express characteristics of the image. Then input the image feature into the linear classifier corresponding to the corresponding neural network model for discriminating the preset category, and the probability value obtained by the output of the linear classifier can more accurately reflect that the image to be classified contains the object of the preset category. The probability value of the image.
- the accuracy of image classification can be further improved by synthesizing the probability values corresponding to the linear classifiers used to determine the preset categories of different neural network models.
- step 202 specifically includes the following steps:
- Step 502 Input the image to be classified into each neural network model.
- step 502 includes inputting the image to be classified into each neural network model at a plurality of scales. Images of multiple scales are obtained by scaling the length and width of the image to be classified. For example, the image to be classified can be scaled to the shorter side and the three dimensions of 256, 384 and 512 are respectively input into each neural network model.
- Step 504 Acquire a vector of a plurality of layer outputs specified in an intermediate layer and an output layer of each neural network model.
- the intermediate layer of each neural network model and the plurality of layers specified in the output layer refer to selecting at least two layers specified in advance from a set of layers composed of the intermediate layer and the output layer.
- the output layer of each neural network model, the vector of the penultimate layer and the vector of the last third layer can be obtained.
- the vector output by each layer is a fixed length vector.
- Step 506 splicing the vectors of different layers of each neural network model to obtain a plurality of image features respectively corresponding to each neural network model.
- the vectors of different layers of each neural network model are spliced according to a predetermined splicing order, and graphic features corresponding to each neural network model are obtained, and the number of obtained image features and the number of neural network models are obtained. It is consistent.
- step 506 specifically includes: splicing vectors of different layers corresponding to images of the same scale of each neural network model, and averaging vectors corresponding to images of different scales to obtain respectively and each neural network. Multiple image features corresponding to the model.
- the vector length of the output of the same layer of each neural network model is fixed, and the characteristics of the images of different scales can be integrated by obtaining the average value.
- the lengths of the outputs of the different layers of each neural network model are generally different, and the features of the images of different layers can be integrated by splicing. It is possible to splicing the vectors of different layers corresponding to the images of the same scale of each neural network model, and averaging the vectors corresponding to the images of different scales without limiting the order. The resulting image features are able to accurately represent the characteristics of the image to be classified.
- the image features of the plurality of layers in the non-input layer of the neural network model are used to generate image features, so that the image features can better express the characteristics of the image to be classified, thereby facilitating better classification accuracy.
- the image classification method further includes: clearing coefficients of an output layer of the original neural network model trained by one training set, adjusting the output layer to be adapted to another training set, and adopting another training set. Retraining to get a retrained neural network model.
- One of the training sets refers to a training set used when training the original neural network model, including several training images known to contain true probability values of object images of preset categories.
- the other training set is a different training set than the one used to train the original neural network model.
- the number of different preset categories in the training set is also different. Therefore, it is necessary to adjust the nodes of the output layer according to the number of preset categories of the other training set, and clear the coefficients of the output layer to perform training again.
- the original neural network model can use the neural network model trained by the ImageNet training set published by the VGG Laboratory of Oxford University.
- other public neural network models can be used, such as Google's open source neural network model.
- ImageNet is a computer vision system recognition project established by American computer scientists to simulate human recognition systems for the establishment of deep learning models for identifying objects from pictures.
- the original Neural Network model training uses an ImageNet training set with 1000 categories, and the output layer coefficient scale is 4096*1000 (where 4096 is the output of the second to last layer).
- the other data sets are not necessarily 1000 categories. If there are 20 types, the scale of the output layer is 4096*20, so the output layer should be adjusted to match the other training sets and retrained.
- the training can be performed using FCN (Fully Convolutional Networks, see Fully Convolutional Networks for Semantic Segmentation, arxiv: 1411.4038v2) algorithm.
- the coefficients of the output layer of the neural network model adjusted by the output layer may be initialized, and each training image in the corresponding training set is scaled to the same size input to the neural network model.
- the probability value of the image of the object containing the preset category is output by the output layer.
- the true probability value may take the proportion of the object image of the preset category in the corresponding training image.
- the output probability value is compared with the true probability value of the corresponding training image to adjust the coefficients of the retrained neural network model, so that the difference between the probability value of the output layer output and the corresponding true probability value is reduced, and finally the training is completed.
- the shorter side is squared by the square traversing the scaled image to obtain the sub-image input into the neural network model. Until all pixels of the scaled image are traversed.
- the training image can be scaled to 256 according to the shorter side, and 256*256 sub-images are input multiple times at intervals of 16 pixels according to the size of the longer side until all the pixels of the scaled image are traversed.
- the dense probability spatial distribution of the object image of each preset category of the training image may be obtained based on the OverFeat algorithm, and the real position may be calculated according to the real position of the object image in the training image.
- the back propagation gradient can be calculated, thereby adjusting the coefficients of the retrained neural network model according to the back propagation gradient, so that the output layer outputs The difference between the probability value and the corresponding true probability value is reduced.
- the coefficient of the retrained neural network model is vector X
- the input training image is I
- the output layer output probability value is y(X, I)
- given X and I can calculate y
- the real The probability value y' is known
- the vector X needs to be adjusted such that y and y' are as close as possible.
- the partial derivative with respect to X is obtained for E, and the gradient direction ⁇ X is obtained, and the direction - ⁇ X opposite to the gradient direction ⁇ X adjusts the value of X such that E decreases.
- the output layer outputs a matrix of probability values corresponding to the preset categories. For example, inputting an image of 256*256 will result in a 2*2 probability value matrix, and each probability value in the probability value matrix corresponds to a sub-image of the input training image. As shown in FIG. 6, the value of the upper left corner of the probability value matrix is determined only by the sub-image of the upper left corner of the training image having a size of 224*224.
- the probability value in the probability value matrix may take the proportion of the object image of the preset category in the corresponding sub-image as a whole of the object image.
- the neural network model required for image classification can be quickly trained, which greatly saves the training time.
- step 206 specifically includes the following steps:
- a window is used to traverse the image to be classified to extract the window image and scale to the same size.
- the image to be classified is extracted from the image to be classified by a Selective Search algorithm and uniformly scaled to a size of 256*256.
- the scaled size should meet the dimensions required for the input image of the neural network model.
- Step 704 input each window image into the retrained neural network model, and acquire data generated by the non-input layer to generate a window image feature.
- each window image may be input to a retrained neural network model, and vectors of a plurality of layer outputs specified in the intermediate layer and the output layer may be acquired, and vectors of different layers may be spliced to obtain window image features.
- each window image may be input to the retrained neural network model according to multiple scales, the vectors of different layers corresponding to the image of the same scale are spliced, and the vectors corresponding to the images of different scales are averaged. , get the window image features.
- Step 706 respectively input each window image feature into a linear classifier for discriminating the preset category corresponding to the retrained neural network model, and obtain an object image of each window image containing the preset category according to the output of the corresponding linear classifier. Probability value.
- Step 708 Select a probability value with the largest value from the probability values corresponding to the respective window images.
- the probability value of the largest selection value among the probability values corresponding to the respective window images is P3, and the probability value corresponding to the original neural network model is P2, and the probability value corresponding to the retrained neural network model is P1. .
- Step 710 Select a probability value with the largest value from the selected probability value and the probability value corresponding to the original neural network model. Specifically, the probability value that selects the largest value from P2 and P3 is denoted as max(P2, P3).
- Step 712 calculating a weighted average of the selected probability value and the probability value corresponding to the retrained neural network model.
- the weighted average of P1 and max(P2, P3) is calculated.
- the weights of P1 and max(P2, P3) can be determined by verifying the accuracy of the image classification by the verification set. Specifically, a plurality of weight combinations, such as 0.1 and 0.9, 0.2 and 0.8, and 0.3 and 0.7, may be prepared in advance, and the image classification accuracy rate under different weight combinations is separately verified by using the verification set, thereby selecting the weight combination with the highest image classification accuracy rate.
- the weight combination is calculated as a weighted average in step 712.
- the validation set includes a collection of images that are known to contain real probability values of object images of the preset categories.
- Step 714 Determine whether the image to be classified includes an object image of a preset category according to a magnitude relationship between the weighted average value and a probability value threshold corresponding to the preset category.
- the weighted average value compared with the probability value threshold value of the preset category, if the weighted average value is greater than or equal to the probability value threshold, determining that the image to be classified includes the object image of the preset category; if the weighted average value is less than the probability value threshold Then, it is determined that the image to be classified does not include an object image of a preset category.
- the probability value threshold may be, for example, 0.5.
- the image to be classified when the image size to be classified is larger than the size of the training image, the image to be classified may be divided into a plurality of partially overlapping sub-images, respectively obtained by the above steps 202, 204 and steps 702 to 712, respectively.
- a weighted average of the sub-images which represents a comprehensive probability value that integrates the various neural network models, and the composite probability values of the respective sub-images constitute a probability spatial distribution.
- the maximum probability value of the probability spatial distribution of the image to be classified may represent the probability that the entire image to be classified contains the object image of the preset category, and the maximum probability value of the different preset categories may determine which presets are included in the image to be classified.
- An image of the object of the category An image of the object of the category.
- some categories of discriminating depend on context information.
- the ship's discriminating often requires a sea background, and the corresponding P2 value is larger than P3; while some categories are not dependent on context information, then P3 The value is larger than P2; but if the object image of the preset category is not included, both P2 and P3 are relatively low. Therefore, as long as P2 or P3 has a high level, it is basically possible to determine that the image of the object to be classified contains the object of the preset category is very large, and the image classification accuracy can be further improved.
- an electronic device 800 is provided.
- the internal structure of the electronic device 800 may correspond to the structure of the electronic device as shown in FIG. 1.
- Each of the following modules may pass in whole or in part.
- Software, hardware or a combination thereof is implemented.
- the electronic device 800 includes an image feature extraction module 810, a linear classifier classification module 820, and a discrimination module 830.
- the image feature extraction module 810 is configured to input the image to be classified into a plurality of different neural network models, and acquire data of the specified plurality of non-input layer outputs of each neural network model to generate a corresponding plurality of image features.
- the image to be classified refers to an image that needs to be classified, and can be carried in a preset format, such as a JPEG format, a PNG format, a BMP format, or a GIF format.
- the neural network model also known as Artificial Neural Networks (abbreviated as ANNs), is a machine learning model that simulates the structure of the brain. In the field of machine learning, neural networks are often used to model more complex tasks. The size of the neural network, including depth and width, can be adjusted, depending on the application area and the size of the problem. Because of its strong expressive ability, neural networks are widely used in applications such as speech recognition, image classification, face recognition, natural language processing, and advertising.
- the structure of the neural network model includes multiple layers, the first layer is the input layer, the last layer is the output layer, and the middle layer includes zero or more intermediate layers, each layer including one Or multiple nodes.
- the input layer size is determined by the number of input variables, and the output layer size depends on the number of classification categories.
- the hidden layer includes multiple neurons, and adjusting the number of neurons can adjust the complexity and expressive power of the neural network model. In general, the wider and deeper the neural network, the stronger its modeling capabilities.
- the multiple neural network models are at least two neural network models.
- the different neural network models are mainly different training sets used in training.
- the different training sets refer to different training images in the training set. Of course, there are a few identical training sets in different training sets.
- the training image is also acceptable, and the training image is an image of the category to which it belongs.
- the architecture of the non-output layer of different neural network models may be uniform, and the number and width of the non-output layer may be the same, and the architecture here does not include the coefficients connecting the different layers.
- the non-output layer refers to the input layer and the intermediate layer, while the non-input layer refers to the intermediate layer and the output layer. The number of nodes in the output layer is multiple.
- the neural network model may preferably be a convolutional neural network model.
- the connection between two adjacent neurons is changed from the original full connection to each neuron connected to only a few neurons, and the connected coefficients are the same between neurons. , called a shared convolution kernel, or a shared weight.
- This convolution-like connection method can greatly reduce the learning parameters, and learn some invariant features, which are very suitable for processing image data, and can further improve the classification accuracy when used for image classification.
- the image feature extraction module 810 is configured to input an image to be classified into an output layer of a plurality of different neural network models, and acquire data outputted by at least one of an intermediate layer and an output layer of each neural network model, and preferably obtain each neural network.
- the data outputted by at least two layers in the middle layer and the output layer of the model generates a plurality of image features corresponding to each neural network model in accordance with the acquired data.
- the output layer output of the neural network model may be the probability that the image to be classified belongs to a preset category, and each node of the output layer represents a preset category.
- the non-input layer when the image feature extraction module 810 acquires the data output by the non-input layer may preferably be selected from the output layer to the input layer, such as selecting the output layer and the penultimate layer, or selecting the output layer, the penultimate layer, and the reciprocal the third floor.
- the linear classifier classification module 820 is configured to input a plurality of image features into a linear classifier for determining a preset category corresponding to each neural network model, and obtain a probability value of the image of the object to be classified that includes the object image of the preset category.
- the linear classifier is trained based on the characteristics of the corresponding training images extracted by the corresponding neural network model.
- each neural network model is separately trained for each predetermined category for discriminating the corresponding pre- a linear classifier of the category
- the linear classifier of the preset category is based on the training model that knows the true probability value of the object image containing the preset category, and then extracts the image feature through the neural network model corresponding to the linear classifier owned. If it is determined whether the image to be classified contains an object image of a specific preset category, a plurality of image features may be respectively input into a linear classifier corresponding to each neural network model for discriminating a specific preset category; If the classified image includes which preset category or categories, multiple image features may be respectively input to all linear classifiers corresponding to each neural network model, and each linear classifier is used to discriminate a preset category.
- An image of an object containing a preset category such as an image containing a television set, an image containing a dog, or an image containing a human.
- the result of the linear classifier output can be a real range.
- the mapping function with the argument of [0, 1] can be used to map the result of the linear classifier output to the image to be classified.
- the dependent variable of the mapping function is positively correlated with the independent variable, that is, the dependent variable increases with the increase of the independent variable, and decreases with the decrease of the independent variable.
- the mapping function can use the Sigmoid function, and the Sigmoid function is specifically Where e is the natural base, x is the independent variable, and S(x) is the dependent variable.
- the mapping function can be integrated into the linear classifier such that the linear classifier directly outputs the probability value of the image to be classified containing the object image of the preset category.
- the determining module 830 is configured to determine, according to the obtained each probability value, whether the image to be classified includes an object image of a preset category.
- the probability values obtained by the linear classifier classification module 820 may be averaged or weighted averaged to obtain a comprehensive probability value, thereby determining the magnitude of the combined probability value and the probability value threshold of the corresponding preset category, if greater than Or equal to determining that the image to be classified contains an object image of a preset category; if less, it is determined that the image to be classified does not include an object image of a preset category.
- a plurality of weight combinations may be prepared in advance, and the image classification accuracy rate of each weight combination is separately verified by using the verification set, and the weight combination with the highest image classification accuracy rate is selected as the weighted average value. The weight of each probability value.
- the electronic device 800 is configured to extract data from multiple non-input layers by using a neural network model.
- the characteristics of the image to be classified can more accurately express the characteristics of the image.
- input the image feature into the linear classifier corresponding to the corresponding neural network model for discriminating the preset category, and the probability value obtained by the output of the linear classifier can more accurately reflect that the image to be classified contains the object of the preset category.
- the probability value of the image The accuracy of image classification can be further improved by synthesizing the probability values corresponding to the linear classifiers used to determine the preset categories of different neural network models.
- the image feature extraction module 810 includes an input module 811 , a vector acquisition module 812 , and an image feature generation module 813 .
- the input module 811 is configured to input an image to be classified into each neural network model.
- the vector obtaining module 812 is configured to acquire a vector of the plurality of layer outputs specified in the middle layer and the output layer of each neural network model.
- the intermediate layer of each neural network model and the plurality of layers specified in the output layer refer to selecting at least two layers specified in advance from a set of layers composed of the intermediate layer and the output layer.
- the output layer of each neural network model, the vector of the penultimate layer and the vector of the last third layer can be obtained.
- the vector output by each layer is a fixed length vector.
- the image feature generation module 813 is configured to splicing vectors of different layers of each neural network model to obtain a plurality of image features respectively corresponding to each neural network model.
- the image feature generation module 813 is configured to splicing the vectors of different layers of each neural network model according to a predetermined splicing order, and obtaining graphic features corresponding to each neural network model one by one, and obtaining the image features.
- the number is consistent with the number of neural network models.
- the image features of the plurality of layers in the non-input layer of the neural network model are used to generate image features, so that the image features can better express the characteristics of the image to be classified, thereby facilitating better classification accuracy.
- the input module 811 is specifically configured to input the image to be classified into each neural network model according to multiple scales. Images of multiple scales are obtained by scaling the length and width of the image to be classified. For example, the image to be classified can be scaled to the shorter side and the three dimensions of 256, 384 and 512 are respectively input into each neural network model.
- the image feature generation module 813 is specifically configured to image the same scale of each neural network model. Corresponding vector splicing of different layers, and averaging vectors corresponding to images of different scales, and obtaining multiple image features corresponding to each neural network model respectively.
- the vector length of the output of the same layer of each neural network model is fixed, and the characteristics of the images of different scales can be integrated by obtaining the average value.
- the lengths of the outputs of the different layers of each neural network model are generally different, and the features of the images of different layers can be integrated by splicing. It is possible to splicing the vectors of different layers corresponding to the images of the same scale of each neural network model, and averaging the vectors corresponding to the images of different scales without limiting the order. The resulting image features are able to accurately represent the characteristics of the image to be classified.
- the electronic device 800 further includes a training module 840 for clearing coefficients of an output layer of an original neural network model trained using a training set, adjusting an output layer and additional training.
- the set is adapted and retrained with another training set to obtain a retrained neural network model.
- One of the training sets refers to a training set used when training the original neural network model, including several training images known to contain true probability values of object images of preset categories.
- the other training set is a different training set than the one used to train the original neural network model.
- the number of different preset categories in the training set is also different. Therefore, it is necessary to adjust the nodes of the output layer according to the number of preset categories of the other training set, and clear the coefficients of the output layer to perform training again.
- the original neural network model can use the neural network model trained by the ImageNet training set published by the VGG Laboratory of Oxford University.
- other public neural network models can be used, such as Google's open source neural network model.
- ImageNet is a computer vision system recognition project established by American computer scientists to simulate human recognition systems for the establishment of deep learning models for identifying objects from pictures.
- the original Neural Network model training uses an ImageNet training set with 1000 categories, and the output layer coefficient scale is 4096*1000 (where 4096 is the output of the second to last layer).
- the other data sets are not necessarily 1000 categories. If there are 20 types, the scale of the output layer is 4096*20, so the output layer should be adjusted to match the other training sets and retrained.
- the training can be performed using the FCN (Semantic Separation of Full Convolutional Network) algorithm.
- the training module 840 may initialize the coefficients of the output layer of the neural network model adjusted by the output layer when retraining the neural network model, and scale each training image in the corresponding training set to the same size input to the
- the neural network model outputs a probability value of an object image containing a preset category by the output layer.
- the true probability value may take the proportion of the object image of the preset category in the corresponding training image.
- the output probability value is compared with the true probability value of the corresponding training image to adjust the coefficients of the retrained neural network model, so that the difference between the probability value of the output layer output and the corresponding true probability value is reduced, and finally the training is completed.
- the training module 840 zooms each training image in the corresponding training set to the same size input to the neural network model. If the aspect ratio is different, the shorter side is the square of the side length traversing the scaled image to obtain the sub-image input to the nerve. The network model until all pixels of the scaled image are traversed. For example, the training image can be scaled to 256 according to the shorter side, and 256*256 sub-images are input multiple times at intervals of 16 pixels according to the size of the longer side until all the pixels of the scaled image are traversed.
- the training module 840 may obtain a dense probability spatial distribution of the object image of each preset category of the training image based on the OverFeat algorithm, according to the real position of the object image in the training image. Calculate the real dense probability spatial distribution, and calculate the back propagation gradient according to the dense probability spatial distribution obtained by the OverFeat algorithm and the real dense probability spatial distribution, so as to adjust the coefficients of the retrained neural network model according to the back propagation gradient, so that the output The difference between the probability value of the layer output and the corresponding true probability value is reduced.
- the coefficient of the retrained neural network model is vector X
- the input training image is I
- the output layer output probability value is y(X, I)
- given X and I can calculate y
- the real The probability value y' is known
- the vector X needs to be adjusted such that y and y' are as close as possible.
- the partial derivative with respect to X is obtained for E, and the gradient direction ⁇ X is obtained, and the direction - ⁇ X opposite to the gradient direction ⁇ X adjusts the value of X such that E decreases.
- the output layer outputs a matrix of probability values corresponding to the preset categories. For example, if you input an image of 256*256, you will get a 2*2 probability value matrix, and each of the probability value matrices.
- the rate value corresponds to a sub-image of the input training image. As shown in FIG. 6, the value of the upper left corner of the probability value matrix is determined only by the sub-image of the upper left corner of the training image having a size of 224*224.
- the probability value in the probability value matrix may take the proportion of the object image of the preset category in the corresponding sub-image as a whole of the object image. For example, in the sub-image of size 224*224 in the upper left corner of Figure 6, all the triangles are located in the sub-image, and the corresponding probability value is 1; the five-pointed star is not in the sub-image, so the corresponding probability value is 0; The corresponding probability value in the image is 0.5.
- the neural network model required for image classification can be quickly trained, which greatly saves the training time.
- the determination module 830 includes a window image extraction module 831 , a window image feature generation module 832 , a probability value obtaining module 833 , a probability value screening module 834 , a calculation module 835 , and an execution module 836 .
- the window image extraction module 831 is configured to traverse the image to be classified by using a window to extract the window image and scale to the same size.
- the window image extraction module 831 can use the Selective Search algorithm to extract 100 window images from the image to be classified and uniformly scale to 256*256 size.
- the scaled size should meet the dimensions required for the input image of the neural network model.
- the window image feature generation module 832 is configured to input each window image into the retrained neural network model, and acquire data generated by the non-input layer to generate a window image feature.
- the window image feature generation module 832 can input the respective window images into the retrained neural network model, acquire vectors of the plurality of layer outputs specified in the intermediate layer and the output layer, and splicing the vectors of the different layers to obtain the window image features.
- the window image feature generation module 832 can input the respective window images into the retrained neural network model according to multiple scales, and splicing the vectors of different layers corresponding to the images of the same scale, and images of different scales. The corresponding vectors are averaged to obtain window image features.
- the probability value obtaining module 833 is configured to input each window image feature into a linear classifier corresponding to the retrained neural network model for discriminating the preset category, and according to the corresponding linear classification.
- the result of the output of the device obtains a probability value of the image of the object in which the respective window image contains the preset category.
- the probability value screening module 834 is configured to select a probability value with the largest value from the probability values corresponding to the respective window images, and select the probability value with the largest value from the selected probability value and the probability value corresponding to the original neural network model. Specifically, the probability value of the largest selection value among the probability values corresponding to the respective window images is P3, and the probability value corresponding to the original neural network model is P2, and the probability value corresponding to the retrained neural network model is P1.
- the probability value that selects the largest value from P2 and P3 is denoted as max(P2, P3).
- the calculating module 835 is configured to calculate a weighted average of the selected probability value and the probability value corresponding to the retrained neural network model.
- the weighted average of P1 and max(P2, P3) is calculated.
- the weights of P1 and max(P2, P3) can be determined by verifying the accuracy of the image classification by the verification set. Specifically, a plurality of weight combinations, such as 0.1 and 0.9, 0.2 and 0.8, and 0.3 and 0.7, may be prepared in advance, and the image classification accuracy rate under different weight combinations is separately verified by using the verification set, thereby selecting the weight combination with the highest image classification accuracy rate.
- the weight combination is calculated as a weighted average in step 712.
- the validation set includes a collection of images that are known to contain real probability values of object images of the preset categories.
- the executing module 836 is configured to determine, according to the magnitude relationship of the weighted average value and the probability value threshold corresponding to the preset category, whether the image to be classified includes an object image of a preset category.
- the execution module 836 compares the weighted average value with the probability value threshold value of the preset category. If the weighted average value is greater than or equal to the probability value threshold value, determining that the image to be classified includes the object image of the preset category; if the weighted average value is less than
- the probability value threshold determines that the image to be classified does not contain an object image of a preset category.
- the probability value threshold may be, for example, 0.5.
- some categories of discriminating depend on context information.
- the ship's discriminating often requires a sea background, and the corresponding P2 value is larger than P3; while some categories are not dependent on context information, then P3 The value is larger than P2; but if the object image of the preset category is not included, both P2 and P3 are relatively low. Therefore, as long as P2 or P3 has a high level, it is basically possible to determine that the image of the object to be classified contains the object of the preset category is very large, and the image classification accuracy can be further improved.
- the storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Physiology (AREA)
- Image Analysis (AREA)
Abstract
一种图像分类方法,所述方法包括:将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
Description
本申请要求于2015年12月11日提交中国专利局,申请号为201510921073.9,发明名称为“图像分类方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及计算机视觉技术领域,特别是涉及一种图像分类方法、电子设备和存储介质。
图像分类方法是根据在图像信息中所反映的不同特征,把不同类别的目标区分开来的图像处理方法,具体利用计算机对图像进行定量分析,把图像或图像中的每个像元或区域划归为若干个类别中的某一种,以代替人的视觉判读。对图像进行分类后,便可以根据分类结果做进一步应用,比如图像检索、视频监控以及涉及图像的语义分析等各种应用。
目前利用神经网络模型可以实现较为准确的图像分类,但随着图像分类应用的不断扩展和细化,对图像分类准确性的要求不断提高,因此如何提高图像分类的准确性成为目前需要解决的一个重要问题。
发明内容
根据本申请的各种实施例,提供一种可提高图像分类准确性的图像分类方法、电子设备和存储介质。
一种图像分类方法,包括:
将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;
将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;及
根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
一种电子设备,包括存储器和处理器,所述存储器中储存有指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;
将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;及
根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
一个或多个存储有计算机可读指令的计算机可读非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;
将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;及
根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
本发明的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中用于实现图像分类方法的电子设备的结构示意图;
图2为一个实施例中图像分类方法的流程示意图;
图3为一个具体实例中简化的神经网络模型的结构示意图;
图4为一个实施例中一种映射函数的曲线示意图;
图5为一个实施例中将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征的步骤的流程示意图;
图6为一个实施例中重新训练神经网络模型时输入大于标准尺寸的训练图像时输出概率值矩阵的示意图;
图7为一个实施例中根据获得的各个概率值判别待分类的图像是否包含预设类别的物体图像的步骤的流程示意图;
图8为一个实施例中电子设备的结构框图;
图9为一个实施例中电子设备的图像特征提取模块的结构框图;
图10为另一个实施例中电子设备的结构框图;
图11为一个实施例中电子设备的判别模块的结构框图。
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施
例仅仅用以解释本发明,并不用于限定本发明。
如图1所示,在一个实施例中,提供了一种用于实现图像分类方法的电子设备,包括通过系统总线连接的处理器、非易失性存储介质和内存储器。其中处理器具有计算功能和控制该电子设备工作的功能,该处理器被配置为执行一种图像分类方法。非易失性存储介质包括磁存储介质、光存储介质和闪存式存储介质中的至少一种,非易失性存储介质存储有操作系统。非易失性存储介质和内存储器可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种图像分类方法。
如图2所示,在一个实施例中,提供了一种图像分类方法,本实施例以该方法应用于上述图1所示的电子设备来举例说明。该方法具体包括如下步骤:
步骤202,将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征。
其中,待分类的图像是指需要进行分类的图像,可承载于预设格式的图片中,预设格式比如JPEG格式、PNG格式、BMP格式或者GIF格式等。神经网络模型也称为人工神经网络模型(Artificial Neural Networks,简写为ANNs),是一种模拟大脑结构的机器学习模型。在机器学习领域,神经网络经常被用来对较复杂的任务建模。神经网络的规模,包括深度和宽度都是可以调节的,视应用领域和问题规模而定。因为神经网络超强的表达能力,在语音识别、图像分类、人脸识别、自然语言处理以及广告投放等应用领域被广泛应用。
如图3所示的简化的神经网络模型,神经网络模型的结构包括多层,第一层是输入层,最后一层是输出层,中间包括零个或者多个中间层,每一层包括一个或多个节点。输入层规模根据输入变量的数量决定,输出层规模则取决于分类类别数目。隐含层包括多个神经元,调整神经元数量就可以调整神经网络模型的复杂度和表达能力。一般来说,神经网络越宽越深,其建模能力越强。
多个神经网络模型至少为2个神经网络模型,不同的神经网络模型主要是训练所采用的训练集不同,训练集不同是指训练集中的训练图像不同,当然不同的训练集中存在少部分相同的训练图像也是可以接受的,训练图像则是已知了所属类别的图像。不同的神经网络模型的非输出层的架构可以是统一的,具体非输出层的层数和宽度可以是相同的,这里的架构不包括连接不同层的系数。非输出层是指输入层和中间层,而非输入层则是指中间层和输出层。输出层的节点数量为多个。
神经网络模型优选可以是卷积神经网络模型。卷积神经网络模型中相邻两层神经元之间的连接关系,由原来的全连接变为每个神经元只与少数几个神经元连接,并且连接的系数(或称为权值)在神经元之间是相同的,称为共享卷积核,或共享权值。这种类似于卷积的连接方式能极大减少学习的参数,学到一些具有不变性的特征,很适合处理图像数据,用于图像分类时可进一步提高分类准确性。
将待分类的图像输入多个不同的神经网络模型的输出层,获取各神经网络模型的中间层和输出层中的至少一层输出的数据,优选可获取各神经网络模型的中间层和输出层中的至少两层输出的数据,根据获取的数据生成与各神经网络模型一一对应的多个图像特征。神经网络模型的输出层输出的可以是待分类的图像属于预设类别的概率,输出层的每个节点代表一种预设类别。
获取非输入层输出的数据时的非输入层优选可从输出层向输入层的方向选择,比如选择输出层和倒数第二层,或者选择输出层、倒数第二层以及倒数第三层。
步骤204,将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的待分类的图像包含预设类别的物体图像的概率值;线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的。
具体地,各神经网络模型对于每个预设类别分别训练了用于判别相应预设类别的线性分类器,该预设类别的线性分类器是根据已知包含该预设类别
的物体图像的真实概率值的训练图像经过该线性分类器所对应的神经网络模型提取图像特征后训练得到的,提取图像特征时。如果要判别待分类的图像是否包含特定的预设类别的物体图像,则可以将多个图像特征分别输入各神经网络模型对应的用于判别特定的预设类别的线性分类器;如果要判别待分类的图像包含哪种或哪些预设类别,则可以将多个图像特征分别输入各神经网络模型对应的所有线性分类器,每个线性分类器用于判别一种预设类别。包含预设类别的物体图像比如包含电视机的图像、包含狗的图像或者包含人类的图像等。
线性分类器输出的结果可以是一个实数范围,可以利用一个自变量为实数集而因变量为[0,1]的映射函数将线性分类器输出的结果映射为待分类的图像包含预设类别的物体图像的概率值。线性分类器是基于SVM(Support Vector Machine,支持向量机)的线性分类器。其中映射函数的因变量与自变量正相关,即因变量随着自变量的增大而增大,随着自变量的减小而减小。比如映射函数可以采用Sigmoid函数,Sigmoid函数具体为其中e为自然底数,x为自变量,S(x)为因变量。Sigmoid函数的曲线如图4所示。映射函数可以集成到线性分类器中使得线性分类器直接输出待分类的图像包含预设类别的物体图像的概率值。
步骤206,根据获得的各个概率值判别待分类的图像是否包含预设类别的物体图像。
具体地,可以将步骤204中获得的各个概率值求平均值或者加权平均值获得综合的概率值,从而判断该综合的概率值与相应预设类别的概率值阈值的大小,若大于或等于则判定待分类的图像包含预设类别的物体图像;若小于则判定待分类的图像不包含预设类别的物体图像。其中计算加权平均值时各个概率值的权重,可以预先准备若干权重组合,利用验证集分别验证每种权重组合下的图像分类准确率,选取图像分类准确率最高的权重组合作为计
算加权平均值时各概率值的权重。
上述图像分类方法,利用神经网络模型多个非输入层输出的数据来提取待分类的图像的特征,可以更加准确地表达图像的特性。再将图像特征输入相应的神经网络模型对应的用于判别预设类别的线性分类器,利用线性分类器输出的结果获得的概率值能够更加准确地反映出待分类的图像包含预设类别的物体图像的概率值。综合不同的神经网络模型各自对应的用于判别预设类别的线性分类器所对应的概率值,可以进一步提高图像分类的准确性。
如图5所示,在一个实施例中,步骤202具体包括以下步骤:
步骤502,将待分类的图像输入每个神经网络模型。
在一个实施例中,步骤502包括:将待分类的图像按照多个尺度分别输入每个神经网络模型。其中多个尺度的图像均由待分类的图像进行长宽等比例缩放获得。比如可将待分类的图像等比例缩放至较短边为256、384和512三个尺度分别输入每个神经网络模型。
步骤504,获取每个神经网络模型的中间层和输出层中指定的多个层输出的向量。
其中,每个神经网络模型的中间层和输出层中指定的多个层,是指从中间层和输出层构成的层的集合中选择预先指定的至少两个层。比如可以获取每个神经网络模型的输出层、倒数第二层和倒数第三层输出的向量。每个层输出的向量为固定长度的向量。
步骤506,将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征。
具体地,按照预先确定的拼接顺序,将每个神经网络模型的不同层的向量进行拼接,获得与每个神经网络模型一一对应的图形特征,获得的图像特征的数量与神经网络模型的数量是一致的。
在一个实施例中,步骤506具体包括:将每个神经网络模型的相同尺度的图像对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得分别与每个神经网络模型对应的多个图像特征。
具体地,不同尺度的图像输入神经网络模型后,各个神经网络模型的同一层输出的向量长度是固定的,可以通过求取平均值来综合不同尺度的图像的特征。而各个神经网络模型的不同层输出的向量长度一般是不同的,可以通过拼接来综合不同层的图像的特征。将每个神经网络模型的相同尺度的图像对应的不同层的向量拼接,以及将不同尺度的图像对应的向量求平均值并不限定先后顺序,均可以实现。最终获得的图像特征能够准确地表达待分类的图像的特性。
本实施例中,利用神经网络模型的非输入层中多个层输出的向量来生成图像特征,使得图像特征可以更好地表达待分类图像的特性,从而有利于获得更好的分类准确性。
在一个实施例中,该图像分类方法还包括:将采用一种训练集训练的原始的神经网络模型的输出层的系数清空,调整输出层与另外的训练集适配,并采用另外的训练集重新训练得到重新训练的神经网络模型。
其中一种训练集是指训练原始的神经网络模型时所采用的训练集,其中包括若干已知包含预设类别的物体图像的真实概率值的训练图像。另外的训练集则是与训练原始的神经网络模型时所采用的一种训练集不同的训练集。训练集不同预设类别的数目也不同,因此需要根据另外的训练集的预设类别的数量来调整输出层的节点,并清空输出层的系数重新进行训练。
原始的神经网络模型可以采用牛津大学VGG实验室所公布的用ImageNet训练集训练过的神经网络模型,在其它实施例中还可以采用其它公开的神经网络模型,比如谷歌公司开源的神经网络模型。ImageNet是一个计算机视觉系统识别项目,是美国的计算机科学家模拟人类的识别系统建立的,用于从图片识别物体的深度学习模型的建立。
原始的神经网络模型训练采用的ImageNet训练集具有1000个类别,输出层系数的规模是4096*1000(其中4096是倒数第二层的输出个数)。另外的数据集不一定是1000个类别,假如有20类,那输出层的系数的规模是4096*20,所以应当调整输出层与另外的训练集适配再重新进行训练。重新训
练时可以采用FCN(Fully Convolutional Networks,参见Fully Convolutional Networks for Semantic Segmentation,arxiv:1411.4038v2)算法进行训练。
在一个实施例中,在重新训练神经网络模型时,可将调整了输出层的神经网络模型输出层的系数初始化,并将相应训练集中的每个训练图像缩放至相同尺寸输入到该神经网络模型,由输出层输出包含预设类别的物体图像的概率值。其中真实概率值可采用预设类别的物体图像占相应训练图像中的比例。将输出的概率值与相应训练图像的真实概率值进行比较来调整重新训练的神经网络模型的系数,使得输出层输出的概率值与相应的真实概率值的差异减小,并最终完成训练。
将相应训练集中的每个训练图像缩放至相同尺寸输入到该神经网络模型时,若长宽比不同则可以较短边为边长的正方形遍历缩放后的图像获得子图像输入该神经网络模型,直到缩放后的图像的所有像素被遍历。比如可将训练图像按照较短边缩放到256,并根据较长边的大小以每次16像素的间隔多次输入256*256的子图像,直到缩放后的图像的所有像素被遍历。
在一个实施例中,在重新训练神经网络模型时,可基于OverFeat算法,得到训练图像的每个预设类别的物体图像的密集概率空间分布,根据训练图像中物体图像的真实位置可以计算真实的密集概率空间分布,根据OverFeat算法得到的密集概率空间分布和真实的密集概率空间分布可以计算反向传播梯度,从而根据反向传播梯度来调整重新训练的神经网络模型的系数,使得输出层输出的概率值与相应的真实概率值的差异减小。
举例说明,假设重新训练的神经网络模型的系数为向量X,输入的训练图像为I,输出层输出的概率值为y(X,I),给定X和I可以计算出y,而真实的概率值y’是已知的,需要调整向量X使得y与y’尽可能接近。于是优化X使得代价函数E=|y-y’|2最小化。对E求关于X的偏导,得到梯度方向ΔX,与梯度方向ΔX相反的方向-ΔX调整X的值,使得E减小。
假设重新训练的神经网络模型的输入图像的标准尺寸为224*224,那么若输入大于标准尺寸的训练图像,输出层输出相应预设类别的概率值矩阵。
例如输入256*256的图像会得到2*2的概率值矩阵,概率值矩阵中的每个概率值对应输入的训练图像的一个子图像。如图6所示,概率值矩阵的左上角的值只决定于训练图像左上角尺寸为224*224的子图像。概率值矩阵中的概率值,可以取预设类别的物体图像在相应的子图像中的部分占物体图像整体的比例。比如在图6左上角尺寸为224*224的子图像中三角形全部位于该子图像中,相应概率值为1;五角星不在该子图像中故相应概率值为0;圆形有一半在该子图像中则相应概率值为0.5。
本实施例中,通过调整已有的神经网络模型并重新训练可以快速训练获得进行图像分类所需的神经网络模型,大大节省了训练所需时间。
如图7所示,在一个实施例中,步骤206具体包括如下步骤:
步骤702,采用窗口遍历待分类的图像以提取窗口图像并缩放至相同尺寸。
具体地,可对待分类的图像采用Selective Search(选择性搜索)算法从待分类的图像中提取100个窗口图像,并统一缩放至256*256的尺寸。缩放后的尺寸应满足神经网络模型输入图像所需的尺寸。
步骤704,将各个窗口图像输入到重新训练的神经网络模型,并获取非输入层输出的数据生成窗口图像特征。
具体地,可将各个窗口图像输入到重新训练的神经网络模型,获取中间层和输出层中指定的多个层输出的向量,将不同层的向量拼接获得窗口图像特征。
在一个实施例中,可将各个窗口图像分别按照多个尺度输入到重新训练的神经网络模型,将相同尺度的图像对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得窗口图像特征。
步骤706,将各个窗口图像特征分别输入重新训练的神经网络模型对应的用于判别预设类别的线性分类器,并根据相应的线性分类器输出的结果获得各个窗口图像包含预设类别的物体图像的概率值。
步骤708,从各个窗口图像对应的概率值中选择值最大的概率值。具体
地,这里记从各个窗口图像对应的概率值中选择值最大的概率值为P3,另外记与原始的神经网络模型对应的概率值为P2,与重新训练的神经网络模型对应的概率值为P1。
步骤710,从选择的概率值和原始的神经网络模型对应的概率值中选取值最大的概率值。具体地,从P2和P3中选择值最大的概率值记为max(P2,P3)。
步骤712,计算选取的概率值和重新训练的神经网络模型对应的概率值的加权平均值。
具体地,计算P1与max(P2,P3)加权平均值。其中P1与max(P2,P3)的权重可以通过验证集验证图像分类准确性来确定。具体地,可预先准备若干权重组合,比如0.1和0.9,0.2和0.8以及0.3和0.7,利用验证集分别验证在不同的权重组合下的图像分类准确率,从而选取图像分类准确率最高的权重组合作为步骤712中计算加权平均值的权重组合。验证集包括若干已知包含预设类别的物体图像的真实概率值的图像的集合。
步骤714,根据加权平均值与预设类别对应的概率值阈值的大小关系判别待分类的图像是否包含预设类别的物体图像。
具体地,比较加权平均值与预设类别的概率值阈值的大小,若加权平均值大于等于概率值阈值,则判定待分类的图像包含预设类别的物体图像;若加权平均值小于概率值阈值,则判定待分类的图像不包含预设类别的物体图像。其中概率值阈值比如可取0.5。
在一个实施例中,当待分类的图像尺寸大于训练图像的尺寸时,可将待分类的图像划分成若干可部分重叠的子图像,分别通过上述步骤202、204以及步骤702至步骤712得到每个子图像的加权平均值,该加权平均值表示的是综合了各个神经网络模型的综合概率值,各个子图像的综合概率值构成概率空间分布。通过待分类图像的概率空间分布的最大概率值可以代表整个待分类的图像包含预设类别的物体图像的概率,而通过不同预设类别的最大概率值就可以判断待分类的图像包含哪些预设类别的物体图像。
本实施例中,考虑到有些类别的判别依赖于上下文信息,如船的判别经常需要有海当背景,相应的P2的值就比P3大;而有些类别的识别不依赖于上下文信息,那么P3的值就比P2大;但如果不包括预设类别的物体图像,则P2和P3都比较低。因此只要P2或P3有一个很高,基本可以判定待分类的图像中包含预设类别的物体图像的可能性就非常大,可以进一步提高图像分类准确性。
如图8所示,在一个实施例中,提供了一种电子设备800,电子设备800的内部结构可对应于如图1所示的电子设备的结构,下述每个模块可全部或部分通过软件、硬件或其组合来实现。电子设备800包括图像特征提取模块810、线性分类器分类模块820和判别模块830。
图像特征提取模块810,用于将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征。
其中,待分类的图像是指需要进行分类的图像,可承载于预设格式的图片中,预设格式比如JPEG格式、PNG格式、BMP格式或者GIF格式等。神经网络模型也称为人工神经网络模型(Artificial Neural Networks,简写为ANNs),是一种模拟大脑结构的机器学习模型。在机器学习领域,神经网络经常被用来对较复杂的任务建模。神经网络的规模,包括深度和宽度都是可以调节的,视应用领域和问题规模而定。因为神经网络超强的表达能力,在语音识别、图像分类、人脸识别、自然语言处理以及广告投放等应用领域被广泛应用。
如图3所示的简化的神经网络模型,神经网络模型的结构包括多层,第一层是输入层,最后一层是输出层,中间包括零个或者多个中间层,每一层包括一个或多个节点。输入层规模根据输入变量的数量决定,输出层规模则取决于分类类别数目。隐含层包括多个神经元,调整神经元数量就可以调整神经网络模型的复杂度和表达能力。一般来说,神经网络越宽越深,其建模能力越强。
多个神经网络模型至少为2个神经网络模型,不同的神经网络模型主要是训练所采用的训练集不同,训练集不同是指训练集中的训练图像不同,当然不同的训练集中存在少部分相同的训练图像也是可以接受的,训练图像则是已知了所属类别的图像。不同的神经网络模型的非输出层的架构可以是统一的,具体非输出层的层数和宽度可以是相同的,这里的架构不包括连接不同层的系数。非输出层是指输入层和中间层,而非输入层则是指中间层和输出层。输出层的节点数量为多个。
神经网络模型优选可以是卷积神经网络模型。卷积神经网络模型中相邻两层神经元之间的连接关系,由原来的全连接变为每个神经元只与少数几个神经元连接,并且连接的系数在神经元之间是相同的,称为共享卷积核,或共享权值。这种类似于卷积的连接方式能极大减少学习的参数,学到一些具有不变性的特征,很适合处理图像数据,用于图像分类时可进一步提高分类准确性。
图像特征提取模块810用于将待分类的图像输入多个不同的神经网络模型的输出层,获取各神经网络模型的中间层和输出层中的至少一层输出的数据,优选可获取各神经网络模型的中间层和输出层中的至少两层输出的数据,根据获取的数据生成与各神经网络模型一一对应的多个图像特征。神经网络模型的输出层输出的可以是待分类的图像属于预设类别的概率,输出层的每个节点代表一种预设类别。
图像特征提取模块810获取非输入层输出的数据时的非输入层优选可从输出层向输入层的方向选择,比如选择输出层和倒数第二层,或者选择输出层、倒数第二层以及倒数第三层。
线性分类器分类模块820,用于将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的待分类的图像包含预设类别的物体图像的概率值;线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的。
具体地,各神经网络模型对于每个预设类别分别训练了用于判别相应预
设类别的线性分类器,该预设类别的线性分类器是根据已知包含该预设类别的物体图像的真实概率值的训练图像经过该线性分类器所对应的神经网络模型提取图像特征后训练得到的。如果要判别待分类的图像是否包含特定的预设类别的物体图像,则可以将多个图像特征分别输入各神经网络模型对应的用于判别特定的预设类别的线性分类器;如果要判别待分类的图像包含哪种或哪些预设类别,则可以将多个图像特征分别输入各神经网络模型对应的所有线性分类器,每个线性分类器用于判别一种预设类别。包含预设类别的物体图像比如包含电视机的图像、包含狗的图像或者包含人类的图像等。
线性分类器输出的结果可以是一个实数范围,可以利用一个自变量为实数集而因变量为[0,1]的映射函数将线性分类器输出的结果映射为待分类的图像包含预设类别的物体图像的概率值。其中映射函数的因变量与自变量正相关,即因变量随着自变量的增大而增大,随着自变量的减小而减小。比如映射函数可以采用Sigmoid函数,Sigmoid函数具体为其中e为自然底数,x为自变量,S(x)为因变量。映射函数可以集成到线性分类器中使得线性分类器直接输出待分类的图像包含预设类别的物体图像的概率值。
判别模块830,用于根据获得的各个概率值判别待分类的图像是否包含预设类别的物体图像。
具体地,可以将线性分类器分类模块820获得的各个概率值求平均值或者加权平均值获得综合的概率值,从而判断该综合的概率值与相应预设类别的概率值阈值的大小,若大于或等于则判定待分类的图像包含预设类别的物体图像;若小于则判定待分类的图像不包含预设类别的物体图像。其中计算加权平均值时各个概率值的权重,可以预先准备若干权重组合,利用验证集分别验证每种权重组合下的图像分类准确率,选取图像分类准确率最高的权重组合作为计算加权平均值时各概率值的权重。
上述电子设备800,利用神经网络模型多个非输入层输出的数据来提取
待分类的图像的特征,可以更加准确地表达图像的特性。再将图像特征输入相应的神经网络模型对应的用于判别预设类别的线性分类器,利用线性分类器输出的结果获得的概率值能够更加准确地反映出待分类的图像包含预设类别的物体图像的概率值。综合不同的神经网络模型各自对应的用于判别预设类别的线性分类器所对应的概率值,可以进一步提高图像分类的准确性。
如图9所示,在一个实施例中,图像特征提取模块810包括:输入模块811、向量获取模块812和图像特征生成模块813。
输入模块811,用于将待分类的图像输入每个神经网络模型。
向量获取模块812,用于获取每个神经网络模型的中间层和输出层中指定的多个层输出的向量。
其中,每个神经网络模型的中间层和输出层中指定的多个层,是指从中间层和输出层构成的层的集合中选择预先指定的至少两个层。比如可以获取每个神经网络模型的输出层、倒数第二层和倒数第三层输出的向量。每个层输出的向量为固定长度的向量。
图像特征生成模块813,用于将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征。
具体地,图像特征生成模块813用于按照预先确定的拼接顺序,将每个神经网络模型的不同层的向量进行拼接,获得与每个神经网络模型一一对应的图形特征,获得的图像特征的数量与神经网络模型的数量是一致的。
本实施例中,利用神经网络模型的非输入层中多个层输出的向量来生成图像特征,使得图像特征可以更好地表达待分类图像的特性,从而有利于获得更好的分类准确性。
在一个实施例中,输入模块811具体用于将待分类的图像按照多个尺度分别输入每个神经网络模型。其中多个尺度的图像均由待分类的图像进行长宽等比例缩放获得。比如可将待分类的图像等比例缩放至较短边为256、384和512三个尺度分别输入每个神经网络模型。
图像特征生成模块813具体用于将每个神经网络模型的相同尺度的图像
对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得分别与每个神经网络模型对应的多个图像特征。
具体地,不同尺度的图像输入神经网络模型后,各个神经网络模型的同一层输出的向量长度是固定的,可以通过求取平均值来综合不同尺度的图像的特征。而各个神经网络模型的不同层输出的向量长度一般是不同的,可以通过拼接来综合不同层的图像的特征。将每个神经网络模型的相同尺度的图像对应的不同层的向量拼接,以及将不同尺度的图像对应的向量求平均值并不限定先后顺序,均可以实现。最终获得的图像特征能够准确地表达待分类的图像的特性。
如图10所示,在一个实施例中,电子设备800还包括训练模块840,用于将采用一种训练集训练的原始的神经网络模型的输出层的系数清空,调整输出层与另外的训练集适配,并采用另外的训练集重新训练得到重新训练的神经网络模型。
其中一种训练集是指训练原始的神经网络模型时所采用的训练集,其中包括若干已知包含预设类别的物体图像的真实概率值的训练图像。另外的训练集则是与训练原始的神经网络模型时所采用的一种训练集不同的训练集。训练集不同预设类别的数目也不同,因此需要根据另外的训练集的预设类别的数量来调整输出层的节点,并清空输出层的系数重新进行训练。
原始的神经网络模型可以采用牛津大学VGG实验室所公布的用ImageNet训练集训练过的神经网络模型,在其它实施例中还可以采用其它公开的神经网络模型,比如谷歌公司开源的神经网络模型。ImageNet是一个计算机视觉系统识别项目,是美国的计算机科学家模拟人类的识别系统建立的,用于从图片识别物体的深度学习模型的建立。
原始的神经网络模型训练采用的ImageNet训练集具有1000个类别,输出层系数的规模是4096*1000(其中4096是倒数第二层的输出数量)。另外的数据集不一定是1000个类别,假如有20类,那输出层的系数的规模是4096*20,所以应当调整输出层与另外的训练集适配再重新进行训练。重新训
练时可以采用FCN(全卷积网络的语义分割)算法进行训练。
在一个实施例中,训练模块840在重新训练神经网络模型时,可将调整了输出层的神经网络模型输出层的系数初始化,并将相应训练集中的每个训练图像缩放至相同尺寸输入到该神经网络模型,由输出层输出包含预设类别的物体图像的概率值。其中真实概率值可采用预设类别的物体图像占相应训练图像中的比例。将输出的概率值与相应训练图像的真实概率值进行比较来调整重新训练的神经网络模型的系数,使得输出层输出的概率值与相应的真实概率值的差异减小,并最终完成训练。
训练模块840将相应训练集中的每个训练图像缩放至相同尺寸输入到该神经网络模型时,若长宽比不同则可以较短边为边长的正方形遍历缩放后的图像获得子图像输入该神经网络模型,直到缩放后的图像的所有像素被遍历。比如可将训练图像按照较短边缩放到256,并根据较长边的大小以每次16像素的间隔多次输入256*256的子图像,直到缩放后的图像的所有像素被遍历。
在一个实施例中,训练模块840在重新训练神经网络模型时,可基于OverFeat算法,得到训练图像的每个预设类别的物体图像的密集概率空间分布,根据训练图像中物体图像的真实位置可以计算真实的密集概率空间分布,根据OverFeat算法得到的密集概率空间分布和真实的密集概率空间分布可以计算反向传播梯度,从而根据反向传播梯度来调整重新训练的神经网络模型的系数,使得输出层输出的概率值与相应的真实概率值的差异减小。
举例说明,假设重新训练的神经网络模型的系数为向量X,输入的训练图像为I,输出层输出的概率值为y(X,I),给定X和I可以计算出y,而真实的概率值y’是已知的,需要调整向量X使得y与y’尽可能接近。于是优化X使得代价函数E=|y-y’|2最小化。对E求关于X的偏导,得到梯度方向ΔX,与梯度方向ΔX相反的方向-ΔX调整X的值,使得E减小。
假设重新训练的神经网络模型的输入图像的标准尺寸为224*224,那么若输入大于标准尺寸的训练图像,输出层输出相应预设类别的概率值矩阵。例如输入256*256的图像会得到2*2的概率值矩阵,概率值矩阵中的每个概
率值对应输入的训练图像的一个子图像。如图6所示,概率值矩阵的左上角的值只决定于训练图像左上角尺寸为224*224的子图像。概率值矩阵中的概率值,可以取预设类别的物体图像在相应的子图像中的部分占物体图像整体的比例。比如在图6左上角尺寸为224*224的子图像中三角形全部位于该子图像中,相应概率值为1;五角星不在该子图像中故相应概率值为0;圆形有一半在该子图像中则相应概率值为0.5。
本实施例中,通过调整已有的神经网络模型并重新训练可以快速训练获得进行图像分类所需的神经网络模型,大大节省了训练所需时间。
如图11所示,在一个实施例中,判别模块830包括:窗口图像提取模块831、窗口图像特征生成模块832、概率值获得模块833、概率值筛选模块834、计算模块835和执行模块836。
窗口图像提取模块831,用于采用窗口遍历待分类的图像以提取窗口图像并缩放至相同尺寸。
具体地,窗口图像提取模块831可用于对待分类的图像采用Selective Search算法从待分类的图像中提取100个窗口图像,并统一缩放至256*256的尺寸。缩放后的尺寸应满足神经网络模型输入图像所需的尺寸。
窗口图像特征生成模块832,用于将各个窗口图像输入到重新训练的神经网络模型,并获取非输入层输出的数据生成窗口图像特征。
具体地,窗口图像特征生成模块832可将各个窗口图像输入到重新训练的神经网络模型,获取中间层和输出层中指定的多个层输出的向量,将不同层的向量拼接获得窗口图像特征。
在一个实施例中,窗口图像特征生成模块832可将各个窗口图像分别按照多个尺度输入到重新训练的神经网络模型,将相同尺度的图像对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得窗口图像特征。
概率值获得模块833,用于将各个窗口图像特征分别输入重新训练的神经网络模型对应的用于判别预设类别的线性分类器,并根据相应的线性分类
器输出的结果获得各个窗口图像包含预设类别的物体图像的概率值。
概率值筛选模块834,用于从各个窗口图像对应的概率值中选择值最大的概率值;从选择的概率值和原始的神经网络模型对应的概率值中选取值最大的概率值。具体地,这里记从各个窗口图像对应的概率值中选择值最大的概率值为P3,另外记与原始的神经网络模型对应的概率值为P2,与重新训练的神经网络模型对应的概率值为P1。从P2和P3中选择值最大的概率值记为max(P2,P3)。
计算模块835,用于计算选取的概率值和重新训练的神经网络模型对应的概率值的加权平均值。
具体地,计算P1与max(P2,P3)加权平均值。其中P1与max(P2,P3)的权重可以通过验证集验证图像分类准确性来确定。具体地,可预先准备若干权重组合,比如0.1和0.9,0.2和0.8以及0.3和0.7,利用验证集分别验证在不同的权重组合下的图像分类准确率,从而选取图像分类准确率最高的权重组合作为步骤712中计算加权平均值的权重组合。验证集包括若干已知包含预设类别的物体图像的真实概率值的图像的集合。
执行模块836,用于根据加权平均值与预设类别对应的概率值阈值的大小关系判别待分类的图像是否包含预设类别的物体图像。
具体地,执行模块836比较加权平均值与预设类别的概率值阈值的大小,若加权平均值大于等于概率值阈值,则判定待分类的图像包含预设类别的物体图像;若加权平均值小于概率值阈值,则判定待分类的图像不包含预设类别的物体图像。其中概率值阈值比如可取0.5。
本实施例中,考虑到有些类别的判别依赖于上下文信息,如船的判别经常需要有海当背景,相应的P2的值就比P3大;而有些类别的识别不依赖于上下文信息,那么P3的值就比P2大;但如果不包括预设类别的物体图像,则P2和P3都比较低。因此只要P2或P3有一个很高,基本可以判定待分类的图像中包含预设类别的物体图像的可能性就非常大,可以进一步提高图像分类准确性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。
Claims (15)
- 一种图像分类方法,包括:将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;及根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
- 根据权利要求1所述的方法,其特征在于,所述将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征包括:将待分类的图像输入每个神经网络模型;获取每个神经网络模型的中间层和输出层中指定的多个层输出的向量;将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征。
- 根据权利要求2所述的方法,其特征在于,所述将待分类的图像输入每个神经网络模型包括:将待分类的图像按照多个尺度分别输入每个神经网络模型;所述将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征,包括:将每个神经网络模型的相同尺度的图像对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得分别与每个神经网络模型对应的多个图像特征。
- 根据权利要求1所述的方法,其特征在于,还包括:将采用一种训练集训练的原始的神经网络模型的输出层的系数清空,调 整输出层与另外的训练集适配,并采用所述另外的训练集重新训练得到重新训练的神经网络模型。
- 根据权利要求4所述的方法,其特征在于,所述根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像包括:采用窗口遍历待分类的图像以提取窗口图像并缩放至相同尺寸;将各个窗口图像输入到重新训练的神经网络模型,并获取非输入层输出的数据生成窗口图像特征;将各个窗口图像特征分别输入重新训练的神经网络模型对应的用于判别预设类别的线性分类器,并根据相应的线性分类器输出的结果获得各个窗口图像包含预设类别的物体图像的概率值;从各个窗口图像对应的概率值中选择值最大的概率值;从选择的概率值和原始的神经网络模型对应的概率值中选取值最大的概率值;计算选取的概率值和重新训练的神经网络模型对应的概率值的加权平均值;及根据所述加权平均值与预设类别对应的概率值阈值的大小关系判别所述待分类的图像是否包含预设类别的物体图像。
- 一种电子设备,包括存储器和处理器,所述存储器中储存有指令,其特征在于,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;及根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
- 根据权利要求6所述的电子设备,其特征在于,所述将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征包括:将待分类的图像输入每个神经网络模型;获取每个神经网络模型的中间层和输出层中指定的多个层输出的向量;将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征。
- 根据权利要求7所述的电子设备,其特征在于,所述将待分类的图像输入每个神经网络模型包括:将待分类的图像按照多个尺度分别输入每个神经网络模型;所述将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征,包括:将每个神经网络模型的相同尺度的图像对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得分别与每个神经网络模型对应的多个图像特征。
- 根据权利要求6所述的电子设备,其特征在于,所述指令被所述处理器执行时,还使得所述处理器执行以下步骤:将采用一种训练集训练的原始的神经网络模型的输出层的系数清空,调整输出层与另外的训练集适配,并采用所述另外的训练集重新训练得到重新训练的神经网络模型。
- 根据权利要求9所述的电子设备,其特征在于,所述根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像包括:采用窗口遍历待分类的图像以提取窗口图像并缩放至相同尺寸;将各个窗口图像输入到重新训练的神经网络模型,并获取非输入层输出的数据生成窗口图像特征;将各个窗口图像特征分别输入重新训练的神经网络模型对应的用于判别预设类别的线性分类器,并根据相应的线性分类器输出的结果获得各个窗口 图像包含预设类别的物体图像的概率值;从各个窗口图像对应的概率值中选择值最大的概率值;从选择的概率值和原始的神经网络模型对应的概率值中选取值最大的概率值;计算选取的概率值和重新训练的神经网络模型对应的概率值的加权平均值;及根据所述加权平均值与预设类别对应的概率值阈值的大小关系判别所述待分类的图像是否包含预设类别的物体图像。
- 一个或多个存储有计算机可读指令的计算机可读非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征;将多个图像特征分别输入各神经网络模型对应的用于判别预设类别的线性分类器,获得相应的所述待分类的图像包含预设类别的物体图像的概率值;所述线性分类器是根据由相应的神经网络模型提取的相应的训练图像的特征进行训练得到的;及根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像。
- 根据权利要求11所述的计算机可读非易失性存储介质,其特征在于,所述将待分类的图像输入多个不同的神经网络模型,获取各神经网络模型的指定的多个非输入层输出的数据生成相应的多个图像特征包括:将待分类的图像输入每个神经网络模型;获取每个神经网络模型的中间层和输出层中指定的多个层输出的向量;将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征。
- 根据权利要求12所述的计算机可读非易失性存储介质,其特征在于, 所述将待分类的图像输入每个神经网络模型包括:将待分类的图像按照多个尺度分别输入每个神经网络模型;所述将每个神经网络模型的不同层的向量拼接,获得分别与每个神经网络模型对应的多个图像特征,包括:将每个神经网络模型的相同尺度的图像对应的不同层的向量拼接,并将不同尺度的图像对应的向量求平均值,获得分别与每个神经网络模型对应的多个图像特征。
- 根据权利要求11所述的计算机可读非易失性存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,还使得所述一个或多个处理器执行以下步骤:将采用一种训练集训练的原始的神经网络模型的输出层的系数清空,调整输出层与另外的训练集适配,并采用所述另外的训练集重新训练得到重新训练的神经网络模型。
- 根据权利要求14所述的计算机可读非易失性存储介质,其特征在于,所述根据获得的各个概率值判别所述待分类的图像是否包含预设类别的物体图像包括:采用窗口遍历待分类的图像以提取窗口图像并缩放至相同尺寸;将各个窗口图像输入到重新训练的神经网络模型,并获取非输入层输出的数据生成窗口图像特征;将各个窗口图像特征分别输入重新训练的神经网络模型对应的用于判别预设类别的线性分类器,并根据相应的线性分类器输出的结果获得各个窗口图像包含预设类别的物体图像的概率值;从各个窗口图像对应的概率值中选择值最大的概率值;从选择的概率值和原始的神经网络模型对应的概率值中选取值最大的概率值;计算选取的概率值和重新训练的神经网络模型对应的概率值的加权平均值;及根据所述加权平均值与预设类别对应的概率值阈值的大小关系判别所述待分类的图像是否包含预设类别的物体图像。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16871963.1A EP3388978B1 (en) | 2015-12-11 | 2016-05-23 | Image classification method, electronic device, and storage medium |
US15/703,027 US10325181B2 (en) | 2015-12-11 | 2017-09-13 | Image classification method, electronic device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510921073.9A CN106874921B (zh) | 2015-12-11 | 2015-12-11 | 图像分类方法和装置 |
CN201510921073.9 | 2015-12-11 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/703,027 Continuation US10325181B2 (en) | 2015-12-11 | 2017-09-13 | Image classification method, electronic device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017096758A1 true WO2017096758A1 (zh) | 2017-06-15 |
Family
ID=59012622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/083064 WO2017096758A1 (zh) | 2015-12-11 | 2016-05-23 | 图像分类方法、电子设备和存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US10325181B2 (zh) |
EP (1) | EP3388978B1 (zh) |
CN (1) | CN106874921B (zh) |
WO (1) | WO2017096758A1 (zh) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109124635A (zh) * | 2018-09-25 | 2019-01-04 | 上海联影医疗科技有限公司 | 模型生成方法、磁共振成像扫描方法及系统 |
WO2019041406A1 (zh) * | 2017-08-28 | 2019-03-07 | 平安科技(深圳)有限公司 | 不雅图片识别方法、终端、设备及计算机可读存储介质 |
CN109766823A (zh) * | 2019-01-07 | 2019-05-17 | 浙江大学 | 一种基于深层卷积神经网络的高分辨率遥感船舶检测方法 |
CN109766742A (zh) * | 2018-11-20 | 2019-05-17 | 安徽农业大学 | 一种玉米籽裂纹识别方法、装置、系统、设备和存储介质 |
CN110427970A (zh) * | 2019-07-05 | 2019-11-08 | 平安科技(深圳)有限公司 | 图像分类方法、装置、计算机设备和存储介质 |
CN111507989A (zh) * | 2020-04-15 | 2020-08-07 | 上海眼控科技股份有限公司 | 语义分割模型的训练生成方法、车辆外观检测方法、装置 |
CN111814813A (zh) * | 2019-04-10 | 2020-10-23 | 北京市商汤科技开发有限公司 | 神经网络训练和图像分类方法与装置 |
CN111931840A (zh) * | 2020-08-04 | 2020-11-13 | 中国建设银行股份有限公司 | 一种图片分类的方法、装置、设备及存储介质 |
CN112446403A (zh) * | 2019-09-03 | 2021-03-05 | 顺丰科技有限公司 | 装载率识别方法、装置、计算机设备和存储介质 |
CN112580716A (zh) * | 2020-12-16 | 2021-03-30 | 北京百度网讯科技有限公司 | 图谱中边类型的识别方法、装置、设备及存储介质 |
CN112966623A (zh) * | 2021-03-16 | 2021-06-15 | 长安大学 | 一种多层宽度神经网络及其训练方法和应用 |
CN113535951A (zh) * | 2021-06-21 | 2021-10-22 | 深圳大学 | 用于进行信息分类的方法、装置、终端设备及存储介质 |
WO2021217775A1 (zh) * | 2020-04-27 | 2021-11-04 | 江苏金恒信息科技股份有限公司 | 一种基于神经网络模型融合的废钢评级方法及装置 |
US11259770B2 (en) * | 2019-11-14 | 2022-03-01 | GE Precision Healthcare LLC | Methods and systems for noise reduction in x-ray imaging |
CN114140637A (zh) * | 2021-10-21 | 2022-03-04 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像分类方法、存储介质和电子设备 |
Families Citing this family (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3050855B1 (fr) * | 2016-04-27 | 2019-05-03 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | Dispositif et procede de calcul de convolution d'un reseau de neurones convolutionnel |
US10043088B2 (en) * | 2016-06-23 | 2018-08-07 | Siemens Healthcare Gmbh | Image quality score using a deep generative machine-learning model |
JP6633462B2 (ja) * | 2016-06-29 | 2020-01-22 | 株式会社東芝 | 情報処理装置および情報処理方法 |
EP3474189A1 (en) * | 2017-10-18 | 2019-04-24 | Aptiv Technologies Limited | A device and a method for assigning labels of a plurality of predetermined classes to pixels of an image |
CN110399929B (zh) | 2017-11-01 | 2023-04-28 | 腾讯科技(深圳)有限公司 | 眼底图像分类方法、装置以及计算机可读存储介质 |
CN107679525B (zh) * | 2017-11-01 | 2022-11-29 | 腾讯科技(深圳)有限公司 | 图像分类方法、装置及计算机可读存储介质 |
US11495057B2 (en) | 2017-12-08 | 2022-11-08 | Nec Corporation | Person verification device and method and non-transitory computer readable media |
CN108156519B (zh) * | 2017-12-25 | 2020-12-11 | 深圳Tcl新技术有限公司 | 图像分类方法、电视设备及计算机可读存储介质 |
CN108319968A (zh) * | 2017-12-27 | 2018-07-24 | 中国农业大学 | 一种基于模型融合的果蔬图像分类识别方法及系统 |
US11734567B2 (en) * | 2018-02-13 | 2023-08-22 | Samsung Electronics Co., Ltd. | Method and system for reducing deep neural network architectures |
CN108280490A (zh) * | 2018-02-28 | 2018-07-13 | 北京邮电大学 | 一种基于卷积神经网络的细粒度车型识别方法 |
WO2019171116A1 (en) * | 2018-03-05 | 2019-09-12 | Omron Corporation | Method and device for recognizing object |
CN108416168B (zh) * | 2018-03-29 | 2021-11-02 | 北京航空航天大学 | 基于分层决策的地形适配区选取方案 |
CN108629772B (zh) * | 2018-05-08 | 2023-10-03 | 上海商汤智能科技有限公司 | 图像处理方法及装置、计算机设备和计算机存储介质 |
CN108765449B (zh) * | 2018-05-16 | 2022-04-26 | 南京信息工程大学 | 一种基于卷积神经网络的图像背景分割和识别方法 |
JP6977886B2 (ja) * | 2018-06-28 | 2021-12-08 | 株式会社島津製作所 | 機械学習方法、機械学習装置、及び機械学習プログラム |
CN110163049B (zh) * | 2018-07-18 | 2023-08-29 | 腾讯科技(深圳)有限公司 | 一种人脸属性预测方法、装置及存储介质 |
US10991092B2 (en) * | 2018-08-13 | 2021-04-27 | Siemens Healthcare Gmbh | Magnetic resonance imaging quality classification based on deep machine-learning to account for less training data |
US10769428B2 (en) * | 2018-08-13 | 2020-09-08 | Google Llc | On-device image recognition |
CN109447097B (zh) * | 2018-08-23 | 2021-01-08 | 浙江理工大学 | 一种基于卷积神经网络的面料主成分检测方法 |
TWI691930B (zh) * | 2018-09-19 | 2020-04-21 | 財團法人工業技術研究院 | 基於神經網路的分類方法及其分類裝置 |
US10915734B2 (en) | 2018-09-28 | 2021-02-09 | Apple Inc. | Network performance by including attributes |
CN111104954B (zh) * | 2018-10-26 | 2023-11-14 | 华为云计算技术有限公司 | 一种对象分类的方法与装置 |
CN109151615B (zh) * | 2018-11-02 | 2022-01-25 | 湖南双菱电子科技有限公司 | 视频处理方法、计算机设备和计算机存储介质 |
US11640522B2 (en) | 2018-12-13 | 2023-05-02 | Tybalt, Llc | Computational efficiency improvements for artificial neural networks |
US10963757B2 (en) * | 2018-12-14 | 2021-03-30 | Industrial Technology Research Institute | Neural network model fusion method and electronic device using the same |
CN111382758B (zh) * | 2018-12-28 | 2023-12-26 | 杭州海康威视数字技术股份有限公司 | 训练图像分类模型、图像分类方法、装置、设备及介质 |
CN109784415B (zh) * | 2019-01-25 | 2021-02-26 | 北京地平线机器人技术研发有限公司 | 图像识别方法及装置、训练卷积神经网络的方法及装置 |
CN111488893B (zh) * | 2019-01-25 | 2023-05-30 | 银河水滴科技(北京)有限公司 | 一种图像分类方法及装置 |
US11003947B2 (en) * | 2019-02-25 | 2021-05-11 | Fair Isaac Corporation | Density based confidence measures of neural networks for reliable predictions |
CN111626400B (zh) * | 2019-02-28 | 2024-03-15 | 佳能株式会社 | 多层神经网络模型的训练和应用方法、装置及存储介质 |
CN110009800B (zh) * | 2019-03-14 | 2023-04-07 | 北京京东乾石科技有限公司 | 一种识别方法和设备 |
WO2020197501A1 (en) * | 2019-03-26 | 2020-10-01 | Agency For Science, Technology And Research | Method and system for image classification |
CN111800287B (zh) * | 2019-04-09 | 2023-07-18 | Oppo广东移动通信有限公司 | 数据处理方法、装置、存储介质及电子设备 |
CN110070122B (zh) * | 2019-04-15 | 2022-05-06 | 沈阳理工大学 | 一种基于图像增强的卷积神经网络模糊图像分类方法 |
CN110188613A (zh) * | 2019-04-28 | 2019-08-30 | 上海鹰瞳医疗科技有限公司 | 图像分类方法及设备 |
CN110222733B (zh) * | 2019-05-17 | 2021-05-11 | 嘉迈科技(海南)有限公司 | 一种高精度的多阶神经网络分类方法及系统 |
CN110246134A (zh) * | 2019-06-24 | 2019-09-17 | 株洲时代电子技术有限公司 | 一种钢轨伤损分类装置 |
CN110321968B (zh) * | 2019-07-11 | 2023-05-05 | 广东工业大学 | 一种超声图像分类装置 |
CN110472675B (zh) * | 2019-07-31 | 2023-04-18 | Oppo广东移动通信有限公司 | 图像分类方法、图像分类装置、存储介质与电子设备 |
US10993465B2 (en) | 2019-08-08 | 2021-05-04 | NotCo Delaware, LLC | Method of classifying flavors |
WO2021051268A1 (zh) * | 2019-09-17 | 2021-03-25 | 深圳市大疆创新科技有限公司 | 基于机器视觉的树木种类识别方法及装置 |
CN112529146B (zh) * | 2019-09-18 | 2023-10-17 | 华为技术有限公司 | 神经网络模型训练的方法和装置 |
CN110956101B (zh) * | 2019-11-19 | 2020-08-07 | 广东省城乡规划设计研究院 | 一种基于随机森林算法的遥感影像黄河冰凌检测方法 |
CN110675415B (zh) * | 2019-12-05 | 2020-05-15 | 北京同方软件有限公司 | 一种基于深度学习增强实例分割的道路积水区域检测方法 |
CN111046949A (zh) * | 2019-12-10 | 2020-04-21 | 东软集团股份有限公司 | 一种图像分类方法、装置及设备 |
CN111401215B (zh) * | 2020-03-12 | 2023-10-31 | 杭州涂鸦信息技术有限公司 | 一种多类别目标检测的方法及系统 |
CN111091132B (zh) * | 2020-03-19 | 2021-01-15 | 腾讯科技(深圳)有限公司 | 基于人工智能的图像识别方法、装置、计算机设备及介质 |
CN111860491A (zh) * | 2020-04-17 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | 一种车辆真伪的认证方法、认证装置以及可读存储介质 |
CN111582372B (zh) * | 2020-05-09 | 2024-06-14 | 西交利物浦大学 | 图像分类方法、模型、存储介质及电子设备 |
JP7486349B2 (ja) * | 2020-05-28 | 2024-05-17 | キヤノン株式会社 | ニューラルネットワーク、ニューラルネットワークの学習方法、プログラム、画像処理装置 |
CN111738316B (zh) * | 2020-06-10 | 2024-02-02 | 北京字节跳动网络技术有限公司 | 零样本学习的图像分类方法、装置及电子设备 |
WO2022041222A1 (en) * | 2020-08-31 | 2022-03-03 | Top Team Technology Development Limited | Process and system for image classification |
CN112163631A (zh) * | 2020-10-14 | 2021-01-01 | 山东黄金矿业(莱州)有限公司三山岛金矿 | 一种用于溜井处基于视频分析的金矿矿质分析方法 |
US10962473B1 (en) | 2020-11-05 | 2021-03-30 | NotCo Delaware, LLC | Protein secondary structure prediction |
CN112580617B (zh) * | 2021-03-01 | 2021-06-18 | 中国科学院自动化研究所 | 自然场景下的表情识别方法和装置 |
US11893792B2 (en) * | 2021-03-25 | 2024-02-06 | Adobe Inc. | Integrating video content into online product listings to demonstrate product features |
EP4083858A1 (en) | 2021-04-29 | 2022-11-02 | Siemens Aktiengesellschaft | Training data set reduction and image classification |
US11514350B1 (en) | 2021-05-04 | 2022-11-29 | NotCo Delaware, LLC | Machine learning driven experimental design for food technology |
US11205101B1 (en) * | 2021-05-11 | 2021-12-21 | NotCo Delaware, LLC | Formula and recipe generation with feedback loop |
CN114972834B (zh) * | 2021-05-12 | 2023-09-05 | 中移互联网有限公司 | 多层次多分类器的图像分类方法及装置 |
CN113344040A (zh) * | 2021-05-20 | 2021-09-03 | 深圳索信达数据技术有限公司 | 图像分类方法、装置、计算机设备和存储介质 |
CN113033518B (zh) * | 2021-05-25 | 2021-08-31 | 北京中科闻歌科技股份有限公司 | 图像检测方法、装置、电子设备及存储介质 |
US11348664B1 (en) | 2021-06-17 | 2022-05-31 | NotCo Delaware, LLC | Machine learning driven chemical compound replacement technology |
CN113313063A (zh) * | 2021-06-21 | 2021-08-27 | 暨南大学 | 麦穗检测方法、电子装置和存储介质 |
US11404144B1 (en) | 2021-11-04 | 2022-08-02 | NotCo Delaware, LLC | Systems and methods to suggest chemical compounds using artificial intelligence |
US11373107B1 (en) | 2021-11-04 | 2022-06-28 | NotCo Delaware, LLC | Systems and methods to suggest source ingredients using artificial intelligence |
US20230326215A1 (en) * | 2022-04-07 | 2023-10-12 | Waymo Llc | End-to-end object tracking using neural networks with attention |
CN115187819B (zh) * | 2022-08-23 | 2023-05-16 | 北京医准智能科技有限公司 | 图像分类模型的训练方法、装置、电子设备及存储介质 |
US11982661B1 (en) | 2023-05-30 | 2024-05-14 | NotCo Delaware, LLC | Sensory transformer method of generating ingredients and formulas |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001331799A (ja) * | 2000-03-16 | 2001-11-30 | Toshiba Corp | 画像処理装置および画像処理方法 |
US20050105780A1 (en) * | 2003-11-14 | 2005-05-19 | Sergey Ioffe | Method and apparatus for object recognition using probability models |
US7274822B2 (en) * | 2003-06-30 | 2007-09-25 | Microsoft Corporation | Face annotation for photo management |
CN103927510A (zh) * | 2013-01-11 | 2014-07-16 | 富士施乐株式会社 | 图像识别装置和图像识别方法 |
CN104156464A (zh) * | 2014-08-20 | 2014-11-19 | 中国科学院重庆绿色智能技术研究院 | 基于微视频特征数据库的微视频检索方法及装置 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7016885B1 (en) * | 2001-08-28 | 2006-03-21 | University Of Central Florida Research Foundation, Inc. | Self-designing intelligent signal processing system capable of evolutional learning for classification/recognition of one and multidimensional signals |
CN102622607B (zh) * | 2012-02-24 | 2013-09-25 | 河海大学 | 一种基于多特征融合的遥感图像分类方法 |
CN102629320B (zh) * | 2012-03-27 | 2014-08-27 | 中国科学院自动化研究所 | 基于特征层定序测量统计描述的人脸识别方法 |
CN102831447B (zh) * | 2012-08-30 | 2015-01-21 | 北京理工大学 | 多类别面部表情高精度识别方法 |
CN104899856B (zh) * | 2014-03-07 | 2018-11-27 | 清华大学 | 图像处理方法及装置 |
CN104281853B (zh) * | 2014-09-02 | 2017-11-17 | 电子科技大学 | 一种基于3d卷积神经网络的行为识别方法 |
CN104504658A (zh) * | 2014-12-15 | 2015-04-08 | 中国科学院深圳先进技术研究院 | 基于bp神经网络的单一图像去雾方法及装置 |
CN107430677B (zh) * | 2015-03-20 | 2022-04-12 | 英特尔公司 | 基于对二进制卷积神经网络特征进行提升的目标识别 |
US20160283864A1 (en) * | 2015-03-27 | 2016-09-29 | Qualcomm Incorporated | Sequential image sampling and storage of fine-tuned features |
CN104850890B (zh) * | 2015-04-14 | 2017-09-26 | 西安电子科技大学 | 基于实例学习和Sadowsky分布的卷积神经网络参数调整方法 |
US10860887B2 (en) * | 2015-11-16 | 2020-12-08 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing object, and method and apparatus for training recognition model |
US11003988B2 (en) * | 2016-11-23 | 2021-05-11 | General Electric Company | Hardware system design improvement using deep learning algorithms |
-
2015
- 2015-12-11 CN CN201510921073.9A patent/CN106874921B/zh active Active
-
2016
- 2016-05-23 EP EP16871963.1A patent/EP3388978B1/en active Active
- 2016-05-23 WO PCT/CN2016/083064 patent/WO2017096758A1/zh unknown
-
2017
- 2017-09-13 US US15/703,027 patent/US10325181B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001331799A (ja) * | 2000-03-16 | 2001-11-30 | Toshiba Corp | 画像処理装置および画像処理方法 |
US7274822B2 (en) * | 2003-06-30 | 2007-09-25 | Microsoft Corporation | Face annotation for photo management |
US20050105780A1 (en) * | 2003-11-14 | 2005-05-19 | Sergey Ioffe | Method and apparatus for object recognition using probability models |
CN103927510A (zh) * | 2013-01-11 | 2014-07-16 | 富士施乐株式会社 | 图像识别装置和图像识别方法 |
CN104156464A (zh) * | 2014-08-20 | 2014-11-19 | 中国科学院重庆绿色智能技术研究院 | 基于微视频特征数据库的微视频检索方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3388978A4 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019041406A1 (zh) * | 2017-08-28 | 2019-03-07 | 平安科技(深圳)有限公司 | 不雅图片识别方法、终端、设备及计算机可读存储介质 |
CN109124635A (zh) * | 2018-09-25 | 2019-01-04 | 上海联影医疗科技有限公司 | 模型生成方法、磁共振成像扫描方法及系统 |
CN109766742A (zh) * | 2018-11-20 | 2019-05-17 | 安徽农业大学 | 一种玉米籽裂纹识别方法、装置、系统、设备和存储介质 |
CN109766823A (zh) * | 2019-01-07 | 2019-05-17 | 浙江大学 | 一种基于深层卷积神经网络的高分辨率遥感船舶检测方法 |
CN111814813A (zh) * | 2019-04-10 | 2020-10-23 | 北京市商汤科技开发有限公司 | 神经网络训练和图像分类方法与装置 |
CN110427970A (zh) * | 2019-07-05 | 2019-11-08 | 平安科技(深圳)有限公司 | 图像分类方法、装置、计算机设备和存储介质 |
CN110427970B (zh) * | 2019-07-05 | 2023-08-01 | 平安科技(深圳)有限公司 | 图像分类方法、装置、计算机设备和存储介质 |
CN112446403A (zh) * | 2019-09-03 | 2021-03-05 | 顺丰科技有限公司 | 装载率识别方法、装置、计算机设备和存储介质 |
US11259770B2 (en) * | 2019-11-14 | 2022-03-01 | GE Precision Healthcare LLC | Methods and systems for noise reduction in x-ray imaging |
CN111507989A (zh) * | 2020-04-15 | 2020-08-07 | 上海眼控科技股份有限公司 | 语义分割模型的训练生成方法、车辆外观检测方法、装置 |
WO2021217775A1 (zh) * | 2020-04-27 | 2021-11-04 | 江苏金恒信息科技股份有限公司 | 一种基于神经网络模型融合的废钢评级方法及装置 |
CN111931840A (zh) * | 2020-08-04 | 2020-11-13 | 中国建设银行股份有限公司 | 一种图片分类的方法、装置、设备及存储介质 |
CN112580716A (zh) * | 2020-12-16 | 2021-03-30 | 北京百度网讯科技有限公司 | 图谱中边类型的识别方法、装置、设备及存储介质 |
CN112580716B (zh) * | 2020-12-16 | 2023-07-11 | 北京百度网讯科技有限公司 | 图谱中边类型的识别方法、装置、设备及存储介质 |
CN112966623A (zh) * | 2021-03-16 | 2021-06-15 | 长安大学 | 一种多层宽度神经网络及其训练方法和应用 |
CN112966623B (zh) * | 2021-03-16 | 2024-03-19 | 长安大学 | 一种多层宽度神经网络及其训练方法和应用 |
CN113535951A (zh) * | 2021-06-21 | 2021-10-22 | 深圳大学 | 用于进行信息分类的方法、装置、终端设备及存储介质 |
CN114140637A (zh) * | 2021-10-21 | 2022-03-04 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像分类方法、存储介质和电子设备 |
CN114140637B (zh) * | 2021-10-21 | 2023-09-12 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像分类方法、存储介质和电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN106874921B (zh) | 2020-12-04 |
CN106874921A (zh) | 2017-06-20 |
EP3388978A1 (en) | 2018-10-17 |
EP3388978B1 (en) | 2021-02-24 |
US10325181B2 (en) | 2019-06-18 |
US20180012107A1 (en) | 2018-01-11 |
EP3388978A4 (en) | 2019-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017096758A1 (zh) | 图像分类方法、电子设备和存储介质 | |
CN109859190B (zh) | 一种基于深度学习的目标区域检测方法 | |
CN111738231B (zh) | 目标对象检测方法、装置、计算机设备和存储介质 | |
US10810745B2 (en) | Method and apparatus with image segmentation | |
KR102224253B1 (ko) | 심층 네트워크와 랜덤 포레스트가 결합된 앙상블 분류기의 경량화를 위한 교사-학생 프레임워크 및 이를 기반으로 하는 분류 방법 | |
US20230154170A1 (en) | Method and apparatus with multi-modal feature fusion | |
CN112991413A (zh) | 自监督深度估测方法和系统 | |
KR20160096460A (ko) | 복수의 분류기를 포함하는 딥 러닝 기반 인식 시스템 및 그 제어 방법 | |
CN109783666A (zh) | 一种基于迭代精细化的图像场景图谱生成方法 | |
KR102252439B1 (ko) | 이미지에서 오브젝트 검출 및 표현 | |
CN107784288A (zh) | 一种基于深度神经网络的迭代定位式人脸检测方法 | |
Ouyang et al. | Vehicle target detection in complex scenes based on YOLOv3 algorithm | |
CN109903339B (zh) | 一种基于多维融合特征的视频群体人物定位检测方法 | |
EP4404148A1 (en) | Image processing method and apparatus, and computer-readable storage medium | |
US11367206B2 (en) | Edge-guided ranking loss for monocular depth prediction | |
CN108875505A (zh) | 基于神经网络的行人再识别方法和装置 | |
Khellal et al. | Pedestrian classification and detection in far infrared images | |
CN112465847A (zh) | 一种基于预测清晰边界的边缘检测方法、装置及设备 | |
CN116977265A (zh) | 缺陷检测模型的训练方法、装置、计算机设备和存储介质 | |
CN111950545B (zh) | 一种基于MSDNet和空间划分的场景文本检测方法 | |
CN113837062A (zh) | 一种分类方法、装置、存储介质及电子设备 | |
CN115731530A (zh) | 一种模型训练方法及其装置 | |
CN113192085A (zh) | 三维器官图像分割方法、装置及计算机设备 | |
KR20220062961A (ko) | 자율주행 자동차의 판단에 대한 근거 설명 모델 | |
KR101763259B1 (ko) | 데이터를 구분하는 전자 장치 및 그 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16871963 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |