CN109993201B - Image processing method, device and readable storage medium - Google Patents
Image processing method, device and readable storage medium Download PDFInfo
- Publication number
- CN109993201B CN109993201B CN201910114848.XA CN201910114848A CN109993201B CN 109993201 B CN109993201 B CN 109993201B CN 201910114848 A CN201910114848 A CN 201910114848A CN 109993201 B CN109993201 B CN 109993201B
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- convolutional neural
- information
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 68
- 238000000034 method Methods 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 106
- 239000013598 vector Substances 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 17
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 11
- 241000282326 Felis catus Species 0.000 description 8
- 238000012706 support-vector machine Methods 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 5
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to artificial intelligence technology, and particularly discloses an image processing method, an image processing device and a readable storage medium, wherein the method comprises the following steps: acquiring image information; according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information; transmitting the characteristic information to a classifier; and the classifier classifies the characteristic information to obtain classification information. By the technical scheme, the minimum feature distance in the classifier is judged, and the classification information is used as a final classification result when the minimum feature distance is larger than the preset distance threshold value, so that the accuracy of image classification can be improved. In addition, the convolutional layer in the neural network is used as the feature extractor for extracting the feature information, so that the accuracy of recognition can be further improved. In the image training process, the invention carries out mode distortion enhancement on the image, and further increases the accuracy of image identification.
Description
Technical Field
The invention relates to the technical field of image processing in the field of artificial intelligence, in particular to an image processing method based on a deep neural network, a terminal and a computer-readable storage medium.
Background
With the rapid development of machine learning technology, convolutional neural networks are increasingly applied to computer vision, especially in the field of image classification.
Convolutional Neural Networks (CNNs) are a powerful machine learning technique used in the field of deep learning. Simple neural networks have been used as machine learning techniques in the fields of character recognition and image classification. However, training such a neural network requires a large amount of marker data. A simple way to exploit the CNN functionality without taking time and effort to train is to use a pre-trained CNN as a feature extractor. In order to generate complex decision surfaces, support Vector Machines (SVMs) are a very economical method for representing complex surfaces in high-dimensional space, including polynomials and other types of surfaces.
In the related art, convolutional neural networks are a powerful machine learning technique used in the field of deep learning. Simple convolutional neural networks have been used as machine learning techniques in the fields of character recognition and image classification. However, in the prior art, the classification accuracy of the images is not high by adopting simple CNN and SVM technologies, so it is not desirable to design a scheme capable of combining the two technologies to improve the classification accuracy of image recognition.
Disclosure of Invention
In order to solve at least one of the above technical problems, the present invention proposes an image processing method, apparatus and readable storage medium.
The first aspect of the present invention provides an image processing method, including:
Acquiring image information;
according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information;
transmitting the characteristic information to a classifier;
The classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;
judging whether the minimum characteristic distance is larger than a preset distance threshold value or not;
And if the classification information is larger than the final classification information, taking the classification information of the classifier as final classification information.
In this scheme, the feature extractor is the feature extractor that sets up through training in advance, specifically is:
acquiring a large-scale object image dataset;
Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet;
acquiring a convolution layer of the deep convolution neural network AlexNet;
the convolution layer is used as a feature extractor.
In this solution, the first layer of the deep convolutional neural network AlexNet includes a filter for capturing edge features, the middle layer of the deep convolutional neural network AlexNet includes a plurality of convolutional layers and a maximum pool layer, the last layer of the deep convolutional neural network AlexNet is a classification layer, and the weights of the layers of the deep convolutional neural network AlexNet are determined by training.
In this scheme, the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.
In the scheme, the classifier is a classifier which is set through pre-training.
In this scheme, the training the image of the large-scale object image dataset to obtain the deep convolutional neural network AlexNet further includes:
processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;
Training is carried out on the processed image to obtain a new deep convolutional neural network AlexNet.
Specifically, the mode enhancement technique is one or more of rotation, tilting, elastic torsion and cosine function enhancement.
A second aspect of the present invention provides an image processing apparatus, comprising: the image processing device comprises a memory and a processor, wherein the memory comprises an image processing method program, and the image processing method program realizes the following steps when being executed by the processor:
Acquiring image information;
according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information;
transmitting the characteristic information to a classifier;
The classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;
judging whether the minimum characteristic distance is larger than a preset distance threshold value or not;
And if the classification information is larger than the final classification information, taking the classification information of the classifier as final classification information.
In this scheme, the feature extractor is the feature extractor that sets up through training in advance, specifically is:
acquiring a large-scale object image dataset;
Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet;
acquiring a convolution layer of the deep convolution neural network AlexNet;
the convolution layer is used as a feature extractor.
In this solution, the first layer of the deep convolutional neural network AlexNet includes a filter for capturing edge features, the middle layer of the deep convolutional neural network AlexNet includes a plurality of convolutional layers and a maximum pool layer, the last layer of the deep convolutional neural network AlexNet is a classification layer, and the weights of the layers of the deep convolutional neural network AlexNet are determined by training.
In this scheme, the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.
In the scheme, the classifier is a classifier which is set through pre-training.
In this scheme, the training the image of the large-scale object image dataset to obtain the deep convolutional neural network AlexNet further includes:
processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;
Training is carried out on the processed image to obtain a new deep convolutional neural network AlexNet.
Specifically, the mode enhancement technique is one or more of rotation, tilting, elastic torsion and cosine function enhancement.
A third aspect of the present invention provides a computer-readable storage medium having embodied therein an image processing method program which, when executed by a processor, implements the steps of an image processing method as described above.
By the image processing method, the device and the readable storage medium, the minimum feature distance in the classifier is judged, and the classification information is used as a final classification result when the minimum feature distance is larger than the preset distance threshold, so that the accuracy of image classification can be improved. In addition, the convolutional layer in the neural network is used as the feature extractor for extracting the feature information, so that the accuracy of recognition can be further improved. In the image training process, the invention carries out mode distortion enhancement on the image, and further increases the accuracy of image identification.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 shows a flow chart of an image processing method of the present invention;
FIG. 2 shows a layer structure diagram of the deep convolutional neural network AlexNet of the present invention;
FIG. 3 shows a first layer weight structure diagram of the deep convolutional neural network AlexNet of the present invention;
FIG. 4 is a schematic diagram of the present invention for image enhancement by cosine mode;
FIG. 5 shows a schematic representation of the image enhancement by rotational tilt of the present invention;
fig. 6 shows a block diagram of an image processing apparatus of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Fig. 1 shows a flow chart of an image processing method of the present invention.
As shown in fig. 1, the present invention discloses an image processing method, comprising:
s102, acquiring image information;
S104, carrying out feature extraction by using a feature extractor according to the image information to obtain feature information;
s106, the characteristic information is sent to a classifier;
s108, the classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;
S110, judging whether the minimum feature distance is larger than a preset distance threshold;
And S112, if the classification information is larger than the final classification information, taking the classification information of the classifier as final classification information.
After the image is acquired, that is, the image information is acquired, the image information is input into a feature extractor to perform feature extraction, so as to obtain feature information. That is, the feature extractor converts the input original image into feature vectors, which are feature information. And then the characteristic information is sent to a classifier for classification processing, and finally the input image is output into a plurality of classified contents. For example, a picture including a cat is input, image information of the picture is firstly obtained, then feature extraction is carried out according to the image information to obtain feature information, the feature information is sent to a classifier for classification and identification, finally classification information is output, the identification result of the picture displayed in the classification information is the cat, and the position of the cat in the picture can be marked. Of course, the classification information may also include other information, for example, marking a detection frame in the picture, detecting the position of the frame circumscribing the species; labeling species information in a detection box, and the like. The person skilled in the art can set the type of the classification information according to the actual needs, but any method for outputting the classification information based on the technical scheme of the invention falls within the protection scope of the invention.
It should be noted that a classifier, such as an SVM classifier, is a discriminant classifier defined by a classification hyperplane. That is, given a set of labeled training samples, the algorithm will output an optimal hyperplane to classify the new sample (test sample). That is, the trained classifier has a hyperplane that separates the feature vectors of each into different classes, the more distant the feature vector in each region is from the hyperplane, which can make the classification more accurate. When the feature vector falls into a certain area, a minimum distance exists between the feature vector and the hyperplane, whether the minimum distance is larger than a preset distance threshold value or not is judged, if the minimum distance is smaller than the preset distance threshold value, a certain error possibly exists, image preprocessing can be performed, and then feature vector calculation and classification are performed again; if the result is larger than the predetermined value, the classification result is accurate, and the classification information can be used as a final classification result. Wherein the preprocessing may include contrast increase, color increase, image noise cancellation, etc.
According to an embodiment of the present invention, the feature extractor is a feature extractor that is set through pre-training, specifically:
acquiring a large-scale object image dataset;
Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet;
acquiring a convolution layer of the deep convolution neural network AlexNet;
the convolution layer is used as a feature extractor.
The feature extractor is a preset feature extractor. The feature extractor is a convolutional layer of a pre-trained deep convolutional neural network AlexNet. The training of the deep convolutional neural network AlexNet is specifically: a large-scale object image dataset is acquired, the large-scale object image dataset having 1000 object categories and 120 ten thousand training images. Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet; and then acquiring the convolution layer information of the deep convolutional neural network AlexNet, wherein the deep convolutional neural network AlexNet is provided with a plurality of convolution layers, and only one of the convolution layers is required to be acquired and used as a feature extractor.
Specifically, FIG. 2 shows a layer structure diagram of the deep convolutional neural network AlexNet of the present invention. As shown in fig. 2, the deep convolutional neural network AlexNet input is one of 1000 different types of images (e.g., cat, dog, etc.), and the output is a vector of 1000 numbers. The i element of the output vector is the probability that the input image belongs to the i-th image. Wherein the first layer of the deep convolutional neural network AlexNet defines the size of the input image as 227×227×3. The layer weights are pre-trained for large scale object image datasets. The first layer of the network learns the filters used to capture blobs and edge features. The middle layer is a series of five convolutional layers and three fully connected layers interspersed with rectifying linear units (relus) and maximum pool layers. The last layer is the classification layer, with 1000 classes. The pre-trained CNN for the object image dataset serves as a feature extractor. The first convolutional layer weights of the deep convolutional neural network AlexNet are shown in fig. 3. Instead of training an object image, such as in an STL-10 database, a CNN (i.e., alex-Net) that is pre-trained on the object image is used as a feature extractor. The 17 th layer named 'fc 7' in the layer diagram of the deep convolutional neural network AlexNet is connected to the classifier as a feature extractor, that is, feature vectors obtained at the 17 th layer are transferred to the classifier.
According to an embodiment of the present invention, the first layer of the deep convolutional neural network AlexNet includes a filter for capturing edge features, the middle layer of the deep convolutional neural network AlexNet includes a plurality of convolutional layers and a maximum pool layer, the last layer of the deep convolutional neural network AlexNet is a classification layer, and the weights of the layers of the deep convolutional neural network AlexNet are determined through training.
According to the embodiment of the invention, the seventeenth layer of the deep convolutional neural network AlexNet is adopted as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.
It should be noted that the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor in the present invention, where the deep convolutional neural network AlexNet is specifically named "fc7", that is, the layer "fc7" in the fully connected layer of the full connected network of AlexNet. The feature vectors to be obtained at layer 17 are passed into the classifier.
According to the embodiment of the invention, the classifier is a classifier which is set through pre-training.
It should be noted that the classifier may use a support vector machine (Support Vector Machine, SVM) to classify. The classifier classifies data by finding the best hyperplane separating all data points of one class from data points of other classes, and a support vector machine (classifier) may represent a complex surface, including polynomials and radial basis functions. The best hyperplane is the largest hyperplane between the two classes, in other words, the margin is the maximum width of the flat plate parallel to the hyperplane, which has no internal data points, where the support vector is the data point closest to the separating hyperplane.
It should be noted that the classifier is trained using convolutional neural network functions. The acquired image characteristic information, namely the characteristic vector, is input into a classifier for training. When using high-dimensional CNN feature vectors (4096 layers each), a random gradient descent (SGD) solver is used to accelerate training. To measure the classification accuracy of the trained SVM classifier, test image features are extracted by CNN and passed to the SVM classifier. The trained classifier is used as the classifier for image recognition, so that the accuracy of image recognition can be improved.
According to an embodiment of the present invention, the training the image of the large-scale object image dataset to obtain the deep convolutional neural network AlexNet further includes:
processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;
Training is carried out on the processed image to obtain a new deep convolutional neural network AlexNet.
When the image training is performed to obtain the deep convolutional neural network AlexNet, the image may be further subjected to enhancement processing. The enhancement process may be mode distortion enhancement. Among them, affine transformation and elastic distortion are well known in character recognition as data enhancement methods. They are used to generate new samples from the original samples and to expand the training set. By applying affine displacement fields to patterns, simple distortions such as translation, rotation, scaling and tilting can be generated. Elastic distortion is an image transformation that mimics changes in handwriting style.
Specifically, the mode enhancement technique is one or more of rotation, tilting, elastic torsion and cosine function enhancement. It should be clear to a person skilled in the art that the present invention is not limited to the above enhancement modes, but any method for mode enhancement by the technical solution of the present invention falls within the scope of the present invention.
Fig. 4 shows a schematic diagram of the present invention for image enhancement by cosine mode.
As shown in fig. 4, the present invention is implemented by using a mode transforming method with a cosine function. In particular, some reinforcement methods such as rotation, tilting and elastic twisting are also applied. Fig. 4 shows a pattern example of cosine function enhancement. The number of training images is increased 31 times the original 5k training samples. By using cosine functions, the original pattern is left-aligned, right-aligned, top-aligned or bottom-aligned, and the pattern is also center-aligned and enlarged.
Fig. 5 shows a schematic representation of the image enhancement by rotational tilting of the present invention.
As shown in fig. 5, the present invention rotates an image by a certain angle by adopting a rotation process of the image; or adopting image tilting processing, the aim is to transform the image into an image which presents a certain angle with the original image. Image processing can also be performed by elastic warping to transform the original image into a warped image.
By training with the processed images, a new deep convolutional neural network AlexNet is obtained. According to the invention, the training is performed through the processed certain distorted image, so that the probability of matching the image in the network can be increased, and the purpose of improving the accuracy of image identification is achieved.
Fig. 6 shows a block diagram of an image processing apparatus of the present invention.
As shown in fig. 6, a second aspect of the present invention provides an image processing apparatus comprising: a memory 61, a processor 62, the memory comprising an image processing method program which, when executed by the processor, performs the steps of:
Acquiring image information;
according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information;
transmitting the characteristic information to a classifier;
The classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;
judging whether the minimum characteristic distance is larger than a preset distance threshold value or not;
And if the classification information is larger than the final classification information, taking the classification information of the classifier as final classification information.
After the image is acquired, that is, the image information is acquired, the image information is input into a feature extractor to perform feature extraction, so as to obtain feature information. That is, the feature extractor converts the input original image into feature vectors, which are feature information. And then the characteristic information is sent to a classifier for classification processing, and finally the input image is output into a plurality of classified contents. For example, a picture including a cat is input, image information of the picture is firstly obtained, then feature extraction is carried out according to the image information to obtain feature information, the feature information is sent to a classifier for classification and identification, finally classification information is output, the identification result of the picture displayed in the classification information is the cat, and the position of the cat in the picture can be marked. Of course, the classification information may also include other information, for example, marking a detection frame in the picture, detecting the position of the frame circumscribing the species; labeling species information in a detection box, and the like. The person skilled in the art can set the type of the classification information according to the actual needs, but any method for outputting the classification information based on the technical scheme of the invention falls within the protection scope of the invention.
It should be noted that a classifier, such as an SVM classifier, is a discriminant classifier defined by a classification hyperplane. That is, given a set of labeled training samples, the algorithm will output an optimal hyperplane to classify the new sample (test sample). That is, the trained classifier has a hyperplane that separates the feature vectors of each into different classes, the more distant the feature vector in each region is from the hyperplane, which can make the classification more accurate. When the feature vector falls into a certain area, a minimum distance exists between the feature vector and the hyperplane, whether the minimum distance is larger than a preset distance threshold value or not is judged, if the minimum distance is smaller than the preset distance threshold value, a certain error possibly exists, image preprocessing can be performed, and then feature vector calculation and classification are performed again; if the result is larger than the predetermined value, the classification result is accurate, and the classification information can be used as a final classification result. Wherein the preprocessing may include contrast increase, color increase, image noise cancellation, etc.
According to an embodiment of the present invention, the feature extractor is a feature extractor that is set through pre-training, specifically:
acquiring a large-scale object image dataset;
Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet;
acquiring a convolution layer of the deep convolution neural network AlexNet;
the convolution layer is used as a feature extractor.
The feature extractor is a preset feature extractor. The feature extractor is a convolutional layer of a pre-trained deep convolutional neural network AlexNet. The training of the deep convolutional neural network AlexNet is specifically: a large-scale object image dataset is acquired, the large-scale object image dataset having 1000 object categories and 120 ten thousand training images. Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet; and then acquiring the convolution layer information of the deep convolutional neural network AlexNet, wherein the deep convolutional neural network AlexNet is provided with a plurality of convolution layers, and only one of the convolution layers is required to be acquired and used as a feature extractor.
Specifically, FIG. 2 shows a layer structure diagram of the deep convolutional neural network AlexNet of the present invention. As shown in fig. 2, the deep convolutional neural network AlexNet input is one of 1000 different types of images (e.g., cat, dog, etc.), and the output is a vector of 1000 numbers. The i element of the output vector is the probability that the input image belongs to the i-th image. Wherein the first layer of the deep convolutional neural network AlexNet defines the size of the input image as 227×227×3. The layer weights are pre-trained for large scale object image datasets. The first layer of the network learns the filters used to capture blobs and edge features. The middle layer is a series of five convolutional layers and three fully connected layers interspersed with rectifying linear units (relus) and maximum pool layers. The last layer is the classification layer, with 1000 classes. The pre-trained CNN for the object image dataset serves as a feature extractor. The first convolutional layer weights of the deep convolutional neural network AlexNet are shown in fig. 3. Instead of training an object image, such as in an STL-10 database, a CNN (i.e., alex-Net) that is pre-trained on the object image is used as a feature extractor. The 17 th layer named 'fc 7' in the layer diagram of the deep convolutional neural network AlexNet is connected to the classifier as a feature extractor, that is, feature vectors obtained at the 17 th layer are transferred to the classifier.
According to an embodiment of the present invention, the first layer of the deep convolutional neural network AlexNet includes a filter for capturing edge features, the middle layer of the deep convolutional neural network AlexNet includes a plurality of convolutional layers and a maximum pool layer, the last layer of the deep convolutional neural network AlexNet is a classification layer, and the weights of the layers of the deep convolutional neural network AlexNet are determined through training.
According to the embodiment of the invention, the seventeenth layer of the deep convolutional neural network AlexNet is adopted as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.
It should be noted that the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor in the present invention, where the deep convolutional neural network AlexNet is specifically named "fc7", that is, the layer "fc7" in the fully connected layer of the full connected network of AlexNet. The feature vectors to be obtained at layer 17 are passed into the classifier.
According to the embodiment of the invention, the classifier is a classifier which is set through pre-training.
It should be noted that the classifier may use a support vector machine (Support Vector Machine, SVM) to classify. The classifier classifies data by finding the best hyperplane separating all data points of one class from data points of other classes, and a support vector machine (classifier) may represent a complex surface, including polynomials and radial basis functions. The best hyperplane is the largest hyperplane between the two classes, in other words, the margin is the maximum width of the flat plate parallel to the hyperplane, which has no internal data points, where the support vector is the data point closest to the separating hyperplane.
It should be noted that the classifier is trained using convolutional neural network functions. The acquired image characteristic information, namely the characteristic vector, is input into a classifier for training. When using high-dimensional CNN feature vectors (4096 layers each), a random gradient descent (SGD) solver is used to accelerate training. To measure the classification accuracy of the trained SVM classifier, test image features are extracted by CNN and passed to the SVM classifier. The trained classifier is used as the classifier for image recognition, so that the accuracy of image recognition can be improved.
According to an embodiment of the present invention, the training the image of the large-scale object image dataset to obtain the deep convolutional neural network AlexNet further includes:
processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;
Training is carried out on the processed image to obtain a new deep convolutional neural network AlexNet.
Specifically, the mode enhancement technique is one or more of rotation, tilting, elastic torsion and cosine function enhancement. It should be clear to a person skilled in the art that the present invention is not limited to the above enhancement modes, but any method for mode enhancement by the technical solution of the present invention falls within the scope of the present invention.
When the image training is performed to obtain the deep convolutional neural network AlexNet, the image may be further subjected to enhancement processing. The enhancement process may be mode distortion enhancement. Among them, affine transformation and elastic distortion are well known in character recognition as data enhancement methods. They are used to generate new samples from the original samples and to expand the training set. By applying affine displacement fields to patterns, simple distortions such as translation, rotation, scaling and tilting can be generated. Elastic distortion is an image transformation that mimics changes in handwriting style.
Fig. 4 shows a schematic diagram of the present invention for image enhancement by cosine mode.
As shown in fig. 4, the present invention is implemented by using a mode transforming method with a cosine function. In particular, some reinforcement methods such as rotation, tilting and elastic twisting are also applied. Fig. 4 shows a pattern example of cosine function enhancement. The number of training images is increased 31 times the original 5k training samples. By using cosine functions, the original pattern is left-aligned, right-aligned, top-aligned or bottom-aligned, and the pattern is also center-aligned and enlarged.
Fig. 5 shows a schematic representation of the image enhancement by rotational tilting of the present invention.
As shown in fig. 5, the present invention rotates an image by a certain angle by adopting a rotation process of the image; or adopting image tilting processing, the aim is to transform the image into an image which presents a certain angle with the original image. Image processing can also be performed by elastic warping to transform the original image into a warped image.
By training with the processed images, a new deep convolutional neural network AlexNet is obtained. According to the invention, the training is performed through the processed certain distorted image, so that the probability of matching the image in the network can be increased, and the purpose of improving the accuracy of image identification is achieved.
A third aspect of the present invention provides a computer-readable storage medium having embodied therein an image processing method program which, when executed by a processor, implements the steps of an image processing method as described above.
By the image processing method, the device and the readable storage medium, the minimum feature distance in the classifier is judged, and the classification information is used as a final classification result when the minimum feature distance is larger than the preset distance threshold, so that the accuracy of image classification can be improved. In addition, the convolutional layer in the neural network is used as the feature extractor for extracting the feature information, so that the accuracy of recognition can be further improved. In the image training process, the invention carries out mode distortion enhancement on the image, and further increases the accuracy of image identification.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another device, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or optical disk, or the like, which can store program codes.
Or the above-described integrated units of the invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (4)
1. An image processing method, comprising:
Acquiring image information;
according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information;
transmitting the characteristic information to a classifier;
The classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;
judging whether the minimum characteristic distance is larger than a preset distance threshold value or not;
If the classification information is larger than the final classification information, the classification information of the classifier is used as final classification information; otherwise, carrying out feature vector calculation and classification again after carrying out image preprocessing;
The feature extractor is a feature extractor which is set through pre-training, and specifically comprises:
acquiring a large-scale object image dataset;
Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet; wherein a first layer of the deep convolutional neural network AlexNet includes a filter for capturing blobs and edge features, a middle layer of the deep convolutional neural network AlexNet includes five convolutional layers and three fully connected layers interspersed with rectifying linear units and a maximum pool layer, a last layer of the deep convolutional neural network AlexNet is a classification layer, and weights of layers of the deep convolutional neural network AlexNet are determined by training;
acquiring a convolution layer of the deep convolution neural network AlexNet;
taking the convolution layer as a feature extractor;
The image training is performed on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet, and the method further includes:
processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;
Training the processed image to obtain a new deep convolutional neural network AlexNet, and taking the new deep convolutional neural network AlexNet convolutional layer as a feature extractor;
The classifier is a classifier trained by using a convolutional neural network function; wherein, when training is performed by using the high-dimensional CNN feature vector input classifier, training is accelerated by using a random gradient descent solver.
2. An image processing method according to claim 1, wherein:
the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.
3. An image processing apparatus, comprising: the image processing device comprises a memory and a processor, wherein the memory comprises an image processing method program, and the image processing method program realizes the following steps when being executed by the processor:
Acquiring image information;
according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information;
transmitting the characteristic information to a classifier;
The classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;
judging whether the minimum characteristic distance is larger than a preset distance threshold value or not;
If the classification information is larger than the final classification information, the classification information of the classifier is used as final classification information; otherwise, carrying out feature vector calculation and classification again after carrying out image preprocessing;
The feature extractor is a feature extractor which is set through pre-training, and specifically comprises:
acquiring a large-scale object image dataset;
Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet; wherein a first layer of the deep convolutional neural network AlexNet includes a filter for capturing blobs and edge features, a middle layer of the deep convolutional neural network AlexNet includes five convolutional layers and three fully connected layers interspersed with rectifying linear units and a maximum pool layer, a last layer of the deep convolutional neural network AlexNet is a classification layer, and weights of layers of the deep convolutional neural network AlexNet are determined by training;
acquiring a convolution layer of the deep convolution neural network AlexNet;
taking the convolution layer as a feature extractor;
The image training is performed on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet, and the method further includes:
processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;
Training the processed image to obtain a new deep convolutional neural network AlexNet, and taking the new deep convolutional neural network AlexNet convolutional layer as a feature extractor;
The classifier is a classifier trained by using a convolutional neural network function; wherein, when training is performed by using the high-dimensional CNN feature vector input classifier, training is accelerated by using a random gradient descent solver.
4. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises an image processing method program which, when executed by a processor, implements the steps of an image processing method according to claim 1 or 2.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910114848.XA CN109993201B (en) | 2019-02-14 | 2019-02-14 | Image processing method, device and readable storage medium |
PCT/CN2019/118277 WO2020164278A1 (en) | 2019-02-14 | 2019-11-14 | Image processing method and device, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910114848.XA CN109993201B (en) | 2019-02-14 | 2019-02-14 | Image processing method, device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109993201A CN109993201A (en) | 2019-07-09 |
CN109993201B true CN109993201B (en) | 2024-07-16 |
Family
ID=67130148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910114848.XA Active CN109993201B (en) | 2019-02-14 | 2019-02-14 | Image processing method, device and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109993201B (en) |
WO (1) | WO2020164278A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993201B (en) * | 2019-02-14 | 2024-07-16 | 平安科技(深圳)有限公司 | Image processing method, device and readable storage medium |
CN110570419A (en) * | 2019-09-12 | 2019-12-13 | 杭州依图医疗技术有限公司 | Method and device for acquiring characteristic information and storage medium |
CN112345531B (en) * | 2020-10-19 | 2024-04-09 | 国网安徽省电力有限公司电力科学研究院 | Transformer fault detection method based on bionic robot fish |
CN114639008A (en) * | 2020-12-01 | 2022-06-17 | 中移(成都)信息通信科技有限公司 | Method, device and equipment for extracting remote sensing image building and computer storage medium |
CN112668449A (en) * | 2020-12-24 | 2021-04-16 | 杭州电子科技大学 | Low-risk landform identification method for outdoor autonomous mobile robot |
CN113129279B (en) * | 2021-04-08 | 2024-04-30 | 合肥工业大学 | Composite insulator bird pecking damage risk level assessment method |
CN113239739B (en) * | 2021-04-19 | 2023-08-01 | 深圳市安思疆科技有限公司 | Wearing article identification method and device |
CN113642655B (en) * | 2021-08-18 | 2024-02-13 | 杭州电子科技大学 | Small sample image classification method based on support vector machine and convolutional neural network |
CN113901647B (en) * | 2021-09-24 | 2024-08-13 | 成都飞机工业(集团)有限责任公司 | Part technical specification compiling method and device, storage medium and electronic equipment |
CN115713763A (en) * | 2022-11-25 | 2023-02-24 | 青海卓旺智慧信息科技有限公司 | Potato image recognition system based on deep learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372648A (en) * | 2016-10-20 | 2017-02-01 | 中国海洋大学 | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method |
CN106934339A (en) * | 2017-01-19 | 2017-07-07 | 上海博康智能信息技术有限公司 | A kind of target following, the extracting method of tracking target distinguishing feature and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6826300B2 (en) * | 2001-05-31 | 2004-11-30 | George Mason University | Feature based classification |
CN104361096B (en) * | 2014-11-20 | 2016-02-24 | 合肥工业大学 | The image search method of a kind of feature based rich region set |
CN109033107B (en) * | 2017-06-09 | 2021-09-17 | 腾讯科技(深圳)有限公司 | Image retrieval method and apparatus, computer device, and storage medium |
CN109145143A (en) * | 2018-08-03 | 2019-01-04 | 厦门大学 | Sequence constraints hash algorithm in image retrieval |
CN109993201B (en) * | 2019-02-14 | 2024-07-16 | 平安科技(深圳)有限公司 | Image processing method, device and readable storage medium |
-
2019
- 2019-02-14 CN CN201910114848.XA patent/CN109993201B/en active Active
- 2019-11-14 WO PCT/CN2019/118277 patent/WO2020164278A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372648A (en) * | 2016-10-20 | 2017-02-01 | 中国海洋大学 | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method |
CN106934339A (en) * | 2017-01-19 | 2017-07-07 | 上海博康智能信息技术有限公司 | A kind of target following, the extracting method of tracking target distinguishing feature and device |
Also Published As
Publication number | Publication date |
---|---|
CN109993201A (en) | 2019-07-09 |
WO2020164278A1 (en) | 2020-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109993201B (en) | Image processing method, device and readable storage medium | |
WO2021027336A1 (en) | Authentication method and apparatus based on seal and signature, and computer device | |
Nogueira et al. | Evaluating software-based fingerprint liveness detection using convolutional networks and local binary patterns | |
Saxena | Niblack’s binarization method and its modifications to real-time applications: a review | |
Chugh et al. | Fingerprint spoof detection using minutiae-based local patches | |
US20200134382A1 (en) | Neural network training utilizing specialized loss functions | |
Akusok et al. | Arbitrary category classification of websites based on image content | |
WO2022035942A1 (en) | Systems and methods for machine learning-based document classification | |
US11600088B2 (en) | Utilizing machine learning and image filtering techniques to detect and analyze handwritten text | |
CN105335760A (en) | Image number character recognition method | |
CN113255557A (en) | Video crowd emotion analysis method and system based on deep learning | |
Aria et al. | QDL-CMFD: A Quality-independent and deep Learning-based Copy-Move image forgery detection method | |
US20230147685A1 (en) | Generalized anomaly detection | |
Bose et al. | Light weight structure texture feature analysis for character recognition using progressive stochastic learning algorithm | |
CN110414622B (en) | Classifier training method and device based on semi-supervised learning | |
US11715288B2 (en) | Optical character recognition using specialized confidence functions | |
Wicht et al. | Camera-based sudoku recognition with deep belief network | |
CN112613341A (en) | Training method and device, fingerprint identification method and device, and electronic device | |
Xu et al. | Application of Neural Network in Handwriting Recognition | |
US20230069960A1 (en) | Generalized anomaly detection | |
Mahapatra et al. | Generator based methods for off-line handwritten character recognition | |
Astawa et al. | Convolutional Neural Network Method Implementation for License Plate Recognition in Android | |
Radhi | Text Recognition using Image Segmentation and Neural Network | |
Mishra et al. | Handwritten Text Recognition Using Convolutional Neural Network | |
EP4361971A1 (en) | Training images generation for fraudulent document detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |