CN109993201B

CN109993201B - Image processing method, device and readable storage medium

Info

Publication number: CN109993201B
Application number: CN201910114848.XA
Authority: CN
Inventors: 赵峰; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-02-14
Filing date: 2019-02-14
Publication date: 2024-07-16
Anticipated expiration: 2039-02-14
Also published as: CN109993201A; WO2020164278A1

Abstract

The invention relates to artificial intelligence technology, and particularly discloses an image processing method, an image processing device and a readable storage medium, wherein the method comprises the following steps: acquiring image information; according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information; transmitting the characteristic information to a classifier; and the classifier classifies the characteristic information to obtain classification information. By the technical scheme, the minimum feature distance in the classifier is judged, and the classification information is used as a final classification result when the minimum feature distance is larger than the preset distance threshold value, so that the accuracy of image classification can be improved. In addition, the convolutional layer in the neural network is used as the feature extractor for extracting the feature information, so that the accuracy of recognition can be further improved. In the image training process, the invention carries out mode distortion enhancement on the image, and further increases the accuracy of image identification.

Description

Image processing method, device and readable storage medium

Technical Field

The invention relates to the technical field of image processing in the field of artificial intelligence, in particular to an image processing method based on a deep neural network, a terminal and a computer-readable storage medium.

Background

With the rapid development of machine learning technology, convolutional neural networks are increasingly applied to computer vision, especially in the field of image classification.

Convolutional Neural Networks (CNNs) are a powerful machine learning technique used in the field of deep learning. Simple neural networks have been used as machine learning techniques in the fields of character recognition and image classification. However, training such a neural network requires a large amount of marker data. A simple way to exploit the CNN functionality without taking time and effort to train is to use a pre-trained CNN as a feature extractor. In order to generate complex decision surfaces, support Vector Machines (SVMs) are a very economical method for representing complex surfaces in high-dimensional space, including polynomials and other types of surfaces.

In the related art, convolutional neural networks are a powerful machine learning technique used in the field of deep learning. Simple convolutional neural networks have been used as machine learning techniques in the fields of character recognition and image classification. However, in the prior art, the classification accuracy of the images is not high by adopting simple CNN and SVM technologies, so it is not desirable to design a scheme capable of combining the two technologies to improve the classification accuracy of image recognition.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present invention proposes an image processing method, apparatus and readable storage medium.

The first aspect of the present invention provides an image processing method, including:

Acquiring image information;

according to the image information, carrying out feature extraction by using a feature extractor to obtain feature information;

transmitting the characteristic information to a classifier;

The classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;

judging whether the minimum characteristic distance is larger than a preset distance threshold value or not;

And if the classification information is larger than the final classification information, taking the classification information of the classifier as final classification information.

In this scheme, the feature extractor is the feature extractor that sets up through training in advance, specifically is:

acquiring a large-scale object image dataset;

Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet;

acquiring a convolution layer of the deep convolution neural network AlexNet;

the convolution layer is used as a feature extractor.

In this solution, the first layer of the deep convolutional neural network AlexNet includes a filter for capturing edge features, the middle layer of the deep convolutional neural network AlexNet includes a plurality of convolutional layers and a maximum pool layer, the last layer of the deep convolutional neural network AlexNet is a classification layer, and the weights of the layers of the deep convolutional neural network AlexNet are determined by training.

In this scheme, the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.

In the scheme, the classifier is a classifier which is set through pre-training.

In this scheme, the training the image of the large-scale object image dataset to obtain the deep convolutional neural network AlexNet further includes:

processing the images in the large-scale object image dataset by adopting a mode enhancement technology to obtain processed images;

Training is carried out on the processed image to obtain a new deep convolutional neural network AlexNet.

Specifically, the mode enhancement technique is one or more of rotation, tilting, elastic torsion and cosine function enhancement.

A second aspect of the present invention provides an image processing apparatus, comprising: the image processing device comprises a memory and a processor, wherein the memory comprises an image processing method program, and the image processing method program realizes the following steps when being executed by the processor:

Acquiring image information;

transmitting the characteristic information to a classifier;

acquiring a large-scale object image dataset;

acquiring a convolution layer of the deep convolution neural network AlexNet;

the convolution layer is used as a feature extractor.

A third aspect of the present invention provides a computer-readable storage medium having embodied therein an image processing method program which, when executed by a processor, implements the steps of an image processing method as described above.

By the image processing method, the device and the readable storage medium, the minimum feature distance in the classifier is judged, and the classification information is used as a final classification result when the minimum feature distance is larger than the preset distance threshold, so that the accuracy of image classification can be improved. In addition, the convolutional layer in the neural network is used as the feature extractor for extracting the feature information, so that the accuracy of recognition can be further improved. In the image training process, the invention carries out mode distortion enhancement on the image, and further increases the accuracy of image identification.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 shows a flow chart of an image processing method of the present invention;

FIG. 2 shows a layer structure diagram of the deep convolutional neural network AlexNet of the present invention;

FIG. 3 shows a first layer weight structure diagram of the deep convolutional neural network AlexNet of the present invention;

FIG. 4 is a schematic diagram of the present invention for image enhancement by cosine mode;

FIG. 5 shows a schematic representation of the image enhancement by rotational tilt of the present invention;

fig. 6 shows a block diagram of an image processing apparatus of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.

Fig. 1 shows a flow chart of an image processing method of the present invention.

As shown in fig. 1, the present invention discloses an image processing method, comprising:

s102, acquiring image information;

S104, carrying out feature extraction by using a feature extractor according to the image information to obtain feature information;

s106, the characteristic information is sent to a classifier;

s108, the classifier calculates according to the characteristic information to obtain the minimum characteristic distance between the characteristic information and the hyperplane in the corresponding classifier;

S110, judging whether the minimum feature distance is larger than a preset distance threshold;

And S112, if the classification information is larger than the final classification information, taking the classification information of the classifier as final classification information.

After the image is acquired, that is, the image information is acquired, the image information is input into a feature extractor to perform feature extraction, so as to obtain feature information. That is, the feature extractor converts the input original image into feature vectors, which are feature information. And then the characteristic information is sent to a classifier for classification processing, and finally the input image is output into a plurality of classified contents. For example, a picture including a cat is input, image information of the picture is firstly obtained, then feature extraction is carried out according to the image information to obtain feature information, the feature information is sent to a classifier for classification and identification, finally classification information is output, the identification result of the picture displayed in the classification information is the cat, and the position of the cat in the picture can be marked. Of course, the classification information may also include other information, for example, marking a detection frame in the picture, detecting the position of the frame circumscribing the species; labeling species information in a detection box, and the like. The person skilled in the art can set the type of the classification information according to the actual needs, but any method for outputting the classification information based on the technical scheme of the invention falls within the protection scope of the invention.

It should be noted that a classifier, such as an SVM classifier, is a discriminant classifier defined by a classification hyperplane. That is, given a set of labeled training samples, the algorithm will output an optimal hyperplane to classify the new sample (test sample). That is, the trained classifier has a hyperplane that separates the feature vectors of each into different classes, the more distant the feature vector in each region is from the hyperplane, which can make the classification more accurate. When the feature vector falls into a certain area, a minimum distance exists between the feature vector and the hyperplane, whether the minimum distance is larger than a preset distance threshold value or not is judged, if the minimum distance is smaller than the preset distance threshold value, a certain error possibly exists, image preprocessing can be performed, and then feature vector calculation and classification are performed again; if the result is larger than the predetermined value, the classification result is accurate, and the classification information can be used as a final classification result. Wherein the preprocessing may include contrast increase, color increase, image noise cancellation, etc.

According to an embodiment of the present invention, the feature extractor is a feature extractor that is set through pre-training, specifically:

acquiring a large-scale object image dataset;

acquiring a convolution layer of the deep convolution neural network AlexNet;

the convolution layer is used as a feature extractor.

The feature extractor is a preset feature extractor. The feature extractor is a convolutional layer of a pre-trained deep convolutional neural network AlexNet. The training of the deep convolutional neural network AlexNet is specifically: a large-scale object image dataset is acquired, the large-scale object image dataset having 1000 object categories and 120 ten thousand training images. Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet; and then acquiring the convolution layer information of the deep convolutional neural network AlexNet, wherein the deep convolutional neural network AlexNet is provided with a plurality of convolution layers, and only one of the convolution layers is required to be acquired and used as a feature extractor.

Specifically, FIG. 2 shows a layer structure diagram of the deep convolutional neural network AlexNet of the present invention. As shown in fig. 2, the deep convolutional neural network AlexNet input is one of 1000 different types of images (e.g., cat, dog, etc.), and the output is a vector of 1000 numbers. The i element of the output vector is the probability that the input image belongs to the i-th image. Wherein the first layer of the deep convolutional neural network AlexNet defines the size of the input image as 227×227×3. The layer weights are pre-trained for large scale object image datasets. The first layer of the network learns the filters used to capture blobs and edge features. The middle layer is a series of five convolutional layers and three fully connected layers interspersed with rectifying linear units (relus) and maximum pool layers. The last layer is the classification layer, with 1000 classes. The pre-trained CNN for the object image dataset serves as a feature extractor. The first convolutional layer weights of the deep convolutional neural network AlexNet are shown in fig. 3. Instead of training an object image, such as in an STL-10 database, a CNN (i.e., alex-Net) that is pre-trained on the object image is used as a feature extractor. The 17 th layer named 'fc 7' in the layer diagram of the deep convolutional neural network AlexNet is connected to the classifier as a feature extractor, that is, feature vectors obtained at the 17 th layer are transferred to the classifier.

According to an embodiment of the present invention, the first layer of the deep convolutional neural network AlexNet includes a filter for capturing edge features, the middle layer of the deep convolutional neural network AlexNet includes a plurality of convolutional layers and a maximum pool layer, the last layer of the deep convolutional neural network AlexNet is a classification layer, and the weights of the layers of the deep convolutional neural network AlexNet are determined through training.

According to the embodiment of the invention, the seventeenth layer of the deep convolutional neural network AlexNet is adopted as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.

It should be noted that the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor in the present invention, where the deep convolutional neural network AlexNet is specifically named "fc7", that is, the layer "fc7" in the fully connected layer of the full connected network of AlexNet. The feature vectors to be obtained at layer 17 are passed into the classifier.

According to the embodiment of the invention, the classifier is a classifier which is set through pre-training.

It should be noted that the classifier may use a support vector machine (Support Vector Machine, SVM) to classify. The classifier classifies data by finding the best hyperplane separating all data points of one class from data points of other classes, and a support vector machine (classifier) may represent a complex surface, including polynomials and radial basis functions. The best hyperplane is the largest hyperplane between the two classes, in other words, the margin is the maximum width of the flat plate parallel to the hyperplane, which has no internal data points, where the support vector is the data point closest to the separating hyperplane.

It should be noted that the classifier is trained using convolutional neural network functions. The acquired image characteristic information, namely the characteristic vector, is input into a classifier for training. When using high-dimensional CNN feature vectors (4096 layers each), a random gradient descent (SGD) solver is used to accelerate training. To measure the classification accuracy of the trained SVM classifier, test image features are extracted by CNN and passed to the SVM classifier. The trained classifier is used as the classifier for image recognition, so that the accuracy of image recognition can be improved.

According to an embodiment of the present invention, the training the image of the large-scale object image dataset to obtain the deep convolutional neural network AlexNet further includes:

When the image training is performed to obtain the deep convolutional neural network AlexNet, the image may be further subjected to enhancement processing. The enhancement process may be mode distortion enhancement. Among them, affine transformation and elastic distortion are well known in character recognition as data enhancement methods. They are used to generate new samples from the original samples and to expand the training set. By applying affine displacement fields to patterns, simple distortions such as translation, rotation, scaling and tilting can be generated. Elastic distortion is an image transformation that mimics changes in handwriting style.

Specifically, the mode enhancement technique is one or more of rotation, tilting, elastic torsion and cosine function enhancement. It should be clear to a person skilled in the art that the present invention is not limited to the above enhancement modes, but any method for mode enhancement by the technical solution of the present invention falls within the scope of the present invention.

Fig. 4 shows a schematic diagram of the present invention for image enhancement by cosine mode.

As shown in fig. 4, the present invention is implemented by using a mode transforming method with a cosine function. In particular, some reinforcement methods such as rotation, tilting and elastic twisting are also applied. Fig. 4 shows a pattern example of cosine function enhancement. The number of training images is increased 31 times the original 5k training samples. By using cosine functions, the original pattern is left-aligned, right-aligned, top-aligned or bottom-aligned, and the pattern is also center-aligned and enlarged.

Fig. 5 shows a schematic representation of the image enhancement by rotational tilting of the present invention.

As shown in fig. 5, the present invention rotates an image by a certain angle by adopting a rotation process of the image; or adopting image tilting processing, the aim is to transform the image into an image which presents a certain angle with the original image. Image processing can also be performed by elastic warping to transform the original image into a warped image.

By training with the processed images, a new deep convolutional neural network AlexNet is obtained. According to the invention, the training is performed through the processed certain distorted image, so that the probability of matching the image in the network can be increased, and the purpose of improving the accuracy of image identification is achieved.

As shown in fig. 6, a second aspect of the present invention provides an image processing apparatus comprising: a memory 61, a processor 62, the memory comprising an image processing method program which, when executed by the processor, performs the steps of:

Acquiring image information;

transmitting the characteristic information to a classifier;

acquiring a large-scale object image dataset;

acquiring a convolution layer of the deep convolution neural network AlexNet;

the convolution layer is used as a feature extractor.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another device, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or optical disk, or the like, which can store program codes.

Or the above-described integrated units of the invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

Acquiring image information;

transmitting the characteristic information to a classifier;

If the classification information is larger than the final classification information, the classification information of the classifier is used as final classification information; otherwise, carrying out feature vector calculation and classification again after carrying out image preprocessing;

The feature extractor is a feature extractor which is set through pre-training, and specifically comprises:

acquiring a large-scale object image dataset;

Performing image training on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet; wherein a first layer of the deep convolutional neural network AlexNet includes a filter for capturing blobs and edge features, a middle layer of the deep convolutional neural network AlexNet includes five convolutional layers and three fully connected layers interspersed with rectifying linear units and a maximum pool layer, a last layer of the deep convolutional neural network AlexNet is a classification layer, and weights of layers of the deep convolutional neural network AlexNet are determined by training;

acquiring a convolution layer of the deep convolution neural network AlexNet;

taking the convolution layer as a feature extractor;

The image training is performed on the large-scale object image dataset to obtain a deep convolutional neural network AlexNet, and the method further includes:

Training the processed image to obtain a new deep convolutional neural network AlexNet, and taking the new deep convolutional neural network AlexNet convolutional layer as a feature extractor;

The classifier is a classifier trained by using a convolutional neural network function; wherein, when training is performed by using the high-dimensional CNN feature vector input classifier, training is accelerated by using a random gradient descent solver.

2. An image processing method according to claim 1, wherein:

the seventeenth layer of the deep convolutional neural network AlexNet is used as a feature extractor, and feature information obtained at the seventeenth layer of the deep convolutional neural network AlexNet is sent to a classifier.

3. An image processing apparatus, comprising: the image processing device comprises a memory and a processor, wherein the memory comprises an image processing method program, and the image processing method program realizes the following steps when being executed by the processor:

Acquiring image information;

transmitting the characteristic information to a classifier;

acquiring a large-scale object image dataset;

acquiring a convolution layer of the deep convolution neural network AlexNet;

taking the convolution layer as a feature extractor;

4. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises an image processing method program which, when executed by a processor, implements the steps of an image processing method according to claim 1 or 2.