CN111898683A

CN111898683A - Image classification method and device based on deep learning and computer equipment

Info

Publication number: CN111898683A
Application number: CN202010761098.8A
Authority: CN
Inventors: 沈赞; 庄伯金; 王少军
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-11-06
Anticipated expiration: 2040-07-31
Also published as: WO2021151318A1; CN111898683B

Abstract

The application discloses an image classification method and device based on deep learning and computer equipment, and relates to the technical field of artificial intelligence. The method comprises the following steps: firstly, configuring search space information of a neural framework based on a MobileNet network; constructing a super-net according to the search space information, and configuring a spring structure corresponding to each convolution layer of the super-net, wherein the spring structure is used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number during super-net training and outputting the channel numbers to the next convolution layer; then, training the super-net by utilizing a first picture training set to determine a target neural framework suitable for image classification; and finally, training the model of the target neural architecture by using a second picture training set, and carrying out image classification on the pictures to be classified by using the model which reaches the standard after training. The image classification method and device can improve the accuracy of image classification. In addition, the application also relates to a block chain technology, and model training data can be stored in the block chain to ensure data privacy and safety.

Description

Image classification method and device based on deep learning and computer equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for classifying images based on deep learning, and a computer device.

Background

The image classification can be intelligently classified by adopting a deep learning method, wherein the deep learning method is successful in the field of machine learning, and a plurality of classic and effective network structures emerge. However, the design of these network structures relies on the abundant experience of domain experts and requires a great deal of time and effort to design and experiment. Therefore, the neural architecture search method becomes a popular research field in recent years, and by defining a search space, an optimal network structure is automatically searched by methods such as reinforcement learning and evolutionary algorithm. These methods are very time consuming and require significant GPU resources.

Therefore, in order to solve the above problems, a One-Shot method using weight sharing is proposed at present, a directed acyclic graph, that is, a hyper-network, including all operation options is constructed, a network formed by a single path composed of different operation items is sampled on the trained hyper-network only through One training, the accuracy on a test set is evaluated, and then an optimal neural architecture is selected.

However, in the research of the inventor of the present invention, it is found that, because the output of the previous layer and the input of the next layer in the convolutional neural network need to be kept consistent in terms of the number of channels, and the supernetwork cannot define the search in the dimension of the number of channels, but artificially defines the number of channels of each layer network in advance, the accuracy of the obtained result is affected, so that the obtained neural architecture is not a proper architecture, and further, when the image classification is performed by using the model of the neural architecture, the accuracy of the image classification is affected.

Disclosure of Invention

In view of this, the present application provides an image classification method, an image classification device and a computer device based on deep learning, and mainly aims to solve the technical problem that the accuracy of image classification is affected in the prior art.

According to an aspect of the present application, there is provided an image classification method based on deep learning, the method including:

configuring search space information of a neural framework based on a MobileNet network;

constructing a super-net according to the search space information, and configuring a spring structure corresponding to each convolution layer of the super-net, wherein the spring structure is used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number during the super-net training and outputting the channel numbers to the next convolution layer;

training the super-net by utilizing a first picture training set to determine a target neural architecture suitable for image classification;

and training the model of the target neural architecture by utilizing a second picture training set, and carrying out image classification on the picture to be classified by utilizing the model which reaches the standard after training.

According to another aspect of the present application, there is provided an image classification apparatus based on deep learning, the apparatus including:

the configuration module is used for configuring search space information of the neural framework based on the MobileNet network;

the constructing module is used for constructing a super-net according to the search space information and configuring a spring structure corresponding to each convolution layer of the super-net, wherein the spring structure is used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number during the super-net training and outputting the channel numbers to the next convolution layer;

the training module is used for training the super-net by utilizing a first picture training set so as to determine a target neural architecture suitable for image classification;

the training module is further used for training the model of the target neural architecture by utilizing a second picture training set;

and the classification module is used for carrying out image classification on the pictures to be classified by using the model which reaches the standard after training.

According to yet another aspect of the present application, there is provided a non-transitory readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described deep learning-based image classification method.

According to yet another aspect of the present application, there is provided a computer device comprising a non-volatile readable storage medium, a processor, and a computer program stored on the non-volatile readable storage medium and executable on the processor, the processor implementing the above deep learning based image classification method when executing the program.

By means of the technical scheme, the image classification method and device based on deep learning and the computer equipment are provided. Compared with the mode that the number of network layer channels can not be searched under the ultra-net mode under the One-Shot framework in the prior art and the number of the channels of each layer network can only be artificially defined in advance, the method can firstly configure search space information of a neural framework based on the MobileNet network, then construct the ultra-net according to the search space information and configure a spring structure corresponding to each convolution layer of the ultra-net, wherein, the spring structure can be used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number and outputting the channel numbers to the next convolution layer during the ultra-net training, thereby ensuring that the input channel number of the next convolution layer is always fixed, further ensuring that the output of the previous volume of lamination layer and the input of the next volume of lamination layer in the convolution network are consistent in the number of channels, therefore, the condition that the super-network can not be trained due to the fact that the number of the input channels of the next volume of the lamination is inconsistent caused by the difference of the number of the output channels of the previous volume of the lamination is avoided. The supernet obtained by training in the mode can be used for accurately determining the optimal neural framework suitable for image classification, so that the image classification is accurately carried out by using the model with the optimal neural framework which reaches the standard after training, and the accuracy of image classification is improved.

The above description is only an outline of the technical solution of the present application, and the present application can be implemented in accordance with the content of the description so as to make the technical means of the present application more clearly understood, and the detailed description of the present application will be given below so that the above and other objects, features, and advantages of the present application can be more clearly understood.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application to the disclosed embodiment. In the drawings:

fig. 1 is a schematic flowchart illustrating an image classification method based on deep learning according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating another deep learning-based image classification method provided in an embodiment of the present application;

fig. 3 shows a schematic structural diagram of an image classification device based on deep learning according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Aiming at solving the technical problems that the number of network layer channels cannot be searched in the ultra-network mode under the One-Shot frame in the prior art, and the image classification accuracy is affected only by manually defining the number of channels of each layer network in advance, the embodiment provides an image classification method based on deep learning, as shown in fig. 1, the method comprises the following steps:

101. and configuring search space information of the neural architecture based on the MobileNet network.

The search space information may include a search space range parameter of the optimal neural architecture, where the search space range parameter may specifically include a number, a step size, and a size of convolution kernels, a number of convolution layers, a number of neurons, whether to use jump connections, and a type of an activation function. Different neural architectures can be constructed according to different search space range parameters, and then the optimal neural architecture suitable for image classification can be searched based on the different neural architectures.

The scheme of the embodiment specifically uses the idea of One-Shot method, and the search space of the neural framework is based on a MobileNet network designed for mobile terminal equipment (a light-weight deep neural network proposed for embedded equipment such as mobile phones). The benefit of selecting the MobileNet network is that the model parameter is small, the operation speed is high, the delay of the server end can be reduced, and the Query rate Per Second (Query Per Second, QPS) of detection is increased. On the other hand, the storage model of the MobileNet is very small, so that the mobilonet can be conveniently deployed at a mobile terminal (such as a mobile phone terminal, a client terminal of a tablet computer and the like), namely, offline picture detection can be performed at the mobile terminal. If the method is built in the APP application, the picture is detected and intercepted (illegal picture interception) before the picture is uploaded by a user, the pressure of a server is further reduced, and the detection capability can be infinitely increased.

The execution subject of the embodiment can be an image classification device or equipment based on deep learning, can be deployed at a client or a server, and the like, and can improve the accuracy of image classification.

102. Constructing a super-net according to the search space information of the neural framework, and configuring a spring structure corresponding to each convolution layer of the super-net.

And constructing a directed acyclic graph containing all operation options, namely a hyper-network, according to the search space information of the neural architecture. And subsequently, a network formed by different single paths consisting of different operations can be sampled on the trained super-network, the accuracy on the test set is evaluated, and the optimal neural architecture is selected.

Because the output of the previous layer and the input of the next layer in the convolutional neural network need to be kept consistent in terms of the number of channels, and the supernetwork cannot define the searching on the dimension of the number of channels, but manually defines the number of channels of each layer network in advance, the obtained result affects the accuracy, therefore, in order to solve the problem that the number of channels of the network layer cannot be searched in the supernetwork mode, a new spring structure (springblock) is introduced in the embodiment, the selection of different numbers of channels can be easily adapted, and meanwhile, the stability of the network is prevented from being damaged. The spring structure can be used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number and outputting the channel numbers to the next convolution layer during the ultra-net training. Therefore, the number of input channels of the next volume of lamination is always fixed, and the output of the previous volume of lamination and the input of the next volume of lamination in the convolutional network are kept consistent in channel number, so that the condition that the number of input channels of the next volume of lamination is inconsistent due to the difference of the number of output channels of the previous volume of lamination and the number of input channels of the next volume of lamination cannot be trained is avoided.

103. And training the super-net by utilizing the first picture training set to determine a target neural architecture suitable for image classification.

In a specific application scene of image classification, a first picture training set is created in advance and used for training a hyper-network to find an optimal neural architecture, namely a target neural architecture, and further find a deep learning model structure suitable for image classification, wherein the first picture training set comprises different picture features (such as picture content features of patterns, colors, linear shapes and the like in pictures) and picture labels (such as labels of girls, freshenes, automobiles, animals, animations, advertisements and the like) respectively corresponding to the picture features. The constructed super-net is trained by utilizing the created first picture training set, and the condition that training cannot be performed due to the fact that the number of input channels is inconsistent because of different numbers of channels of the previous layer can be avoided because each layer is configured with corresponding spring structure information.

104. And training the model of the target neural architecture by utilizing the second picture training set, and carrying out image classification on the pictures to be classified by utilizing the model which reaches the standard after training.

And the second picture training set can contain more sample features and label data corresponding to the sample features compared with the first picture training set. The first picture training set may be obtained by partially acquiring from the second picture training set. The first picture training set aims to enable the hyper-network to find an optimal neural architecture model suitable for picture classification, and the second picture training set is used for carrying out model training on the optimal neural architecture model to enable the optimal neural architecture model to reach a classification model with accuracy rate larger than a certain threshold value, and is used for carrying out image classification on pictures to be classified so as to determine classification results corresponding to the pictures to be classified, such as classification results of girls, fresheners, automobiles, animals, animations, advertisements and the like. The model with the optimal neural architecture can select a MobileNet model, such as a MobileNet V2 model, a MobileNet V3 model and the like. After the MobileNet model with the optimal neural architecture is trained and the test reaches the standard, the model can be used as a classification model for image classification.

For example, the trained MobileNetV3 model can be deployed on the side of the smartphone, when a user picture needs to be uploaded in the smartphone, the picture features of the user picture are extracted locally, then the picture features are input into the MobileNetV3 model, a picture label corresponding to the most similar sample feature is found, a classification result is output according to the picture label, and a client of the smartphone determines whether to upload the user picture to the server according to the classification result. If the classification result is 'girl', 'breath', 'cartoon', 'advertisement' and the like, the user can be firstly refused to request for uploading the picture locally, and the user is prompted to change to another legal picture for uploading. By the method, the pressure of the server on identifying and classifying the pictures uploaded by the user can be reduced, and illegal pictures can be intercepted at the first local time.

It should be noted that the solution of the present embodiment is only described by an exemplary application scene of image classification, and besides, the method of the present embodiment may also be applied to other fields, such as various technical fields classified by using a deep learning model.

By the image classification method based on deep learning in the present embodiment. Compared with the prior art that the number of network layer channels cannot be searched in the ultra-net mode under the One-Shot framework and the number of channels of each layer network can only be artificially defined in advance, the embodiment can firstly configure search space information of a neural framework based on the MobileNet network, then construct the ultra-net according to the search space information and configure a spring structure corresponding to each convolution layer of the ultra-net, wherein, the spring structure can be used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number and outputting the channel numbers to the next convolution layer during the ultra-net training, thereby ensuring that the input channel number of the next convolution layer is always fixed, further ensuring that the output of the previous volume of lamination layer and the input of the next volume of lamination layer in the convolution network are consistent in the number of channels, therefore, the condition that the super-network can not be trained due to the fact that the number of the input channels of the next volume of the lamination is inconsistent caused by the difference of the number of the output channels of the previous volume of the lamination is avoided. The supernet obtained by training in the mode can be used for accurately determining the optimal neural framework suitable for image classification, so that the image classification is accurately carried out by using the model with the optimal neural framework which reaches the standard after training, and the accuracy of image classification is improved.

Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully explain the implementation process in this embodiment, another image classification method based on deep learning is provided, as shown in fig. 2, and the method includes:

201. the number of convolutional layers in the MobileNet network is set.

202. And defining dimension information and search space size information of the neural framework search space according to the set MobileNet network.

The dimension information at least comprises the convolution kernel size, the expansion coefficient and the channel number of each convolution layer.

For example, based on the One-Shot framework, the search space is based on a MobileNetV2 network designed for mobile end devices, and has 19 layers, and each convolution layer is optionally defined as an inverted residual block (inverted residual blocks). The dimensions of the search space include convolution kernel size k:3 x 3, 5 x 5, 7 x 7, expansion coefficients t:3, 6 and channel number c (three options per convolution layer), the size of the search space being 3 x 6 x 19.

203. Constructing a super-net according to the search space information of the neural framework, and configuring a spring structure corresponding to each convolution layer of the super-net.

For example, according to the search space parameters exemplified in step 202, a partial structure of the constructed supernet is shown in table 1 below:

TABLE 1

Optionally, the spring structure may be obtained by modifying each convolution layer of the super-net based on a reverse residual structure, a middle-depth convolution layer of the spring structure is used for deep feature extraction, and a 1 × 1 convolution layer is respectively arranged before and after the middle-depth convolution layer (depth-wise convolution); the first 1 x 1 convolutional layer of the middle depth convolutional layer is used for expanding the diversity of input features, the last 1 x 1 convolutional layer of the middle depth convolutional layer is used for recovering the extracted deep features to a fixed channel number and outputting the channel number to the next convolutional layer, and the fixed channel number is the maximum channel number selectable by the structure of the convolutional layer. It can be seen that the last linear 1 x 1 convolutional layer in this structure can transform the output channel number to virtually any channel number. By utilizing the point, when the channel number corresponding to different operation items of the same layer passes through the last layer of 1 × 1 convolutional layer, the channel number is fixed to the channel number with the same size and output, so that the input channel number of the next layer of reverse residual error structure is ensured to be fixed all the time (namely, the output of the previous layer and the input of the next layer in the convolutional neural network are ensured to be consistent in the channel number), and the condition that the training cannot be performed due to the fact that the input channel numbers are inconsistent because of the difference of the channel numbers of the previous layer is avoided. Meanwhile, in order to ensure that the original characteristic information is not lost because the fixed channel number is smaller than the original channel number, the fixed channel number after conversion adopts the maximum channel number selectable by the layer structure.

204. When the hypernet is trained by utilizing the first picture training set, the training process of the hypernet is divided into a plurality of sub-training processes according to a preset time interval.

In order to ensure the security and privacy of the data in the first picture training set, optionally, the first picture training set may be stored in the blockchain; correspondingly, utilize first picture training set to train super net, specifically include: and acquiring a first picture training set from the block chain, and training the super network. For example, the first picture training set data may be obtained from the target node of the blockchain, and then the hyper-net may be trained. It should be noted that the blockchain in this embodiment is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In the process of ultra-net training, in order to relieve weight coupling and model average effect brought by an ultra-large search space, the ultra-net training is divided into two stages by the scheme. The first stage keeps normal training, and randomly samples one path of the super network each time to update the weight; the second stage shrinks the search space step by step based on the model trained in the first stage. The process shown in step 205 may be specifically performed.

205. And when the sub-training process is executed each time, randomly sampling one path of the super-net for weight updating based on the super-net obtained in the last sub-training process, and continuing training based on the super-net after the path weight updating so as to shrink the search space corresponding to the super-net.

Optionally, step 205 may specifically include: randomly sampling a preset number of models from the ultra-net obtained by training in the last sub-training process; then, testing the sampled model by using a picture test set, wherein the picture test set can be determined according to the first picture training set; sequencing the sampled models according to the test accuracy; counting a first number of models of which the operation items are positioned in a preset proportion before ranking and a second number of models of which the operation items are positioned in a preset proportion after ranking in each convolutional layer; then according to the difference value between the first quantity and the second quantity, reserving the operation items of which the difference value of the first preset quantity is larger than 0 in each convolution layer, and deleting the rest operation items which are not reserved; after deleting the other operation items which are not reserved in each convolution layer, executing the sub-training process; and when the number of the remaining operation items in each convolution layer after the ultra-net training is less than or equal to a preset number threshold, determining the ultra-net after the search space is contracted through training.

For example, in the first step, when training the super-network, the number of the selectable operation items in each layer is reduced from 18 to 9, the method is to randomly sample 18 × 200 models from the currently trained super-network, test the accuracy on a picture test set (such as can be obtained by dividing from the first picture training set), then sort the models according to the accuracy, count the difference between the number of the first third model and the number of the last third model of each operation item in each layer, sort the merits of each operation item according to the difference, retain the operation items with the first 9 differences larger than zero, and then train for a period of time on the remaining search space.

And the second step and the third step adopt the same method, on the basis of the training of the previous step, the number of the selectable operation items is reduced to 5 and 3, and finally the number of the selectable operation items of each layer is not more than 3, so that only the operation items which continuously keep excellent performance are not eliminated in the gradual shrinkage process. The final search space is far smaller than the initial search space and is shrunk to a proper size, so that the coupling and averaging effects caused by weight sharing among models are greatly relieved, the performance difference of each model is easier to distinguish, and the relevance of ranking is kept. And the training efficiency is also improved.

206. And searching a target neural framework suitable for image classification in the ultra-net after the search space is shrunk.

Through the evolutionary algorithm in step 205, an optimal structure (an optimal neural architecture suitable for image classification) is searched in the super-net after the search space is shrunk, and because the model weights in the search stage are inherited from the super-net without retraining, the search time is greatly accelerated.

207. And training the model of the target neural architecture by utilizing the second picture training set.

Similarly, in order to ensure the security and privacy of the data in the second picture training set, optionally, the second picture training set may also be stored in the blockchain, and correspondingly, step 207 may specifically include: and acquiring a second picture training set from the block chain, and training the model of the target neural architecture.

During model training, since the spring structure is added when searching for the target neural architecture in this embodiment, in order to avoid the influence of the spring structure on the model training effect, optionally, step 207 may specifically include: when a model of a target neural framework is independently trained from the beginning, all spring structures which adopt the maximum channel number output are adjusted to the original channel number of the operation item selected by each current convolution layer so as to restore to a standard reverse residual structure; the model of the target neural architecture restored to the standard inverse residual structure is then trained using the second picture training set.

For example, in the embodiment, when the searched network model of the target neural architecture is independently trained from the beginning, all the spring structures output by adopting the maximum number of channels are adjusted to the original number of channels of the operation selected at each current layer, so as to restore to the standard inverse residual structure, and then model training is performed. Experiments prove that the original advantages and disadvantages of the network are not damaged by introducing the spring structure, and the performance sequencing of the network is not influenced.

208. And using the model which reaches the standard after training to classify the images to be classified.

For example, the scheme of the embodiment searches two network structures on a new search space: BS-NAS-A, BS-NAS-B, have achieved 75.9% and 76.3% top-1 accuracy on the public large-scale ImageNet classified datA set respectively, have reached the advanced level in the world in the mobile terminal model.

The method breaks through the limitation that the number of the channels of the network layer cannot be searched under the One-Shot framework, can easily adapt to the selection of different channel numbers by introducing a new spring block spring structure, and simultaneously avoids the stability of the network from being damaged. In addition, a new training strategy for gradually shrinking the search space is provided, and the search space is shrunk to a proper size by sequencing the performance of each layer of operation and gradually eliminating the operation with poor performance. By the method, the average effect of the good and bad models caused by non-differential weight sharing can be effectively relieved, and ranking correlation between the good and bad models is maintained, so that the optimal model is searched more favorably.

Further, as a specific implementation of the method shown in fig. 1 and fig. 2, the present embodiment provides an image classification apparatus based on deep learning, as shown in fig. 3, the apparatus includes: configuration module 31, construction module 32, training module 33, classification module 34.

A configuration module 31, configured to configure search space information of the neural architecture based on a MobileNet network;

a constructing module 32, configured to construct a super-net according to the search space information, and configure a spring structure corresponding to each convolution layer of the super-net, where the spring structure is used to fix the number of channels corresponding to different operation items of the same convolution layer to the same number of channels and output the channel number to a next convolution layer during the super-net training;

a training module 33, configured to train the super-net by using a first image training set to determine a target neural architecture suitable for image classification;

the training module 33 is further configured to train the model of the target neural architecture by using a second picture training set;

and the classification module 34 is used for carrying out image classification on the picture to be classified by using the model which reaches the standard after training.

In a specific application scenario, optionally, the spring structure is obtained by transforming each convolution layer of the super-net based on a reverse residual error structure, a middle depth convolution layer of the spring structure is used for deep feature extraction, and a 1 × 1 convolution layer is respectively arranged before and after the middle depth convolution layer; the first 1 × 1 convolutional layer of the intermediate depth convolutional layer is used for expanding the diversity of input features, the last 1 × 1 convolutional layer of the intermediate depth convolutional layer is used for recovering the extracted deep features to a fixed number of channels and outputting the deep features to the next convolutional layer, and the fixed number of channels is the maximum number of channels selectable by the structure of the convolutional layer.

In a specific application scenario, the training module 33 is specifically configured to adjust all spring structures output by using the maximum number of channels to the original number of channels of the operation item selected by each convolutional layer when independently training the model of the target neural architecture from the beginning, so as to restore the standard inverse residual structure; training the model of the target neural architecture restored to a standard inverse residual structure with a second picture training set.

In a specific application scenario, the training module 33 is further configured to divide the training process of the piconet into a plurality of sub-training processes according to a preset time interval; randomly sampling a path of the super-net for weight updating based on the super-net obtained in the last sub-training process when the sub-training process is executed each time, and continuing training based on the super-net after the path weight updating so as to shrink the search space corresponding to the super-net; searching the target neural architecture in the hyper-network after shrinking a search space.

In a specific application scenario, the training module 33 is further specifically configured to randomly sample a preset number of models from the hyper-network obtained through training in the previous sub-training process; testing the sampled model by using a picture test set, wherein the picture test set is determined according to the first picture training set; sequencing the sampled models according to the test accuracy; counting a first number of models of which the operation items are positioned in a preset proportion before ranking and a second number of models of which the operation items are positioned in a preset proportion after ranking in each convolutional layer; according to the difference value between the first quantity and the second quantity, reserving a first preset quantity of operation items with the difference value larger than 0 in each convolution layer, and deleting the rest operation items which are not reserved; after deleting the other operation items which are not reserved in each convolution layer, executing the sub-training process; and when the number of the remaining operation items in each convolution layer after the ultra-net training is less than or equal to a preset number threshold, determining the ultra-net after the search space is contracted through training.

In a specific application scenario, the configuration module 31 is specifically configured to set the number of convolution layers in the MobileNet network; and defining dimension information of a search space and size information of the search space according to the set MobileNet network, wherein the dimension information at least comprises the size of a convolution kernel, an expansion coefficient and the number of channels of each convolution layer.

In a specific application scenario, optionally, the first picture training set and the second picture training set are stored in a block chain;

correspondingly, the training module 33 is specifically further configured to acquire the first picture training set from the block chain, and train the super-network;

correspondingly, the training module 33 is further specifically configured to acquire the second picture training set from the block chain, and train the model of the target neural architecture.

It should be noted that other corresponding descriptions of the functional units related to the image classification device based on deep learning provided in this embodiment may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not repeated herein.

Based on the above-mentioned methods as shown in fig. 1 and fig. 2, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, which when executed by a processor implements the above-mentioned deep learning-based image classification method as shown in fig. 1 and fig. 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments of the present application.

Based on the method shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3, in order to achieve the above object, this embodiment further provides a computer device, which may specifically be a personal computer, a notebook computer, a server, a network device, and the like, where the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described image classification method based on deep learning as shown in fig. 1 and 2.

Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, sensors, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.

It will be understood by those skilled in the art that the computer device structure provided in the present embodiment is not limited to the physical device, and may include more or less components, or combine some components, or arrange different components.

The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the above-described physical devices, and supports the operation of the information processing program as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the technical scheme of the embodiment, compared with the mode that the number of network layer channels cannot be searched under the ultra-net mode under the One-Shot framework in the prior art and the number of channels of each layer network can only be artificially defined in advance, the embodiment can firstly configure search space information of a neural framework based on a MobileNet network, then construct an ultra-net according to the search space information and configure a spring structure corresponding to each convolution layer of the ultra-net, wherein, the spring structure can be used for fixing the channel numbers corresponding to different operation items of the same convolution layer to the same channel number and outputting the channel numbers to the next convolution layer during the ultra-net training, thereby ensuring that the input channel number of the next convolution layer is always fixed, further ensuring that the output of the previous volume of lamination layer and the input of the next volume of lamination layer in the convolution network are consistent in the number of channels, therefore, the condition that the super-network can not be trained due to the fact that the number of the input channels of the next volume of the lamination is inconsistent caused by the difference of the number of the output channels of the previous volume of the lamination is avoided. The supernet obtained by training in the mode can be used for accurately determining the optimal neural framework suitable for image classification, so that the image classification is accurately carried out by using the model with the optimal neural framework which reaches the standard after training, and the accuracy of image classification is improved.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. An image classification method based on deep learning is characterized by comprising the following steps:

2. The method of claim 1, wherein the spring structure is obtained by modifying each convolution layer of the super-net based on an inverse residual structure, wherein a middle depth convolution layer of the spring structure is used for deep feature extraction, and each of the front and rear of the middle depth convolution layer has a 1 x 1 convolution layer;

the first 1 × 1 convolutional layer of the intermediate depth convolutional layer is used for expanding the diversity of input features, the last 1 × 1 convolutional layer of the intermediate depth convolutional layer is used for recovering the extracted deep features to a fixed number of channels and outputting the deep features to the next convolutional layer, and the fixed number of channels is the maximum number of channels selectable by the structure of the convolutional layer.

3. The method of claim 2, wherein training the model of the target neural architecture using the second picture training set comprises:

when the model of the target neural framework is independently trained from the beginning, all the spring structures which adopt the maximum channel number output are adjusted to the original channel number of the operation item selected by each current convolution layer so as to restore to a standard reverse residual structure;

training the model of the target neural architecture restored to a standard inverse residual structure with a second picture training set.

4. The method of claim 1, wherein the training the hypermesh with the first picture training set to determine a target neural architecture suitable for image classification comprises:

dividing the training process of the ultra-net into a plurality of sub-training processes according to a preset time interval;

randomly sampling a path of the super-net for weight updating based on the super-net obtained in the last sub-training process when the sub-training process is executed each time, and continuing training based on the super-net after the path weight updating so as to shrink the search space corresponding to the super-net;

searching the target neural architecture in the hyper-network after shrinking a search space.

5. The method according to claim 4, wherein each time the sub-training process is performed, based on the super-net obtained in the previous sub-training process, randomly sampling a path of the super-net to perform weight update, and continuing training based on the super-net after the path weight update, so as to shrink the search space corresponding to the super-net, specifically includes:

randomly sampling a preset number of models from the ultra-net obtained by training in the last sub-training process;

testing the sampled model by using a picture test set, wherein the picture test set is determined according to the first picture training set;

sequencing the sampled models according to the test accuracy;

counting a first number of models of which the operation items are positioned in a preset proportion before ranking and a second number of models of which the operation items are positioned in a preset proportion after ranking in each convolutional layer;

according to the difference value between the first quantity and the second quantity, reserving a first preset quantity of operation items with the difference value larger than 0 in each convolution layer, and deleting the rest operation items which are not reserved;

after deleting the other operation items which are not reserved in each convolution layer, executing the sub-training process;

and when the number of the remaining operation items in each convolution layer after the ultra-net training is less than or equal to a preset number threshold, determining the ultra-net after the search space is contracted through training.

6. The method according to claim 1, wherein the configuring search space information of the neural architecture based on the MobileNet network specifically includes:

setting the number of convolution layers in the MobileNet network;

and defining dimension information of a search space and size information of the search space according to the set MobileNet network, wherein the dimension information comprises the size of a convolution kernel, an expansion coefficient and the number of channels of each convolution layer.

7. The method of claim 1, wherein the first picture training set and the second picture training set are stored in a blockchain;

the training of the super net by utilizing the first picture training set specifically comprises the following steps:

acquiring the first picture training set from the block chain, and training the super-network;

the training of the model of the target neural architecture by using the second picture training set specifically includes:

and acquiring the second picture training set from the block chain, and training the model of the target neural architecture.

8. An image classification device based on deep learning, comprising:

9. A non-transitory readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the deep learning based image classification method of any one of claims 1 to 7.

10. A computer device comprising a non-volatile readable storage medium, a processor, and a computer program stored on the non-volatile readable storage medium and executable on the processor, wherein the processor implements the deep learning based image classification method of any one of claims 1 to 7 when executing the program.