CN107220667A

CN107220667A - Image classification method, device and computer-readable recording medium

Info

Publication number: CN107220667A
Application number: CN201710388824.4A
Authority: CN
Inventors: 万韶华
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2017-09-29
Anticipated expiration: 2037-05-24
Also published as: CN107220667B

Abstract

The disclosure is directed to a kind of image classification method, device and computer-readable recording medium, belong to technical field of image processing.This method includes：The feature of target image is determined by CNN models, and the feature of target image is handled by RNN models, to obtain multiple labels of target image.Because last pond layer that the feature of target image is included by CNN models is exported, namely, CNN models are after the feature of target image is obtained, the probability for each classification that the target image belongs to is not determined according to the feature of the target image, but directly handled the feature of the target image by RNN models, to obtain multiple labels of the target image, the image to the content including at least two classifications is classified so as to realize.

Description

Image classification method, device and computer-readable recording medium

Technical field

This disclosure relates to technical field of image processing, more particularly to a kind of image classification method, device and computer-readable Storage medium.

Background technology

When computer receives an image, it usually needs the image is classified, to determine belonging to the image Classification, and the classification belonging to the image is set to the label of the image, in order to which user searches the image.Such as work as computer When receiving a landscape image, it may be determined that the classification of the image is landscape, that is, the label of the image is set into wind Scape.

In correlation technique, computer is classified to image mainly to be realized by single labeling method, that is, right In the multiple classifications pre-set, pass through CNN (Convolutional Neural Networks, convolutional neural networks) model Including at least one convolutional layer and at least one pond layer determine the feature of the image, the full connection included by CNN models Layer determines the probability of each classification that the image is belonging respectively in the plurality of classification according to the feature of the image, and by maximum probability Classification be defined as classification belonging to the image, to obtain the label of the image.

The content of the invention

To overcome problem present in correlation technique, the disclosure provides a kind of image classification method, device and computer can Storage medium is read, the technical scheme is as follows：

According to the first aspect of the embodiment of the present disclosure there is provided a kind of image classification method, methods described includes：

The target image that needs are classified is handled by the CNN models of training in advance, to determine the CNN moulds The feature for last the pond layer output that type includes；

Using the feature of determination as the feature of the target image, pass through RNN (the Recurrent Neural of training in advance Network, Recognition with Recurrent Neural Network) model handled the feature of the target image, obtains the multiple of the target image Each label in label, the multiple label is used to indicate the classification belonging to the picture material in the target image.

Alternatively, it is described that the feature of the target image is handled by the RNN models of training in advance, obtain described Multiple labels of target image, including：

Using the feature of the target image as the RNN models input, and by the RNN models to the target The feature of image is handled, and obtains the first label, and first label is one in the multiple label；

Using first label as the RNN models input, and by the RNN models to the target image Feature and first label are handled, and obtain the second label, and second label is one in the multiple label；

Cycle-index is determined, the cycle-index refers to label and the target figure of the RNN mold cycles to input The feature of picture carries out number of processing；

When the cycle-index is less than or equal to preset times, second label is continued to be used as the RNN models Input, circulation perform aforesaid operations, until the cycle-index be more than the preset times when, obtain the multiple label.

The target image is determined while belonging to the probability for the corresponding classification of all labels for currently determining to obtain；

When the probability is more than or equal to predetermined probabilities, second label is continued to be used as the defeated of the RNN models Enter, circulation performs aforesaid operations, until when the probability is less than the predetermined probabilities, obtaining the multiple label.

Alternatively, before the CNN models by training in advance are handled target image, in addition to：

Training sample set is determined, the training sample set includes multiple images and the multiple marks set in advance for every image Label；

According to multiple described images, initialization CNN models are trained, CNN models after being trained and described many Open the feature of every image in image；

According to the feature of every image in multiple described images, initialization RNN models are trained, after being trained Multiple labels of every image in RNN models and multiple described images；

When multiple labels of every image are set with advance for correspondence image in multiple images described in being obtained by training Multiple labels between error when being more than default error, the initialization CNN models are adjusted and described by gradient descent method RNN models are initialized, and are returned according to multiple described images, the step of being trained to initialization CNN models, until passing through instruction In multiple images described in getting between the multiple labels and the advance multiple labels set for correspondence image of every image Untill error is less than or equal to the default error, and obtained CNN models will be finally trained to be defined as the training in advance CNN models, by the RNN models for finally training obtained RNN models to be defined as the training in advance.

According to the second aspect of the embodiment of the present disclosure there is provided a kind of image classification device, described device includes：

At first processing module, the target image classified for the CNN models by training in advance to needs Reason, to determine the feature for last the pond layer output that the CNN models include；

Second processing module, for as the feature of the target image, the feature of determination to be passed through into the RNN of training in advance Model is handled the feature of the target image, is obtained in multiple labels of the target image, the multiple label Each label is used to indicate the classification belonging to the picture material in the target image.

Alternatively, the Second processing module includes：

First processing submodule, for as the input of the RNN models, and the feature of the target image to be passed through into institute State RNN models to handle the feature of the target image, obtain the first label, first label is the multiple label In one；

Second processing submodule, for as the input of the RNN models, and first label to be passed through into the RNN Model is handled the feature and first label of the target image, obtains the second label, and second label is institute State one in multiple labels；

First determination sub-module, for determining cycle-index, the cycle-index refers to the RNN mold cycles to input Label and the target image feature carry out number of processing；

First circulation submodule, for when the cycle-index is less than or equal to preset times, by second label Continue the input as the RNN models, circulation performs aforesaid operations, until the cycle-index is more than the preset times When, obtain the multiple label.

Alternatively, the Second processing module includes：

Second determination sub-module, for determining the target image while belonging to all labels correspondence for currently determining to obtain Classification probability；

Second circulation submodule, for when the probability is more than or equal to predetermined probabilities, second label to be continued As the input of the RNN models, circulation performs aforesaid operations, until when the probability is less than the predetermined probabilities, obtaining institute State multiple labels.

Alternatively, described device also includes：

First determining module, for determining training sample set, the training sample set includes multiple images and is in advance every Open multiple labels that image is set；

First training module, for according to multiple described images, being trained, being obtained after training to initialization CNN models CNN models and multiple described images in every image feature；

Second training module, for the feature according to every image in multiple described images, is carried out to initialization RNN models Multiple labels of every image in training, the RNN models and multiple described images after being trained；

Second determining module, for when multiple labels of every image in multiple images described in being obtained by training and in advance When error between the multiple labels first set for correspondence image is more than default error, adjust described initial by gradient descent method Change CNN models and the initialization RNN models, and return according to multiple described images, initialization CNN models are trained Step, until by training obtain described in multiple images multiple labels of every image with advance being set for correspondence image Untill error between multiple labels is less than or equal to the default error, and obtained CNN models will be finally trained to be defined as The CNN models of the training in advance, by the RNN models for finally training obtained RNN models to be defined as the training in advance.

According to the third aspect of the embodiment of the present disclosure there is provided another image classification device, described device includes：

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as performing the image classification method described in above-mentioned first aspect.

It is described computer-readable according to the fourth aspect of the embodiment of the present disclosure there is provided a kind of computer-readable recording medium Be stored with instruction on storage medium, and the image classification side described in above-mentioned first aspect is realized in the instruction when being executed by processor Method.

The technical scheme provided by this disclosed embodiment can include the following benefits：

In the disclosed embodiments, the feature of target image is determined by CNN models, and by RNN models to target figure The feature of picture is handled, to obtain multiple labels of target image.Because the feature of target image is included most by CNN models Latter pond layer output, that is, CNN models are after the feature of target image is obtained, not according to the target image Feature determines the probability for each classification that the target image belongs to, but the feature of the target image directly is passed through into RNN models Handled, to obtain multiple labels of the target image, so as to realize to the image to the content including at least two classifications Classified.

It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary and explanatory, not The disclosure can be limited.

Brief description of the drawings

Accompanying drawing herein is merged in specification and constitutes the part of this specification, shows the implementation for meeting the present invention Example, and for explaining principle of the invention together with specification.

Figure 1A is a kind of configuration diagram for CNN models that the embodiment of the present disclosure is provided.

Figure 1B is a kind of pixel distribution schematic diagram for image that the embodiment of the present disclosure is provided.

Fig. 1 C are a kind of pond processing procedure schematic diagrames that the embodiment of the present disclosure is provided.

Fig. 1 D are a kind of configuration diagrams for RNN models that the embodiment of the present disclosure is provided.

A kind of image classification method flow chart that Fig. 2 provides for the embodiment of the present disclosure.

Another image classification method flow chart that Fig. 3 provides for the embodiment of the present disclosure.

Another image classification method flow chart that Fig. 4 provides for the embodiment of the present disclosure.

A kind of image classification device block diagram that Fig. 5 A provide for the embodiment of the present disclosure.

Another image classification device block diagram that Fig. 5 B provide for the embodiment of the present disclosure.

Fig. 6 is another image classification device block diagram that the embodiment of the present disclosure is provided.

Fig. 7 is another image classification device block diagram that the embodiment of the present disclosure is provided.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the present invention.

Before explanation is explained in detail to the embodiment of the present disclosure,Two nerves being first related to the embodiment of the present disclosure Network model is introduced。

(1) CNN models

CNN models refer to that the one kind grown up on the basis of traditional multilayer neural network is directed to image classification and knowledge A kind of other neutral net, relative to traditional multilayer neural network, introduces convolution algorithm and pond algorithm in CNN models. Wherein, convolution algorithm refers to a kind of mathematical algorithm that the data in regional area are weighted to superposition, pond algorithm refer to by Data in regional area carry out a kind of mathematical algorithm of sampling processing.

The function of CNN models is described in detail below in conjunction with the framework of CNN models.

A kind of configuration diagram for CNN models 101 that Figure 1A provides for the embodiment of the present disclosure, as shown in Figure 1A, the CNN moulds Type 101 includes input layer 1011, convolutional layer 1012 and convolutional layer 1014, pond layer 1013 and pond layer 1015 and full connection Layer 1016.Input layer 1011 is connected with convolutional layer 1012, and convolutional layer 1012 is connected with pond layer 1013, pond layer 1013 and convolution Layer 1014 is connected, and convolutional layer 1014 is connected with pond layer 1015, and pond layer 1015 is connected with full articulamentum 1016.

Wherein, input layer 1011 is used for the pixel value of all pixels point for determining that the image of input includes, and by the image The pixel value of all pixels point transmit to convolutional layer 1012.Convolutional layer 1012 is according to the pixel value of all pixels point received First time convolution algorithm processing is carried out, the pixel after first layer process of convolution is obtained, and by the picture after first time process of convolution Vegetarian refreshments is transmitted to pond layer 1013.Pond layer 1013 is carried out for the first time according to the pixel value of the pixel after first time process of convolution Pond algorithm process, obtain first time pondization processing after pixel, and by first time pondization processing after pixel transmit to Next convolutional layer 1014.

Pixel after 1014 pairs of first time pondization processing of convolutional layer carries out second of convolution algorithm processing, obtains second Pixel after process of convolution, and the pixel after second of process of convolution is transmitted to next pond layer 1015.Pond layer Pixel after 1015 pairs of second of process of convolution carries out second of pond algorithm process, obtains the picture after second of pondization processing Vegetarian refreshments, and the pixel after second of pondization processing is transmitted to full articulamentum 1016.Full articulamentum 1016 is according to second of pond The pixel value of pixel after change processing, determines that the image is belonging respectively to pre-set the general of each classification in multiple classifications Rate, to obtain the label of the image.

It should be noted that the process of first time convolution algorithm processing and the basic phase of process of second of convolution algorithm processing Together, the process of the process of first time pond algorithm process and second of pond algorithm process is essentially identical, namely convolutional layer 1012 The processing procedure of the pixel received with 1014 pairs of convolutional layer is essentially identical, and 1015 pairs of pond layer 1013 and pond layer are received Pixel processing procedure it is essentially identical.Therefore, the process and first time pondization handled below with first time convolution algorithm is calculated Illustrated exemplified by the process of method processing.

Figure 1B is a kind of pixel distribution schematic diagram for image that the embodiment of the present disclosure is provided, and as shown in Figure 1B, works as convolution Layer 1012 is when receiving all pixels point that the image includes, and the is determined according to the first preset order and the first default regional area The pixel of one regional area, and by the pixel value of all pixels point included in first regional area according to default weight Superposition is weighted, and obtained pixel value is defined as to the pixel value of first pixel after first time process of convolution.

Afterwards, the first default regional area is moved according to first preset order, displacement is a row or a line picture Vegetarian refreshments, obtains the pixel of second regional area, and by the pixel value of all pixels point included in second regional area Superposition is weighted according to default weight, and obtained pixel value is defined as the point of the second pixel after first time process of convolution Pixel value.The like, until last regional area includes the last batch of pixel of the image, and by last The pixel value of all pixels point included in regional area is weighted superposition according to default weight, and obtained pixel value is determined For the pixel value of last pixel after first time process of convolution.Now, convolutional layer 1012 obtains first time process of convolution The pixel value of all pixels point afterwards, and the pixel value of all pixels point after first time process of convolution is transmitted to pond layer 1013。

Wherein, the first preset order can be order from left to right from top to bottom, or from top to bottom from a left side to Right order.First default regional area can be the region of 5 × 5 pixels, or the region of 6 × 6 pixels.

In addition, the weight that it is each pixel setting in predeterminable area in advance that default weight, which is,.For example, for shown in Figure 1B Regional area, the default weight of all pixels point included in the regional area is 1/16, at this point for first part Pixel in region, the pixel value for each pixel that first regional area is included and the default multiplied by weight, so Each pixel value after multiplication is added afterwards, the pixel value of first pixel after first time process of convolution is obtained.

As shown in Figure 1 C, when pond layer 1013 receives the pixel value of the point of all pixels after first time process of convolution, According to the second preset order and the second default regional area, the pixel in first regional area is determined, first office is chosen Max pixel value in the pixel value of all pixels point included in portion region, and the max pixel value is defined as first time pond The pixel value of first pixel after change processing.

Afterwards, the second default regional area is moved according to second preset order, displacement is default pixel, The number of the default pixel is identical with the columns or line number of the second default regional area, obtains the picture of second regional area Vegetarian refreshments.Max pixel value in the pixel value of all pixels point included in second regional area is defined as first time pond The pixel value of second pixel point after processing.The rest may be inferred, until last regional area is included at the first time convolution The last batch of pixel in all pixels point after reason, and by the picture of all pixels point included in last regional area Max pixel value in plain value is defined as the pixel value of last pixel after the processing of first time pondization.Now, pond layer 1013 obtain first time pondization processing after all pixels point pixel value, and by first time pondization processing after all pixels point Pixel value transmit to convolutional layer 1014.

Wherein, the second preset order can be order from left to right from top to bottom, or from top to bottom from a left side to Right order.Second default regional area can be the region of 2 × 2 pixels, or the region of 3 × 3 pixels.

For example, the first preset order and the second preset order are order from left to right from top to bottom, the first preset areas Domain is the region of 5 × 5 pixels, and the second predeterminable area is 2 × 2 pixels.If the image that input layer is obtained is 28 × 28 The image of individual pixel, now, it is 24 × 24 pixels, pond that convolutional layer 1012, which obtains the pixel after first time process of convolution, The pixel changed after the first time pondization processing that layer 1013 is obtained is 12 × 12 pixels.

In addition, it is necessary to which the max pixel value in the regional area of selection can be defined as pond by explanation, pond layer The pixel value of corresponding pixel after processing, it is of course possible to determine the pixel of the pixel after pondization processing by other means Value.For example, the average value of the pixel value of the point of all pixels in the regional area of selection can be defined as pond Hua Chu by pond layer The pixel value of corresponding pixel after reason.

It is worth noting that, the CNN models that the embodiment of the present disclosure is provided can include multiple convolutional layers and multiple pond layers, The embodiment of the present disclosure is not specifically limited herein, is only illustrated in figure ia by taking two convolutional layers and two pond layers as an example.

It is further to note that because the pixel value of pixel generally uses RGB (RED/GREEN/BULE, primaries Pattern) three components represent, therefore, during handling the pixel value of pixel, need respectively to tri- points of RGB Amount is respectively processed, therefore in the CNN models 101 shown in Figure 1A, convolutional layer 1012 and convolutional layer 1014 and pond layer 1013 and pond layer 1015 include three modules, to handle respectively tri- components of RGB of pixel.

(2) RNN models

Because traditional neutral net does not have memory function, that is, for traditional neutral net, its input is only The vertical data without context relation.But in practical application, input is usually that some have the sequence of obvious contextual feature Change input.Than the broadcasting content if desired for next frame in prediction video, the now output of neutral net has to rely on the last time Input.Namely, it is desirable to neutral net should have memory function, and RNN models are a kind of neutral net with memory function.

A kind of configuration diagram for RNN models 102 that Fig. 1 D provide for the embodiment of the present disclosure, should shown on the left of Fig. 1 D RNN models include input layer 1021, hidden layer 1022 and the three-decker of output layer 1023, and hidden layer 1022 is loop configuration.Its In, input layer 1021 is connected with hidden layer 1022, and hidden layer 1022 is connected with output layer 1023.

The function of the RNN models 102 for convenience of description, by the structure of the RNN models 102 shown in Fig. 1 D according to such as Fig. 1 D Structure expansion shown in right side.Because the data that the input layer of RNN models is received are the number according to certain time sequence permutation According to, namely the data that input layer is received are sequence data, are x by the labeling sequential data for convenience of description₁、x₂、…、 x_n, each data in the sequence data are t at the time of corresponding to respectively₁、t₂、…、t_n。

Shown on the right side of Fig. 1 D, in RNN models after the expansion, t₁The data that moment input layer 1021 is received are x₁, And by data x₁Transmit to hidden layer 1022,1022 couples of x of hidden layer₁Handled, and by the data transfer after processing to defeated Go out layer 1023, obtain t₁The output data at moment.t₂The data that moment input layer 1021 is received are x₂, and by data x₂Pass Hidden layer 1022 is transported to, now hidden layer 1022 is according to t₁The output data at moment is to the x₂Handled, and by the number after processing According to transmitting to output layer 1023, t is obtained₂The output data at moment.That is, t at any time_i, hidden layer 1022 is except receiving t_iThe data x that moment input layer 1021 is transmitted_i, also receive t_i-1The output data at moment, and according to t_i-1The output data at moment To data x_iHandled, obtain t_iThe output data at moment.

The application scenarios of the embodiment of the present disclosure are introduced below.In actual life, when user sees an image, The result that user directly can see according to eyes determines the classification of the image, such as sees the image of a cat, determines the image Classification be cat；See the image of a dog, the classification for determining the image is dog.But for computer, work as computer When receiving an image, what is actually received is the pixel value for all pixels that the image includes.Therefore, computer The pixel value of all pixels included according to the image received is needed to determine that the classification of the image, namely computer are needed The image is classified.The image classification method that the embodiment of the present disclosure is provided applied to computer with regard to being divided the image In the scene of class.In order to subsequently be easy to explanation, it would be desirable to the image classified referred to as target image.

Mainly realized because in correlation technique, computer is classified to image by single labeling method.When When one image includes the content of at least two classifications, it can only determine that the classification of the image is by single labeling method One kind at least two classification, without can determine that the image while belonging at least two classification, namely above-mentioned single label Sorting technique can not classify to the image of the content including at least two classifications.

Therefore, in the disclosed embodiments, after the feature of target image is determined by CNN models, RNN models are passed through Feature to target image is handled, to obtain multiple labels of target image.Because the feature of target image is by CNN models Including layer output of last pond, that is, CNN models are after the feature of target image is obtained, not according to the mesh The feature of logo image determines the probability for each classification that the target image belongs to, but directly passes through the feature of the target image RNN models are handled, to obtain multiple labels of the target image, so as to realize to including the content of at least two classifications Image classified.

It is described in detail below in conjunction with accompanying drawing for the image classification method that the embodiment of the present disclosure is provided.

A kind of image classification method flow chart that Fig. 2 provides for the embodiment of the present disclosure, this method is applied to any required right In the equipment that image is classified, the equipment can be terminal or server.As shown in Fig. 2 this method comprises the following steps.

In step 201, the target image that needs are classified is handled by the CNN models of training in advance, with Determine the feature for last the pond layer output that the CNN models include.

In step 202., the feature of determination is passed through into the RNN models pair of training in advance as the feature of the target image The feature of the target image is handled, and each label obtained in multiple labels of the target image, the plurality of label is used for Indicate the classification belonging to the picture material in the target image.

Alternatively, the feature of the target image is handled by the RNN models of training in advance, obtains the target image Multiple labels, including：

Using the feature of the target image as the input of the RNN models, and pass through spy of the RNN models to the target image Levy and handled, obtain the first label, the first label is one in the plurality of label；

Using the first label as the RNN models input, and by the RNN models to the feature of the target image and first Label is handled, and obtains the second label, and the second label is one in multiple labels；

Cycle-index is determined, the cycle-index refers to spy of the RNN mold cycles to the label of input and the target image Levy carry out number of processing；

When the cycle-index is less than or equal to preset times, the second label is continued into the input as the RNN models, followed Ring performs aforesaid operations, until when the cycle-index is more than preset times, obtaining multiple labels.

Using the first label as the RNN models input, and by the RNN models to the feature of the target image and first Label is handled, and obtains the second label, and the second label is one in the plurality of label；

When the probability is more than or equal to predetermined probabilities, the second label is continued into the input as the RNN models, circulation is held Row aforesaid operations, until when the probability is less than the predetermined probabilities, obtaining the plurality of label.

Alternatively, before being handled by the CNN models of training in advance target image, in addition to：

According to multiple images, initialization CNN models are trained, the CNN models after being trained and multiple figures The feature of every image as in；

According to the feature of every image in multiple images, initialization RNN models are trained, after being trained Multiple labels of every image in RNN models and multiple images；

When by training multiple labels of every image in obtained multiple images and being set in advance for correspondence image When error between multiple labels is more than default error, initialization CNN models and the initialization are adjusted by gradient descent method RNN models, and return according to multiple images, the step of being trained to initialization CNN models, until obtained by training Error in multiple images between multiple labels of every image and the multiple labels set in advance for correspondence image be less than or , will be final untill equal to the default error, and by the CNN models for finally training obtained CNN models to be defined as the training in advance Obtained RNN models are trained to be defined as the RNN models of the training in advance.

Above-mentioned all optional technical schemes, can form the alternative embodiment of the disclosure according to any combination, and the disclosure is real Example is applied no longer to repeat this one by one.

Another image classification method flow chart that Fig. 3 provides for the embodiment of the present disclosure, this method is applied to any need In the equipment classified to image, the equipment can be terminal or server.As shown in figure 3, this method comprises the following steps.

In step 301, the CNN models of training in advance and the RNN models of training in advance are determined.

Because the embodiment of the present disclosure is that target image is classified by CNN models and RNN models, therefore, right Before target image is classified, need first to be trained CNN models and RNN models, that is, determining the CNN moulds of training in advance The RNN models of type and training in advance.

Wherein, the implementation process of step 301 can be：Training sample set is determined, the training sample set includes multiple images The multiple labels set in advance for every image.According to multiple images, initialization CNN models are trained, instructed The feature of every image in CNN models and multiple images after white silk.According to the feature of every image in multiple images, to first Beginningization RNN models are trained, multiple labels of every image in the RNN models after being trained and multiple images.

When by train multiple labels of every image in obtained multiple images with advance for correspondence image set it is many When error between individual label is more than default error, initialization CNN models and initialization RNN are adjusted by gradient descent method Model, and return according to multiple images, the step of being trained to initialization CNN models, until passing through being somebody's turn to do that training is obtained Error in multiple images between the multiple labels and the advance multiple labels set for correspondence image of every image is less than or waited Untill the default error, and by the CNN models for finally training obtained CNN models to be defined as the training in advance, will finally it instruct The RNN models got are defined as the RNN models of the training in advance.

That is, after first time is to initialization CNN models and initialization RNN model trainings, according to being somebody's turn to do that training is obtained Multiple labels of every image in multiple images and the multiple labels set in advance for correspondence image, judge whether to need to continue Training is proceeded to initialization CNN models and initialization RNN.If it is determined that needing to continue to initialization CNN models and initialization RNN proceeds training, then initialization CNN models and initialization RNN is adjusted according to the method described above, obtains adjusting for the first time CNN models and RNN models after whole, and initialize CNN models according to above-mentioned training and initialize the process of RNN models, training The RNN models after CNN models and first time adjustment after adjusting for the first time, obtain this time training in obtained multiple images Multiple labels of every image, return to multiple labels for performing every image in obtained multiple images of training and pre- The multiple labels first set for correspondence image, judge whether to need to continue to proceed initialization CNN models and initialization RNN The operation of training, circulation performs above procedure.Until judge need not to the CNN models after adjustment and adjustment after RNN after It is continuous to be trained.

Wherein, training sample set is a predetermined image collection, and the image collection includes multiple images and advance The multiple labels set for every image.Set multiple labels can be by way of administrative staff add come real for every image It is existing, namely when receiving the label addition instruction that administrative staff are sent by predetermined registration operation, will be taken in label addition instruction Multiple labels of band are set to multiple labels of this every image.

For example, when training sample set includes 1000 images, administrative staff can be in advance to be every in 1000 images Open image and add 5 labels, namely the training sample set includes 1000 images and 5 for every image setting mark in advance Label.

In addition, initialization CNN models and initialization RNN models are adjusted by gradient descent method, that is, according to training Error in multiple obtained images between the multiple labels and the advance multiple labels set for correspondence image of every image, really Determine gradient vector, the gradient vector is multiple labels of every image in multiple images that training is obtained with being in advance correspondence image The direction vector that error between the multiple labels set declines.According to the gradient vector, to initialization CNN models and RNN moulds Type is adjusted.

Because multiple labels that gradient vector is every image in multiple images that training is obtained are with being in advance correspondence image The direction vector that error between the multiple labels set declines, therefore after CNN models and the adjustment after adjustment is passed through RNN models when being trained again to training sample set, every image in multiple images obtained relative to last training Multiple labels and multiple labels for being set in advance for correspondence image between error, this time train in multiple obtained images often The error opened between multiple labels of image and the advance multiple labels set for correspondence image will decline.

In step 302, the target image that needs are classified is handled by the CNN models of training in advance, with Determine the feature for last the pond layer output that the CNN models include.

Step 302 is described in detail by taking the CNN models shown in Figure 1A as an example.That is, the CNN models of the training in advance Including two pairs of convolution-pond layer, each pair convolution-pond layer includes a convolutional layer and a pond layer.

As shown in Figure 1A, all pictures that the input layer 1011 that the CNN models of the training in advance include includes target image The pixel value of vegetarian refreshments transmits the first pair of convolution-pond layer included to CNN models.First pair of convolution-pond layer is according to mesh The pixel value for all pixels point that logo image includes is handled the target image, and the pixel after processing is transmitted under The convolutional layer that a pair of convolution-pond layers include, is proceeded by lower a pair of convolution-pond layer to the pixel after processing Reason, until second pair of convolution-pond layer exports the feature of the target image.

Wherein, all pixels point that the first convolutional layer 1012 that first pair of convolution-pond layer includes includes to target image First time convolution algorithm processing is carried out, the pixel after first time process of convolution is obtained, and by the picture after first time process of convolution Vegetarian refreshments is input to the first pond layer 1013 that first pair of convolution-pond layer includes.At first 1013 pairs of pond layer first time convolution Pixel after reason carries out first time pond algorithm process, obtains the pixel after first time pondization is handled, and by first time pond Pixel after change processing is transmitted to second pair of convolution-pond layer, by second pair of convolution-pond layer according to above-mentioned same side Method carries out second of convolution algorithm processing and second of pond algorithm process..

Wherein, the processing of first time convolution algorithm and the processing of second convolution algorithm, and first time pond algorithm process with The detailed process of second of pond algorithm process may be referred to convolution algorithm processing procedure and the pond of the CNN models shown in Figure 1A Algorithm process process, is no longer elaborated herein.

It should be noted that compared with the process that the CNN models shown in Figure 1A are classified to image, the embodiment of the present disclosure The CNN models of the training in advance of offer are when determining the feature for last the pond layer output that the CNN models include, not There is the feature by last the pond layer output to transmit to full articulamentum, but the feature that last pond layer is exported Transmit to the RNN models of training in advance.

That is, when determining the feature for last the pond layer output that the CNN models include, the feature of determination is made For the feature of the target image, and the feature of the target image is handled by the RNN models of training in advance, obtain the mesh Each label in multiple labels of logo image, the plurality of label is used to indicate the class belonging to the picture material in the target image Not.Wherein, the feature of the target image is handled by the RNN models of training in advance, obtains the multiple of the target image Label can be realized by following step 303 to step 306.

In step 303, using the feature of the target image as the RNN models input, and by the RNN models to this The feature of target image is handled, and obtains the first label, and the first label is one in the plurality of label.

From step 301, the RNN models of the training in advance are multiple images concentrated according to training sample and are in advance What multiple labels that every image is set were determined, therefore, when using the feature of the target image as during the input of the RNN models, The RNN models can export a label, namely obtain the first label in the plurality of label.

Illustrated by taking the RNN models shown in Fig. 1 D as an example, work as t₁Moment, the RNN mode inputs layer received target image Feature when, input layer transmits the feature of the target image to hidden layer, and the feature of the target image is carried out by hidden layer Processing, obtains the first label.Namely first label be t₁The output data at moment, hidden layer transmits first label to output Layer, first label is exported by output layer.

In step 304, using the first label as the RNN models input, and by the RNN models to the target image Feature and the first label handled, obtain the second label, the second label is one in multiple labels.

As shown in figure iD, due to t₂The input layer of the moment RNN models does not receive next data, now hides Layer continues t₁The feature that the moment RNN mode inputs layer receives target image is defined as t₂The input data x at moment₂, and root According to t₁The label of output data first at moment, the feature to the target image proceeds processing, obtains the second label.

That is, in any instant t shown in Fig. 1 D_i, the hidden layer of the RNN models is directly by t₁The moment RNN mode inputs The feature that layer receives target image is defined as t_iThe input data x at moment_i, and the i-th -1 mark obtained according to last moment Label, the feature to the destination object proceeds processing, to obtain i-th of label.

In step 305, cycle-index is determined, the cycle-index refers to the RNN mold cycles to the label of input and is somebody's turn to do The feature of target image carries out number of processing.

In the RNN models shown in Fig. 1 D, due to neural network model of the RNN models for circulation, therefore, practical application In go down in order to avoid the RNN model Infinite Cyclics, it usually needs for the RNN models, loop termination condition is set.

Can be that the RNN models set end condition by cycle-index in a kind of possible implementation.That is, In the cyclic process that RNN models carry out as shown in figure iD, after a label is obtained, cycle-index is determined.And according to this Cycle-index, judges whether to need to proceed circulate operation.Wherein, judge whether to need the process for proceeding circulate operation It can be realized by following step 306.

Within step 306, when the cycle-index is less than or equal to preset times, the second label is continued to be used as the RNN The input of model, circulation performs aforesaid operations, until when the cycle-index is more than preset times, obtaining multiple labels.

Wherein, after second label is obtained, it is 1 to determine cycle-index；After the 3rd label is obtained, it is determined that following Ring number of times is 2；The like, after i-th of label is obtained, it is i-1 to determine cycle-index.

Preset times are the number of times pre-set, and the preset times can be 5,10 or 15 etc..For example, preset times are 10, after i-th of label is obtained, it is i-1 to determine cycle-index, and judges whether cycle-index i-1 is more than 5, when i-1 is small When more than 5, continue above-mentioned circulate operation, until judging that i-1 is more than 5, now, obtain the multiple of the target image Label.

Alternatively, in RNN models carry out cyclic process as shown in figure iD, except can be by Fig. 3 embodiments Step 305 and step 306 are that the RNN models set loop termination condition, can also be set by other means for the RNN models Loop termination condition, following embodiments will provide the implementation of another setting loop termination condition.

Another image classification method flow chart that Fig. 4 provides for the embodiment of the present disclosure, this method is applied to any need In the equipment classified to image, the equipment can be terminal or server.As shown in figure 4, this method comprises the following steps.

In step 401, the CNN models of training in advance and the RNN models of training in advance are determined.

Wherein, the implementation process of step 401 may be referred to the implementation process of the step 301 in Fig. 3 embodiments, not exist herein Elaborate.

In step 402, the target image that needs are classified is handled by the CNN models of training in advance, with Determine the feature for last the pond layer output that the CNN models include.

Wherein, the implementation process of step 402 may be referred to the implementation process of the step 302 in Fig. 3 embodiments, not exist herein Elaborate.

Now, using the feature of determination as the target image feature, and by the RNN models of training in advance to the target The feature of image is handled, and obtaining each label in multiple labels of the target image, the plurality of label is used to indicate this The classification belonging to picture material in target image.Wherein, the feature of the target image is entered by the RNN models of training in advance Row processing, obtaining multiple labels of the target image can be realized by following step 403 to step 406.

In step 403, using the feature of the target image as the RNN models input, and by the RNN models to this The feature of target image is handled, and obtains the first label, and the first label is one in the plurality of label.

Wherein, the implementation process of step 403 may be referred to the implementation process of the step 303 in Fig. 3 embodiments, not exist herein Elaborate.

In step 404, using the first label as the RNN models input, and by the RNN models to the target image Feature and the first label handled, obtain the second label, the second label is one in the plurality of label.

Wherein, the implementation process of step 404 may be referred to the implementation process of the step 304 in Fig. 3 embodiments, not exist herein Elaborate.

In step 405, the target image is determined while belonging to the corresponding classification of all labels that currently determines to obtain Probability.

The embodiment of the present disclosure provides another implementation for RNN models setting loop termination condition.That is, RNN models are carried out in cyclic process as shown in figure iD, after a label is obtained, and are determined the target image and are worked as while belonging to The probability of the preceding corresponding classification of all labels for determining to obtain.And according to the probability, judge whether that needs proceed circulation behaviour Make.Wherein, judge whether to need the process for proceeding circulate operation to realize by following step 406.

Due in existing RNN models, RNN models t at any one time_i, determine t_iAfter the output data at moment, The probability that currently available all output datas occur simultaneously will be determined.Therefore, in this bright embodiment, one is often being obtained After individual label, RNN models can determine whether out the probability that currently available all labels occur simultaneously, namely determine the target Image belongs to the probability for the corresponding classification of all labels for currently determining to obtain simultaneously.

Because with the increase of cycle-index, the probability that currently available all output datas occur will more and more simultaneously It is small, namely with the increase of cycle-index, the target image belongs to the corresponding classification of all labels for currently determining to obtain simultaneously Probability also will be less and less.It therefore, it can that predetermined probabilities will be set, and the predetermined probabilities be defined as loop termination condition. That is, during RNN mold cycles, after a label is obtained, determining the probability.And according to the probability, judgement is It is no to need to proceed circulate operation.Wherein, judge whether to need the process for proceeding circulate operation can be by following steps Rapid 306 realize.

In a step 406, when the probability is more than or equal to predetermined probabilities, the second label is continued to be used as the RNN models Input, circulation perform aforesaid operations, until the probability be less than the predetermined probabilities when, obtain the plurality of label.

Wherein, predetermined probabilities are the probability pre-set, and the predetermined probabilities can be 0.01,0.02 or 0.03 etc..For example, Predetermined probabilities are 0.01, after i-th of label is obtained, and determine the target image while belonging to all marks for currently determining to obtain The probability of corresponding classification is signed, and judges whether the probability is more than or equal to 0.01, is more than 0.01 when the probability is more than or equal to When, continue above-mentioned circulate operation, until judging that the probability is less than 0.01, now, obtain multiple labels of the target image.

It should be noted that in the disclosed embodiments, can be by the step 305 in Fig. 3 embodiments and step 306 The RNN models set loop termination condition, can also be the RNN models by the step 405 in Fig. 4 embodiments and step 406 Loop termination condition is set.Certainly, in practical application, can also simultaneously by the step 305 and step 306 in Fig. 3 embodiments, And the step 405 and step 406 in Fig. 4 embodiments are that the RNN models set loop termination condition, that is, being the RNN models It is provided with two loop termination conditions.Now, in the cyclic process that RNN models carry out as shown in figure iD, a mark is being obtained After label, while cycle-index and above-mentioned probability are determined, when one in cycle-index and above-mentioned probability meets corresponding circulation During end condition, you can terminate the cyclic process of RNN models.

The embodiment of the present disclosure additionally provides a kind of image classification device in addition to providing above-mentioned image classification method, under The image classification device that the embodiment of the present disclosure is provided will be described in detail for face.

A kind of block diagram for image classification device 500 that Fig. 5 A provide for the embodiment of the present disclosure, as shown in Figure 5A, the image point Class device 500 includes first processing module 501 and Second processing module 502.

First processing module 501, the target image classified for the CNN models by training in advance to needs is carried out Processing, to determine the feature for last the pond layer output that the CNN models include；

Second processing module 502, for as the feature of the target image, the feature of determination to be passed through into training in advance RNN models are handled the feature of the target image, obtain each in multiple labels of the target image, the plurality of label Label is used to indicate the classification belonging to the picture material in the target image.

Alternatively, the Second processing module includes：

First processing submodule, for as the input of the RNN models, and the feature of the target image to be passed through into the RNN Model is handled the feature of the target image, obtains the first label, and the first label is one in the plurality of label；

Second processing submodule, for using the first label as the RNN models input, and by the RNN models to this The feature of target image and the first label are handled, and obtain the second label, and the second label is one in the plurality of label；

First determination sub-module, for determining cycle-index, the cycle-index refers to mark of the RNN mold cycles to input The feature of label and the target image carries out number of processing；

First circulation submodule, for when the cycle-index is less than or equal to preset times, the second label being continued to make For the input of the RNN models, circulation performs aforesaid operations, until when the cycle-index is more than preset times, obtaining the plurality of mark Label.

Alternatively, the Second processing module 502 includes：

Second determination sub-module, currently determines that obtained all labels are corresponding for determining the target image while belonging to The probability of classification；

Second circulation submodule, for when the probability is more than or equal to predetermined probabilities, the second label being continued to be used as this The input of RNN models, circulation performs aforesaid operations, until when the probability is less than predetermined probabilities, obtaining the plurality of label.

Alternatively, referring to Fig. 5 B, the device 500 also includes the first determining module 503, the first training module 504, second and instructed Practice the determining module 506 of module 505 and second：

First determining module 503, for determining training sample set, the training sample set includes multiple images and is in advance every Open multiple labels that image is set；

First training module 504, for according to multiple images, being trained, being obtained after training to initialization CNN models CNN models and multiple images in every image feature；

Second training module 505, for the feature according to every image in multiple images, enters to initialization RNN models Multiple labels of every image in row training, the RNN models and multiple images after being trained；

Second determining module 506, in by obtained multiple images of training multiple labels of every image with When error between the multiple labels set in advance for correspondence image is more than default error, this is adjusted by gradient descent method initial Change CNN models and initialization RNN models, and return according to multiple images, the step being trained to initialization CNN models Suddenly, until multiple labels of every image are with being in advance the multiple of correspondence image setting in multiple images obtained by training Untill error between label is less than or equal to the default error, and it is advance finally to train obtained CNN models to be defined as this The CNN models of training, by the RNN models for finally training obtained RNN models to be defined as the training in advance.

On the device in above-described embodiment, wherein modules perform the concrete mode of operation in relevant this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

Fig. 6 is the block diagram for another image classification device 600 that the embodiment of the present disclosure is provided.For example, device 600 can be Mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building is set It is standby, personal digital assistant etc..

Reference picture 6, device 600 can include following one or more assemblies：Processing assembly 602, memory 604, power supply Component 606, multimedia groupware 608, audio-frequency assembly 610, the interface 612 of input/output (I/O), sensor cluster 614, and Communication component 616.

The integrated operation of the usual control device 600 of processing assembly 602, such as with display, call, data communication, phase Machine operates the operation associated with record operation.Processing assembly 602 can refer to including one or more processors 620 to perform Order, to complete all or part of step of above-mentioned method.In addition, processing assembly 602 can include one or more modules, just Interaction between processing assembly 602 and other assemblies.For example, processing assembly 602 can include multi-media module, it is many to facilitate Interaction between media component 608 and processing assembly 602.

Memory 604 is configured as storing various types of data supporting the operation in device 600.These data are shown Example includes the instruction of any application program or method for being operated on device 600, and contact data, telephone book data disappears Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group Close and realize, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) is erasable to compile Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 606 provides power supply for the various assemblies of device 600.Power supply module 606 can include power management system System, one or more power supplys, and other components associated with generating, managing and distributing power supply for device 600.

Multimedia groupware 608 is included in the screen of one output interface of offer between described device 600 and user.One In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch sensings Device is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or sliding action Border, but also detection touches or slide related duration and pressure with described.In certain embodiments, many matchmakers Body component 608 includes a front camera and/or rear camera.When device 600 be in operator scheme, such as screening-mode or During video mode, front camera and/or rear camera can receive the multi-medium data of outside.Each front camera and Rear camera can be a fixed optical lens system or with focusing and optical zoom capabilities.

Audio-frequency assembly 610 is configured as output and/or input audio signal.For example, audio-frequency assembly 610 includes a Mike Wind (MIC), when device 600 be in operator scheme, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The audio signal received can be further stored in memory 604 or via communication set Part 616 is sent.In certain embodiments, audio-frequency assembly 610 also includes a loudspeaker, for exports audio signal.

I/O interfaces 612 is provide interface between processing assembly 602 and peripheral interface module, above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor cluster 614 includes one or more sensors, and the state for providing various aspects for device 600 is commented Estimate.For example, sensor cluster 614 can detect opening/closed mode of device 600, the relative positioning of component is for example described Component is the display and keypad of device 600, and sensor cluster 614 can be with 600 1 components of detection means 600 or device Position change, the existence or non-existence that user contacts with device 600, the orientation of device 600 or acceleration/deceleration and device 600 Temperature change.Sensor cluster 614 can include proximity transducer, be configured to detect in not any physical contact The presence of neighbouring object.Sensor cluster 614 can also include optical sensor, such as CMOS or ccd image sensor, for into As being used in application.In certain embodiments, the sensor cluster 614 can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 616 is configured to facilitate the communication of wired or wireless way between device 600 and other equipment.Device 600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 616 also includes near-field communication (NFC) module, to promote junction service.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 600 can be believed by one or more application specific integrated circuits (ASIC), numeral Number processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing above-mentioned Fig. 2, Fig. 3 and Fig. 4 institute Show the image classification method that embodiment is provided.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 604 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 620 of device 600.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is held by the processor of terminal During row so that terminal is able to carry out the image classification method that above-mentioned Fig. 2, Fig. 3 and embodiment illustrated in fig. 4 are provided.

Fig. 7 is the block diagram for another image classification device 700 that the embodiment of the present disclosure is provided.For example, device 700 can be by It is provided as a server.Reference picture 7, device 700 includes processor 722, and it further comprises one or more processors, and As the memory resource representated by memory 732, for store can by the execution of processor 722 instruction, such as application program. The application program stored in memory 732 can include it is one or more each correspond to the module of one group of instruction. In addition, processor 722 is configured as execute instruction, to perform the image classification that above-mentioned Fig. 2, Fig. 3 and embodiment illustrated in fig. 4 are provided Method.

Device 700 can also include the power management that a power supply module 726 is configured as performs device 700, and one has Line or radio network interface 750 are configured as device 700 being connected to network, and input and output (I/O) interface 758.Dress Putting 700 can operate based on the operating system for being stored in memory 732, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 732 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 722 of device 700.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processor of server During execution so that server is able to carry out the image classification method that above-mentioned Fig. 2, Fig. 3 and embodiment illustrated in fig. 4 are provided.

Those skilled in the art will readily occur to its of the present invention after considering specification and putting into practice invention disclosed herein Its embodiment.The application be intended to the present invention any modification, purposes or adaptations, these modifications, purposes or Person's adaptations follow the general principle of the present invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as exemplary, and true scope and spirit of the invention are by following Claim is pointed out.

It should be appreciated that the invention is not limited in the precision architecture for being described above and being shown in the drawings, and And various modifications and changes can be being carried out without departing from the scope.The scope of the present invention is only limited by appended claim.

Claims

1. a kind of image classification method, it is characterised in that methods described includes：

The target image that needs are classified is handled by the convolutional neural networks CNN models of training in advance, to determine The feature for last the pond layer output that the CNN models include；

Using the feature of determination as the target image feature, by the Recognition with Recurrent Neural Network RNN models of training in advance to institute The feature for stating target image is handled, and obtains each label in multiple labels of the target image, the multiple label For indicating the classification belonging to the picture material in the target image.

2. according to the method described in claim 1, it is characterised in that the RNN models by training in advance are to the target The feature of image is handled, and obtains multiple labels of the target image, including：

Using the feature of the target image as the RNN models input, and by the RNN models to the target image Feature handled, obtain the first label, first label is one in the multiple label；

Using first label as the input of the RNN models, and pass through feature of the RNN models to the target image Handled with first label, obtain the second label, second label is one in the multiple label；

Determine cycle-index, the cycle-index refers to the RNN mold cycles to the label of input and the target image Feature carries out number of processing；

When the cycle-index is less than or equal to preset times, second label is continued to be used as the defeated of the RNN models Enter, circulation performs aforesaid operations, until when the cycle-index is more than the preset times, obtaining the multiple label.

3. according to the method described in claim 1, it is characterised in that the RNN models by training in advance are to the target The feature of image is handled, and obtains multiple labels of the target image, including：

When the probability is more than or equal to predetermined probabilities, second label is continued into the input as the RNN models, followed Ring performs aforesaid operations, until when the probability is less than the predetermined probabilities, obtaining the multiple label.

4. according to any described method of claims 1 to 3, it is characterised in that the CNN models by training in advance are to mesh Before logo image is handled, in addition to：

Training sample set is determined, the training sample set includes multiple images and the multiple labels set in advance for every image；

According to multiple described images, initialization CNN models are trained, CNN models and multiple described figures after being trained The feature of every image as in；

According to the feature of every image in multiple described images, initialization RNN models are trained, the RNN after being trained Multiple labels of every image in model and multiple described images；

Described in obtained by training in multiple images multiple labels of every image with advance for correspondence image set it is many When error between individual label is more than default error, the initialization CNN models are adjusted and described initial by gradient descent method Change RNN models, and return according to multiple described images, the step of being trained to initialization CNN models, until by training Error in multiple images described between the multiple labels and the advance multiple labels set for correspondence image of every image Untill less than or equal to the default error, and by the CNN moulds for finally training obtained CNN models to be defined as the training in advance Type, by the RNN models for finally training obtained RNN models to be defined as the training in advance.

5. a kind of image classification device, it is characterised in that described device includes：

First processing module, the target figure that needs are classified for the convolutional neural networks CNN models by training in advance As being handled, to determine the feature for last the pond layer output that the CNN models include；

Second processing module, for as the feature of the target image, the feature of determination to be passed through into the circulation god of training in advance The feature of the target image is handled through network RNN models, multiple labels of the target image are obtained, it is the multiple Each label in label is used to indicate the classification belonging to the picture material in the target image.

6. device according to claim 5, it is characterised in that the Second processing module includes：

First processing submodule, for using the feature of the target image as the RNN models input, and by described RNN models are handled the feature of the target image, obtain the first label, first label is in the multiple label One；

Second processing submodule, for as the input of the RNN models, and first label to be passed through into the RNN models Feature and first label to the target image are handled, and obtain the second label, and second label is described many One in individual label；

First circulation submodule, for when the cycle-index is less than or equal to preset times, second label to be continued As the input of the RNN models, circulation performs aforesaid operations, until when the cycle-index is more than the preset times, obtaining To the multiple label.

7. device according to claim 5, it is characterised in that the Second processing module includes：

Second determination sub-module, for determining the target image while belonging to the corresponding class of all labels for currently determining to obtain Other probability；

Second circulation submodule, for when the probability be more than or equal to predetermined probabilities when, will second label continuation as The input of the RNN models, circulation performs aforesaid operations, until when the probability is less than the predetermined probabilities, obtaining described many Individual label.

8. according to any described device of claim 5 to 7, it is characterised in that described device also includes：

First determining module, for determining training sample set, the training sample set includes multiple images and is in advance every figure As the multiple labels set；

First training module, for according to multiple described images, being trained to initialization CNN models, the CNN after being trained The feature of every image in model and multiple described images；

Second training module, for the feature according to every image in multiple described images, is instructed to initialization RNN models Practice, multiple labels of every image in the RNN models and multiple described images after being trained；

Second determining module, for multiple labels when every image in multiple images described in being obtained by training with being in advance When error between multiple labels that correspondence image is set is more than default error, the initialization is adjusted by gradient descent method CNN models and the initialization RNN models, and return according to multiple described images, the step being trained to initialization CNN models Suddenly, it is many of correspondence image setting up to multiple labels of every image in multiple images described in being obtained by training and in advance Untill error between individual label is less than or equal to the default error, and obtained CNN models will be finally trained to be defined as institute The CNN models of training in advance are stated, by the RNN models for finally training obtained RNN models to be defined as the training in advance.

9. a kind of image classification device, it is characterised in that described device includes：

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as the step of perform claim requires any one method described in 1-4.

10. be stored with instruction on a kind of computer-readable recording medium, the computer-readable recording medium, it is characterised in that The step of instruction realizes any one method described in claim 1-4 when being executed by processor.