CN108427957A

CN108427957A - image classification method and system

Info

Publication number: CN108427957A
Application number: CN201710081054.9A
Authority: CN
Inventors: 樊春玲; 张云; 姜青山
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2018-08-21
Anticipated expiration: 2037-02-15
Also published as: CN108427957B

Abstract

The present invention relates to a kind of image classification methods, including：Predict that the original image belongs to the probability value of each classification；Judge whether to need to open low resolution network；Low-resolution image is obtained to the original image down-sampling, predicts that the low-resolution image belongs to the probability value of each classification；Judge whether to need to open salient region network；It merges the prediction result of the prediction result of original image network and low resolution network to obtain image category；Conspicuousness is carried out to the original image to detect to obtain salient region image, predicts that the salient region image belongs to the probability value of each classification；It merges the prediction result of the prediction result of original image network, the prediction result of low resolution network and salient region network to obtain image category.The invention further relates to a kind of image classification systems.The present invention can obtain the information of image multi-dimensional degree using multiple dimensioned image information and vision significance, and improve the nicety of grading of image.

Description

Image classification method and system

Technical field

The present invention relates to a kind of image classification method and systems.

Background technology

Image classification is all widely used in many application fields, for example, target identification, image understanding, based on content Image retrieval etc..In recent years, breakthrough is obtained in image processing field with deep learning, is schemed using deep learning research As classification has become a research hotspot.

Currently, many convolutional neural networks require that the input of convolutional layer is image segments (such as the 224* of fixed size 224), because full articulamentum need input identical dimensional feature, and the dimension of feature be by input image segments size, What convolutional layer and pooling layers of size and step-length determined.

When people watches an object, what closely viewing obtained is more image detail informations, and remote viewing obtains What is obtained is more profile information, therefore the information of a variety of scales has complementarity to the identification of image.Meanwhile the recipient of image It is people, the vision attention of people has a great impact to the understanding of image.But utilizing multiple dimensioned image there is presently no a kind of Information and vision significance obtain the information of image multi-dimensional degree, and carry out the method or system of image classification.

Invention content

In view of this, it is necessary to provide a kind of image classification method and system, multiple dimensioned image information can be utilized The information of image multi-dimensional degree is obtained with vision significance, and carries out image classification.

The present invention provides a kind of image classification method, and this method comprises the following steps：A. by original image input depth volume Product network, to predict that the original image belongs to the probability value of each classification, wherein the original image belongs to each classification Probability value is the prediction result of original image network；B. judge whether to need to open low resolution network：It is low if necessary to open Resolution ratio network, then enter step c；Otherwise, flow terminates；C. low-resolution image is obtained to the original image down-sampling, The low-resolution image is inputted into the depth convolutional network, to predict that the low-resolution image belongs to the general of each classification Rate value, wherein the probability value that the low-resolution image belongs to each classification is the prediction result of low resolution network；D. judge Whether need to open salient region network：If you do not need to opening salient region network, then e is entered step, flow terminates； If necessary to open salient region network, then f is entered step；E. by the prediction result of original image network and low resolution net The prediction result of network merges to obtain image category；F. conspicuousness is carried out to the original image to detect to obtain salient region figure The salient region image is inputted the depth convolutional network, to predict that the salient region image belongs to each by picture The probability value of classification, wherein the probability value that the salient region image belongs to each classification is the pre- of salient region network Survey result；G. by the pre- of the prediction result of original image network, the prediction result of low resolution network and salient region network Result is surveyed to merge to obtain image category.

Specifically, the depth convolutional network includes：1 input layer, 5 convolutional layers and 3 full articulamentums, wherein：The 1, pooling layers of a Max is respectively met after 2,5 convolutional layers, the last one full articulamentum is output layer.

Specifically, the output layer belongs to the probability value of each classification, the softmax using softmax function predictions Function is：

Wherein, x_iIt is the input of each node in last layer, n is last layer of all node number.

Specifically, the step b is specifically included：It is adopted as the maximum value setting threshold value of the probability value of each dimension prediction Mode decides whether unlatching next stage dimension network, specially：

Wherein, P_i={ P_i1,P_i2,…,P_iCBe i-stage neural network forecast probability value, value range is [0,1], and C is class Other number,

If the most probable value of original image network output is less than the threshold value T1 of setting, judgement needs to automatically turn on low Resolution ratio network.

Specifically, the step e is specifically included：

By the probability value weighted blend of original image network and low resolution network：

P_i,j=w_iP_i+w_jP_j

Wherein, w_iIt is the weight of i-stage network, P_iIt is the probability value of i-stage neural network forecast, w_jIt is the power of j-th stage network Weight, P_jIt is the probability value of j-th stage neural network forecast, P_ijIt is the classification for the fusion that i-stage network and j-th stage network weights are mixed to get Prediction probability value.

The present invention also provides a kind of image classification systems, including original image network module, judgment module, low resolution net Network module, salient region network module and integrated classification module, wherein：The original image network module is used for original graph As input depth convolutional network, to predict that the original image belongs to the probability value of each classification, wherein the original image category In the prediction result that the probability value of each classification is original image network；The judgment module is low for judging whether to need to open Resolution ratio network；The low resolution network module is used for after opening low resolution network, to the original image down-sampling Low-resolution image is obtained, the low-resolution image is inputted into the depth convolutional network, to predict the low resolution figure Probability value as belonging to each classification, wherein the probability value that the low-resolution image belongs to each classification is low resolution net The prediction result of network；The judgment module is additionally operable to judge whether to need to open salient region network；The integrated classification mould Block is for merging the prediction result of the prediction result of original image network and low resolution network to obtain image category；It is described aobvious Work property Local Area Network module is used for after opening salient region network, is detected and is shown to original image progress conspicuousness The salient region image is inputted the depth convolutional network, to predict the salient region figure by work property area image Probability value as belonging to each classification, wherein the probability value that the salient region image belongs to each classification is conspicuousness area The prediction result of domain network；The integrated classification module is additionally operable to the prediction result of original image network, low resolution network Prediction result and the prediction result of salient region network merge to obtain image category.

Specifically, the convolutional network structure includes：1 input layer, 5 convolutional layers and 3 full articulamentums, wherein：The 1, pooling layers of a Max is respectively met after 2,5 convolutional layers, the last one full articulamentum is output layer.

Specifically, the judgment module is specifically used for：

The mode for being adopted as the maximum value setting threshold value of the probability value of each dimension prediction decides whether to open next stage dimension Network is spent, specially：

Specifically, the integrated classification module is specifically used for：

P_i,j=w_iP_i+w_jP_j

Wherein, w_iIt is the weight of i-stage network, P_iIt is the probability value of i-stage neural network forecast, w_jIt is the power of j-th stage network Weight, P_jIt is the probability value of j-th stage neural network forecast, P_ijIt is the classification for the fusion that i-stage network and j-th stage network weights are mixed to get Prediction probability value..

The present invention can obtain the information of image multi-dimensional degree using multiple dimensioned image information and vision significance, and carry out Image classification.The low-resolution image and conspicuousness detection information that the present invention is obtained using original-resolution image, down-sampling The various dimensions feature that image is extracted by depth convolutional network will Bu Tong be led to by the way that rational decision strategy and associative mechanism is arranged The class probability value in road merges and then improves the nicety of grading of image.

Description of the drawings

Fig. 1 is the flow chart of image classification method of the present invention；

Fig. 2 is the schematic diagram of one embodiment of the invention convolutional network structure；

Fig. 3 is the hardware architecture diagram of image classification system of the present invention.

Specific implementation mode

Below in conjunction with the accompanying drawings and specific embodiment the present invention is described in further detail.

As shown in fig.1, being the operation process chart of image classification method preferred embodiment of the present invention.

Step S1 automatically extracts the original using trained depth convolutional network to original image by being successively abstracted The high-level semantics features of beginning image recycle last layer of softmax grader in the depth convolutional network, predict the original Beginning image belongs to the probability value of each classification.In the present embodiment, the depth convolutional network for inputting original image is referred to as original Image network namely first order network.Wherein, the classification is indicated with K={ 1,2 ..., k }.Specifically：

The convolutional network structure (please referring to Fig.2) that the present embodiment uses, including 1 input layer, 5 convolutional layers and 3 are complete Articulamentum, wherein：Pooling layers of a Max is respectively met after 1st, 2,5 convolutional layer, the last one full articulamentum is to export Layer.Neuron number is K, represents the classification number of output, and output layer belongs to the general of each classification using softmax function predictions Rate value, the softmax functions are：

Wherein, x_iIt is the input of each node in last layer, n is last layer of all node number.Softmax functions Prognostic chart picture belongs to the probability value of each classification.Although the present embodiment uses the convolutional network structure of attached drawing 2, of the invention Protection domain be not limited to the network, convolutional network structure can also use other depth networks.

Step S2 judges whether to need to open low resolution networkIf necessary to open low resolution network, then enter step Rapid S3；If you do not need to opening low resolution network, then flow terminates.Specifically：

In the present embodiment, each network becomes a dimension, and each dimension predicts one group of probability value P={ p_i, i= 1 ..., K }, wherein p_iThe input picture for representing the dimension belongs to the probability value of i-th of classification.

The mode that the present embodiment is adopted as the maximum value setting threshold value of the probability value of each dimension prediction decides whether to open Next stage dimension network.Specially：

Wherein, P_i={ P_i1,P_i2,…,P_iCBe i-stage neural network forecast probability value, value range is [0,1], and C is class Other number.

If the most probable value of original image network output is less than the threshold value T1 of setting, judgement needs to automatically turn on low Resolution ratio network, enters step S3.

Step S3, by doing the low-resolution image that down-sampling obtains multiple and different scales to the original image, by institute It states low-resolution image and inputs above-mentioned trained depth convolutional network, automatically extract the high-level semantic of above-mentioned low-resolution image Feature recycles last layer of softmax grader in the depth convolutional network, predicts that the low-resolution image belongs to The probability value of each classification.In the present embodiment, the depth convolutional network for inputting low-resolution image is known as low resolution net Network namely second level network.It specifically includes：

The present embodiment generates low-resolution image using Downsapling method from original image, it is contemplated that calculated performance and meter Complexity is calculated, the present embodiment uses bilinear interpolation and obtains the low-resolution image.

Wherein, it is extracted described in high-level semantics features and the prediction of the low-resolution image by the low resolution network The method that low-resolution image belongs to the probability value of each classification is similar with step S1, and which is not described herein again.

Step S4：Judge whether to need to open salient region networkIf you do not need to opening salient region network, then S5 is entered step, flow terminates；If necessary to open salient region network, then S6 is entered step.Specifically：

If threshold value of the maximum probability not less than setting in original image network and the output valve of the low resolution network integration When T2, then salient region network need not be opened, S5 is entered step.

If threshold of the maximum probability still less than setting in original image network and the output valve of the low resolution network integration When value T2, system automatically turns on salient region network, enters step S6.

Specific computational methods are similar with step S2, and which is not described herein again.

Step S5 merges the prediction result of the prediction result of original image network and low resolution network to obtain image class Not.Specifically：

The present embodiment is by the probability value weighted blend of original image network and low resolution network：

P_i,j=w_iP_i+w_jP_j

Step S6 carries out conspicuousness to original image and detects to obtain the salient region namely salient region figure of image Picture, and the salient region image is normalized, the salient region image is inputted into above-mentioned trained depth Convolutional network extracts the high-level semantics features of the salient region image, and utilizes last in the depth convolutional network One layer of softmax grader predicts that the salient region image belongs to the probability value of each classification.In the present embodiment, will The depth convolutional network of input salient region image is known as salient region network namely third level network.Specifically：

The present embodiment detects salient region using conspicuousness detection method from original image, selects and is based on histogram pair Conspicuousness detection method than degree generates saliency region mask, and this method divides the image into region first, then passes through The global contrast angle value of zoning is that significance value is distributed in each region, eventually forms the conspicuousness based on region contrast Figure.The mask that the present embodiment detects conspicuousness, chooses the minimum rectangle encirclement frame of foreground in mask, then scales it The picture size that the salient region network needs.

Wherein, high-level semantics features and the prediction of the salient region image are extracted by the salient region network The method that the salient region image belongs to the probability value of each classification is similar with step S1, and which is not described herein again.

Step S7, by the prediction result of original image network, the prediction result of low resolution network and salient region net The prediction result of network merges to obtain image category.

Specific fusion method is similar with step S5, and which is not described herein again.

In order to briefly explain the main points of invention, the present embodiment only lists three dimensions, but present invention model to be protected It encloses and is not limited to three dimensions, can be merged with the information of multiple dimensions.

As shown in fig.3, being the hardware architecture diagram of image classification system 10 of the present invention.The system includes original image network Module 101, judgment module 102, low resolution network module 103, salient region network module 104 and integrated classification module 105。

The original image network module 101 be used for utilize trained depth convolutional network, to original image by by Layer is abstract, automatically extracts the high-level semantics features of the original image, recycles last layer in the depth convolutional network Softmax graders predict that the original image belongs to the probability value of each classification.In the present embodiment, original graph will be inputted The depth convolutional network of picture is known as original image network.Wherein, the classification is indicated with K={ 1,2 ..., k }.Specifically：

The judgment module 102 needs to open low resolution network for judging whether.Specifically：

If the most probable value of original image network output is less than the threshold value T1 of setting, the judgement of judgment module 102 needs Low resolution network is automatically turned on, otherwise flow terminates.

The low resolution network module 103 is used for after opening low resolution network, by being done to the original image Down-sampling obtains the low-resolution image of multiple and different scales, and the above-mentioned trained depth of low-resolution image input is rolled up Product network, automatically extracts the high-level semantics features of above-mentioned low-resolution image, recycles last in the depth convolutional network One layer of softmax grader predicts that the low-resolution image belongs to the probability value of each classification.It in the present embodiment, will be defeated The depth convolutional network for entering low-resolution image is known as low resolution network.It specifically includes：

The low resolution network module 103 generates low-resolution image using Downsapling method from original image, examines Consider calculated performance and computation complexity, the present embodiment uses bilinear interpolation and obtains the low-resolution image.

Wherein, it is extracted described in high-level semantics features and the prediction of the low-resolution image by the low resolution network The method that low-resolution image belongs to the probability value of each classification is similar with original image network module 101, no longer superfluous here It states.

The judgment module 102 is additionally operable to judge whether to need to open salient region network.Specifically：

If threshold value of the maximum probability not less than setting in original image network and the output valve of the low resolution network integration When T2, then salient region network need not be opened.

If threshold of the maximum probability still less than setting in original image network and the output valve of the low resolution network integration When value T2, system automatically turns on salient region network.

Specifically computational methods judge whether to need unlatching low resolution network similar with aforementioned, and which is not described herein again.

The integrated classification module 105 is used for the prediction knot of the prediction result of original image network and low resolution network Fruit merges to obtain image category.Specifically：

The integrated classification module 105 is by the probability value weighted blend of original image network and low resolution network：

P_i,j=w_iP_i+w_jP_j

The salient region network module 104 is used for after opening salient region network, is shown to original image Work property detects to obtain the salient region of image namely salient region image, and returns to the salient region image One changes, and the salient region image is inputted above-mentioned trained depth convolutional network, extracts the salient region image High-level semantics features, and using last layer of softmax grader in the depth convolutional network, predict the conspicuousness Area image belongs to the probability value of each classification.In the present embodiment, the depth convolutional network of salient region image will be inputted Referred to as salient region network.Namely：

Wherein, high-level semantics features and the prediction of the salient region image are extracted by the salient region network The method that the salient region image belongs to the probability value of each classification is similar with original image network module 101, here It repeats no more.

The integrated classification module 105 is additionally operable to the prediction of the prediction result of original image network, low resolution network As a result and the prediction result of salient region network merges to obtain image category.

Specific fusion method is with aforementioned by the prediction result of original image network and the prediction result of low resolution network Merge similar, which is not described herein again.

The present invention is using original image, low-resolution image, salient region image respectively with trained depth convolution Network carries out class prediction, and each network becomes a dimension, by the way that decision strategy and syncretizing mechanism is arranged, is turned on or off Low resolution network or salient region network, finally merge the prediction result of multiple and different networks to obtain image category.

Although the present invention is described with reference to current better embodiment, those skilled in the art should be able to manage Solution, for above-mentioned better embodiment only for illustrating the present invention, protection domain not for the purpose of limiting the invention is any in the present invention Spirit and spirit within, any modification, equivalence replacement, improvement for being done etc., should be included in the present invention right protect Within the scope of shield.

Claims

1. a kind of image classification method, which is characterized in that the method comprising the steps of：

A. original image is inputted into depth convolutional network, to predict that the original image belongs to the probability value of each classification, wherein The probability value that the original image belongs to each classification is the prediction result of original image network；

B. judge whether to need to open low resolution network：If necessary to open low resolution network, then c is entered step；Otherwise, Flow terminates；

C. low-resolution image is obtained to the original image down-sampling, the low-resolution image is inputted into the depth convolution Network, to predict that the low-resolution image belongs to the probability value of each classification, wherein the low-resolution image belongs to each The probability value of classification is the prediction result of low resolution network；

D. judge whether to need to open salient region network：If you do not need to opening salient region network, then enter step E, flow terminate；If necessary to open salient region network, then f is entered step；

E. it merges the prediction result of the prediction result of original image network and low resolution network to obtain image category；

F. it carries out conspicuousness to the original image to detect to obtain salient region image, the salient region image is inputted The depth convolutional network, to predict that the salient region image belongs to the probability value of each classification, wherein the conspicuousness The probability value that area image belongs to each classification is the prediction result of salient region network；

G. by the prediction knot of the prediction result of original image network, the prediction result of low resolution network and salient region network Fruit merges to obtain image category.

2. the method as described in claim 1, which is characterized in that the depth convolutional network includes：1 input layer, 5 convolution Layer and 3 full articulamentums, wherein：Pooling layers of a Max is respectively met after 1st, 2,5 convolutional layer, the last one full articulamentum For output layer.

3. method as claimed in claim 2, which is characterized in that the output layer belongs to each using softmax function predictions The probability value of classification, the softmax functions are：

4. method as claimed in claim 3, which is characterized in that the step b is specifically included：

The mode for being adopted as the maximum value setting threshold value of the probability value of each dimension prediction decides whether to open next stage dimension net Network, specially：

Wherein, P_i={ P_i1,P_i2,…,P_iCBe i-stage neural network forecast probability value, value range is [0,1], and C is classification Number,

If the most probable value of original image network output is less than the threshold value T1 of setting, judgement needs to automatically turn on low resolution Rate network.

5. method as claimed in claim 4, which is characterized in that the step e is specifically included：

P_i,j=w_iP_i+w_jP_j

Wherein, w_iIt is the weight of i-stage network, P_iIt is the probability value of i-stage neural network forecast, w_jIt is the weight of j-th stage network, P_jIt is The probability value of j-th stage neural network forecast, P_ijBe the fusion that i-stage network and j-th stage network weights are mixed to get class prediction it is general Rate value.

6. a kind of image classification system, which is characterized in that the system includes original image network module, judgment module, low resolution Rate network module, salient region network module and integrated classification module, wherein：

The original image network module is used to original image inputting depth convolutional network, to predict that the original image belongs to The probability value of each classification, wherein the probability value that the original image belongs to each classification is the prediction knot of original image network Fruit；

The judgment module needs to open low resolution network for judging whether；

The low resolution network module is used for after opening low resolution network, and low point is obtained to the original image down-sampling The low-resolution image is inputted the depth convolutional network, to predict that the low-resolution image belongs to every by resolution image The probability value of a classification, wherein the probability value that the low-resolution image belongs to each classification is the prediction of low resolution network As a result；

The judgment module is additionally operable to judge whether to need to open salient region network；

The integrated classification module is used to merge the prediction result of original image network and the prediction result of low resolution network Obtain image category；

The salient region network module is used for after opening salient region network, and conspicuousness is carried out to the original image Detection obtains salient region image, and the salient region image is inputted the depth convolutional network, described aobvious to predict Work property area image belongs to the probability value of each classification, wherein the salient region image belongs to the probability value of each classification For the prediction result of salient region network；

The integrated classification module is additionally operable to the prediction result of original image network, the prediction result of low resolution network and shows The prediction result of work property Local Area Network merges to obtain image category.

7. system as claimed in claim 6, which is characterized in that the convolutional network structure includes：1 input layer, 5 convolution Layer and 3 full articulamentums, wherein：Pooling layers of a Max is respectively met after 1st, 2,5 convolutional layer, the last one full articulamentum For output layer.

8. system as claimed in claim 7, which is characterized in that the output layer belongs to each using softmax function predictions The probability value of classification, the softmax functions are：

9. system as claimed in claim 8, which is characterized in that the judgment module is specifically used for：

10. method as claimed in claim 9, which is characterized in that the integrated classification module is specifically used for：

P_i,j=w_iP_i+w_jP_j