CN108805152A

CN108805152A - A kind of scene classification method and device

Info

Publication number: CN108805152A
Application number: CN201710313796.XA
Authority: CN
Inventors: 黄欢; 赵刚
Original assignee: Shanghai Jinghong Electronic Technology Co Ltd
Current assignee: Shenzhen Jinghong Technology Co., Ltd
Priority date: 2017-05-05
Filing date: 2017-05-05
Publication date: 2018-11-13

Abstract

A kind of scene classification method of present invention offer and device, including：S1, multiple dimensioned convolutional neural networks are based on, extract scene convolution feature of the scene picture of input on each scale；S2, Fusion Features are carried out to the scene convolution feature on each scale, obtains the multiple dimensioned scene characteristic of the scene picture；S3, it is based on the multiple dimensioned scene characteristic, scene classification is completed in the multiple dimensioned convolutional neural networks.A kind of scene classification method proposed by the present invention and device have fully excavated the contact of scene characteristic between different scale, have extracted the multiple dimensioned scene characteristic with identification, improve the precision of scene classification by building multiple dimensioned convolutional neural networks.

Description

A kind of scene classification method and device

Technical field

The present invention relates to information technology fields, more particularly, to a kind of scene classification method and device.

Background technology

With the rapid development of the multimedia technologies such as Digital photographic and stored digital, the digital image data that people obtain is got over Come more, contains magnanimity information in these image datas, only relying on manpower at all can not locate these magnanimity informations in real time Reason.Therefore, it is desirable to which the Vision information processing function by simulating human visual system, assigns the energy of machine automatic identification image Power, to help or the mankind is assisted to complete many vital tasks.Scene classification is inferred to this by the content that picture is included The correct classification of scene is a basic and very challenging computation vision task, in picture retrieval, target detection And the fields such as target following play an important role.

Currently, whether existing scene recognition method is related to convolutional neural networks according to model used, it is broadly divided into two classes： One type is the shallow Model based on manual feature, and such methods are dedicated to designing the scene characteristic operator of robust, or set Count the model of robust；Another type is the depth model based on convolutional neural networks, and the core concept of such methods is will to roll up Product neural network extracts the scene characteristic for including high-rise semantics information, classifies as scene characteristic extractor.

But based on the depth model of convolutional neural networks by feature learning and classifier training point in processing procedure From weakening the performance of entire model, and the model, by the feature under different scale, directly fusion obtains scene characteristic, not Give full play to the performance of scene characteristic in single scale so that nicety of grading is not high.

Invention content

The present invention provides a kind of a kind of scene classification side for overcoming the above problem or solving the above problems at least partly Method and device.

According to the first aspect of the invention, a kind of scene classification method is provided, including：

S1, multiple dimensioned convolutional neural networks are based on, it is special extracts scene convolution of the scene picture of input on each scale Sign；

S2, Fusion Features are carried out to the scene convolution feature on each scale, obtains more rulers of the scene picture Spend scene characteristic；

S3, it is based on the multiple dimensioned scene characteristic, scene classification is completed in the multiple dimensioned convolutional neural networks.

Wherein, the multiple dimensioned convolutional neural networks include convolutional layer, full articulamentum, activation primitive layer, SoftMax layers with And on the convolutional layer layer building multiple dimensioned layer.

Wherein, step S2 includes：

S21, pre-confluent is carried out to the scene convolution feature in each scale, it is special obtains scene fusion in each scale Sign；

S22, Fusion Features are carried out to scene fusion feature in each scale between each scale, obtains the field The multiple dimensioned scene characteristic of scape picture.

Wherein, fusion process is all made of ReLU activation primitives and is merged to feature in step S21 and step S22.

Wherein, step S1 includes：

The scene picture of input is divided into the scenario block of multiple scales in multiple dimensioned layer；

The scene convolution feature of the scenario block of each scale is extracted in convolutional layer.

Wherein, step S3 includes：

Object function is built in the multiple dimensioned convolutional neural networks；

Based on the multiple dimensioned scene characteristic and the object function, classify in the SoftMax layer buildings scene characteristic Device completes scene classification.

Wherein, the object function is：

Wherein, the scene number of pictures that { x, y } is the scene picture x of input and its label y, M are input, C is scene Class number, R_msFor multiple dimensioned scene characteristic, the weight and biasing of W and b for multiple dimensioned convolutional neural networks grader.

Wherein, it is described to the multiple dimensioned convolutional neural networks be trained including：

The multiple dimensioned convolutional neural networks are trained using the stochastic gradient descent method with momentum, until the mesh Scalar functions are restrained.

According to the second aspect of the invention, a kind of scene classification device is provided, including：

Extraction module extracts the scene picture of input on each scale for being based on multiple dimensioned convolutional neural networks Scene convolution feature；

Fusion Module obtains the scene for carrying out Fusion Features to the scene convolution feature on each scale The multiple dimensioned scene characteristic of picture；

Sort module completes field for being based on the multiple dimensioned scene characteristic in the multiple dimensioned convolutional neural networks Scape is classified.

According to the third aspect of the invention we, a kind of computer program product, including program code, said program code are provided For executing image search method described above.

According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium is provided, for storing such as preceding institute The computer program stated.

A kind of scene classification method proposed by the present invention and device are fully sent out by building multiple dimensioned convolutional neural networks The contact of scene characteristic between different scale has been dug, the multiple dimensioned scene characteristic with identification has been extracted, improves scene classification Precision.

Description of the drawings

Fig. 1 is a kind of scene classification method flow chart provided in an embodiment of the present invention；

Fig. 2 is another scene classification method flow chart provided in an embodiment of the present invention；

Fig. 3 is a kind of scene classification structure drawing of device provided in an embodiment of the present invention.

Specific implementation mode

With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below Example is not limited to the scope of the present invention for illustrating the present invention.

Fig. 1 is a kind of scene classification method flow chart provided in an embodiment of the present invention, as shown in Figure 1, the method includes：

In S1, the multiple dimensioned convolutional neural networks are the multiple dimensioned convolutional neural networks that training is completed, specifically, in advance Multiple dimensioned convolutional neural networks are built, the input multiple dimensioned convolutional neural networks of training sample set pair are trained, after training To the neural network structure with scene classification function.

It is understood that in the classification for the training sample set that the classification of the scene picture of input need to be included in input.

In S2, Fusion Features are carried out to the scene convolution feature on each scale and use secondary convergence strategy, in ruler It in degree on the basis of pre-confluent, then carries out second between scale and merges, obtain the multiple dimensioned scene characteristic of scene picture, this is more Scale scene characteristic identification is strong, and has very strong robustness to geometry rotation, and efficiently solving part scene picture has The problem of blocking.

In S3, scene classification completion scene classification is completed directly in multiple dimensioned convolutional neural networks, is no longer needed to by outer In grader, the working performance of multiple dimensioned convolutional neural networks is given full play to, scene point is completed while carrying out feature extraction Class.

Scene classification method provided in an embodiment of the present invention is fully excavated by building multiple dimensioned convolutional neural networks The contact of scene characteristic between different scale extracts the multiple dimensioned scene characteristic with identification, improves the essence of scene classification Degree.

On the basis of the above embodiments, multiple dimensioned convolutional neural networks provided in an embodiment of the present invention include：

Convolutional layer, full articulamentum, activation primitive layer and SoftMax layers, and the more rulers of layer building on the convolutional layer Spend layer.

In general, the basic structure of traditional convolutional neural networks includes two layers, one is characterized extract layer, i.e. convolutional layer, Each input of neuron is connected with the local acceptance region of preceding layer, and extracts the feature of the part.Once the local feature quilt After extraction, its position relationship between other feature is also decided therewith；The second is Feature Mapping layer, includes：Full connection Layer, activation primitive layer and SoftMax layers.Each computation layer of network is made of multiple Feature Mappings, and each Feature Mapping is One plane, the weights of all neurons are equal in plane.

Wherein, the multiple dimensioned convolutional neural networks are embedded in one on the convolutional layer upper layer of traditional convolutional neural networks Multiple dimensioned layer, after scene picture is input into the multiple dimensioned layer, by according to the default segmentation scale of scale layer, to scene graph Piece is split.

Described in Fig. 1 on the basis of embodiment, Fig. 2 is another scene classification method flow provided in an embodiment of the present invention Figure, as shown in Fig. 2, step S2 includes：

In S21, the convolution feature of each scenario block is not single features, the volume of each scenario block on each scale Product feature is possible to more than one, in order to fully excavate the performance of feature in scale, thus it is pre- to convolution feature in each scale Fusion, calculating process are represented by：

R_l=σ (W_fl[σ(W_clr_l,1+b_cl),…σ(W_clr_l,16+b_cl)]+b_fl)

R_m=σ (W_fm[σ(W_cmr_m,1+b_cm),…σ(W_cmr_l,4+b_cm)]+b_fm)

R_g=σ (W_fgr_g+b_fg)

Wherein, σ () is Rectified Linear Units (ReLU) activation primitive, r_l,i、r_m,i、r_gFor different scale Under each scenario block convolution feature, R_l、R_m、R_gFor scene fusion feature in each scale, [A, B] indicates A connecting shape with B The matrix of Cheng Xin.

It in S22, obtains in each scale after scene fusion feature, Fusion Features is carried out to scene fusion feature, are had There are the multiple dimensioned scene characteristic of identification, calculating process to be represented by：

R_ms=σ (W_ms[R_l,R_m,R_g]+b_ms)

Wherein, R_msFor multiple dimensioned scene characteristic, W_msAnd b_msIndicate weight and the biasing of multiple dimensioned scene characteristic.

The embodiment of the present invention carries out pre-confluent by the convolution feature to the scenario block in scale, fully excavates on each scale Association between feature, the Fusion Features between scale provide better basis.

On the basis of the above embodiments, fusion process is all made of ReLU activation primitives to spy in step S21 and step S22 Sign is merged.

The ReLu activation primitives can greatly accelerate convergence rate so that feature is melted during Fusion Features The problem of closing faster, and alleviating gradient disperse so that gradient will not be saturated quickly, and fusion is newer more efficient.

Described in Fig. 1 on the basis of embodiment, step S1 includes：

Wherein, a multiple dimensioned layer is established in convolutional neural networks, when scene picture inputs, multiple dimensioned layer is preset The scene picture automatic cutting of input is to correspond to the scale of scale layer, such as formula by scale：

E (x)={ l₁,…l₁₆,m₁,…m₄,g}

Wherein, x is the scene picture of input, and e () is multiple dimensioned operation, l_iFor the scenario block on 86 × 86 scales, m_iFor Scenario block on 140 × 140 scales, g are the scenario block on 224 × 224 scales.

It should be noted that the embodiment of the present invention is not limited specific scale, the above-mentioned scale provided is only to refer to Scale.

Wherein, convolution module is established in multiple dimensioned convolutional neural networks, the field of the scenario block for extracting each scale Convolution mould of all convolutional layers of GoogLeNet as the multiple dimensioned convolutional neural networks of the present invention is employed herein in scape block feature Block extracts scene block feature respectively on each scale：

r_l,i=GoogLeNet (l_i)

r_m,i=GoogLeNet (m_i)

r_g=GoogLeNet (g)

Wherein, r_l,i、r_m,i、r_gFor the convolution feature of each scenario block under different scale.

Described in Fig. 1 on the basis of embodiment, step S3 includes：

Multiple dimensioned convolutional neural networks grader builds grader on Feature Mapping layer and classifies to picture, generally , we use grader of the softmax graders as multiple dimensioned convolutional neural networks.

On the basis of above-described embodiment, the object function is：

According to the scene type number for the scene picture that the object function is inputted, the classification of the same category number is built Device, and the weight of the multiple dimensioned convolutional neural networks grader controlled according to object function and biasing adjust in the training process Multiple dimensioned scene characteristic, until when object function convergence, the multiple dimensioned scene characteristic is optimal.

The embodiment of the present invention is appointed by multiple dimensioned convolutional neural networks direct construction grader, realizing in scene classification Multitask strategy in business enhances the performance of entire multiple dimensioned convolutional neural networks structure.

On the basis of the above embodiments, it is described to the multiple dimensioned convolutional neural networks be trained including：

The stochastic gradient descent method with momentum is to randomly select some training datasets to replace entire sample training Collection, makees gradient decline on the training dataset randomly selected, and until object function is restrained, the advantage of this method is Iteration time is saved, improves training effectiveness so that object function being capable of more rapid convergence.

The embodiment of the present invention by using the stochastic gradient descent method with momentum to the multiple dimensioned convolutional neural networks into Row training so that training speed is accelerated, training effectiveness higher.

Specifically, being extracted first to the multiple dimensioned scene characteristic of the scene picture of input, further according to more rulers of extraction Degree scene characteristic classifies the scene picture of input, and in order to further verify using multiple dimensioned scene characteristic, the present invention is real Example is applied to emulate above-mentioned scene classification method.

1. simulated conditions

It is Intel (R) Core i7-5930K3.50GHZ, GeForce GTX that the embodiment of the present invention, which is in central processing unit, On Titan X GPU, memory 64G, linux operating system, with the emulation of MATLAB softwares progress.

What emulation experiment data utilized is by the Massachusetts Institute of Technology (Massachu-setts Institute of Technology, MIT) the J.Xiao et al. of 67 databases of MIT indoor and MIT that provides of A.Quattoni et al. 397 databases of SUN of offer.

2. emulation content

The picture random division 80% in database that emulation is used is used as training sample set, and remaining 20% as survey Sample set is tried, training sample set is input in the multiple dimensioned convolutional neural networks built, to multiple dimensioned convolutional neural networks It is trained, input test sample set after training, and classifies to test sample collection.By obtained classification results and test specimens The legitimate reading of this collection compares, the correct number r of statistical classification, and nicety of grading is then：

Acc=r/R*100%

Wherein, R is the number of samples of test sample collection.

Multiple dimensioned disordering neural network (MOP-CNN), simple files check addition are used respectively with same database (SFV), oriented acyclic neural network (DAG-CNN), Digital Signal Processing method (DSP) are tested, and count this 5 kinds of methods To the nicety of grading of the database.

Using the niceties of grading of 67 databases of MIT indoor, the results are shown in Table 1：

Table 1：Scene nicety of grading on 67 databases of MIT indoor

Sorting technique	Nicety of grading
		MOP-CNN	68.88%
SFV	72.86%
		DAG-CNN	77.50%
DSP	78.28%
		The present invention	80.90%

Using the niceties of grading of SUN397 databases, the results are shown in Table 2：

Table 2：Scene nicety of grading on SUN397 databases

Sorting technique	Nicety of grading
		MOP-CNN	51.98%
SFV	54.40%
		DAG-CNN	56.20%
DSP	59.78%
		The present invention	62.24%

The scene classification side provided in an embodiment of the present invention it can be seen from the emulation data of the nicety of grading of Tables 1 and 2 Method has in nicety of grading and is obviously improved, this is because the present invention, on the basis of original convolutional neural networks, structure is more Scale layer, and the scene characteristic between each scale layer merges, and obtains the multiple dimensioned scene characteristic with identification, makes Nicety of grading is obtained to be obviously improved.

Fig. 3 is a kind of scene classification device provided in an embodiment of the present invention, including：

Extraction module 1 extracts the scene picture of input on each scale for being based on multiple dimensioned convolutional neural networks Scene convolution feature；

Fusion Module 2 obtains the scene for carrying out Fusion Features to the scene convolution feature on each scale The multiple dimensioned scene characteristic of picture；

Sort module 3 completes field for being based on the multiple dimensioned scene characteristic in the multiple dimensioned convolutional neural networks Scape is classified.The embodiment of the present invention additionally provides a kind of storage device, and wherein that is stored with a plurality of instruction, described instruction be suitable for by Processor is loaded and is executed：

Wherein extraction module 1 is based on trained multiple dimensioned convolutional neural networks, extracts the scene picture of input each Scene convolution feature on scale.

Wherein, Fusion Module 2 carries out Fusion Features to the scene convolution feature on each scale and uses secondary fusion Strategy in scale on the basis of pre-confluent, then carries out second between scale and merges, obtains the multiple dimensioned scene of scene picture Feature, the multiple dimensioned scene characteristic identification is strong, and has very strong robustness to geometry rotation, efficiently solves part The problem of scape picture blocks.

Wherein, sort module 3 completes scene classification directly in multiple dimensioned convolutional neural networks and completes scene classification, is not necessarily to External grader is relied on again, gives full play to the working performance of multiple dimensioned convolutional neural networks, it is complete while carrying out feature extraction At scene classification.

Scene classification method provided in an embodiment of the present invention is filled by building extraction module, Fusion Module and sort module The contact of scene characteristic between different scale has been dug in distribution, is extracted the multiple dimensioned scene characteristic with identification, is improved scene The precision of classification.

The present embodiment provides a kind of scene classification devices, including：At least one processor；And with the processor communication At least one processor of connection, wherein：

The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to refer to It enables to execute the method that above-mentioned each method embodiment is provided, such as including：Based on multiple dimensioned convolutional neural networks, extraction input Scene convolution feature of the scene picture on each scale；Feature is carried out to the scene convolution feature on each scale to melt It closes, obtains the multiple dimensioned scene characteristic of the scene picture；Based on the multiple dimensioned scene characteristic, in the multiple dimensioned convolution god Scene classification is completed in network.

The present embodiment discloses a kind of computer program product, and the computer program product includes being stored in non-transient calculating Computer program on machine readable storage medium storing program for executing, the computer program include program instruction, when described program instruction is calculated When machine executes, computer is able to carry out the method that above-mentioned each method embodiment is provided, such as including：Based on multiple dimensioned convolution god Through network, scene convolution feature of the scene picture of input on each scale is extracted；To the scene volume on each scale Product feature carries out Fusion Features, obtains the multiple dimensioned scene characteristic of the scene picture；Based on the multiple dimensioned scene characteristic, Scene classification is completed in the multiple dimensioned convolutional neural networks.

The present embodiment provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium Computer instruction is stored, the computer instruction makes the computer execute the method that above-mentioned each method embodiment is provided, example Such as include：Based on multiple dimensioned convolutional neural networks, scene convolution feature of the scene picture of input on each scale is extracted；It is right Scene convolution feature on each scale carries out Fusion Features, obtains the multiple dimensioned scene characteristic of the scene picture；Base In the multiple dimensioned scene characteristic, scene classification is completed in the multiple dimensioned convolutional neural networks.

One of ordinary skill in the art will appreciate that：Realize that all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes：ROM, RAM, magnetic disc or light The various media that can store program code such as disk.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be expressed in the form of software products in other words, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of scene classification method, which is characterized in that including：

S1, multiple dimensioned convolutional neural networks are based on, extract scene convolution feature of the scene picture of input on each scale；

S2, Fusion Features are carried out to the scene convolution feature on each scale, obtains the multiple dimensioned field of the scene picture Scape feature；

2. according to the method described in claim 1, the multiple dimensioned convolutional neural networks include convolutional layer, full articulamentum, activation The multiple dimensioned layer of layer building on function layer, SoftMax layers and the convolutional layer.

3. according to the method described in claim 1, it is characterized in that, step S2 includes：

S21, pre-confluent is carried out to the scene convolution feature in each scale, obtains scene fusion feature in each scale；

S22, Fusion Features are carried out to scene fusion feature in each scale between each scale, obtains the scene graph The multiple dimensioned scene characteristic of piece.

4. according to the method described in claim 3, it is characterized in that, fusion process is all made of ReLU in step S21 and step S22 Activation primitive merges feature.

5. according to the method described in claim 2, it is characterized in that, step S1 includes：

6. according to the method described in claim 2, it is characterized in that, step S3 includes：

It is complete in the SoftMax layer buildings scene characteristic grader based on the multiple dimensioned scene characteristic and the object function At scene classification.

7. according to the method described in claim 6, it is characterized in that, the object function is：

Wherein, the scene number of pictures that { x, y } is the scene picture x of input and its label y, M are input, C is scene type Number, R_msFor multiple dimensioned scene characteristic, the weight and biasing of W and b for multiple dimensioned convolutional neural networks grader.

8. the method according to the description of claim 7 is characterized in that the method further includes：

The multiple dimensioned convolutional neural networks are trained using the stochastic gradient descent method with momentum, until the target letter Number convergence.

9. a kind of scene classification device, which is characterized in that including：

Extraction module extracts scene of the scene picture of input on each scale for being based on multiple dimensioned convolutional neural networks Convolution feature；

Fusion Module obtains the scene picture for carrying out Fusion Features to the scene convolution feature on each scale Multiple dimensioned scene characteristic；

Sort module completes scene point for being based on the multiple dimensioned scene characteristic in the multiple dimensioned convolutional neural networks Class.

10. a kind of computer program product, which is characterized in that the computer program product includes being stored in non-transient computer Computer program on readable storage medium storing program for executing, the computer program include program instruction, when described program is instructed by computer When execution, the computer is made to execute method as described in any of the claims 1 to 8.

11. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute method as described in any of the claims 1 to 8.