CN111652262A

CN111652262A - Image object recognition method and device, computer equipment and storage medium

Info

Publication number: CN111652262A
Application number: CN202010198070.8A
Authority: CN
Inventors: 王国彬; 胡鹏; 侯兴兴
Original assignee: Shenzhen Bincent Technology Co Ltd
Current assignee: Shenzhen Bincent Technology Co Ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2020-09-11

Abstract

The invention provides an image object identification method, an image object identification device, computer equipment and a storage medium, wherein the image object identification method comprises the following steps: building a convolutional neural network, inputting an image to be identified into the convolutional neural network to obtain m convolutional characteristic graphs, compressing each convolutional characteristic graph into n-dimensional vectors to obtain m n-dimensional vectors, performing one-dimensional analysis on the m n-dimensional vectors, and learning through a full connection layer to obtain image representation; and classifying the image representation and outputting the classification probability of the image to be recognized. According to the method, end-to-end learning is achieved, characteristics are extracted more conveniently than those extracted manually, transfer learning is used, the training cost is reduced, the correlation characteristics of the characteristic diagram are calculated, a shallow network is used for learning the characteristics, better combination characteristics are obtained, and the identification performance of the object is improved.

Description

Image object recognition method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to an image object recognition method, an image object recognition apparatus, a computer device, and a storage medium.

Background

In the prior art, the identification of an image object is carried out by manually extracting features or extracting image features by adopting a common cnn classification model, however, manually extracting image texture features as the style representation of the image and establishing a mathematical or statistical model are too inefficient and difficult to be widely applied, and the high-level extracted by adopting the cnn classification model often contains various abstract feature combinations such as color, texture, style and the like, and the high abstraction of the features can lose many details, so that the performance of the identification of the image object is low.

Disclosure of Invention

The invention aims to provide an image object identification method, an image object identification device, computer equipment and a storage medium, which can improve the performance of image object identification.

The present invention is achieved in this way, and a first aspect of the present invention provides an image object recognition method including:

building a convolutional neural network, inputting an image to be identified into the convolutional neural network to obtain m convolutional characteristic graphs, wherein m is larger than 1;

compressing each convolution feature map into n-dimensional vectors to obtain m n-dimensional vectors, and performing one-dimensional analysis on the m n-dimensional vectors and learning through a full connection layer to obtain an image representation;

and classifying the image features and outputting the classification probability of the image to be recognized.

A second aspect of the present invention provides an image object recognition apparatus comprising:

the characteristic diagram acquisition module is used for building a convolutional neural network, inputting an image to be identified into the convolutional neural network to acquire m convolutional characteristic diagrams, wherein m is larger than 1;

the image representation output module is used for compressing each convolution feature map into n-dimensional vectors to obtain m n-dimensional vectors, and performing one-dimensional transformation on the m n-dimensional vectors and learning through a full connection layer to obtain image representations;

and the classification module is used for classifying the image representation and outputting the classification probability of the image to be identified.

A third aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the invention when executing the computer program.

A fourth aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to the first aspect of the invention.

The invention provides an image object identification method, an image object identification device, computer equipment and a storage medium, wherein the image object identification method comprises the following steps: building a convolutional neural network, inputting an image to be identified into the convolutional neural network to obtain m convolutional characteristic graphs, compressing each convolutional characteristic graph into n-dimensional vectors to obtain m n-dimensional vectors, performing one-dimensional analysis on the m n-dimensional vectors, and learning through a full connection layer to obtain image representation; and classifying the image representations and outputting the classification probability of the image to be recognized. According to the method, end-to-end learning is achieved, characteristics are extracted more conveniently than those extracted manually, transfer learning is used, the training cost is reduced, the correlation characteristics of the characteristic diagram are calculated, a shallow network is used for learning the characteristics, better combination characteristics are obtained, and the identification performance of the object is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an image object recognition method according to embodiment 1 of the present invention;

fig. 2 is a flowchart of an implementation manner of step S11 in an image object identification method provided in embodiment 1 of the present invention;

fig. 3 is a schematic diagram of an implementation manner of step S12 in an image object identification method according to embodiment 1 of the present invention;

fig. 4 is a schematic diagram of a working process in an image object recognition method according to embodiment 1 of the present invention;

fig. 5 is a schematic structural diagram of an image object recognition apparatus according to embodiment 2 of the present invention;

fig. 6 is a schematic structural diagram of a computer device provided in embodiment 4 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example 1

An embodiment 1 of the present invention provides an image object recognition method, and as shown in fig. 1, the image recognition method includes:

s11, building a convolutional neural network, inputting the image to be identified into the convolutional neural network to obtain m convolutional characteristic graphs, wherein m is larger than 1.

In step S11, building a convolutional neural network is to build a vgg16 model with 5 convolutional layers, and input the image to be recognized into a vgg16 model to obtain m convolutional feature maps.

The method for acquiring the m convolution characteristic graphs by inputting the image to be recognized into the vgg16 model comprises the following steps:

the image to be identified is input into vgg16 models to obtain 512 convolution characteristic maps of 14 x 14.

As shown in fig. 2, a 224 × 224 × 3 picture is input, and after two convolutions with 64 convolution kernels, a 112 × 112 × 64 picture is obtained by using posing (pooling) once; after two times of 128 convolution kernel convolution, adopting one pooling to obtain a 56 × 56 × 128 picture; after three times of convolution with 256 convolution kernels, obtaining a 28 × 28 × 256 picture by adopting posing for one time; after convolution with convolution kernel of 512 times, adopting posing once to obtain 14 × 14 × 512 pictures; after the convolution with the convolution kernel of 512 times in sequence, a picture of 7 × 7 × 512 is obtained by using pooling once, that is, 512 convolution feature maps of 7 × 7 are obtained.

And S12, compressing each convolution feature map into n-dimensional vectors to obtain m n-dimensional vectors, performing one-dimensional analysis on the m n-dimensional vectors, and learning through a full connection layer to obtain image representation.

Specifically, each convolution feature map is compressed into 49-dimensional vectors to obtain 512 49-dimensional vectors, the 512 49-dimensional vectors are subjected to one-dimensional transformation to obtain 1 × 25088 vectors, and the 1 × 25088 vectors are subjected to learning through a full-connection network to obtain image representations.

Specifically, as shown in fig. 4, the fully-connected network includes a fully-connected layer dense, a batch normalization layer, a dimensionality reduction layer Dropout layer, and a softmax layer.

And S13, classifying the image representations and outputting the classification probability of the image to be recognized.

And classifying the combined features through a classifier, and acquiring a label corresponding to the maximum probability value as an output label of the image to be identified.

Specifically, as shown in fig. 3 and 4, the working process of the present embodiment is as follows:

firstly, inputting a picture, and through vgg16 model, obtaining the output of "block 5_ conv 1", namely 512 feature maps with size of 7 × 7.

Compressing each obtained feature map into a 49-dimensional vector to obtain 512 vectors, then serially connecting the vectors into a 1 multiplied by 25088 vector, then passing through a full connection layer dense, and then passing through a batch normalization layer BatchNormalizationlayer, a dimensionality reduction layer Dropout layer and a softmax layer.

And thirdly, outputting the object label to which the picture belongs by utilizing the mapping relation between the features learned by the classifier and the object. For example, if the probability of the tatami is 80% and the probabilities of the remaining picture styles are less than 10%, the object in the output picture includes the tatami.

The embodiment of the invention provides an image object identification method, which comprises the following steps: building a convolutional neural network, inputting an image to be identified into the convolutional neural network to obtain m convolutional characteristic graphs, compressing each convolutional characteristic graph into n-dimensional vectors to obtain m n-dimensional vectors, performing one-dimensional analysis on the m n-dimensional vectors, and learning through a full connection layer to obtain image representation; and classifying the image features and outputting the classification probability of the image to be recognized. According to the method, end-to-end learning is achieved, characteristics are extracted more conveniently than those extracted manually, transfer learning is used, training cost is reduced, by calculating correlation characteristics of the characteristic diagram and learning the characteristics through a shallow network, better combination characteristics are obtained, and identification performance of objects is improved.

Example 2

Embodiment 2 of the present invention provides an image object recognition apparatus, as shown in fig. 5, the image object recognition apparatus including:

the characteristic diagram obtaining module 200 is used for building a convolutional neural network, inputting an image to be identified into the convolutional neural network to obtain m convolutional characteristic diagrams, wherein m is greater than 1;

an image representation output module 201, configured to compress each convolution feature map into n-dimensional vectors to obtain m n-dimensional vectors, and perform one-dimensional transformation on the m n-dimensional vectors and perform learning through a full connection layer to obtain an image representation;

and the classification module 202 is configured to classify the image representations and output a classification probability of the image to be identified.

Further, the feature map obtaining module 200 is specifically configured to build an vgg16 model with 5 convolutional layers, and input the image to be identified into the vgg16 model to obtain m convolutional feature maps.

Further, the image representation output module 201 is specifically configured to compress each convolution feature map into 49-dimensional vectors to obtain 512 49-dimensional vectors, perform one-dimensional transformation on the 512 49-dimensional vectors to obtain 1 × 25088 vectors, and output the combined features after passing through the full connection layer by the 1 × 25088 vectors.

The classification module is used for classifying the image representations through a classifier, and acquiring labels corresponding to the maximum probability values as output labels of the images to be recognized.

For the specific working process of the module in the computer device, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.

Example 3

Embodiment 3 of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method in the foregoing embodiments is implemented, and details are not described here for avoiding redundancy. Alternatively, the computer program is executed by the processor to implement the functions of the modules/units in the above embodiments, and is not described herein again to avoid repetition.

Example 4

Fig. 6 is a schematic diagram of a computer device in embodiment 4 of the present invention. As shown in fig. 6, the computer device 6 comprises a processor 63, a memory 61 and a computer program 62 stored in the memory 61 and executable on the processor 60. The processor 63, when executing the computer program 62, implements the various steps in the above embodiments, such as the steps S11, S12, S13 described in fig. 1. Alternatively, the functions of the modules/units in the above-described embodiments are implemented when the processor 63 executes the computer program 62.

Illustratively, the computer program 62 may be divided into one or more modules/units, which are stored in the memory 61 and executed by the processor 63 to perform the data processing procedures of the present invention. One or more of the modules/units may be a series of computer program segments capable of performing certain functions, which are used to describe the execution of the computer program 62 in the computer device 6. For example, the computer program 62 may be divided into modules as shown in fig. 6, and the specific functions of each module correspond to the steps of the method in embodiment 1 one by one, which are not repeated herein to avoid repetition.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An image object recognition method, characterized by comprising:

and classifying the image representation and outputting the classification probability of the image to be recognized.

2. The image object recognition method of claim 1, wherein the building a convolutional neural network, and the inputting an image to be recognized into the convolutional neural network to obtain m convolutional feature maps comprises:

an vgg16 model with 5 convolutional layers is built, and an image to be identified is input into the vgg16 model to obtain m convolutional characteristic maps.

3. The image object recognition method of claim 2, wherein the step of inputting the image to be recognized into the vgg16 model to obtain m convolution feature maps comprises:

and inputting the image to be recognized into the vgg16 model to obtain 512 7 x 7 convolution feature maps.

4. The image object recognition method of claim 3, wherein the compressing each of the convolved feature maps into n-dimensional vectors to obtain m n-dimensional vectors, the m n-dimensional vectors being one-dimensional and being learned through a full connected layer to obtain the image representation comprises:

compressing each convolution feature map into 49-dimensional vectors to obtain 512 49-dimensional vectors, performing one-dimensional quantization on the 512 49-dimensional vectors to obtain 1 × 25088 vectors, and learning the 1 × 25088 vectors through a full connection layer to obtain image representation.

5. The image object recognition method of claim 1, wherein the classifying the image characterization and outputting the classification probability of the image to be recognized comprises:

and classifying the image representations through a classifier, and acquiring labels corresponding to the maximum probability values as output labels of the images to be identified.

6. An image object recognition apparatus, characterized by comprising:

7. The image object recognition device of claim 6, wherein the feature map acquisition module is specifically configured to build an vgg16 model with 5 convolutional layers, and input an image to be recognized into the vgg16 model to acquire m convolutional feature maps.

8. The image object recognition device of claim 6, wherein the image representation output module is specifically configured to compress each convolved feature map into 49-dimensional vectors to obtain 512 49-dimensional vectors, to dimension the 512 49-dimensional vectors into 1 x 25088 vectors, and to learn the 1 x 25088 vectors through a full connection layer to obtain the image representation.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.