CN108304821B

CN108304821B - Image recognition method and device, image acquisition method and device, computer device and non-volatile computer-readable storage medium

Info

Publication number: CN108304821B
Application number: CN201810151420.8A
Authority: CN
Inventors: 张弓
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-02-14
Filing date: 2018-02-14
Publication date: 2020-12-18
Anticipated expiration: 2038-02-14
Also published as: CN108304821A

Abstract

The invention discloses an image identification method and device based on a multilayer convolutional neural network, an image acquisition method and device, computer equipment and a non-volatile computer readable storage medium. The image recognition method and device based on the multilayer convolutional neural network, the image acquisition method and device, the computer device and the nonvolatile computer readable storage medium construct a multilayer convolutional neural network model with three convolutional layers and two pooling layers, the multilayer convolutional neural network model is trained by using a training image with resolution normalized to first resolution, the multilayer convolutional neural network model is tested by using a test image with resolution normalized to second resolution, the recognition of an image scene can be realized without using a full-connection layer, the complexity of a scene recognition algorithm is reduced, the calculation amount of scene recognition is small, and the calculation time is short.

Description

Image recognition method and device, image acquisition method and device, computer device and non-volatile computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image recognition method based on a multilayer convolutional neural network model, an image recognition apparatus based on a multilayer convolutional neural network model, an image acquisition method, an image acquisition device, a computer device, and a non-volatile computer-readable storage medium.

Background

The existing method for identifying the image scene by adopting the manually designed characteristics has the defects of long design period and poor robustness, and has poor identification capability for complex image scenes. The scene based on the convolutional neural network is that a full connection layer is needed in the identification method, and the defects of large calculation amount and long calculation time exist.

Disclosure of Invention

The embodiment of the invention provides an image identification method based on a multilayer convolutional neural network model, an image identification device based on the multilayer convolutional neural network model, an image acquisition method, image acquisition equipment, computer equipment and a non-volatile computer readable storage medium.

The invention provides an image identification method based on a multilayer convolutional neural network model, which comprises the following steps:

marking a target category for each pre-collected training image, and preprocessing each training image to obtain a plurality of training images with a first resolution;

setting an initial structure of the multilayer convolutional neural network model, wherein the initial structure comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a fifth convolutional layer which are sequentially arranged;

calculating to obtain at least one first characteristic image according to the training image with the first resolution and a first parameter of the first layer of convolutional layer;

inputting the first characteristic image into the second layer of pooling layer to calculate a second characteristic image corresponding to the first characteristic image one by one;

calculating to obtain at least one third characteristic image according to the second characteristic image and a third parameter of the third layer of convolutional layer;

inputting the third feature image into the fourth pooling layer to calculate a fourth feature image corresponding to the third feature image one by one;

calculating to obtain at least one fifth characteristic image according to the fourth characteristic image and a fifth parameter of the fifth layer convolution layer;

confirming a scene recognition result of each training image according to the fifth characteristic image;

calculating a loss value of the multilayer convolutional neural network according to the target class and the scene recognition result;

confirming convergence of the multilayer convolutional neural network model when the loss value is smaller than a preset loss value;

pre-processing an acquired test image to obtain a plurality of test images at a second resolution, the second resolution being greater than the first resolution;

inputting the test image to the converged multilayer convolutional neural network model to test the converged multilayer convolutional neural network model; and

and identifying scene categories in the scene images by adopting the tested multilayer convolutional neural network model.

The invention provides an image acquisition method, which comprises the following steps:

acquiring a scene image;

identifying scene types in the scene images by adopting the multilayer convolutional neural network model;

and adjusting shooting parameters of the camera according to the scene category to acquire a new scene image corresponding to the scene image, wherein the shooting parameters comprise at least one of color temperature, exposure time, sensitivity and exposure compensation.

The invention provides an image recognition device based on a multilayer convolutional neural network model. The image recognition device comprises a first preprocessing module, a setting module, a first calculating module, a second calculating module, a third calculating module, a fourth calculating module, a fifth calculating module, a first confirming module, a sixth calculating module, a second confirming module, a second preprocessing module, a testing module and a recognition module. The first preprocessing module is used for marking each pre-collected training image with a target class and preprocessing each training image to obtain a plurality of training images with a first resolution. The setting module is used for setting an initial structure of the multilayer convolutional neural network model, and the initial structure comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a fifth convolutional layer which are sequentially arranged. The first calculation module is used for calculating at least one first characteristic image according to the training image with the first resolution and a first parameter of the first layer of convolutional layer. The second calculation module is used for inputting the first characteristic image into the second layer of pooling layer to calculate to obtain a second characteristic image corresponding to the first characteristic image one by one. The third calculation module is used for calculating at least one third characteristic image according to the second characteristic image and a third parameter of the third layer of convolutional layer. The fourth calculation module is used for inputting the third feature images into the fourth pooling layer to calculate fourth feature images corresponding to the third feature images one by one. The fifth calculation module is used for calculating at least one fifth feature image according to the fourth feature image and the fifth parameter of the fifth convolutional layer. The first confirming module is used for confirming the scene recognition result of each training image according to the fifth characteristic image. The sixth calculation module is used for calculating a loss value of the multilayer convolutional neural network according to the target class and the scene recognition result. The second confirming module is used for confirming the convergence of the multilayer convolutional neural network model when the loss value is smaller than a preset loss value. The second preprocessing module is configured to preprocess the acquired test image to obtain a plurality of test images at a second resolution, where the second resolution is greater than the first resolution. The test module is used for inputting the test image to the converged multilayer convolutional neural network model to test the converged multilayer convolutional neural network model. The recognition module is used for recognizing scene categories in the scene images by adopting the tested multilayer convolutional neural network model.

The invention provides an image acquisition apparatus including an acquisition unit and an image recognition device. The acquisition unit is used for acquiring a scene image. The image identification device is used for identifying the scene type in the scene image by adopting the multilayer convolutional neural network model. The acquisition unit is further used for adjusting shooting parameters of the camera according to the scene type to acquire a new scene image corresponding to the scene image, wherein the shooting parameters comprise at least one of color temperature, exposure time, sensitivity and exposure compensation.

The invention provides a computer device, which comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and when the computer readable instructions are executed by the processor, the processor executes the image recognition method and the image acquisition method.

The present invention provides one or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the image recognition methods described above and the image acquisition methods described above.

The method and the device for constructing the multilayer convolutional neural network model, the image acquisition method and the device, the computer device and the nonvolatile computer readable storage medium construct the multilayer convolutional neural network model with three convolutional layers and two pooling layers, the training image with the resolution normalized to the first resolution is used for training the multilayer convolutional neural network model, the testing image with the resolution normalized to the second resolution is used for testing the multilayer convolutional neural network model, the recognition of an image scene can be realized without using a full-connection layer, the complexity of a scene recognition algorithm is reduced, the calculation amount of the scene recognition is small, and the calculation time is short.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of an image recognition method based on a multilayer convolutional neural network model according to some embodiments of the present invention.

FIG. 2 is a block diagram of an image recognition apparatus based on a multi-layer convolutional neural network model according to some embodiments of the present invention.

FIG. 3 is a block diagram of a computer device in accordance with certain embodiments of the invention.

Fig. 4 is a scene schematic diagram of an image recognition method based on a multilayer convolutional neural network model according to some embodiments of the present invention.

FIG. 5 is a flow chart of an image recognition method based on a multi-layer convolutional neural network model according to some embodiments of the present invention.

FIG. 6 is a block diagram of an image recognition apparatus based on a multi-layer convolutional neural network model according to some embodiments of the present invention.

FIG. 7 is a flowchart illustrating an image recognition method based on a multi-layer convolutional neural network model according to some embodiments of the present invention.

FIG. 8 is a block diagram of an image recognition apparatus based on a multi-layer convolutional neural network model according to some embodiments of the present invention.

FIG. 9 is a schematic flow chart diagram of an image acquisition method in accordance with certain embodiments of the present invention.

FIG. 10 is a block schematic diagram of an image acquisition device according to some embodiments of the invention.

FIG. 11 is a block diagram of a computer device in accordance with certain embodiments of the invention.

FIG. 12 is a block diagram of an image processing circuit according to some embodiments of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1, the present invention provides an image recognition method based on a multilayer convolutional neural network model. The image recognition method comprises the following steps:

00: marking a target type of each pre-collected training image, and preprocessing each training image to obtain a plurality of training images with a first resolution;

01: setting an initial structure of a multilayer convolutional neural network model, wherein the initial structure comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a fifth convolutional layer which are sequentially arranged;

02: calculating to obtain at least one first characteristic image according to the training image with the first resolution and the first parameter of the first layer of convolutional layer;

03: inputting the first characteristic image into a second layer of pooling layer to calculate to obtain second characteristic images corresponding to the first characteristic images one by one;

04: calculating to obtain at least one third characteristic image according to the second characteristic image and a third parameter of the third layer of the convolutional layer;

05: inputting the third characteristic image into a fourth pooling layer to calculate to obtain fourth characteristic images corresponding to the third characteristic images one by one;

06: calculating to obtain at least one fifth characteristic image according to the fourth characteristic image and the fifth parameter of the fifth layer convolution layer;

07: confirming a scene recognition result of each training image according to the fifth characteristic image;

08: calculating a loss value of the multilayer convolutional neural network according to the target category and the scene recognition result;

09: confirming the convergence of the multilayer convolutional neural network model when the loss value is smaller than the preset loss value;

011: preprocessing the acquired test images to obtain a plurality of test images with a second resolution, wherein the second resolution is greater than the first resolution;

012: inputting a test image to the converged multilayer convolutional neural network model to test the converged multilayer convolutional neural network model; and

013: and identifying the scene type in the scene image by adopting the tested multilayer convolutional neural network model.

Referring to fig. 2, the present invention further provides an image recognition apparatus 100 based on the multi-layer convolutional neural network model. The image recognition method based on the multilayer convolutional neural network according to the embodiment of the present invention may be implemented by the image recognition apparatus 100 based on the multilayer convolutional neural network according to the embodiment of the present invention. The image recognition apparatus 100 includes a first preprocessing module 30, a setting module 31, a first calculating module 32, a second calculating module 33, a third calculating module 34, a fourth calculating module 35, a fifth calculating module 36, a first confirming module 37, a sixth calculating module 38, a second confirming module 39, a second preprocessing module 41, a testing module 42, and a recognition module 43. Step 00 may be implemented by the first pre-processing module 30. Step 01 may be implemented by the setting module 31. Step 02 may be implemented by the first calculation module 32. Step 03 may be implemented by the second calculation module 33. Step 04 may be implemented by the third calculation module 34. Step 05 may be implemented by the fourth calculation module 35. Step 06 may be implemented by the fifth calculation module 36. Step 07 may be implemented by the first validation module 37. Step 08 may be implemented by the sixth calculation module 38. Step 09 may be implemented by the second validation module 39. Step 011 can be implemented by the second preprocessing module 41. Step 012 may be implemented by test module 42. Step 013 can be implemented by the recognition module 43.

That is, the first pre-processing module 30 may be configured to label each pre-acquired training image with a target class and pre-process each training image to obtain a plurality of training images at the first resolution. The setting module 31 may be configured to set an initial structure of the multi-layer convolutional neural network model, where the initial structure is a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, and a fifth convolutional layer, which are sequentially arranged. The first calculating module 32 is configured to calculate at least one first feature image according to the training image with the first resolution and the first parameter of the first convolutional layer. The second calculation module 33 may be configured to input the first feature image into the second pooling layer to calculate a second feature image corresponding to the first feature image. The third calculating module 34 is configured to calculate at least one third feature image according to the second feature image and a third parameter of the third layer convolutional layer. The fourth calculating module 35 may be configured to input the third feature image into the fourth pooling layer to calculate a fourth feature image corresponding to the third feature image one to one. The fifth calculating module 36 may be configured to calculate at least one fifth feature image according to the fourth feature image and the fifth parameter of the fifth convolutional layer. The first confirming module 37 may be configured to confirm the scene recognition result of each training image according to the fifth feature image. The sixth calculation module 38 may be configured to calculate a loss value of the multi-layer convolutional neural network according to the object class and the scene recognition result. The second validation module 39 may be configured to validate convergence of the multi-layer convolutional neural network model when the loss value is less than a preset loss value. The second pre-processing module 41 may be configured to pre-process the acquired test images to obtain a plurality of test images at a second resolution, the second resolution being greater than the first resolution. The test module 42 may be used to input the test image to the converged multi-layer convolutional neural network model to test the converged multi-layer convolutional neural network model. The identification module 43 may be configured to identify a scene class in the scene image using the tested multi-layer convolutional neural network model.

Referring to fig. 3, the present invention provides a computer apparatus 1000. The computer device 1000 comprises a memory 61 and a processor 62. The memory 61 has stored therein computer readable instructions 611. The computer readable instructions 611, when executed by the processor 62, cause the processor 62 to: marking a target type of each pre-collected training image, and preprocessing each training image to obtain a plurality of training images with a first resolution; setting an initial structure of a multilayer convolutional neural network model, wherein the initial structure comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a fifth convolutional layer which are sequentially arranged; calculating to obtain at least one first characteristic image according to the training image with the first resolution and the first parameter of the first layer of convolutional layer; inputting the first characteristic image into a second layer of pooling layer to calculate to obtain second characteristic images corresponding to the first characteristic images one by one; calculating to obtain at least one third characteristic image according to the second characteristic image and a third parameter of the third layer of the convolutional layer; inputting the third characteristic image into a fourth pooling layer to calculate to obtain fourth characteristic images corresponding to the third characteristic images one by one; calculating to obtain at least one fifth characteristic image according to the fourth characteristic image and the fifth parameter of the fifth layer convolution layer; confirming a scene recognition result of each training image according to the fifth characteristic image; calculating a loss value of the multilayer convolutional neural network according to the target category and the scene recognition result; confirming the convergence of the multilayer convolutional neural network model when the loss value is smaller than the preset loss value; preprocessing the acquired test images to obtain a plurality of test images with a second resolution, wherein the second resolution is greater than the first resolution; inputting a test image to the converged multilayer convolutional neural network model to test the converged multilayer convolutional neural network model; and identifying the scene type in the scene image by adopting the tested multilayer convolutional neural network model. And the loss value is smaller than the preset loss value, which indicates that the recognition accuracy of the multilayer convolutional neural network model is higher.

The multilayer convolutional neural network model is used for scene recognition. Wherein the first parameters in the multilayer convolutional neural network model comprise a first feature matrix and a first bias term, and the third parameters comprise a third feature matrix and a third bias term. The fifth parameter includes a fifth feature matrix and a fifth bias term. The number of the first feature matrix, the third feature matrix and the fifth feature matrix can be multiple. The plurality of feature matrices are used for extracting features in the image to classify the image according to the features. More features are beneficial for image classification.

Referring to fig. 4, specifically, a large number of training images including scenes are collected first, the training images may be derived from a new media platform such as a microblog, a WeChat, and the like, and the training images include various common scenes, such as sky, coast, grassland, forest, dining room, and the like. One or more scenes can be included in each training image, but each training image needs to have a main scene, and the proportion of the main scene in the training image is larger than that of other scenes in the training image. Assuming each training image is X, the primary scene in each training image is labeled as target class Y.

Next, each training image X is preprocessed. Wherein the operation of preprocessing comprises normalizing the resolution of all training images X. It can be understood that the resolutions of the training pictures X obtained from various channels are very different, and normalizing the resolutions before identifying the scene of the training picture X can facilitate the training of the multilayer convolutional neural network model and accelerate the convergence of the training of the multilayer convolutional neural network model. The normalization of the resolution is specifically to down-sample each training image X. In an embodiment of the present invention, the resolution of all training images X is uniformly normalized to 64X64, that is, the first resolution is 64X 64.

Subsequently, a training image X of 64X64 was input to the first layer convolutional layer. The first characteristic matrix in the first convolutional layer is W^layer1Wherein W is^layer1Number N of₁May be plural, and in one embodiment of the invention, the first feature matrix W^layer1Number N of₁The value of (d) is 32. Each first feature matrix W^layer1Has a perceptual domain size of k₁×k₁In a specific embodiment of the present invention, k₁Has a value of 3. Defining the first characteristic image output by the first layer convolution layer as F^layer1Then F is^layer1＝(X*W^layer1+b^layer1) Wherein, in the step (A),

is a first bias term, R^N1Is a radical of and N₁A correlated real space; (x) Max (x,0) is the activation function. First feature matrix W^layer1The function of (a) is to fit a function such that the function can classify the scene in the image. Specifically, the first feature matrix W^layer1Features in the training image X can be extracted, a plurality of first feature matrices W^layer1A plurality of characteristics in the training image X can be extracted, so that the characteristics can be used for scene classification, and the accuracy of scene classification is improved. First bias term b^layer1The function of (2) is similar to the intercept in a linear function, and the accuracy of the fitted function classification scene can be improved. The activation function (x) ═ max (x,0) is used to increase the nonlinearity of the fitted function, which can further improve the accuracy of the fitted function classification scene. In an embodiment of the present invention, the training image X and the first feature matrix W^layer1When convolution operation is carried out, the sliding step length of the window is 2. First feature matrix W in the first convolutional layer^layer1Number N of₁32, the first convolution layer can output 32 first characteristic images F^layer1For the ith first feature image F_i ^layer1：

Wherein i has a value range of [1,32 ]]I is a positive integer, j has a value range of [1,32 ]]J is a positive integer, and each first feature image F^layer1Has a resolution of 31x 31.

Subsequently, 32 first characteristic images F output from the first layer convolution layer^layer1Inputting the mixture into a second layer of pooling layer for pooling. In an embodiment of the invention, a maximum pooling method is used for each first feature image F^layer1Performing a pooling operation. Specifically, the kernel function size of the second pooling layer is 3 × 3, and the window sliding step size is 2, then each first feature image F^layer1Output after the pooling operation of the second layer of pooling layer and each first sheetCharacteristic image F^layer1Corresponding second characteristic image F^layer2Each second characteristic image F^layer2All 15x 15. Second characteristic image F output by second layer pooling layer^layer2The number of (2) is 32.

Subsequently, 32 second feature images F are taken^layer2Input to the third layer of convolutional layers. The third eigen matrix in the third convolutional layer is W^layer3Wherein W is^layer3Number N of₃May be plural, and in one embodiment of the invention, the third feature matrix W^layer3Number N of₃The value of (d) is 32. Each third feature matrix W^layer3Has a perceptual domain size of k₃×k₃In a specific embodiment of the present invention, k₃Has a value of 3. Defining the third characteristic image output by the third layer of convolution layer as F^layer3Then F is^layer3＝(F^layer2*W^layer3+b^layer3) Wherein, in the step (A),

in order to be the third bias term,

is a radical of and N₃A correlated real space; (x) Max (x,0) is the activation function. Third feature matrix W^layer3The function of (a) is to fit a function such that the function can classify the scene in the image. Third bias term b^layer3The function of (2) is similar to the intercept in a linear function, and the accuracy of the fitted function classification scene can be improved. The activation function (x) ═ max (x,0) is used to increase the nonlinearity of the fitted function, which can further improve the accuracy of the fitted function classification scene. In a specific embodiment of the invention, the second characteristic image F^layer2And a third feature matrix W^layer3When convolution operation is carried out, the sliding step length of the window is 2. Third feature matrix W in the third convolutional layer^layer3Number N of₃32, the third layer convolution layer can output 32 third characteristic images F^layer3For the ith third feature image F_i ^layer3：

That is, each third feature image F output from the third layer convolution layer^layer3Is a plurality of second characteristic images F^layer2With the same third feature matrix W^layer3Obtained by addition after convolution, wherein the value range of i is [1,32 ]]I is a positive integer, j has a value range of [1,32 ]]J is a positive integer, and each third feature image F^layer3Has a resolution of 7x 7.

Subsequently, 32 third feature images F output from the third layer of convolutional layer^layer3Inputting the wastewater into a fourth layer of pooling layer for pooling. In an embodiment of the invention, a maximum pooling method is used for each third feature image F^layer3Performing a pooling operation. Specifically, if the kernel function size of the fourth pooling layer is 3 × 3 and the window sliding step size is 2, each third feature image F^layer3Outputting each third characteristic image F after passing through the fourth layer of pooling layer^layer3Corresponding fourth feature image F^layer4Each fourth feature image F^layer4All of the resolutions of (2) are 3x 3. Fourth feature image F output by fourth layer pooling layer^layer4The number of (2) is 32.

Subsequently, 32 fourth feature images F are extracted^layer4Input to the fifth layer of convolutional layers. The fifth feature matrix in the fifth convolutional layer is W^layer5Wherein W is^layer5Number N of₅May be plural, each fifth feature matrix W^layer5Corresponding to a category of a scene, in an embodiment of the present invention, the fifth feature matrix W^layer5Number N of₅The value of (2) is 10. Each fifth feature matrix has a perceptual domain size of k₅×k₅In a specific embodiment of the present invention, k₅Has a value of 3. Defining the fifth characteristic image output from the fifth layer convolution layer as F^layer5Then F is^layer5＝(F^layer4*W^layer5+b^layer5) Wherein, in the step (A),

in order to be a fifth bias term, the first bias term,

is a radical of and N₅A correlated real space; (x) Max (x,0) is the activation function. Fifth feature matrix W^layer5The function of (a) is to fit a function such that the function can classify the scene in the image. Fifth bias term b^layer5The function of (2) is similar to the intercept in a linear function, and the accuracy of the fitted function classification scene can be improved. The activation function (x) ═ max (x,0) is used to increase the nonlinearity of the fitted function, which can further improve the accuracy of the fitted function classification scene. In a specific embodiment of the present invention, the fourth feature image F^layer4And the fifth feature matrix W^layer5When convolution operation is carried out, the sliding step length of the window is 1. Fifth feature matrix W in fifth convolutional layer^layer5Number N of₅10, the fifth layer convolution layer can output 10 fifth characteristic images F^layer5For the ith fifth feature image F_i ^layer5：

That is, each fifth feature image F outputted from the fifth layer convolution layer^layer5Is a plurality of fourth characteristic images F^layer4With the same fifth feature matrix W^layer5Obtained by addition after convolution, wherein the value range of i is [1,32 ]]I is a positive integer, j has a value range of [1,32 ]]J is a positive integer, and each fifth feature image F^layer5Has a resolution of 1x 1.

Subsequently, 10 fifth feature images F are obtained from the obtained images^layer5In (1) selecting F^layer5The fifth characteristic image with the maximum value is used for calculating to obtain F^layer5Fifth feature matrix W of the fifth feature image having the largest value^layer5The corresponding category is the identified scene category

For each training image X, the above can be appliedThe method carries out scene identification to obtain the only scene category

Therefore, the scene classification of each training image X is obtained by using the multi-layer convolution neural network model

Then, based on the object class Y and the recognized class

Calculating a Loss value Loss of the multilayer convolutional neural network model:

wherein, N is the number of the training images X, k is the k-th training image X, and k is a positive integer. And confirming the convergence of the multilayer convolutional neural network model when the Loss value Loss is smaller than the preset Loss value. The preset loss value represents the recognition error rate of the multilayer convolutional neural network model when the multilayer convolutional neural network model is used for scene recognition. When the Loss value Loss is smaller than the preset Loss value, the recognition error rate of the multilayer convolutional neural network model for recognizing the scene category is low, in other words, the recognition accuracy rate of the multilayer convolutional neural network model for recognizing the scene category is high. And thus, the construction and training of the multilayer convolutional neural network model are completed.

In the initial stage of model construction, a large number of images are acquired in advance, and any two images in the images are different. The images are divided into training images and test images according to a ratio of 4:1, for example, 3000 images are collected, 2400 images are used as training images for training the multi-layer convolutional neural network model, and 600 images are used as test images for testing the multi-layer convolutional neural network model. The training image and the test image are moderate in proportion of 4:1, so that the requirement of recognition accuracy rate of the multi-layer convolutional neural network model after training can be met, and the time complexity of the multi-layer convolutional neural network model construction is low.

Therefore, after the multi-layer convolutional neural network model is trained, the trained multi-layer convolutional neural network model is further tested by adopting a test image. Specifically, the test image is subjected to resolution normalization processing to obtain a test image with a second resolution, and the test image can be subjected to resolution normalization processing in a downsampling manner. Subsequently, the test image of the second resolution is input to the converged multi-layered convolutional neural network model to test the converged multi-layered convolutional neural network model. The test image is adopted to test the converged multilayer convolutional neural network model, so that the overfitting of the multilayer convolutional neural network model can be avoided. And in the training stage, the multilayer convolutional neural network model is trained by adopting a first resolution with lower resolution, so that the calculated amount in the process of constructing the multilayer convolutional neural network model can be reduced, and in the testing stage, the multilayer convolutional neural network model is tested by adopting a second resolution with higher resolution, so that the resolution of a fifth characteristic image output by the fifth layer of convolutional layers is higher, and the fifth characteristic image can be used for subsequent scene layout construction.

According to the image recognition method based on the multilayer convolutional neural network model, the image recognition device 100 and the computer device 1000, the multilayer convolutional neural network model with three convolutional layers and two pooling layers is constructed, the multilayer convolutional neural network model is trained by using the training image with the resolution normalized to the first resolution, the multilayer convolutional neural network model is tested by using the test image with the resolution normalized to the second resolution, the image scene can be recognized without using a full-connection layer, the complexity of a scene recognition algorithm is reduced, the calculation amount of scene recognition is small, and the calculation time is short.

Referring to fig. 5, in some embodiments, the image recognition method based on the multi-layer convolutional neural network model further includes:

010: modifying the first parameter, the third parameter and the fifth parameter when the loss value is greater than or equal to the preset loss value;

and returning to the step 02 to obtain at least one first characteristic image by calculating according to the training image with the first resolution and the first parameter of the first layer of convolutional layer.

Referring to fig. 6, in some embodiments, the image recognition apparatus 100 further includes a modification module 40. Step 010 may be implemented by the modification module 40. That is, the modification module 40 may be configured to modify the first parameter, the third parameter, and the fifth parameter when the loss value is greater than or equal to the preset loss value, and enter step 02 after the parameters are modified.

Referring back to fig. 3, in some embodiments, the computer readable instructions 611, when executed by the processor 62, further cause the processor 62 to modify the first parameter, the third parameter, and the fifth parameter when the loss value is greater than or equal to the predetermined loss value, and enter step 02 after the parameters are modified.

Specifically, when the loss value is greater than or equal to the preset loss value, the first characteristic matrix and the first offset item in the first layer of convolutional layer, the third characteristic matrix and the third offset item in the third layer of convolutional layer, and the fifth characteristic matrix and the fifth offset item in the fifth layer of convolutional layer are modified.

It can be understood that when the loss value is larger, the accuracy of the multilayer convolutional neural network model for identifying the image scene is lower, and therefore, the feature matrix and the bias term of each layer should be modified, so that the fitting function formed by the feature matrix and the bias term of each layer can identify the image scene more accurately, and the accuracy of the whole multilayer convolutional neural network model for identifying the image scene is further improved.

Referring to fig. 7, in some embodiments, the image recognition method based on the multi-layer convolutional neural network model further includes, after step 013:

014: and performing expansion or corrosion processing on the fifth characteristic image corresponding to the scene category to acquire the outline of the scene.

Referring to fig. 8, in some embodiments, the image recognition apparatus 100 further includes a processing module 44. Step 014 may be implemented by processing module 44. That is, the processing module 44 may be configured to perform dilation or erosion processing on the fifth feature image corresponding to the scene category to obtain the contour of the scene.

Referring back to fig. 3, in some embodiments, the computer readable instructions 611, when executed by the processor 62, further cause the processor 62 to perform an operation of performing a dilation or erosion process on the fifth feature image corresponding to the scene category to obtain a contour of the scene.

Specifically, after the training of the multilayer convolutional neural network model is completed, an image to be recognized is input, the image to be recognized may be downsampled to reduce the resolution, but the resolution should be greater than 64 × 64, so that the resolution of the finally selected fifth feature image is not too small, and the scene contour is conveniently obtained. After the image to be recognized is input, 10 fifth characteristic images corresponding to the image to be recognized are output from the fifth layer convolution layer, and F is selected from the fifth characteristic images^layer5And the category indicated by the feature matrix corresponding to the largest fifth feature image is the category of the identified scene. Subsequently, according to the selected F^layer5And determining areas corresponding to the pixel points in the image to be identified by the pixel points of the largest fifth characteristic image, and performing expansion and corrosion treatment on the areas to obtain the outline of the scene.

The present invention also provides one or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors 62, cause the processors 62 to perform a method of constructing a model of a multi-layer convolutional neural network as described in any of the above embodiments.

For example, the computer-executable instructions, when executed by the one or more processors 62, cause the processors 62 to perform the operations of:

05: inputting the third characteristic image into the fourth pooling layer to calculate to obtain a fourth characteristic image corresponding to the third characteristic image one by one;

As another example, the computer-executable instructions, when executed by the one or more processors 62, cause the processors 62 to perform the operations of:

Referring to fig. 9, the present invention further provides an image capturing method. The image acquisition method comprises the following steps:

21: acquiring a scene image;

22: identifying scene types in the scene images by adopting the multilayer convolutional neural network model in any one of the above embodiments; and

23: and adjusting shooting parameters of the camera according to the scene category to acquire a new scene image corresponding to the scene image, wherein the shooting parameters comprise at least one of color temperature, exposure time, sensitivity and exposure compensation.

Referring to fig. 10, the present invention further provides an image capturing apparatus 200. The image acquisition method of the embodiment of the present invention can be realized by the image acquisition apparatus 200 of the embodiment of the present invention. The image acquisition apparatus 200 includes an acquisition unit 50 and an image recognition device 100. Both step 21 and step 23 may be implemented by the obtaining unit 50. Step 22 may be implemented by image recognition apparatus 100. That is, the acquiring unit 50 may be used to acquire a scene image. The image recognition apparatus 100 may be configured to recognize a scene type in a scene image by using the multi-layer convolutional neural network model according to any of the above embodiments. The obtaining unit 50 may further be configured to adjust shooting parameters of the camera according to the scene category to obtain a new scene image corresponding to the scene image, where the shooting parameters include at least one of color temperature, exposure time, sensitivity, and exposure compensation.

Referring back to fig. 3, when the computer readable instructions 611 are executed by the processor 62, the processor 62 further executes operations of acquiring a scene image, identifying a scene type in the scene image by using the multi-layer convolutional neural network model according to any one of the above embodiments, and adjusting shooting parameters of the camera according to the scene type to acquire a new scene image corresponding to the scene image. Wherein a scene image is captured by the camera 81 (shown in fig. 12), and the processor 62 is connected to the camera 81 to read the scene image.

The photographing parameters including at least one of color temperature, exposure time, sensitivity, and exposure compensation refer to: the photographing parameters may include only color temperature, exposure time, sensitivity, or exposure compensation. The shooting parameters may also include color temperature and exposure time, or color temperature, exposure time and sensitivity, or color temperature, exposure time, sensitivity and exposure compensation.

Specifically, for example, after the camera 81 captures an image of a scene, the processor 62 performs an operation of identifying a scene type in the image of the scene using the multi-layer convolutional neural network model, and if it is identified that the scene is a coast, since the scene on the coast usually has strong sunlight, the exposure time may be appropriately reduced to prevent overexposure of the captured image, and a new image of the scene may be captured with a shorter exposure time.

Therefore, the multilayer convolution neural network model provided by the embodiment of the invention is used for identifying the scene image, the category of the scene image can be obtained, and the outline of the scene image can be extracted. Furthermore, the shooting parameters of the camera 81 are adjusted according to the identified scene, so that the quality of the new scene image after shooting can be improved, and the use experience of the user is improved.

In addition, in some embodiments, after the new scene image is obtained, since the scene in the new scene image is known, the new scene image can be further processed according to the known scene, for example, if the scene in the new scene image is a coast, a section of audio of sea waves is allocated to the new scene image, and the audio of the sea waves is played in real time in the process that the user browses the stored new scene image; for another example, if the scene in the new scene image is a forest, a section of audio containing the bird song is allocated to the new scene image, and the section of audio containing the bird song is played in real time in the process that the user browses the stored new scene image, so that the shooting interest of the user is improved and the use experience of the user is improved by adding audio information corresponding to the scene to the new scene image according to the scene.

The present invention also provides one or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors 62, cause the processors 62 to perform the following image acquisition methods:

21: acquiring a scene image;

FIG. 11 is a schematic diagram of internal modules of the computer apparatus 1000, in one embodiment. As shown in fig. 11, the computer apparatus 1000 includes a processor 62, a memory 61 (e.g., a nonvolatile storage medium), an internal memory 63, a display screen 65, and an input device 64, which are connected by a system bus 66. The memory 61 of the computer device 1000 has stored therein an operating system and computer readable instructions 611 (shown in FIG. 3). The computer readable instructions 611 can be executed by the processor 62 to implement the image recognition method based on the multi-layer convolutional neural network model according to any of the above embodiments and the image acquisition method according to any of the above embodiments. The processor 62 may be used to provide computing and control capabilities that support the operation of the overall computer device 1000. The internal memory 63 of the computer device 1000 provides an environment for the computer-readable instructions 611 in the memory 61 to run. The display screen 65 of the computer device 1000 may be a liquid crystal display screen or an electronic ink display screen, and the input device 64 may be a touch layer covered on the display screen 65, a case, a trackball or a touch pad arranged on a housing of the computer device 1000, or an external keyboard, a touch pad or a mouse. The computer device 1000 may be a mobile phone, a tablet computer, a notebook computer, a personal digital assistant, or a wearable device (e.g., a smart bracelet, a smart watch, a smart helmet, smart glasses), etc. It will be understood by those skilled in the art that the configuration shown in fig. 11 is only a schematic diagram of a part of the configuration related to the solution of the present invention, and does not constitute a limitation to the computer device 1000 to which the solution of the present invention is applied, and a specific computer device 1000 may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.

Referring to fig. 12, a computer apparatus 1000 according to an embodiment of the present invention includes an image processor circuit 80. The image processing circuit 80 may be implemented using hardware and/or software. Various Processing units may be included that define an ISP (Image Signal Processing) pipeline. FIG. 12 is a schematic diagram of image processing circuitry 80 in one embodiment. As shown in fig. 12, for convenience of explanation, only aspects of the image processing technique related to the embodiment of the present invention are shown.

As shown in fig. 12, the image processing circuit includes an ISP processor (which may be the processor 62 or part of the processor 62) and control logic 84. The image data captured by the camera 81 is first processed by the ISP processor 83, and the ISP processor 83 analyzes the image data to capture image statistics that may be used to determine one or more control parameters of the camera 81. The camera 81 may include a lens 811 and an image sensor 812. The image sensor 812 may acquire light intensity and wavelength information captured by each imaging pixel and provide a set of raw image data that may be processed by the ISP processor 83. The sensor 82 (e.g., a gyroscope) may provide parameters of the acquired image processing (e.g., anti-shake parameters) to the ISP processor 83 based on the type of sensor interface. The sensor interface may be an SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the above.

In addition, the image sensor 812 may also send raw image data to the sensor 82, the sensor 82 may provide raw image data to the ISP processor 83 based on the sensor interface type, or the sensor may store raw image data in the memory 61.

The ISP processor 83 processes the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 83 may perform one or more image processing operations on the raw image data, gathering statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

The ISP processor 83 may also receive image data from the memory 61. For example, the sensor interface sends raw image data to the memory 61, and the raw image data in the memory 61 is then provided to the ISP processor 83 for processing.

Upon receiving raw image data from the image sensor interface or from the sensor 82 interface or from the memory 61, the ISP processor 83 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to memory 61 for additional processing before being displayed. The ISP processor 83 receives the processing data from the memory 61 and performs image data processing on the processing data. The image data processed by the ISP processor 83 may be output to a display screen for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the ISP processor 83 may also be transmitted to the memory 61, and the display screen 65 may read the image data from the memory 61. In one embodiment, memory 61 may be configured to implement one or more frame buffers. In addition, the output of the ISP processor 83 may be transmitted to an encoder/decoder 85 for encoding/decoding image data. The encoded image data may be saved and decompressed before being displayed on the display screen 65. The encoder/decoder 85 may be implemented by a CPU or GPU or coprocessor.

The statistics determined by the ISP processor 83 may be sent to the control logic unit 84. For example, the statistical data may include image sensor statistics such as auto-exposure, auto-focus, flicker detection, black level compensation, lens shading correction, and the like. Control logic 84 may include a processing element and/or microcontroller that executes one or more routines (e.g., firmware) that determine control parameters for camera 81 and control parameters for ISP processor 83 based on the received statistical data. For example, the control parameters of the camera 81 may include sensor 82 control parameters (e.g., gain, integration time for exposure control, anti-shake parameters, etc.), camera flash control parameters, lens control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which can be stored in a non-volatile computer readable storage medium, and when executed, can include the processes of the above embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image identification method based on a multilayer convolutional neural network model is characterized by comprising the following steps:

inputting the test image to the converged multilayer convolutional neural network model to test the converged multilayer convolutional neural network model;

the method comprises the steps that downsampling processing is conducted on a scene image to be identified to obtain the scene image with the resolution being larger than the first resolution, and the tested multilayer convolutional neural network model is adopted to identify the scene type in the downsampled scene image; and

determining regions corresponding to the pixel points in the scene image to be identified according to the pixel points of the fifth characteristic image corresponding to the scene type, and performing expansion and corrosion processing on the regions to obtain the outline of the scene in the scene image.

2. The image recognition method of claim 1, wherein the training images comprise a plurality of images, the test images comprise a plurality of images, and a ratio of the number of training images to the number of test images is 4: 1;

the preprocessing the training image comprises normalizing the training image;

the preprocessing of the acquired test image comprises the normalization processing of the test image.

3. The image recognition method of claim 1, wherein the first parameter includes a first feature matrix and a first bias term; the third parameters comprise a third feature matrix and a third bias term; the fifth parameter includes a fifth feature matrix and a fifth bias term.

4. The image recognition method according to claim 3, wherein the number of the first feature matrices is 32, and the size of a perceptual domain of each of the first feature matrices is 3x 3; and/or

The number of the third feature matrixes is 32, and the size of a sensing domain of each third feature matrix is 3x 3; and/or

The number of the fifth feature matrices is 10, and the size of the sensing domain of each fifth feature matrix is 3 × 3.

5. The image recognition method according to claim 3, further comprising:

modifying the first parameter, the third parameter, and the fifth parameter when the loss value is greater than or equal to the preset loss value; and

and returning to the step of calculating at least one first characteristic image according to the training image with the first resolution and the first parameter of the first layer of convolutional layer.

6. An image acquisition method, characterized in that the image acquisition method comprises:

acquiring a scene image;

identifying a scene class in the scene image using the multi-layer convolutional neural network model of any one of claims 1 to 5;

7. An image recognition apparatus based on a multilayer convolutional neural network model, the image recognition apparatus comprising:

the first preprocessing module is used for marking a target class of each training image acquired in advance and preprocessing each training image to obtain a plurality of training images with a first resolution;

the setting module is used for setting an initial structure of the multilayer convolutional neural network model, and the initial structure comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a fifth convolutional layer which are sequentially arranged;

the first calculation module is used for calculating at least one first characteristic image according to the training image with the first resolution and a first parameter of the first layer of convolutional layer;

the second calculation module is used for inputting the first characteristic image into the second layer of pooling layer to calculate a second characteristic image corresponding to the first characteristic image one by one;

the third calculation module is used for calculating at least one third characteristic image according to the second characteristic image and a third parameter of the third layer of convolutional layer;

a fourth calculation module, configured to input the third feature image into the fourth pooling layer to calculate a fourth feature image corresponding to the third feature image one to one;

a fifth calculation module, configured to calculate at least one fifth feature image according to the fourth feature image and a fifth parameter of the fifth convolutional layer;

a first confirming module, configured to confirm a scene recognition result of each training image according to the fifth feature image;

a sixth calculation module, configured to calculate a loss value of the multilayer convolutional neural network according to the target class and the scene recognition result;

a second confirmation module, configured to confirm convergence of the multilayer convolutional neural network model when the loss value is smaller than a preset loss value;

a second pre-processing module for pre-processing an acquired test image to obtain a plurality of said test images at a second resolution, said second resolution being greater than said first resolution;

a test module to input the test image to the converged multilayer convolutional neural network model to test the converged multilayer convolutional neural network model;

the recognition module is used for performing down-sampling processing on a scene image to be recognized to obtain a scene image with the resolution being greater than the first resolution, and recognizing the scene type in the down-sampled scene image by adopting the tested multilayer convolutional neural network model; and

and the processing module is used for determining areas corresponding to the pixel points in the scene image to be identified according to the pixel points of the fifth characteristic image corresponding to the scene type, and performing expansion and corrosion processing on the areas to obtain the outline of the scene in the scene image.

8. The image recognition device of claim 7, wherein the training images comprise a plurality of the test images, and the number of the training images and the number of the test images are in a ratio of 4: 1;

the preprocessing the training image comprises normalizing the training image;

9. The image recognition apparatus according to claim 7, wherein the first parameter includes a first feature matrix and a first bias term; the third parameters comprise a third feature matrix and a third bias term; the fifth parameter includes a fifth feature matrix and a fifth bias term.

10. The image recognition apparatus according to claim 9, wherein the number of the first feature matrices is 32, and the size of a perceptual domain of each of the first feature matrices is 3x 3; and/or

11. The image recognition device of claim 9, further comprising a modification module configured to:

12. An image acquisition apparatus characterized by comprising:

an acquisition unit for acquiring a scene image;

image recognition means for recognizing a scene class in the scene image using the multi-layer convolutional neural network model of any one of claims 1 to 5;

the acquisition unit is further used for adjusting shooting parameters of the camera according to the scene type to acquire a new scene image corresponding to the scene image, wherein the shooting parameters comprise at least one of color temperature, exposure time, sensitivity and exposure compensation.

13. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions which, when executed by the processor, cause the processor to perform the image recognition method of any one of claims 1 to 5 and the image acquisition method of claim 6.

14. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the image recognition method of any one of claims 1 to 5 and the image acquisition method of claim 6.