CN110751162B

CN110751162B - Image identification method and device and computer equipment

Info

Publication number: CN110751162B
Application number: CN201810821175.7A
Authority: CN
Inventors: 张鹏
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2023-04-07
Anticipated expiration: 2038-07-24
Also published as: CN110751162A

Abstract

The application provides an image identification method and device and computer equipment. The image identification method provided by the application comprises the following steps: extracting image features from an image to be recognized to obtain a multi-channel feature map, determining image recognition parameters corresponding to all feature values in the multi-channel feature map, correcting the multi-channel feature map according to the image recognition parameters corresponding to all feature values in the multi-channel feature map to obtain a corrected multi-channel feature map, and recognizing a target object in the image to be recognized by using the corrected multi-channel feature map. And the image identification parameters corresponding to the characteristic values are used for representing the correlation degree of the characteristic values and the identified target object. The image identification method, the image identification device and the computer equipment are high in identification accuracy.

Description

Image identification method and device and computer equipment

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to an image recognition method, an image recognition device, and a computer device.

Background

Image recognition refers to a technique of processing, analyzing, and understanding an image to be recognized using a computer to detect and recognize various patterns of target objects. Currently, with the development of image recognition technology, image recognition technology has been widely applied in various fields. For example, the method is applied to the field of security protection and the like.

Related image recognition methods generally include: and extracting image features, and performing image recognition by using the extracted image features to obtain a target object. When the method is adopted to identify the image to be identified, the extracted image features usually contain some redundant or useless information, and the identification accuracy is low when the extracted image features are directly utilized to identify the image.

Disclosure of Invention

In view of this, the present application provides an image recognition method, an image recognition apparatus, and a computer device, so as to provide an image recognition method with a high recognition accuracy.

A first aspect of the present application provides an image recognition method, including:

extracting image features from an image to be identified to obtain a multi-channel feature map;

determining image identification parameters corresponding to all characteristic values in the multi-channel characteristic diagram, wherein the image identification parameters corresponding to the characteristic values are used for representing the correlation degree of the characteristic values and the identified target object;

correcting the multi-channel feature map according to the image identification parameters corresponding to the feature values in the multi-channel feature to obtain a corrected multi-channel feature map;

and identifying the target object in the image to be identified by using the corrected multi-channel feature map.

A second aspect of the present application provides an image recognition apparatus comprising an extraction module, a determination module, a processing module, and a recognition module, wherein,

the extraction module is used for extracting image features from the image to be identified to obtain a multi-channel feature map;

the determining module is used for determining image identification parameters corresponding to all characteristic values in the multi-channel characteristic diagram, wherein the image identification parameters corresponding to the characteristic values are used for representing the correlation degree of the characteristic values and the identified target object;

the processing module is used for correcting the multi-channel feature map according to the image identification parameters corresponding to the feature values in the multi-channel feature to obtain a corrected multi-channel feature map;

and the identification module is used for identifying the target object in the image to be identified by using the corrected multi-channel feature map.

A third aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided in the first aspect of the present application.

A fourth aspect of the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the present application when executing the program.

According to the image identification method, the image identification device and the computer equipment, the multi-channel feature map is obtained by extracting the image features from the image to be identified, the image identification parameters corresponding to all feature values in the multi-channel feature map are determined, the multi-channel feature map is corrected according to the image identification parameters corresponding to all feature values in the multi-channel feature map, the corrected multi-channel feature map is obtained, and therefore the target object in the image to be identified is identified by means of the corrected multi-channel feature map. In this way, when the multi-channel feature map is subjected to correction processing, the feature value with a high degree of correlation with the target object is enhanced, and the feature value with a low degree of correlation with the target object is suppressed, so that when image recognition is performed by using the corrected multi-channel feature map, the recognition accuracy can be improved.

Drawings

Fig. 1 is a flowchart of a first embodiment of an image recognition method provided in the present application;

fig. 2 is a flowchart of a second embodiment of an image recognition method provided in the present application;

FIG. 3 is a diagram illustrating a weighting process performed on a multi-channel feature map of feature values by a fully-connected layer in a first neural network according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a weighting process performed on a multi-channel feature map of feature values by a fully-connected layer in a first neural network according to another exemplary embodiment of the present application;

fig. 5 is a flowchart of a third embodiment of an image recognition method provided in the present application;

FIG. 6 is a diagram illustrating a convolution layer in a second neural network performing convolution processing on an eigenvalue multi-channel eigen map according to an exemplary embodiment of the present application;

fig. 7 is a hardware configuration diagram of a computer device in which an image recognition apparatus according to an exemplary embodiment of the present application is located;

fig. 8 is a schematic structural diagram of a first embodiment of an image recognition apparatus provided in the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.

The application provides an image identification method, an image identification device and computer equipment, and aims to provide an image identification method with high identification accuracy.

The technical solution of the present application will be described in detail with specific examples. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a flowchart of a first embodiment of an image recognition method provided in the present application. Referring to fig. 1, the image recognition method provided in this embodiment may include:

s101, extracting image features from the image to be identified to obtain a multi-channel feature map.

Specifically, conventional methods may be employed to extract image features from the image to be identified. For example, histogram of Oriented gradients (Histogram of Oriented gradients, HOG) is used to extract Histogram of Oriented gradients (Histogram of Oriented gradients). For another example, a Scale-invariant Feature Transform (SIFI) algorithm (Scale-invariant Feature Transform, abbreviated as SIFT) is used to extract image features from the image to be identified. It should be noted that, for specific implementation principles and implementation procedures of the HOG algorithm and the SIFT algorithm, reference may be made to descriptions in the related art, and details are not described here.

Optionally, in a possible implementation manner of the present application, a specific implementation process of the step may include:

(1) Inputting an image to be recognized into a trained third neural network, and performing feature extraction on the image to be recognized by a specified layer in the third neural network; the designated layer includes a convolutional layer, or the designated layer includes a convolutional layer and at least one of a pooling layer and a fully-connected layer.

(2) And determining the output result of the specified layer as the multi-channel feature map.

Specifically, the third neural network may include a convolutional layer for performing a filtering process on the input image to be recognized. Furthermore, at this time, the filtering processing result output by the convolutional layer is the multi-channel feature map extracted. In addition, the third neural network may further include a pooling layer and/or a fully-connected layer. For example, in an embodiment, the third neural network includes a convolutional layer, a pooling layer, and a fully-connected layer, where the convolutional layer is configured to perform filtering processing on an input image to be recognized; the pooling layer is used for compressing the filtering result; and the full connection layer is used for carrying out aggregation processing on the compression processing result. Further, at this time, the aggregation processing result output by the full connection layer is the multi-channel feature map extracted.

S102, determining image identification parameters corresponding to all characteristic values in the multi-channel characteristic diagram, wherein the image identification parameters corresponding to the characteristic values are used for representing the correlation degree of the characteristic values and the identified target object.

Specifically, each feature value in the multi-channel feature map refers to each element in the multi-channel feature map. For example, when the dimension of the multi-channel feature map is C × H × W (where C is the number of channels of the multi-channel feature map, and H × W is the height and width of the feature map corresponding to each channel), the multi-channel feature map includes C × H × W feature values. In addition, the image identification parameter corresponding to the characteristic value is used for representing the degree of correlation between the characteristic value and the identified target object. That is, the larger the image recognition parameter corresponding to a certain feature value is, the higher the degree of correlation between the feature value and the recognized target object is.

Furthermore, the image recognition parameter characteristic values corresponding to the characteristic values in the multi-channel characteristic diagram can be determined through a pre-trained neural network. The detailed implementation procedure and implementation principle of this step will be described in detail in the following embodiments, and will not be described herein.

The dimension of an image recognition parameter set composed of image recognition parameters corresponding to all feature values is the same as the dimension of the multi-channel feature map, and each image recognition parameter in the image recognition parameter set is an image recognition parameter corresponding to each feature value in the multi-channel feature map. For example, when the multi-channel feature map is a one-dimensional vector, the image identification parameter set is also a one-dimensional vector; when the multi-channel feature map is a three-dimensional tensor, the image identification parameter set is also a three-dimensional tensor. For example, in connection with the above example, when the dimension of the multi-channel feature map is C × H × W, the dimension of the image recognition parameter set is also C × H × W.

S103, correcting the multi-channel feature map according to the image identification parameters corresponding to the feature values in the multi-channel feature to obtain a corrected multi-channel feature map.

In specific implementation, for each feature value in the multi-channel feature map, the feature value may be corrected according to the image identification parameter corresponding to the feature value to obtain a corrected feature value, and all corrected feature values constitute the corrected multi-channel feature map. The corrected feature value is positively correlated with the image recognition parameter corresponding to the feature value. That is, the larger the image recognition parameter corresponding to a certain feature value is, the larger the feature value after correction processing is performed on the feature value is.

As described above, by performing the correction processing on the multi-channel feature map, it is possible to enhance the feature value with a large image recognition parameter, suppress the feature value with a small image recognition parameter, that is, enhance the feature value with a high degree of correlation with the target object, and suppress the feature value with a low degree of correlation with the target object. Therefore, useful information can be enhanced, and useless information can be suppressed, so that when the corrected multi-channel feature map is used for image recognition, the recognition accuracy can be improved.

Specifically, in an embodiment, a specific implementation process of the step may include: for each eigenvalue in the multichannel characteristic diagram, carrying out correction processing on the eigenvalue according to a first formula, wherein the first formula is as follows:

F1(xi)＝a+F(xi)*Bi

wherein, F (xi) is the ith characteristic value in the multichannel characteristic diagram;

bi is an image identification parameter corresponding to the ith characteristic value;

a is a constant;

f1 (xi) is the corrected ith characteristic value.

and for each characteristic value in the multi-channel characteristic diagram, carrying out weighting processing on the characteristic value according to the image identification parameter corresponding to the characteristic value to obtain the corrected multi-channel characteristic diagram.

Specifically, the weighting process may be represented by a second formula, where the second formula is:

F1(xi)＝F(xi)*Bi

f1 (xi) is the corrected ith characteristic value.

And S104, identifying the target object in the image to be identified by using the corrected multi-channel feature map.

Specifically, according to different application directions, the image recognition may include target detection, target classification, and target ratio peer-to-peer processing; accordingly, the image recognition is different, and the recognized target object is also different. In the present embodiment, this is not limited. In addition, the specific implementation process and implementation principle of this step may refer to the description in the related art, and are not described herein again.

In the method provided by this embodiment, image features are extracted from an image to be recognized to obtain a multi-channel feature map, image recognition parameters corresponding to feature values in the multi-channel feature map are determined, and then the multi-channel feature map is corrected according to the image recognition parameters corresponding to feature values in the multi-channel feature map to obtain a corrected multi-channel feature map, so that a target object in the image to be recognized is recognized by using the corrected multi-channel feature map. In this way, when the multi-channel feature map is subjected to correction processing, the feature value with a high degree of correlation with the target object is enhanced, and the feature value with a low degree of correlation with the target object is suppressed, so that when image recognition is performed by using the corrected multi-channel feature map, the recognition accuracy can be improved.

Several specific embodiments are given below for describing in detail the process of determining the image recognition parameters corresponding to the feature values in the multi-channel feature map.

Fig. 2 is a flowchart of a second embodiment of an image recognition method provided in the present application. In the method provided by this embodiment, on the basis of the above embodiment, in step S102, determining the image identification parameter corresponding to each feature value in the multi-channel feature map may include:

s201, inputting the multichannel feature map into a trained first neural network, and performing weighting processing on the multichannel feature map by a full connection layer in the first neural network.

S202, determining image identification parameters corresponding to all characteristic values in the multi-channel characteristic diagram according to the weighting processing result output by the full connection layer.

Specifically, in an embodiment, the fully-connected layer of the first neural network includes a fully-connected coefficient corresponding to each channel in the multi-channel feature learned in advance. In addition, the full-connection coefficient corresponding to each channel comprises the full-connection coefficient corresponding to each characteristic value in the multi-channel characteristic diagram. The specific implementation process of the step may include:

and the full-connection layer adopts a pre-learned full-connection coefficient corresponding to the kth channel in the multi-channel characteristic diagram to carry out weighting processing on the multi-channel characteristic diagram and outputs an image identification parameter corresponding to the kth channel.

Note that the full link coefficient refers to a model parameter in the full link layer. Further, the k-th channel refers to an arbitrary channel in the above-described multi-channel feature map.

Specifically, at this time, the weighting process may be represented by a third formula, where the third formula is:

wherein Mk is an image identification parameter corresponding to the kth channel;

f (i, j, n) is the (i, j) th characteristic value in the characteristic diagram corresponding to the nth channel in the multi-channel characteristic diagram;

and w (i, j, n, k) is a full connection coefficient corresponding to the (i, j, n) th characteristic value in the full connection coefficient corresponding to the kth channel, wherein the (i, j, n) th characteristic value refers to the (i, j) th characteristic value in the characteristic diagram corresponding to the nth channel in the multi-channel characteristic diagram.

Accordingly, in this embodiment, determining the image identification parameter corresponding to each feature value in the multi-channel feature map according to the weighting processing result output by the full connection layer in the first neural network includes:

and determining the image identification parameters corresponding to the kth channel output by the full connection layer as the image identification parameters corresponding to each characteristic value in the target characteristic diagram aiming at the target characteristic diagram corresponding to the kth channel in the multi-channel characteristic diagram.

With reference to the above example, in this step, mk is determined as the image recognition parameter corresponding to each feature value in the target feature map corresponding to the kth channel.

A specific example is given below for explaining a specific implementation process of the present embodiment in detail. Fig. 3 is a schematic diagram illustrating a weighting process performed on a multi-channel feature map by a fully-connected layer in a first neural network according to an exemplary embodiment of the present application. Referring to fig. 3, in the embodiment shown in fig. 3, the dimension of the multi-channel feature map is 3 × 2 (the multi-channel feature map includes 3 × 2 feature values), that is, the number of channels of the multi-channel feature map is 3, and the feature map corresponding to each channel includes 2 × 2 feature values. For example, in one embodiment, the three channels are R, G, and B channels, respectively. Further, with reference to fig. 3, the full-connection layer in the first neural network includes 3 sets of full-connection coefficients (only one set of full-connection coefficients is shown in fig. 3, and the set of full-connection coefficients is shown as a straight line in fig. 3) learned in advance, each set of full-connection coefficients includes 3 × 2 full-connection coefficients, where each set of full-connection coefficients is a full-connection coefficient corresponding to one of the channels in the multi-channel feature, and 3 × 2 full-connection coefficients included in each set of full-connection coefficients are full-connection coefficients corresponding to each feature value in the multi-channel feature map in the full-connection coefficients corresponding to the channel. For example, in the embodiment shown in fig. 3, w (1, 2, 1) is the full-connection coefficient corresponding to the (1, 2) th eigenvalue in the full-connection coefficient corresponding to the first channel. Wherein the (1, 2) th feature value represents the (1, 1) th feature value in the 2 nd channel, i.e., G11 in fig. 3. Further, see fig. 3, wherein:

M1＝R11*w(1，1，1，1)+R12*w(1，2，1，1)+……+G22*w(2，2，2，1)+……+B22*w(2，2，3，1)

in addition, the specific calculation procedures and calculation principles related to M2 and M3 are similar to M1, and are not described herein again.

With reference to the above description, after M1 is obtained through calculation, i.e., M1 is determined as the image identification parameter corresponding to each feature value in the feature map corresponding to the 1 st channel. Namely, the image recognition parameter corresponding to R11, R12, R21, and R22 is determined to be M1. Likewise, the image recognition parameter corresponding to G11, G12, G21, and G22 is determined to be M2. And determining that the image identification parameters corresponding to the B11, the B12, the B21 and the B22 are M3.

Further, in another embodiment, the image to be recognized includes multiple frames of images, and the fully-connected layer of the first neural network includes a fully-connected coefficient previously learned to each frame of image in the multiple frames of images. In addition, the full-connection parameter corresponding to each frame of image includes a full-connection parameter corresponding to each feature value in the multi-channel feature map, and a specific implementation process of the step may include:

and the full-connection layer performs weighting processing on the multi-channel feature map by adopting a pre-learned full-connection coefficient corresponding to the image of the frame c in the image to be identified, and outputs an image identification parameter corresponding to the image of the frame c.

Specifically, at this time, the weighting process may be represented by a fourth formula, where the fourth formula is:

wherein Lc is an image identification parameter corresponding to the c frame image;

f (i, j, n, h) is the (i, j) th characteristic value in the characteristic map corresponding to the nth channel extracted from the h frame image;

the w (i, j, n, h, c) is a full connection coefficient corresponding to the (i, j, n, h) th characteristic value in the full connection coefficient corresponding to the c frame image; and the (i, j, n, h) th characteristic value represents the (i, j) th characteristic value in the characteristic diagram corresponding to the nth channel extracted from the h frame image.

Correspondingly, in this embodiment, determining the image identification parameter corresponding to each feature value in the multi-channel feature map according to the weighting processing result output by the full connection layer in the first neural network includes:

and determining image identification parameters corresponding to the c frame image output by the full connection layer as image identification parameters corresponding to all characteristic values in the target characteristic map aiming at the target characteristic map extracted from the c frame image in the multi-channel characteristic map.

In conjunction with the above example, in this step, mc is determined as the image recognition parameter corresponding to each feature value in the feature map extracted from the c-th frame image.

A specific example is given below to explain in detail the specific implementation process of the embodiment. Specifically, fig. 4 is a schematic diagram illustrating a weighting process performed on a multi-channel feature map by a fully-connected layer in a first neural network according to another exemplary embodiment of the present application. Referring to fig. 4, the dimension of the multi-channel feature map is 2 × 3 × 2 (the multi-channel feature map includes 2 × 3 × 2 feature values), that is, the time dimension of the multi-channel feature map is 2 (the multi-channel feature map is a feature map extracted from two frames of images), the number of channels of the multi-channel feature map is 3, and the feature map corresponding to each channel includes 2 × 2 feature values. For example, in one embodiment, the image to be recognized includes 2 frames of images, and R, G, and B three-channel image features are extracted from the 2 frames of images respectively. Further, with reference to fig. 4, the full-connectivity layer in the first neural network includes 2 groups of full-connectivity coefficients (only one group of full-connectivity coefficients is shown in fig. 4) learned in advance, each group of full-connectivity coefficients includes 2 × 3 × 2 full-connectivity coefficients, each group of full-connectivity coefficients is a full-connectivity coefficient corresponding to each frame of image in the image to be identified, and 2 × 3 × 2 full-connectivity coefficients included in each group of full-connectivity coefficients are full-connectivity coefficients corresponding to each feature value in the multi-channel feature map in the full-connectivity coefficients corresponding to the frame of image. Referring to fig. 4, in which, L1= w (1, 1) _ R11+ \8230; + w (1, 2, 1) G11+ \8230; \ 8230; + w (1, 3, 1) } B11+ \8230; \ 8230

+w(1，1，1，2，1)*R21+……+w(1，1，2，2，1)*G21+……+w(1，1，3，2，1)*B21

With reference to the above description, when L1 is obtained through calculation, L1 is determined as an image recognition parameter corresponding to each feature value in the feature map extracted from the first frame image. Namely, the image identification parameters corresponding to R11, R12, R13, R14 \8230; and B14 are determined to be L1. Similarly, the image identification parameters corresponding to R21, R22, R23, R24 \8230, and R24 \8230andL 2 are determined.

The method provided by this embodiment provides a method for determining image recognition parameters corresponding to each feature value, and by this method, image recognition parameters corresponding to each feature value may be determined, and then each feature value is corrected according to the image recognition parameter corresponding to each feature value, and image recognition is performed by using the corrected multi-channel feature map. Thus, the recognition accuracy can be improved.

Fig. 5 is a flowchart of a third embodiment of an image recognition method provided in the present application. Referring to fig. 5, in the method provided in this embodiment, on the basis of the above embodiment, in step S102, determining the image identification parameter corresponding to each feature value in the multi-channel feature map may include:

s501, inputting the multi-channel feature map into a trained second neural network, performing convolution processing on the multi-channel feature map by a convolution layer in the second neural network, and outputting a convolution processing result; and the dimension of the convolution processing result is the same as that of the multi-channel feature map.

And S502, sequentially determining each convolution value in the convolution processing result as an image identification parameter corresponding to each characteristic value in the multi-channel characteristic diagram.

Note that the convolution value is each element in the convolution processing result. For example, in one embodiment, the dimension of the multi-pass feature map is C × H × W, and the multi-pass feature map includes C × H × W feature values. Further, after the multi-channel feature map is convolved, the dimension of the convolution processing result is also C × H × W, and in this case, the convolution processing result includes C × H × W convolution values. At this time, the convolution values in the convolution processing result are sequentially determined as the image identification parameters corresponding to the characteristic values in the multi-channel characteristic diagram.

Further, the second neural network includes a convolutional layer containing n convolutional kernels, each convolutional kernel having a size of 1 × n, each convolutional kernel being shifted by a step size of 1. Wherein n is equal to the number of channels of the multi-channel feature map. Further, the convolution process can be characterized by a fifth formula, where the fifth formula is:

wherein, M (i, j, k) is an image identification parameter corresponding to the (i, j) th characteristic value in the characteristic diagram corresponding to the k-th channel in the multi-channel characteristic diagram;

w (n, k) is a convolution coefficient corresponding to the nth channel in the kth convolution kernel.

The convolution coefficient refers to a model parameter in the convolution layer.

A specific example is given below to explain in detail the specific implementation process of the present embodiment. Specifically, fig. 6 is a schematic diagram illustrating convolution processing of a multi-channel feature map by convolution layers in a second neural network according to an exemplary embodiment of the present application. Referring to fig. 6, in the present embodiment, the dimension of the multi-channel feature map is 3 × 2, where 3 represents the number of channels of the multi-channel feature map, and 2 × 2 represents the number of feature values included in the feature map corresponding to each channel. Further, referring to fig. 6, the convolutional layer includes 3 convolutional kernels (in fig. 6, only one convolutional kernel is shown), the size of which is 1 × 3,3 characterizing the number of channels. Referring to fig. 6, m (1, 1) = R11 × W (1, 1) + G11 × W (2, 1) + B11 × W (3, 1), where W (n, k) is a convolution coefficient corresponding to the nth channel in the kth convolution kernel. It should be noted that M (1, 1) is an image identification parameter corresponding to the (1, 1) th feature value in the feature map corresponding to the 1 st channel, and in this example, is an image identification parameter corresponding to R11.

The method provided by this embodiment provides a method for determining image recognition parameters corresponding to each feature value, and by this method, image recognition parameters corresponding to each feature value can be determined, and then each feature value is corrected according to the image recognition parameter corresponding to each feature value, and image recognition is performed by using the corrected multi-channel feature map. Thus, the recognition accuracy can be improved.

Corresponding to the embodiment of the image identification method, the application also provides an embodiment of the image identification device.

The embodiment of the image recognition device can be applied to computer equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the memory into the memory for operation through the processor of the computer device where the software implementation is located as a logical means. From a hardware aspect, as shown in fig. 7, which is a hardware structure diagram of a computer device where an image recognition apparatus is located according to an exemplary embodiment of the present application, except for the storage 710, the processor 720, the memory 730, and the network interface 740 shown in fig. 7, the computer device where the apparatus is located in the embodiment may also include other hardware according to an actual function of the image recognition apparatus, which is not described again.

Fig. 8 is a schematic structural diagram of a first embodiment of an image recognition apparatus provided in the present application. Referring to fig. 8, the apparatus provided in this embodiment may include an extracting module 810, a determining module 820, a processing module 830, and an identifying module 840, wherein,

the extraction module 810 is configured to extract image features from an image to be identified to obtain a multi-channel feature map;

the determining module 820 is configured to determine an image identification parameter corresponding to each feature value in the multi-channel feature map, where the image identification parameter corresponding to the feature value is used to represent a degree of correlation between the feature value and the identified target object;

the processing module 830 is configured to correct the multi-channel feature map according to the image identification parameter corresponding to each feature value in the multi-channel feature, so as to obtain a corrected multi-channel feature map;

the identifying module 840 is configured to identify the target object in the image to be identified by using the corrected multi-channel feature map.

The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.

Further, the determining module 820 is configured to input the multi-channel feature map into a trained first neural network, and perform weighting processing on the multi-channel feature map by using a full link layer in the first neural network; and determining image identification parameters corresponding to all characteristic values in the multi-channel characteristic diagram according to the weighting processing result output by the full connection layer.

Further, a fully-connected layer in the first neural network performs weighting processing on the multi-channel feature map, including:

the full-connection layer performs weighting processing on the multichannel feature map by adopting a pre-learned full-connection coefficient corresponding to a kth channel in the multichannel feature map, and outputs an image identification parameter corresponding to the kth channel;

the determining the image identification parameters corresponding to each feature value in the multi-channel feature map according to the weighting processing result output by the full connection layer comprises the following steps:

the full-connection layer performs weighting processing on the multi-channel feature map by adopting a pre-learned full-connection coefficient corresponding to the image of the frame c in the image to be identified, and outputs an image identification parameter corresponding to the image of the frame c;

determining image identification parameters corresponding to each characteristic value in the multi-channel characteristic diagram according to the weighting processing result output by the full connection layer, wherein the determining comprises the following steps:

Further, the determining module 820 is configured to input the multi-channel feature map into a trained second neural network, perform convolution processing on the multi-channel feature map by using convolution layers in the second neural network, and output a convolution processing result; sequentially determining each convolution value in the convolution processing result as an image identification parameter corresponding to each characteristic value in the multi-channel characteristic diagram; and the dimension of the convolution processing result is the same as that of the multi-channel feature map.

Further, the processing module 830 is configured to, for each feature value in the multi-channel feature map, perform weighting processing on the feature value according to an image identification parameter corresponding to the feature value, so as to obtain a corrected multi-channel feature map.

Further, the extracting module 810 is configured to input the image to be recognized into a trained third neural network, perform feature extraction on the image to be recognized by a specified layer in the third neural network, and determine an output result of the specified layer as the multi-channel feature map; wherein the designated layer comprises a convolutional layer, or the designated layer comprises a convolutional layer and at least one of a pooling layer and a fully-connected layer.

The present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the image recognition methods provided herein.

In particular, computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.

With continued reference to fig. 7, the present application further provides a computer device, which includes a memory 710, a processor 720 and a computer program stored in the memory 710 and executable on the processor 720, wherein the processor 720 implements the steps of any of the image recognition methods provided herein when executing the program.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. An image recognition method, characterized in that the method comprises:

extracting image features from an image to be identified to obtain a multi-channel feature map, wherein the image to be identified comprises at least two frames of images;

inputting the multichannel feature map into a first neural network, and performing weighting processing on the multichannel feature map by a fully-connected layer in the first neural network, wherein the weighting processing comprises the following steps: the full-connection layer performs weighting processing on the multi-channel feature map by adopting a pre-learned full-connection coefficient corresponding to the image of the frame c in the image to be identified, and outputs an image identification parameter corresponding to the image of the frame c;

determining image identification parameters corresponding to all characteristic values in the multi-channel characteristic diagram according to the weighting processing result output by the full connection layer, wherein the image identification parameters corresponding to the characteristic values are used for representing the correlation degree of the characteristic values and the identified target object;

the determining the image identification parameters corresponding to each feature value in the multi-channel feature map according to the weighting processing result output by the full connection layer comprises the following steps: aiming at a target feature map extracted from the c frame image in the multi-channel feature map, determining image identification parameters corresponding to the c frame image output by the full connection layer as image identification parameters corresponding to each feature value in the target feature map;

2. The method of claim 1, wherein the weighting of the multi-channel feature map by the fully-connected layer in the first neural network comprises:

3. The method according to claim 1, wherein the determining the image recognition parameters corresponding to the feature values in the multi-channel feature map comprises:

inputting the multi-channel feature map into a trained second neural network, carrying out convolution processing on the multi-channel feature map by a convolution layer in the second neural network, and outputting a convolution processing result; wherein the dimension of the convolution processing result is the same as the dimension of the multi-channel feature map;

and sequentially determining each convolution value in the convolution processing result as an image identification parameter corresponding to each characteristic value in the multi-channel characteristic diagram.

4. The method according to claim 1, wherein the modifying the multi-channel feature map according to the image recognition parameters corresponding to the feature values in the multi-channel feature to obtain a modified multi-channel feature map comprises:

5. The method according to claim 1, wherein the extracting image features from the image to be recognized to obtain a multi-channel feature map comprises:

inputting the image to be recognized into a trained third neural network, and performing feature extraction on the image to be recognized by a specified layer in the third neural network; the designated layer comprises a convolutional layer, or the designated layer comprises a convolutional layer and at least one of a pooling layer and a fully-connected layer;

and determining the output result of the specified layer as the multi-channel feature map.

6. An image recognition apparatus, characterized in that the apparatus comprises an extraction module, a determination module, a processing module and a recognition module, wherein,

the determining module is configured to input the multi-channel feature map into a first neural network, and perform weighting processing on the multi-channel feature map by using a fully-connected layer in the first neural network, where the weighting processing includes: the full connection layer adopts a pre-learned full connection coefficient corresponding to the c frame image in the images to be identified to carry out weighting processing on the multi-channel feature map and output an image identification parameter corresponding to the c frame image;

determining image identification parameters corresponding to each characteristic value in the multi-channel characteristic diagram according to the weighting processing result output by the full connection layer, wherein the determining comprises the following steps: aiming at a target feature map extracted from the c frame image in the multi-channel feature map, determining image identification parameters corresponding to the c frame image output by the full connection layer as image identification parameters corresponding to each feature value in the target feature map, wherein the image identification parameters corresponding to the feature values are used for representing the correlation degree of the feature values and the identified target object;

7. The apparatus of claim 6, wherein the fully-connected layer in the first neural network performs weighting processing on the multi-channel feature map, and wherein the weighting processing comprises:

8. The apparatus of claim 6, wherein the determining module is configured to input the multi-channel feature map into a trained second neural network, perform convolution processing on the multi-channel feature map by convolution layers in the second neural network, and output a result of the convolution processing; sequentially determining each convolution value in the convolution processing result as an image identification parameter corresponding to each characteristic value in the multi-channel characteristic diagram; and the dimension of the convolution processing result is the same as that of the multi-channel feature map.

9. The apparatus according to claim 6, wherein the processing module is configured to, for each feature value in the multi-channel feature map, perform weighting processing on the feature value according to an image identification parameter corresponding to the feature value, so as to obtain a modified multi-channel feature map.

10. The apparatus according to claim 6, wherein the extracting module is configured to input the image to be recognized into a trained third neural network, perform feature extraction on the image to be recognized by a specified layer in the third neural network, and determine an output result of the specified layer as the multi-channel feature map; wherein the designated layer comprises a convolutional layer, or the designated layer comprises a convolutional layer and at least one of a pooling layer and a fully-connected layer.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-5 are implemented when the program is executed by the processor.