CN113553905A - Image recognition method, device and system - Google Patents

Image recognition method, device and system Download PDF

Info

Publication number
CN113553905A
CN113553905A CN202110667481.1A CN202110667481A CN113553905A CN 113553905 A CN113553905 A CN 113553905A CN 202110667481 A CN202110667481 A CN 202110667481A CN 113553905 A CN113553905 A CN 113553905A
Authority
CN
China
Prior art keywords
image
attention
feature
characteristic
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110667481.1A
Other languages
Chinese (zh)
Other versions
CN113553905B (en
Inventor
王强昌
郭国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110667481.1A priority Critical patent/CN113553905B/en
Publication of CN113553905A publication Critical patent/CN113553905A/en
Application granted granted Critical
Publication of CN113553905B publication Critical patent/CN113553905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses an image recognition method, device and system, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to smart cities and smart financial scenes. The specific implementation scheme is as follows: acquiring a characteristic image of an image to be identified, and generating an attention distribution thermodynamic diagram according to the characteristic image; generating an attention feature vector and an attention relation feature value according to the attention distribution thermodynamic diagram; and generating the category of the image to be recognized according to the attention feature vector and the attention relation feature value. The embodiment of the disclosure can realize perception of different attention areas in the image to be recognized and detect the face type of the image to be recognized. The embodiment of the disclosure can avoid the limitation of key point detection, has better recognition effect on the non-lens face and the shielded face, and improves the robustness of the face recognition system.

Description

Image recognition method, device and system
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, can be applied to smart cities and smart financial scenes, and particularly relates to an image identification method, device and system.
Background
The face recognition technology is a technology for performing identity recognition based on feature information of a face. And collecting pictures or videos containing human faces by using an image acquisition device. And based on the collected picture information, detecting the face by using a face detection technology, and further carrying out face recognition on the detected face. However, the current face detection technology is relatively weak in universality, and a face detection technology with strong universality is still lacked.
Disclosure of Invention
The disclosure provides an image recognition method, an image recognition device, an image recognition apparatus and a storage medium.
According to an aspect of the present disclosure, there is provided an image recognition method including:
acquiring a characteristic image of an image to be identified, and generating an attention distribution thermodynamic diagram according to the characteristic image;
generating an attention feature vector and an attention relation feature value according to the attention distribution thermodynamic diagram;
and generating the category of the image to be recognized according to the attention feature vector and the attention relation feature value.
Optionally, the acquiring a feature image of the image to be recognized includes:
and acquiring the characteristic image of the image to be identified through a characteristic extraction network.
Optionally, the generating an attention distribution thermodynamic diagram from the feature image includes:
generating an attention image according to the characteristic image;
generating the attention distribution thermodynamic diagram from the attention image and the feature image.
Optionally, the generating an attention image from the feature image includes:
and performing dimension reduction on the characteristic image through convolution kernel to generate the attention image, wherein the attention image is a single-channel image.
Optionally, the generating an attention distribution thermodynamic diagram from the attention image and the feature image comprises:
and multiplying the attention image and the feature image points by points to obtain the attention distribution thermodynamic diagram.
Optionally, the generating an attention feature vector and an attention relationship feature value according to the attention distribution thermodynamic diagram includes:
performing global average pooling processing on the attention distribution thermodynamic diagram to obtain an attention characteristic value;
and splicing the attention characteristic values and acquiring the attention characteristic vector.
Optionally, the method further comprises:
constructing a graph neural network by taking the attention characteristic values as nodes;
and generating an attention relation characteristic value of the attention distribution thermodynamic diagram according to the graph neural network.
Optionally, the generating a category of the image to be recognized according to the attention feature vector and the attention relationship feature value includes:
multiplying the attention feature vector and the attention relation feature value to generate a feature vector of the image to be recognized;
and inputting the feature vector of the image to be recognized into a full-connection network to obtain the category of the feature vector of the image to be recognized, wherein the category of the feature vector of the image to be recognized is the category of the image to be recognized.
According to a second aspect of the present disclosure, there is provided an image recognition apparatus comprising:
the characteristic extraction module is used for acquiring a characteristic image of an image to be identified and generating an attention distribution thermodynamic diagram according to the characteristic image;
an attention relation obtaining module, configured to generate an attention feature vector and an attention relation feature value according to the attention distribution thermodynamic diagram;
and the image identification module is used for generating the category of the image to be identified according to the attention feature vector and the attention relation feature value.
Optionally, the feature extraction module includes:
and the feature extraction submodule is used for acquiring the feature image of the image to be identified through a feature extraction network.
Optionally, the feature extraction module includes:
the attention image acquisition sub-module is used for generating an attention image according to the characteristic image;
and the thermodynamic diagram acquisition sub-module is used for generating the attention distribution thermodynamic diagram according to the attention image and the feature image.
Optionally, the attention map image acquisition sub-module comprises:
and the image dimension reduction unit is used for reducing the dimension of the characteristic image through convolution kernel so as to generate the attention image, wherein the attention image is a single-channel image.
Optionally, the thermodynamic diagram acquisition sub-module includes:
and the multiplying unit is used for carrying out point-to-point multiplication on the attention image and the feature image point so as to obtain the attention distribution thermodynamic diagram.
Optionally, the attention relationship obtaining module includes:
the pooling sub-module is used for carrying out global average pooling on the attention distribution thermodynamic diagram to obtain an attention characteristic value;
and the splicing submodule is used for splicing the attention characteristic values and acquiring the attention characteristic vector.
Optionally, the method further comprises:
the graph neural network submodule is used for constructing a graph neural network by taking the attention characteristic value as a node;
and the attention relation characteristic value acquisition submodule is used for generating the attention relation characteristic value of the attention distribution thermodynamic diagram according to the diagram neural network.
Optionally, the image recognition module includes:
the feature vector acquisition submodule is used for multiplying the attention feature vector and the attention relation feature value to generate a feature vector of the image to be identified;
and the image classification submodule is used for inputting the characteristic vector of the image to be recognized into a full-connection network so as to acquire the category of the characteristic vector of the image to be recognized, wherein the category of the characteristic vector of the image to be recognized is the category of the image to be recognized.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the first aspects.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first aspects.
The embodiment of the invention can avoid the limitation of key point detection, distributes different attention to different areas in the image to be recognized, pays more attention to areas with high quality when detecting the face type, has better recognition effect on the face without a lens and the face shielded, and improves the robustness of the face recognition system.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic diagram of an image recognition method provided according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an image recognition method provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an image recognition method provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an image recognition method provided in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an image recognition method provided in accordance with an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an image recognition apparatus provided according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram of an image recognition apparatus provided in accordance with an embodiment of the present disclosure;
fig. 8 is a schematic diagram of an image recognition apparatus provided in accordance with an embodiment of the present disclosure;
fig. 9 is a schematic diagram of an image recognition apparatus provided in accordance with an embodiment of the present disclosure;
fig. 10 is a schematic diagram of an image recognition apparatus provided in accordance with an embodiment of the present disclosure;
fig. 11 is a schematic diagram of an image recognition apparatus provided in accordance with an embodiment of the present disclosure;
fig. 12 is a block diagram of an electronic device for implementing an image recognition method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The image recognition technology includes a face recognition technology, which is a technology for performing identity recognition based on feature information of a face. The implementation method of the face recognition technology comprises the following steps: and collecting pictures or videos containing human faces by using an image acquisition device. Based on the collected picture information, firstly, a human face is detected by utilizing a human face detection technology, and then the detected human face is subjected to face recognition. The face recognition technology has great convenience and is widely applied to the fields of access control systems, face payment and the like. The traditional face recognition method is as follows: 1. and extracting global face features. However, the face image is easily affected by different postures and shielding conditions, and different global face structures have some large changes, so that the face recognition performance is obviously reduced. 2. And extracting local face features based on the key points. Firstly, a face key point detection method is used for detecting key points, and then local image blocks are cut out by taking the key points as centers. Although the global face structure is susceptible to large variations, the local face regions may remain unchanged. Therefore, the local face features based on the key points have certain robustness on some challenging face scenes. However, under some extreme conditions, such as too large exposure and heavy occlusion, the detection result may be inaccurate or even fail. Which limits the versatility of its application to some extent.
Wearing a mask is an effective safeguard in people's daily life, however, it poses a great challenge to current face recognition systems. The performance of the face recognition system under the scene of the wearing mask can be greatly improved.
Under the condition of blocking a part of face, the local information of the face needs to be extracted, the face is identified according to the local information, and the identification accuracy is improved. The following two schemes are generally adopted for extracting the local face information:
1. extracting the local face information based on the face key points, comprising: face key points, such as eyes, mouth, and nose, are detected, and then local features are extracted in a cropped region centered on the face key points. Without relying on key points, some models use an attention module to automatically locate useful face regions.
2. Local face information is extracted based on attention, and an attention mechanism is used to automatically locate discriminative face regions such as eyes and mouth while compressing the response in non-valid face regions such as the regions of sunglasses and masks.
The two extraction schemes of the local face information have different use scenes and different defects:
the method for extracting the local face information based on the face key points greatly depends on the accuracy of key point detection, and key point detection failure may occur under severe face posture change or severe shielding. When people wear the mask, some facial parts, such as the nose or the mouth, are invisible, so that the key point detection is inaccurate or fails. Furthermore, even if a keypoint is detected, the cropped portion will inevitably contain an occlusion that degrades the local information of the face.
The method for extracting local face information based on attention does not need to rely on detection of key points of the face, but two problems are easily caused: on the one hand, multiple attention modules tend to respond only to a limited local face region, missing other valid local face regions. This can be a challenging scenario, such as cross-pose and heavily occluded face recognition problems, because some important local face regions are not visible, resulting in performance degradation; on the other hand, in some challenging scenarios, valid local face regions and useless information are close in location. For example, in a cross-pose and occlusion face recognition problem, the background and occlusion are very close to the eye portion. Therefore, the extracted features are prone to contain useless or even noisy information.
According to an embodiment of the present disclosure, an image recognition method is provided, which can be applied in smart cities and smart financial scenes, and fig. 1 is a schematic diagram of an image recognition method according to an embodiment of the present disclosure.
As shown in fig. 1, the image recognition method includes:
step 101, acquiring a characteristic image of an image to be identified, and generating an attention distribution thermodynamic diagram according to the characteristic image;
in a possible implementation manner, in order to extract features of an image to be recognized, an image to be recognized is input into a stem Convolutional Neural Network (stmcnn), global abstract facial features of the image to be recognized are extracted by using the CNN, and a feature image corresponding to the image to be recognized is generated, so that useful local information can be extracted later.
In a possible implementation manner, in order to obtain attention features of different areas in an image to be recognized, the embodiment of the present disclosure needs to perform dimension reduction on a feature image corresponding to the image to be recognized, where a data format of the image to be recognized is h × w × c, h is a height of the feature image, w is a width of the feature image, and c is a number of channels of the feature image. The attention image is obtained by changing c of the feature image to 1 by a 1 × 1 convolution kernel conv. This is done to emphasize useful information in the feature map while compressing useless or even noisy information.
And multiplying the attention image and the feature image point to obtain the attention distribution thermodynamic diagram.
102, generating an attention feature vector and an attention relation feature value according to the attention distribution thermodynamic diagram;
in order to reduce the number of parameters and the amount of calculation and reduce the overfitting, a Global Average Pooling (GAP) process is performed on the attention distribution thermodynamic diagram, specifically: and adding all pixel values of the characteristic diagram to average to obtain a numerical value, and representing the corresponding characteristic diagram by using the numerical value. The numerical value is the attention characteristic value.
In order to associate the features of each part in the image to be identified, the attention feature value is spliced concat to obtain the attention feature vector.
And meanwhile, constructing a Graph Neural Network (GNN) according to the attention characteristic value to obtain the attention characteristic value. This enables each local part itself and its relationship to the other parts to be considered simultaneously. Each local block is encouraged to make full use of the information in the other local blocks, making them more discriminative. Secondly, it can estimate the quality of the localized image block, emphasizes the rich information parts and suppresses the more noisy parts.
And 103, generating the category of the image to be recognized according to the attention feature vector and the attention relation feature value.
In order to highlight the difference of different areas in the image to be recognized, the attention characteristic vector and the attention relation characteristic value are multiplied to generate a characteristic vector of the image to be recognized. And inputting the characteristic vector of the image to be recognized into a full-connection network, so as to obtain the category of the characteristic vector of the image to be recognized, namely the category of the image to be recognized.
Face recognition is challenging for two main reasons: 1. the human face images have large intra-class differences. Faces from the same individual may appear in different poses or occlusion levels. In this case, if only a few facial parts are located, which are not visible when the facial parts are occluded, for example, with a mask, or the facial pose changes, the extracted mask local features may not be sufficient to support the face recognition. 2. The difference between classes is small. Faces from different individuals may have similar local appearances, especially when considering a large number of individuals. Face recognition performance will degrade if a few facial features across different individuals appear similar. To alleviate the above problem, it is necessary to capture features as rich as possible. In this way, if a small number of emphasized local information is not visible or remains similar between different objects, the lost local information may cause a reduction in face recognition performance.
Optionally, the acquiring a feature image of the image to be recognized includes:
and acquiring the characteristic image of the image to be identified through a characteristic extraction network.
In a possible implementation manner, in order to extract features of an image to be recognized, an image to be recognized is input into a stem Convolutional Neural Network (CNN), global abstract facial features of the image to be recognized are extracted by using the CNN, and a feature image corresponding to the image to be recognized is generated, so that useful local information can be extracted later.
The feature of the image to be recognized is extracted in order to extract a global abstract human face feature map so as to extract useful local information in the following process.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 2 is a schematic diagram of an image recognition method provided according to an embodiment of the present disclosure.
As shown in fig. 2, the image recognition method includes:
step 201, generating an attention image according to the characteristic image;
in order to obtain the attention features of different local areas in the image to be recognized, the dimension of the feature image corresponding to the image to be recognized is reduced.
In one possible embodiment, the feature image is dimensionality reduced using a 1 × 1 convolution kernel. This advantageously emphasizes useful information in the feature map while compressing unwanted information, and even noisy information.
Step 202, generating the attention distribution thermodynamic diagram according to the attention image and the feature image.
In a possible implementation manner, in order to highlight the features of the corresponding region of the feature image, the attention image and the feature image need to be multiplied by point to obtain the attention distribution thermodynamic diagram, in which the heat distribution in the attention distribution thermodynamic diagram represents the strength of the attention of each part in the image to be recognized by the attention image, and a greater heat indicates that the attention is more concentrated in the corresponding region.
Optionally, the generating an attention image from the feature image includes:
and performing dimension reduction on the characteristic image through convolution kernel to generate the attention image, wherein the attention image is a single-channel image.
In a possible implementation manner, in order to obtain attention features of different areas in an image to be recognized, the embodiment of the present disclosure needs to perform dimension reduction on a feature image corresponding to the image to be recognized, where a data format of the image to be recognized is h × w × c, h is a height of the feature image, w is a width of the feature image, and c is a number of channels of the feature image. The attention image is obtained by changing c of the feature image to 1 by a 1 × 1 convolution kernel conv.
Optionally, the generating an attention distribution thermodynamic diagram from the attention image and the feature image comprises:
and multiplying the attention image and the feature image points by points to obtain the attention distribution thermodynamic diagram.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 3 is a schematic diagram of an image recognition method provided according to an embodiment of the present disclosure.
As shown in fig. 3, the image recognition method includes:
step 301, performing global average pooling processing on the attention distribution thermodynamic diagram to obtain an attention characteristic value;
in order to reduce the number of parameters and the amount of calculation and reduce the overfitting, a Global Average Pooling (GAP) process is performed on the attention distribution thermodynamic diagram, specifically: and adding all pixel values of the characteristic diagram to average to obtain a numerical value, and representing the corresponding characteristic diagram by using the numerical value. The numerical value is the attention characteristic value.
Step 302, the attention feature values are spliced and the attention feature vector is obtained.
In order to associate the features of each part in the image to be identified, the attention feature value is spliced concat to obtain the attention feature vector.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 4 is a schematic diagram of an image recognition method provided according to an embodiment of the present disclosure.
As shown in fig. 4, the image recognition method includes:
step 401, constructing a graph neural network by taking the attention characteristic value as a node;
and 402, generating an attention relation characteristic value of the attention distribution thermodynamic diagram according to the graph neural network.
And constructing a Graph Neural Network (GNN) according to the attention characteristic value to acquire the attention characteristic value. This enables each local part itself and its relationship to the other parts to be considered simultaneously. Each local block is encouraged to make full use of the information in the other local blocks, making them more discriminative. Secondly, it can estimate the quality of the localized image block, emphasizes the rich information parts and suppresses the more noisy parts.
The quality perception of the local image block level is carried out through the steps, and the relation between a single local feature and the rest local features is explored through the graph neural network. Through the steps 401, 402, the relationship between each local part and other local parts in the image to be recognized can be considered. Thereby encouraging each local image block to make full use of the information in the other local image blocks, thereby making them more discriminative. And the quality of the localized local image block can be estimated, emphasizing locations with rich information and suppressing noisy locations.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 5 is a schematic diagram of an image recognition method provided according to an embodiment of the present disclosure.
As shown in fig. 5, the image recognition method includes:
step 501, multiplying the attention feature vector and the attention relation feature value to generate a feature vector of the image to be identified;
in order to highlight the difference of different areas in the image to be recognized, the attention characteristic vector and the attention relation characteristic value are multiplied to generate a characteristic vector of the image to be recognized. .
Step 502, inputting the feature vector of the image to be recognized into a full-connection network to obtain the category of the feature vector of the image to be recognized, wherein the category of the feature vector of the image to be recognized is the category of the image to be recognized.
And inputting the characteristic vector of the image to be recognized into a full-connection network, so as to obtain the category of the characteristic vector of the image to be recognized, namely the category of the image to be recognized.
The embodiment of the invention can avoid the limitation of key point detection, distributes different attention to different areas in the image to be recognized, pays more attention to areas with high quality when detecting the face type, has better recognition effect on the face without a lens and the face shielded, and improves the robustness of the face recognition system.
According to an embodiment of the present disclosure, an image recognition method is provided, which can be applied in smart cities and smart financial scenes, and fig. 6 is a schematic diagram of an image recognition apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the image recognition apparatus 600 includes:
the feature extraction module 610 is configured to obtain a feature image of an image to be identified, and generate an attention distribution thermodynamic diagram according to the feature image.
In a possible implementation manner, in order to extract features of an image to be recognized, the feature extraction module includes stemCNN, the image to be recognized is input into stemCNN, global abstract face features of the image to be recognized are extracted by using the CNN, and a feature image corresponding to the image to be recognized is generated, so that useful local information can be extracted later.
In a possible implementation manner, in order to obtain attention features of different areas in an image to be recognized, the embodiment of the present disclosure needs to perform dimension reduction on a feature image corresponding to the image to be recognized, where a data format of the image to be recognized is h × w × c, h is a height of the feature image, w is a width of the feature image, and c is a number of channels of the feature image. The attention image is obtained by changing c of the feature image to 1 by a 1 × 1 convolution kernel conv. This is done to emphasize useful information in the feature map while compressing useless or even noisy information.
And multiplying the attention image and the feature image point to obtain the attention distribution thermodynamic diagram.
An attention relationship obtaining module 620, configured to generate an attention feature vector and an attention relationship feature value according to the attention distribution thermodynamic diagram.
In order to reduce the number of parameters and the amount of calculation, and at the same time reduce the overfitting condition, the attention relation obtaining module may be configured to perform Global Average Pooling (GAP) processing on the attention distribution thermodynamic diagram, specifically: and adding all pixel values of the characteristic diagram to average to obtain a numerical value, and representing the corresponding characteristic diagram by using the numerical value. The numerical value is the attention characteristic value.
In order to associate the features of each part in the image to be identified, the attention relation obtaining module concatenates the attention feature values to obtain the attention feature vector.
Meanwhile, the attention relationship obtaining module may construct a Graph Neural Network (GNN) according to the attention feature value, and obtain the attention feature value. This enables each local part itself and its relationship to the other parts to be considered simultaneously. Each local block is encouraged to make full use of the information in the other local blocks, making them more discriminative. Secondly, it can estimate the quality of the localized image block, emphasizes the rich information parts and suppresses the more noisy parts.
An image identification module 630, configured to generate a category of the image to be identified according to the attention feature vector and the attention relationship feature value.
In order to highlight the difference of different areas in the image to be recognized, the image recognition module multiplies the attention feature vector and the attention relation feature value to generate a feature vector of the image to be recognized. And inputting the characteristic vector of the image to be recognized into a full-connection network, so as to obtain the category of the characteristic vector of the image to be recognized, namely the category of the image to be recognized.
Optionally, the feature extraction module includes:
and the feature extraction submodule is used for acquiring the feature image of the image to be identified through a feature extraction network.
In a possible implementation manner, the feature extraction sub-module includes stemCNN, the image to be recognized is input into a stem convolutional neural network stemCNN, global abstract human face features of the image to be recognized are extracted by using the CNN, and a feature image corresponding to the image to be recognized is generated, so that useful local information can be extracted later.
The feature of the image to be recognized is extracted in order to extract a global abstract human face feature map so as to extract useful local information in the following process.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 7 is a schematic diagram of an image recognition apparatus provided according to an embodiment of the present disclosure.
As shown in fig. 7, the image recognition apparatus 700 includes:
an attention image acquisition sub-module 710 for generating an attention image from the feature image;
in one possible embodiment, the feature image is dimensionality reduced using a 1 × 1 convolution kernel. This advantageously emphasizes useful information in the feature map while compressing unwanted information, and even noisy information.
And the thermodynamic diagram acquisition sub-module 720 is used for generating the attention distribution thermodynamic diagram according to the attention image and the feature image.
In a possible implementation manner, in order to highlight the features of the corresponding region of the feature image, the attention image and the feature image need to be multiplied by point to obtain the attention distribution thermodynamic diagram, in which the heat distribution in the attention distribution thermodynamic diagram represents the strength of the attention of each part in the image to be recognized by the attention image, and a greater heat indicates that the attention is more concentrated in the corresponding region.
Optionally, the attention map image acquisition sub-module comprises:
and the image dimension reduction unit is used for reducing the dimension of the characteristic image through convolution kernel so as to generate the attention image, wherein the attention image is a single-channel image.
In a possible implementation manner, the image dimension reduction unit is configured to obtain attention features of different areas in an image to be recognized, and it is required to perform dimension reduction on a feature image corresponding to the image to be recognized, where a data format of the image to be recognized is h × w × c, h is a height of the feature image, w is a width of the feature image, and c is a number of channels of the feature image. The attention image is obtained by changing c of the feature image to 1 by a 1 × 1 convolution kernel conv.
Optionally, the thermodynamic diagram acquisition sub-module includes:
and the multiplying unit is used for carrying out point-to-point multiplication on the attention image and the feature image point so as to obtain the attention distribution thermodynamic diagram.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 8 is a schematic diagram of an image recognition apparatus provided according to an embodiment of the present disclosure.
As shown in fig. 8, the image recognition apparatus 800 includes:
a pooling sub-module 810, configured to perform global average pooling on the attention distribution thermodynamic diagram to obtain an attention feature value;
in order to reduce the number of parameters and the amount of calculation and reduce the overfitting condition, the Pooling sub-module performs Global Average Pooling (GAP) processing on the attention distribution thermodynamic diagram, specifically: and adding all pixel values of the characteristic diagram to average to obtain a numerical value, and representing the corresponding characteristic diagram by using the numerical value. The numerical value is the attention characteristic value.
A splicing sub-module 820, configured to splice the attention feature values and obtain the attention feature vector.
In order to associate the features of each part in the image to be identified, the attention feature value is spliced concat by the splicing submodule to obtain the attention feature vector.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 9 is a schematic diagram of an image recognition apparatus provided according to an embodiment of the present disclosure.
As shown in fig. 9, the image recognition apparatus 900 includes:
the graph neural network sub-module 910 is configured to construct a graph neural network by using the attention feature values as nodes;
and the Graph Neural network sub-module is used for constructing a Graph Neural Network (GNN) according to the attention characteristic value and acquiring the attention characteristic value. This enables each local part itself and its relationship to the other parts to be considered simultaneously.
An attention relationship characteristic value obtaining submodule 920, configured to generate an attention relationship characteristic value of the attention distribution thermodynamic diagram according to the graph neural network.
And the graph neural network submodule is used for constructing a graph neural network according to the attention characteristic value, and the attention relation characteristic value acquisition submodule is used for acquiring the attention characteristic value. This enables each local part itself and its relationship to the other parts to be considered simultaneously. Each local block is encouraged to make full use of the information in the other local blocks, making them more discriminative. Secondly, it can estimate the quality of the localized image block, emphasizes the rich information parts and suppresses the more noisy parts.
The module introduces a quality perception module of a local image block level, and the quality perception module comprises the graph neural network sub-module and the attention relation characteristic value acquisition sub-module. Exploring the relationship of the individual local features to the remaining local features by using the graph neural network. Each local part itself and its relation to other parts can be taken into account by the quality perception module. Thereby encouraging each local block to make full use of the information in the other local blocks, thereby making them more discriminative. And the quality perception module may estimate the quality of the located local image block, emphasize local portions with rich information and suppress local portions with less information.
According to an embodiment of the present disclosure, an image recognition method is provided, and fig. 10 is a schematic diagram of an image recognition apparatus provided according to an embodiment of the present disclosure.
As shown in fig. 10, the image recognition apparatus 1000 includes:
a feature vector obtaining sub-module 1010, configured to multiply the attention feature vector and the attention relationship feature value to generate a feature vector of the image to be recognized;
the image classification sub-module 1020 is configured to input the feature vector of the image to be recognized into a full-connection network, so as to obtain a category of the feature vector of the image to be recognized, where the category of the feature vector of the image to be recognized is the category of the image to be recognized.
The embodiment of the invention can avoid the limitation of key point detection, distributes different attention to different areas in the image to be recognized, pays more attention to areas with high quality when detecting the face type, has better recognition effect on the face without a lens and the face shielded, and improves the robustness of the face recognition system.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Figure 11 is a schematic diagram of an image recognition device provided in accordance with an embodiment of the present disclosure,
in order to capture features as rich as possible, the embodiment of the present disclosure provides an image recognition apparatus, which can be applied in smart cities and smart financial scenarios, as shown in fig. 11, wherein the image recognition apparatus 1100 comprises a Contrast Attention Learning (CAL) module 1110. The comparative attention learning module encourages differences between different attentions by applying a penalty term based on a difference between any two attentions. In particular, if the distance between two attentions is small, it means that they are similar. In this case, a large penalty is imposed to drive the difference between different attentions. Therefore, it is possible to ensure that rich local information is well searched on the face image and to emphasize different distinctive local information.
The local feature representation may provide discrimination information for face recognition. However, directly connecting local feature representations without considering the relationship between different local information may not be the best choice. In order to solve the limitation of independent local feature information, local structural relevance is constructed to enhance the discrimination capability of local features. Different local tiles are observed to have various qualities under occlusion, blurring, or pose change, which refers to how many discriminative features the local tiles contain. If the local features are directly connected, performance will be degraded. To address both of these issues, the present disclosure designs a Quality-Aware network (QAN) 1120. It introduces a quality-aware network at the local image block level, exploring the relationship of individual local features to the rest of the local features by using a graph neural network. Each local part itself and its relation to other parts can be taken into account by the quality-aware network. Thereby encouraging each local block to make full use of the information in the other local blocks, thereby making them more discriminative. And the quality-aware network can estimate the quality of the localized local image blocks, emphasize locations with rich information and suppress noisy locations.
FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 performs the respective methods and processes described above, such as the image recognition method. For example, in some embodiments, the image recognition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the image recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the image recognition method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. An image recognition method, comprising:
acquiring a characteristic image of an image to be identified, and generating an attention distribution thermodynamic diagram according to the characteristic image;
generating an attention feature vector and an attention relation feature value according to the attention distribution thermodynamic diagram;
and generating the category of the image to be recognized according to the attention feature vector and the attention relation feature value.
2. The method of claim 1, the obtaining a feature image of an image to be identified, comprising:
and acquiring the characteristic image of the image to be identified through a characteristic extraction network.
3. The method of claim 1, the generating an attention distribution thermodynamic diagram from the feature images, comprising:
generating an attention image according to the characteristic image;
generating the attention distribution thermodynamic diagram from the attention image and the feature image.
4. The method of claim 3, the generating an attention image from the feature image, comprising:
and performing dimension reduction on the characteristic image through convolution kernel to generate the attention image, wherein the attention image is a single-channel image.
5. The method of claim 3, the generating an attention distribution thermodynamic diagram from the attention image and the feature image, comprising:
and multiplying the attention image and the feature image points by points to obtain the attention distribution thermodynamic diagram.
6. The method of claim 1, the generating an attention feature vector and an attention relationship feature value from the attention distribution thermodynamic diagram, comprising:
performing global average pooling processing on the attention distribution thermodynamic diagram to obtain an attention characteristic value;
and splicing the attention characteristic values and acquiring the attention characteristic vector.
7. The method of claim 6, further comprising:
constructing a graph neural network by taking the attention characteristic values as nodes;
and generating an attention relation characteristic value of the attention distribution thermodynamic diagram according to the graph neural network.
8. The method of claim 1 or 7, the generating a class of the image to be recognized from the attention feature vector and attention relationship feature values, comprising:
multiplying the attention feature vector and the attention relation feature value to generate a feature vector of the image to be recognized;
and inputting the feature vector of the image to be recognized into a full-connection network to obtain the category of the feature vector of the image to be recognized, wherein the category of the feature vector of the image to be recognized is the category of the image to be recognized.
9. An image recognition apparatus comprising:
the characteristic extraction module is used for acquiring a characteristic image of an image to be identified and generating an attention distribution thermodynamic diagram according to the characteristic image;
an attention relation obtaining module, configured to generate an attention feature vector and an attention relation feature value according to the attention distribution thermodynamic diagram;
and the image identification module is used for generating the category of the image to be identified according to the attention feature vector and the attention relation feature value.
10. The apparatus of claim 9, the feature extraction module, comprising:
and the feature extraction submodule is used for acquiring the feature image of the image to be identified through a feature extraction network.
11. The apparatus of claim 9, the feature extraction module, comprising:
the attention image acquisition sub-module is used for generating an attention image according to the characteristic image;
and the thermodynamic diagram acquisition sub-module is used for generating the attention distribution thermodynamic diagram according to the attention image and the feature image.
12. The apparatus of claim 11, the attention map image acquisition sub-module, comprising:
and the image dimension reduction unit is used for reducing the dimension of the characteristic image through convolution kernel so as to generate the attention image, wherein the attention image is a single-channel image.
13. The apparatus of claim 11, the thermodynamic diagram acquisition sub-module, comprising:
and the multiplying unit is used for carrying out point-to-point multiplication on the attention image and the feature image point so as to obtain the attention distribution thermodynamic diagram.
14. The apparatus of claim 9, the attention relationship acquisition module, comprising:
the pooling sub-module is used for carrying out global average pooling on the attention distribution thermodynamic diagram to obtain an attention characteristic value;
and the splicing submodule is used for splicing the attention characteristic values and acquiring the attention characteristic vector.
15. The apparatus of claim 14, further comprising:
the graph neural network submodule is used for constructing a graph neural network by taking the attention characteristic value as a node;
and the attention relation characteristic value acquisition submodule is used for generating the attention relation characteristic value of the attention distribution thermodynamic diagram according to the diagram neural network.
16. The apparatus of claim 9 or 15, the image recognition module, comprising:
the feature vector acquisition submodule is used for multiplying the attention feature vector and the attention relation feature value to generate a feature vector of the image to be identified;
and the image classification submodule is used for inputting the characteristic vector of the image to be recognized into a full-connection network so as to acquire the category of the characteristic vector of the image to be recognized, wherein the category of the characteristic vector of the image to be recognized is the category of the image to be recognized.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
CN202110667481.1A 2021-06-16 2021-06-16 Image recognition method, device and system Active CN113553905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110667481.1A CN113553905B (en) 2021-06-16 2021-06-16 Image recognition method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110667481.1A CN113553905B (en) 2021-06-16 2021-06-16 Image recognition method, device and system

Publications (2)

Publication Number Publication Date
CN113553905A true CN113553905A (en) 2021-10-26
CN113553905B CN113553905B (en) 2024-04-26

Family

ID=78102157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110667481.1A Active CN113553905B (en) 2021-06-16 2021-06-16 Image recognition method, device and system

Country Status (1)

Country Link
CN (1) CN113553905B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network
WO2020155606A1 (en) * 2019-02-02 2020-08-06 深圳市商汤科技有限公司 Facial recognition method and device, electronic equipment and storage medium
CN111931859A (en) * 2020-08-28 2020-11-13 中国科学院深圳先进技术研究院 Multi-label image identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
WO2020155606A1 (en) * 2019-02-02 2020-08-06 深圳市商汤科技有限公司 Facial recognition method and device, electronic equipment and storage medium
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network
CN111931859A (en) * 2020-08-28 2020-11-13 中国科学院深圳先进技术研究院 Multi-label image identification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
焦双健;王志远;: "卷积神经网络的人脸识别门禁系统设计", 单片机与嵌入式系统应用, no. 09, 1 September 2020 (2020-09-01) *

Also Published As

Publication number Publication date
CN113553905B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN114550177B (en) Image processing method, text recognition method and device
CN113570606B (en) Target segmentation method and device and electronic equipment
CN113971751A (en) Training feature extraction model, and method and device for detecting similar images
CN113343826A (en) Training method of human face living body detection model, human face living body detection method and device
CN113221771B (en) Living body face recognition method, device, apparatus, storage medium and program product
CN112802037A (en) Portrait extraction method, device, electronic equipment and storage medium
CN113326773A (en) Recognition model training method, recognition method, device, equipment and storage medium
CN113378712A (en) Training method of object detection model, image detection method and device thereof
CN112561879A (en) Ambiguity evaluation model training method, image ambiguity evaluation method and device
CN112989987A (en) Method, apparatus, device and storage medium for identifying crowd behavior
CN113435408A (en) Face living body detection method and device, electronic equipment and storage medium
CN111932530A (en) Three-dimensional object detection method, device and equipment and readable storage medium
CN114140320B (en) Image migration method and training method and device of image migration model
CN114387651B (en) Face recognition method, device, equipment and storage medium
CN113553905A (en) Image recognition method, device and system
CN114093006A (en) Training method, device and equipment of living human face detection model and storage medium
CN113591718A (en) Target object identification method and device, electronic equipment and storage medium
CN113570607B (en) Target segmentation method and device and electronic equipment
JP7372487B2 (en) Object segmentation method, object segmentation device and electronic equipment
CN113642428B (en) Face living body detection method and device, electronic equipment and storage medium
CN114332416B (en) Image processing method, device, equipment and storage medium
CN114140319A (en) Image migration method and training method and device of image migration model
CN114327346A (en) Display method, display device, electronic apparatus, and storage medium
CN114005181A (en) Interactive relationship identification method and device and electronic equipment
CN114120417A (en) Face image processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant