CN111860250A - Image identification method and device based on character fine-grained features - Google Patents
Image identification method and device based on character fine-grained features Download PDFInfo
- Publication number
- CN111860250A CN111860250A CN202010655258.0A CN202010655258A CN111860250A CN 111860250 A CN111860250 A CN 111860250A CN 202010655258 A CN202010655258 A CN 202010655258A CN 111860250 A CN111860250 A CN 111860250A
- Authority
- CN
- China
- Prior art keywords
- image
- sample
- layer
- feature
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 42
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000007781 pre-processing Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 7
- 230000004807 localization Effects 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 abstract description 8
- 239000010410 layer Substances 0.000 description 116
- 230000006870 function Effects 0.000 description 16
- 230000001815 facial effect Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 230000003321 amplification Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 239000009959 nanxing Substances 0.000 description 6
- 238000003199 nucleic acid amplification method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 101150039208 KCNK3 gene Proteins 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 241000755266 Kathetostoma giganteum Species 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 240000005373 Panax quinquefolius Species 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- MCSXGCZMEPXKIW-UHFFFAOYSA-N 3-hydroxy-4-[(4-methyl-2-nitrophenyl)diazenyl]-N-(3-nitrophenyl)naphthalene-2-carboxamide Chemical compound Cc1ccc(N=Nc2c(O)c(cc3ccccc23)C(=O)Nc2cccc(c2)[N+]([O-])=O)c(c1)[N+]([O-])=O MCSXGCZMEPXKIW-UHFFFAOYSA-N 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 101150041570 TOP1 gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- -1 then Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of image processing, and discloses an image identification method and device based on fine-grained characteristics of people, wherein the method comprises the following steps: acquiring a figure image to be identified; carrying out feature extraction on the figure image to be identified to obtain a figure feature image layer; inputting the figure feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result; acquiring image identification accuracy according to an image identification result; and when the image identification accuracy rate is greater than or equal to a preset standard threshold, taking the image identification result as an image identification result based on the fine-grained character of the person. Compared with the prior art, the method has the advantages that the key area information cannot be accurately acquired due to the fact that the attention mechanism network is used for image processing, so that the image type cannot be accurately identified.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an image identification method and device based on fine-grained characteristics of people.
Background
In daily life, a user has a need of identifying a shot image, in the prior art, in a processing process aiming at image identification, a large number of category semantic features are extracted by using features, so that the method is only suitable for a coarse-grained image classification task, and spatial features such as positions, textures, contours and the like of a large number of bottom layers of the image are lost, so that an attention mechanism network for a fine-grained image feature positioning task cannot efficiently and accurately acquire information of a key area, and cannot accurately identify a person image. Therefore, how to efficiently and accurately acquire the information of the key area of the image so as to accurately identify the person image is an urgent technical problem to be solved.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide an image identification method and device based on character fine-grained characteristics, and aims to solve the technical problem of how to efficiently and accurately acquire information of a key area of an image so as to accurately identify a character image.
In order to achieve the above object, the present invention provides an image recognition method based on fine-grained features of a person, including the following steps:
Acquiring a figure image to be identified;
extracting the characteristics of the figure image to be identified to obtain a figure characteristic image layer;
inputting the figure feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result;
acquiring image identification accuracy according to the image identification result;
and when the image identification accuracy is greater than or equal to a preset standard threshold, taking the image identification result as an image identification result based on the fine-grained character of the person.
Preferably, before the step of acquiring the image to be recognized of the person, the method further includes:
acquiring image training sets corresponding to different characters, and traversing the image training sets to obtain traversed current training images;
obtaining a corresponding sample convolution layer according to the current training image;
extracting a sample characteristic layer from the sample convolution layer;
obtaining a layer pixel point set corresponding to the sample characteristic layer;
superposing the layer pixel point set by a preset up-sampling method to obtain a sample super-column set;
when the traversal is finished, constructing a sample super-column set according to all the obtained sample super-column sets;
respectively preprocessing each sample super-column set in the sample super-column set to obtain a sample target image set;
Obtaining a sample figure identification result corresponding to each sample target image contained in the sample target image set;
and constructing a preset supercolumn feature recognition model according to the training image set and the sample character recognition result.
Preferably, the step of preprocessing each sample supercolumn set in the sample supercolumn set to obtain a sample target image set includes:
traversing the sample super-column set to obtain a traversed current sample super-column set;
preprocessing the current sample super-column set by a preset down-sampling method to obtain a target super-column set;
flattening the target super-column set to obtain a target area;
determining attention area positioning parameters according to the target area;
when the traversal is finished, constructing an attention area positioning parameter set according to all acquired attention area positioning parameters;
and respectively processing each sample target image contained in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set.
Preferably, the step of respectively processing each sample target image included in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set includes:
Traversing the attention area positioning parameter set to obtain a traversed current attention area positioning parameter;
determining the position of a target area according to the current attention area positioning parameter;
performing area cutting on the current training image according to the position of the target area to obtain a target area image;
amplifying the target area image by a preset bilinear interpolation method to obtain a sample target image;
and at the end of the traversal, constructing a sample target image set according to all the obtained sample target images.
Preferably, after the step of obtaining the sample person recognition result corresponding to each sample target image included in the sample target image set, the method further includes:
inputting the sample feature layer into a preset residual error model to obtain a sample high-dimensional feature layer;
determining a sample class probability loss value according to the sample high-dimensional feature layer;
judging whether the sample class probability loss value is larger than a preset probability threshold value or not;
and when the sample class probability loss value is larger than the preset probability threshold value, executing the step of constructing a preset supercolumn feature recognition model according to the training image set and the sample person recognition result.
Preferably, after the step of determining whether the sample class probability loss value is greater than a preset probability threshold, the method further includes:
and returning to the step of extracting the sample feature layer from the sample convolution layer when the sample class probability loss value is less than or equal to the preset probability threshold.
Preferably, the step of extracting the features of the to-be-identified person image to obtain a person feature map layer includes:
inputting the figure image to be identified into a preset convolutional neural network model to obtain an initial characteristic map layer;
pooling the initial feature map layer to obtain an attention image;
and obtaining a character feature layer according to the attention image and the initial feature layer.
In addition, in order to achieve the above object, the present invention further provides an image recognition apparatus based on fine-grained features of a person, including:
the acquisition module is used for acquiring a figure image to be identified;
the extraction module is used for extracting the characteristics of the figure image to be identified to obtain a figure characteristic map layer;
the recognition module is used for inputting the figure feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result;
The acquisition module is also used for acquiring the image identification accuracy according to the image identification result;
and the judging module is used for taking the image recognition result as an image recognition result based on the fine-grained character of the person when the image recognition accuracy is greater than or equal to a preset standard threshold.
In addition, in order to achieve the above object, the present invention further provides an image recognition apparatus based on fine-grained features of a person, including: the image recognition method comprises the steps of a memory, a processor and an image recognition program based on human fine-grained features, wherein the image recognition program based on human fine-grained features is stored on the memory and can run on the processor, and when being executed by the processor, the image recognition program based on human fine-grained features realizes the steps of the image recognition method based on human fine-grained features.
Furthermore, in order to achieve the above object, the present invention further provides a storage medium having stored thereon an image recognition program based on fine-grained features of persons, which when executed by a processor implements the steps of the image recognition method based on fine-grained features of persons as described above.
The method comprises the steps of firstly obtaining a figure image to be recognized, carrying out feature extraction on the figure image to be recognized to obtain a figure feature layer, then inputting the figure feature layer into a preset supercolumn feature recognition model, accurately positioning a key area of the image to quickly and accurately obtain an image recognition result corresponding to the figure image, finally obtaining an image recognition accuracy according to the image recognition result, and taking the image recognition result as an image recognition result based on figure fine-grained features when the image recognition accuracy is larger than or equal to a preset standard threshold value, so that the figure image recognition efficiency is improved while the figure recognition result is accurate.
Drawings
Fig. 1 is a schematic structural diagram of an image recognition device based on human fine-grained features in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flowchart of a first embodiment of an image recognition method based on fine-grained features of a person according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of an image recognition method based on fine-grained features of a person according to the present invention;
fig. 4 is a block diagram of a first embodiment of an image recognition apparatus based on fine-grained features of a person according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an image recognition device based on human fine-grained features in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the image recognition apparatus based on fine-grained features of a person may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of image recognition apparatuses based on fine-grained features of persons, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, identified as a computer storage medium, may include an operating system, a network communication module, a user interface module, and an image recognition program based on fine-grained features of a person.
In the image recognition device based on the fine-grained character features shown in fig. 1, the network interface 1004 is mainly used for connecting with a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the image recognition device based on the fine-grained character of the person calls an image recognition program based on the fine-grained character of the person stored in the memory 1005 through the processor 1001 and executes the image recognition method based on the fine-grained character of the person provided by the embodiment of the invention.
Based on the hardware structure, the embodiment of the image identification method based on the character fine-grained characteristics is provided.
Referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of an image recognition method based on fine-grained features of a person according to the present invention, and the first embodiment of the image recognition method based on fine-grained features of a person according to the present invention is provided.
In a first embodiment, the image recognition method based on the fine-grained character of the person comprises the following steps:
step S10: and acquiring the image of the person to be identified.
It should be noted that the execution subject of the present embodiment is an image recognition device based on fine-grained features of a person, where the device is an image recognition device based on fine-grained features of a person with functions of image processing, data communication, program execution, and the like, and may also be other devices, which is not limited in this embodiment.
Before the step of obtaining the image of the person to be identified, a preset supercolumn feature identification model (HCA-CNN) needs to be established, wherein the preset supercolumn feature identification model is a Convolutional Neural Network model established based on the supercolumn feature idea of image segmentation and fine-grained positioning.
Acquiring image training sets corresponding to different characters, traversing the image training sets to obtain traversed current training images, acquiring a corresponding sample convolution layer according to the current training image, extracting a sample characteristic layer from the sample convolution layer, acquiring a layer pixel point set corresponding to the sample characteristic layer, superposing the pixel point sets of the image layers by a preset up-sampling method to obtain a sample super-column set, when the traversal is finished, constructing a sample super-column set according to all the obtained sample super-column sets, respectively preprocessing each sample super-column set in the sample super-column set to obtain a sample target image set, obtaining a sample character recognition result corresponding to each sample target image contained in the sample target image set, and constructing a preset supercolumn feature recognition model according to the training image set and the sample character recognition result.
In the above, extracting the sample feature map layer from the sample convolution map layer may be inputting a current training image into a convolution neural network to obtain a corresponding sample convolution map layer, or inputting the sample feature map layer into a preset residual error model, that is, a deep residual error network model to obtain a sample high-dimensional feature map layer, where the feature extraction stage may reduce the magnitude of network parameters by continuously simulating feature differentiation, weight sharing of convolution, and pooling, and finally inputting the features into a conventional neural network structure to complete a classification task.
The step of preprocessing each sample superset set in the sample superset set to obtain a sample target image set may be understood as traversing the sample superset set to obtain a traversed current sample superset set, preprocessing the current sample superset set by a preset downsampling method to obtain a target superset set, flattening the target superset set to obtain a target area, determining an attention area positioning parameter according to the target area, constructing an attention area positioning parameter set according to all the obtained attention area positioning parameters when the traversal is finished, and processing each sample target image included in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set.
The step of processing each sample target image included in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set may be understood as traversing the attention area positioning parameter set to obtain a traversed current attention area positioning parameter, determining a target area position according to the current attention area positioning parameter, performing area clipping on the current training image according to the target area position to obtain a target area image, performing amplification processing on the target area image by a preset bilinear interpolation method to obtain a sample target image, and when the traversal is finished, constructing a sample target image set according to all the obtained sample target images.
For ease of understanding, the following specific steps for constructing the HCA-CNN network model may be:
the character feature data set may be a movie and television character image set, a character image set in daily life, or the like, and this embodiment is not limited thereto.
The following is exemplified by the Beijing opera character feature data set:
according to the different visual characteristics of Beijing Opera characters, a Beijing Opera Role (BJOR) data set facing to the Beijing Opera character recognition task is manufactured, more than 300 Beijing Opera videos of classical Opera are sorted and classified, different video frames are set by a control variable method for image capture, and 273100 pictures are obtained in total; 40000 single target pictures for the image classification task are obtained through screening; the classification is 8, 5000 pieces in each category.
Further, the data set is input to the HCA-CNN for image recognition.
The HCA-CNN network is formed by three layers of scaleless scale networks in an iteration mode, and the structures of the scale networks are the same. The input pictures in the HCA-CNN network firstly pass through a lightweight network (MobileNet V2 classification network), and a feature map of each feature layer is obtained after a series of feature extraction operations. On one hand, the last feature map layer is input into a classifier to be used as a classification task of the current scale; on the other hand, part of the selected stage feature maps are overlapped to form a super column Set, the super column Set is input into a super column based Attention mechanism Network (HC-APN), the HC-APN Network performs down-sampling and full connection operation on the obtained super column Set features, and then key region images are amplified through the extracted key region parameters. And the amplified image is used as an input value of the next layer scale, and iteration is repeated in this way, so that the proportion of a key area is improved, and finally fine classification of fine-grained images is realized.
In the cyclic Attention Convolutional neural network (RA-CNN), it is mentioned that spatial features contribute to the identification of fine-grained images, so that a large number of spatial features existing in the extraction stage of the features of the image of kyoto are worth intensive study. In the feature extraction stage, the magnitude of network parameters is reduced by continuously simulating feature differentiation and weight sharing and pooling of convolution, and finally the features are input into a traditional neural network structure to finish classification tasks. And then showing the image characteristics of the Beijing opera characters through the middle-layer feature map information extracted by the visual characteristics. Taking the MobileNetV2 network as an example, feature map strength information of different feature extraction stages can be shown through a red-blue map, and the stronger the red representative feature is, the weaker the blue representative feature is. It can be observed that the closer to the input layer (ImageInput) the underlying class information (Category characteristics) is, the weaker, the stronger the spatial characteristics (Spatialcharacteristics); the higher the level of the high-level class information closer to the output level (classifier), the weaker the spatial features.
Because the sub-network APN of the RA-CNN only adopts the last layer feature map of the main network depth model (VGG) as an input feature, excessive processing on spatial features is not needed. The method is based on the input characteristic change, and is correspondingly improved on the basis of the sub-network APN, and a new attention mechanism sub-network HC-APN is provided.
The HC-APN subnetwork first downsamples the input features of size 224 x 2024 to size 7 x 2024, then performs two full join operations, first flattening the features to a size 1 x 16192 and second to a size 1 x 3 (the number of channels 3 represents the three parameters tx, ty, tl for attention area localization).
Furthermore, the attention network is used for determining the position of the target area, and after a coordinate relation is defined, the clipped attention area is obtained by a method of element multiplication of a Maskm function and an input image X. And then, performing region amplification on the determined target region by using a bilinear interpolation method through the following formula, acquiring a corresponding image recognition result according to the region amplified image through the steps, and then constructing a preset supercolumn feature recognition model according to the training image set and the sample character recognition result.
And performing joint loss function formula calculation on the successfully constructed preset supercolumn feature recognition model to verify whether the preset supercolumn feature recognition model meets the requirements, and when the preset supercolumn feature recognition model does not meet the requirements, adjusting parameters in the preset supercolumn feature recognition model to obtain the preset supercolumn feature recognition model with higher accuracy.
Step S20: and extracting the characteristics of the figure image to be identified to obtain a figure characteristic image layer.
The extracting of the feature layer from the to-be-identified person image may be inputting the to-be-identified person image into a preset convolutional neural network model to obtain an initial feature layer, performing pooling processing on the initial feature layer to obtain an attention image, obtaining the person feature layer according to the attention image and the initial feature layer, or inputting the feature layer into a preset residual model, that is, a depth residual network model to obtain a sample high-dimensional feature layer, and the like, which is not limited in this embodiment.
Step S30: and inputting the figure feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result.
In the structural schematic diagram of the lightweight classification model, the extracted feature map (namely, a character feature map layer) is subjected to multi-Task learning, Task1 represents a super column Set (HyperColumn Set) for learning HC-APN, and Task2 represents a feature map for learning classification tasks.
In Task1, the feature maps are different in size from the original image, and the respective layers of feature maps and the original image are superimposed on a pixel-by-pixel basis, and upsampling (upsampling) is required first.
After upsampling is carried out on each layer of feature map, the features of different layer feature maps of pixel points at different positions are shown, and the k feature maps are subjected to accumulation operation, which means that the channels of different feature maps are overlapped together, but not added numerically. The f value is not the whole feature map, but the input image size is 224x224 only for a certain pixel position i, so the hyperColumn Set consists of 224x224 fi, that is, after upsampling each layer feature map, superposition can be performed. Since the input image size is fixed to 224 × 224, the super-column set is composed of 224 × 224 super columns of pixel points i.
Because the sub-network APN of the RA-CNN only adopts the last layer feature map of the main network VGG as an input feature, excessive processing on spatial features is not needed. The method is based on the input characteristic change, and is correspondingly improved on the basis of the sub-network APN, and a new attention mechanism sub-network HC-APN is provided.
The HC-APN subnetwork first downsamples the input features of size 224x 2024 to size 7 x 2024, then performs two full join operations, first flattening the features to a size 1 x 16192 and second to a size 1 x 3 (the number of channels 3 represents the three parameters tx, ty, tl for attention area localization).
Further, the attention network is used for determining the position of the target area, and after a coordinate relation is defined, the clipped attention area is obtained by a method of element multiplication of a Mask m function and the input image X.
The Mask m function can select the most important region in forward propagation, then the determined target region is subjected to region amplification by a bilinear interpolation method, and finally image recognition is carried out according to the amplified image so as to obtain a corresponding image recognition result.
Step S40: and acquiring the image identification accuracy according to the image identification result.
Step S50: and when the image identification accuracy is greater than or equal to a preset standard threshold, taking the image identification result as an image identification result based on the fine-grained character of the person.
It can be understood that an image recognition accuracy corresponding to the image recognition result can also be output in the preset supercolumn feature recognition model, and the image recognition accuracy may be 50%, 60%, 90%, or the like.
Assuming that the image recognition accuracy rate corresponding to the current image recognition result is 80% and the preset standard threshold is 70%, where the preset standard threshold is set by a user in a self-defined manner, and the embodiment is not limited, and if it is known that 80% is greater than 70%, the image recognition result is used as an image recognition result based on the fine-grained features of the person.
In this embodiment, a character image to be recognized is obtained, feature extraction is performed on the character image to be recognized to obtain a character feature layer, the character feature layer is input into a preset supercolumn feature recognition model, a key area of the image can be accurately located, an image recognition result corresponding to the character image is rapidly and accurately obtained, an image recognition accuracy is obtained according to the image recognition result, and when the image recognition accuracy is greater than or equal to a preset standard threshold, the image recognition result is used as an image recognition result based on character fine-grained features, so that the character image recognition efficiency is improved while the character recognition result is accurate.
In addition, referring to fig. 3, fig. 3 is a first embodiment of the image recognition method based on the fine-grained feature of the person, and a second embodiment of the image recognition method based on the fine-grained feature of the person is proposed.
In the second embodiment, before the step S10 in the image recognition method based on fine-grained features of a person, the method further includes:
step S001: and acquiring image training sets corresponding to different characters, traversing the image training sets, and acquiring traversed current training images.
The character feature data set may be a movie and television character image set, a character image set in daily life, or the like, and this embodiment is not limited thereto.
The following is exemplified by a character feature data set and a Beijing opera character feature data set in daily life:
1. data set classification
The classification of the character feature set in daily life can be divided according to the characteristics of the age, sex, occupation and the like of the characters. Among them, 8 kinds of category labels can be set, including: "Zhongnian _ Nanxing _ Bailing", "Qingnian _ Nanxing _ Junren", "Qingnian _ Nanxing _ Xuesheng", "Zhongnian _ Nvxing _ Getihu", "Qingnian _ Nvxing _ Xuesheng", "Lanian _ Nvxing _ Zhufu".
The head (Headwear), the Face (Face), the Beard (Beard), the clothing (Clothes) and other parts are adopted to distinguish the characteristics of the Type.
Part of characteristics are selected from the following characteristics for introduction:
(1) middle-aged male white collar (Zhongnian _ Nanxing _ Bailing): the head is characterized by no cap and neat hairstyle; the clothes are characterized by mostly black business suit, white shirt and negative tie;
(2) young male soldiers (Qingnian _ Nanxing _ Junren): the head is characterized in that the clothes are mostly characterized by wearing army caps and flat-head hairstyles; the color of the clothes is characterized by army green;
(3) Young male students (Qingnian _ Nanxing _ Xuesheng): the head is characterized by a flat head; most clothes are dark blue and white school uniforms;
(4) middle-aged female teacher (Zhongnian _ Nvxing _ Getihu): the face features are mostly glasses; the clothes are mostly characterized by carrying textbooks;
(5) young female students (Qingnian _ Nvxing _ Xuesheng): the head is characterized by long hair or student hair style; most clothes are dark blue and white school uniforms;
(6) housewives of elderly women (Laonian _ Nvxing _ Zhufu): the head is mostly grey-white; the dress features are mostly a wearing apron.
A daily character (BJOR) data set for a person recognition task is created according to the difference in visual characteristics of persons in daily life, 1200 photographed images are acquired and sorted, and 200 photographed images are classified into 6 categories. Wherein the acquired images are collated to acquire a corresponding training set of images.
According to the different visual characteristics of Beijing Opera characters, a Beijing Opera Role (BJOR) data set facing to the Beijing Opera character recognition task is produced, more than 300 Beijing Opera videos of classical Opera are sorted and classified, different video frames are set by a control variable method for image capture, 273100 pictures are obtained in total, 40000 single-target pictures for the image classification task are obtained through screening, the single-target pictures are divided into 8 types, and 5000 pictures of each type are obtained. Wherein the acquired images are collated to acquire a corresponding training set of images.
In addition, a Beijing Opera Role (BJOR) data set facing to the Beijing Opera character recognition task is also produced according to different visual characteristics of the Beijing Opera characters, more than 300 Beijing Opera videos of classical dramas are sorted and classified, different video frames are set by a control variable method for image capture, 273100 pictures are obtained in total, single target pictures 40000 for the image classification task are obtained through screening, and the single target pictures are divided into 8 types and 5000 pieces of each type. Wherein the acquired images are collated to acquire a corresponding training set of images.
The classification of the characters of the Beijing opera is performed according to the characteristics of the characters such as age, gender, character and the like. We select representative 8 rows as category labels, and setting basic category labels includes: "LaoSheng", "WuSheng", "XiaoaoSheng", "ZhengDan", "HuaDan", "LaoDan", "jingJue", "ChouJue".
By referring to the related data such as Beijing opera costume atlas, the head ornament (Headwear), the Face mask (Face), the Beard (Beard), the costume (Clothes), the Sleeve (Sleeve), the waistband (Belt) and other parts are adopted to distinguish the features of the guidang (Type).
Some of the features are selected for example and introduction: middle-aged man, young man and young woman
Old (LaoSheng): the beard is characterized by diversified, black, pale and white, three beard in shape and full of beard; the facial makeup is light in overall appearance due to facial features;
xiaosheng (Xiaoosheng): the bearded mouth is characterized by being bearded mouth, and the shape of the mouth can be observed; the facial makeup is characterized by thick makeup and deep red lips;
wu sheng (wu sheng): the bearded mouth is characterized by being bearded mouth, and the mouth shape can be observed; the facial makeup features appear as deeper red lips; clothes are mostly characterized by white long leaning;
positive denier (ZhengDan): the facial makeup is characterized by thick makeup; the head decoration is characterized by wearing silver bubbles; the sleeves are characterized by having water sleeves;
flower denier (HuaDan): the dress is characterized by being mostly characterized by a meal list and a wadded skirt; the headwear is characterized by being mostly characterized by wearing bright head faces and rhinestones; the sleeves are characterized as waterless sleeves; in addition, "handkerchiefs" are also a special identification feature;
old denier (LaoDan): facial makeup is characterized by a shallow makeup; the clothes are mostly characterized by yellow, grey white and dark green folds (a kind of casual clothes); in addition, "crutches" are also their special identification features;
net angle (JingJue): beard is often characterized as full; the facial makeup is characterized by thick facial makeup, and comprises various categories such as specific ' whole face ', ' three tiles ', broken face ' and the like;
Ugly corner (ChouJue): the facial makeup is characterized in that a piece of white powder is smeared at the nose bridge; the beard of the wind is characterized by wind, beard, sleeve, beard and sleeve.
Step S002: and acquiring a corresponding sample convolution layer according to the current training image.
The method includes selecting a current training image from the current training set to extract a sample convolution layer, where the current training image may be input into a preset convolution neural network model to obtain an initial feature layer, performing pooling processing on the initial feature layer to obtain an attention image, and obtaining the sample convolution layer according to the attention image and the initial feature layer, which is not limited in this embodiment.
Step S003: and extracting a sample characteristic layer from the sample convolution layer.
And selecting a part of sample feature layers with clear image profiles from the sample convolution layers, wherein the sample convolution layers are arranged according to the definition degree of the image, and assuming that one image has three layers, the definition degree of the image profiles from the upper layer to the middle layer to the lower layer is gradually reduced, so that the layer corresponding to the lower layer can be selected as the sample feature layer according to the user requirement.
Step S004: and acquiring a layer pixel point set corresponding to the sample characteristic layer.
Step S005: and superposing the layer pixel point set by a preset up-sampling method to obtain a sample super-column set.
In the invention, the middle layer feature map of the MobileNet V2 classification network is shown, wherein (a) represents the feature map of the bottommost layer, and obvious contour characteristics can be seen; (b) and (c) represents the featuremap of the intermediate stage, and the outline characteristic effect is weakened; (d) features such as feature map, outline, etc. representing higher layers have disappeared.
In the process of feature extraction, in order to meet classification tasks, semantic information of the current class of the Beijing opera is continuously enhanced, and spatial features are weakened (including character postures, joints of limbs, stage lighting intensity, stage positions and the like).
In the process of extracting the features of the convolutional neural network, because of gradual weakening of spatial features and continuous enhancement of category semantic features, feature maps at different stages present larger feature differences. By using the thought of a gating network structure for feature fusion and a scale-dependent pooling SDP algorithm for reference, the supercolumn feature for image segmentation is applied to the task of fusing the space features and the category features of characters of the feature map Beijing opera in different stages through the following formula, wherein the formula is as follows:
fi=∑kaikFk(1)
In the formula, i is a certain pixel point of an input Beijing opera image, and fi is a feature vector, namely, supercolumn, sigma, serially connected with corresponding positions in each layer of feature mapkThe accumulation operation (non-numerical addition) is performed for k feature maps, and aik is the position of the pixel and the feature map.
Further, the HCA-CNN network proposed by the Beijing opera character image research is formed by three layers of hierarchical structures in an iteration mode, the network structures of all layers are the same, and partial features of each layer can serve as input information of the next layer. The image is input into an HCA-CNN network, and is firstly classified by a MobileNet V2 network, and a series of feature extraction operations are carried out to obtain feature maps of each intermediate layer. On one hand, inputting the last feature map layer into a classifier for a classification task of the current scale; and on the other hand, the feature map of part of the middle layers is overlapped based on pixel points to form a super-column Set and is input into the HC-APN sub-network, the HC-APN network performs down-sampling and full-connection operations on the obtained HyperColumn Set characteristics, then key area image amplification is performed through the extracted key area parameters, and the amplified image is used as the input of the next layer.
Considering the end-to-end application scenario of the Beijing opera character recognition task and the requirements of real-time recognition, the patent proposes that a MobileNet V2 network with less parameters and higher operation efficiency is used as a backbone network, and MobileNet V2 is beneficial to the end-to-end real-time scenario. The structure of the VGG is similar to that of the VGG in composition, and different conv2d and bottleeck structures are stacked.
In the structural schematic diagram of the lightweight classification model, the extracted feature map is subjected to multi-Task learning, Task1 represents a super column Set (hyperColumn Set) for learning HC-APN, and Task2 represents a feature map for learning classification tasks.
In Task1, the feature maps are different in size from the original image, and the respective layers of feature maps and the original image are superimposed on a pixel-by-pixel basis, and upsampling (upsampling) is required first. In order to obtain the f value of the function of the point P, linear interpolation is first performed in the x direction, and the upsampling formula is:
then linear interpolation is carried out in the y direction to obtain:
where P ═ x, y denotes a point representing upsampling insertion, and Q11=(x1,y1)、Q12=(x1,y2)、Q21=(x2,y1)、Q12=(x2,y2) The values of four pixels existing in the original image, wherein R1=(x,y1)、R2=(x,y2) Are pixel point values.
After upsampling is carried out on each layer of feature map, the features of different layer feature maps of pixel points at different positions are shown, and the k feature maps are subjected to accumulation operation, which means that the channels of different feature maps are overlapped together, but not added numerically. The f value is not the whole feature map, but the input image size is 224x224 only for a certain pixel position i, so the hyperColumn Set consists of 224x224 fi, that is, after upsampling each layer feature map, superposition can be performed. Since the input image size is fixed to 224 × 224, the super-column set is composed of 224 × 224 super columns of pixel points i.
Step S006: and at the end of traversal, constructing a sample supercolumn set according to all the obtained sample supercolumn sets.
Step S007: and respectively preprocessing each sample supercolumn set in the sample supercolumn set to obtain a sample target image set.
The step of preprocessing each sample superset set in the sample superset set to obtain a sample target image set may be understood as traversing the sample superset set to obtain a traversed current sample superset set, preprocessing the current sample superset set by a preset downsampling method to obtain a target superset set, flattening the target superset set to obtain a target area, determining an attention area positioning parameter according to the target area, constructing an attention area positioning parameter set according to all the obtained attention area positioning parameters when the traversal is finished, and processing each sample target image included in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set.
The step of processing each sample target image included in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set may be understood as traversing the attention area positioning parameter set to obtain a traversed current attention area positioning parameter, determining a target area position according to the current attention area positioning parameter, performing area clipping on the current training image according to the target area position to obtain a target area image, performing amplification processing on the target area image by a preset bilinear interpolation method to obtain a sample target image, and when the traversal is finished, constructing a sample target image set according to all the obtained sample target images.
Because the sub-network APN of the RA-CNN only adopts the last layer feature map of the main network VGG as an input feature, excessive processing on spatial features is not needed. The method is based on the input characteristic change, and is correspondingly improved on the basis of the sub-network APN, and a new attention mechanism sub-network HC-APN is provided.
The HC-APN subnetwork first downsamples the input features with size 224 x 2024 to size 7 x 2024, then performs two full join operations, first flattening the features to size 1 x 16192 and second to size 1 x 3 (channel number 3 represents three parameters tx, ty, tl for attention area localization), and then the attention network is used to determine the target area location by the following formula, which is:
in the formula, (tx, ty) is the central coordinate point of the region, tl is half of the side length of the square region, (tx (tl), ty (tl)) is the upper left-corner coordinate of the target region, (tx (br), ty (br)) is the lower right-corner coordinate of the target region.
Further, after a coordinate relation is defined, a clipped attention area is obtained by a method of element multiplication of a Mask m function and an input image X:
Xatt=X·M(tx,ty,tl) (5)
wherein X is an input image, X attThe clipped region is obtained by a method of element multiplication of a Mask m function and an input image X.
The Mask m function can select the most important region in forward propagation, and is easy to optimize in backward propagation due to the characteristic of a continuous function:
M(.)=[h(x-tx(tl)-h(x-tx(br))]·[h(y-ty(tl)-h(y-ty(br)](6)
h (x) in Mask m function is step function:
in the formula, k is a set positive integer, h (x) is a step function, and exp is an exponential function with a natural constant e as a base.
When-kx tends to be positive infinity, the denominator also tends to be positive infinity, and then h (x) tends to be 0; when-kx tends to minus infinity, the second half of the denominator tends to 0, so that the entire denominator tends to 1, and h (x) tends to 1. t is tx(tl)≤x≤tx(br),h(x-tx(tl))-h(x-tx(br)) Tending to the same 1, y-axis. Therefore, only when x is between tx (tl) and tx (br) and y is between ty (tl) and ty (br), M (.) tends to 1, and the other tends to 0.
Further, we then use a bilinear interpolation method to perform region amplification on the determined target region by the following formula:
and (i, j) is the coordinate of the addition point after the image is amplified.
Step S008: and acquiring a sample person identification result corresponding to each sample target image contained in the sample target image set.
Step S009: and constructing a preset supercolumn feature recognition model according to the training image set and the sample character recognition result.
Through the steps, joint loss function formula calculation is carried out on the successfully constructed preset supercolumn feature recognition model to verify whether the preset supercolumn feature recognition model meets the requirements, when the preset supercolumn feature recognition model does not meet the requirements, parameters in the preset supercolumn feature recognition model need to be adjusted, and the joint loss function formula is as follows:
in the formula, Lclx is the category loss, the loss generated by the predicted Beijing opera character category of the three-layer classification network compared with the real line label is Lrank, the loss generated when the recognition rate of the middle and the upper layers of the front and the rear layers is lower than that of the lower layer, X is the input image, Y(s) is the predicted category probability, Y is the real category, P is the real categoryt (s)Probability of s-layer real label class, Pt (s)-Pt (s+1)In the case that the probability of the category of the s-layer network is higher than the loss generated by the s +1 layer, margin is a padding value and can be 0.05, and Max { } is the generated loss, which can be understood as the loss obtained by taking the difference value of 0 and more than 0 when less than 0.
And finally, detecting the model identification effect through a detection formula according to Top1 and Top5 indexes, namely the model accuracy, taking the maximum probability vector as a prediction result, and if the classification with the maximum probability in the prediction result is correct, predicting correctly. Otherwise the prediction is wrong. The detection formula is as follows:
In the formula, Topl _ accuracy is the maximum prediction result in the prediction probability vector, TP is the number of negative classes predicted by the negative classes, FP is the number of positive classes predicted by the negative classes, which can be a false alarm rate, FN is the number of negative classes predicted by the positive classes, which can be a false alarm rate, and TN is the number of positive classes predicted by the positive classes
It can also be understood that, in the first five names with the largest probability vectors, if the correct probability occurs, the prediction is correct, otherwise, the prediction is wrong.
In addition, the patent also adopts a series of indexes to evaluate the complexity of the network, wherein the time complexity evaluation index comprises the following steps: FLOPs, spatial complexity assessment indicators include: memory Usage, Million Params, Million Muti-Adds.
The time complexity determines the training and prediction time of the model, and the space complexity determines the parameter quantity and the access quantity of the model, wherein the parameter quantity represents the total weight parameter quantity of all the belt parameters of the model. The complexity of the convolutional neural network is therefore related to the feature map size M information output by the convolution kernel. The overall time complexity calculation formula is as follows:
the overall spatial complexity calculation formula is as follows:
wherein, feature map size M is (X-K +2 Padding)/Stride +1, X is the input matrix size, K is the convolution kernel size, Padding is the Padding value, and Stride is the step size.
For a self-made BJOR data set, carrying out a comparative ablation experiment by combining different layers (scales) of different weak supervision networks and recursion networks, wherein the selected combination is as follows:
(1)VGG16
(2)RA-CNN(VGG16+APN)
(3)MobileNetV2
(4)MobileNetV2+APN
(5)MobileNetV2+HCAPN+HC(scale 2)
(6)MobileNetV2+HCAPN+HC(scale 3)
(7)MobileNetV2+HCAPN+HC(scale 1+2)
(8)MobileNetV2+HCAPN+HC(scale 1+2+3)
the BJOR data set (80%) is selected as a network training set, the data set (20%) is selected as a network verification set, and model accuracy can be obtained according to model training results, wherein table 1 is referred to as an ablation experiment accuracy comparison table.
TABLE 1
As can be seen from table 1, the MobileNetV2 network has a small reduction of approximately 1.8% compared to VGG16, and likewise, MobileNetV2 is slightly less than 1.7% for the three-layer recursive network of the combined APN attention network. Then, the hc (hypercolumn) feature is added in the text, ablation experiments are carried out on different levels (scales) of the recursive network, and a certain increase of level combination compared with a single layer can be observed. The study integrates a (MobileNet V2+ HCAPN + HC) network of three layers of scales, namely a preset supercolumn feature recognition model, the accuracy rate reaches 91.58%, the accuracy rate is improved by 0.63% compared with that of a VGG16+ APN network combination of an RA-CNN model based on a circulating soft attention mechanism, and the problem that the positioning of the attention mechanism is not efficient and accurate enough is effectively solved.
In this embodiment, first, image training sets corresponding to different characters are obtained, the image training sets are traversed to obtain traversed current training images, corresponding sample convolution layers are obtained according to the current training images, then, sample feature layers are extracted from the sample convolution layers, then, layer pixel point sets corresponding to the sample feature layers are obtained, the layer pixel point sets are subjected to superposition processing through a preset up-sampling method to obtain sample super-column sets, when traversal is finished, a sample super-column set is constructed according to all the obtained sample super-column sets, each sample super-column set in the sample super-column set is preprocessed to obtain a sample target image set, sample character recognition results corresponding to each sample target image included in the sample target image set are obtained, and a preset super-column feature recognition model is constructed according to the training image sets and the sample character recognition results, compared with the prior art, the method has the advantages that a large number of category semantic features are extracted by using the features, so that the image processing process is complex and tedious, and the key area of the image cannot be accurately positioned.
Furthermore, an embodiment of the present invention further provides a storage medium, where an image recognition program based on fine-grained features of a person is stored, and the image recognition program based on fine-grained features of a person implements the steps of the image recognition method based on fine-grained features of a person as described above when being executed by a processor.
In addition, referring to fig. 4, an embodiment of the present invention further provides an image recognition apparatus based on a fine-grained feature of a person, where the image recognition apparatus based on a fine-grained feature of a person includes:
the acquisition module 4001 is used for acquiring a figure image to be identified;
an extraction module 4002, configured to perform feature extraction on the person image to be identified, so as to obtain a person feature map layer;
the recognition module 4003 is configured to input the character feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result;
the obtaining module 4001 is further configured to obtain an image recognition accuracy according to the image recognition result;
the judging module 4004 is configured to, when the image recognition accuracy is greater than or equal to a preset standard threshold, take the image recognition result as an image recognition result based on fine-grained features of a person.
In this embodiment, a character image to be recognized is obtained, feature extraction is performed on the character image to be recognized to obtain a character feature layer, the character feature layer is input into a preset supercolumn feature recognition model, a key area of the image can be accurately located, an image recognition result corresponding to the character image is rapidly and accurately obtained, an image recognition accuracy is obtained according to the image recognition result, and when the image recognition accuracy is greater than or equal to a preset standard threshold, the image recognition result is used as an image recognition result based on character fine-grained features, so that the character image recognition efficiency is improved while the character recognition result is accurate.
Other embodiments or specific implementation manners of the image recognition device based on the fine-grained character of the person may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, but rather the words first, second, third, etc. are to be interpreted as names.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be substantially implemented or a part contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. An image identification method based on fine-grained characteristics of people is characterized by comprising the following steps:
acquiring a figure image to be identified;
extracting the characteristics of the figure image to be identified to obtain a figure characteristic image layer;
inputting the figure feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result;
acquiring image identification accuracy according to the image identification result;
and when the image identification accuracy is greater than or equal to a preset standard threshold, taking the image identification result as an image identification result based on the fine-grained character of the person.
2. The method of claim 1, wherein the step of obtaining the image of the person to be identified is preceded by the step of:
acquiring image training sets corresponding to different characters, and traversing the image training sets to obtain traversed current training images;
Obtaining a corresponding sample convolution layer according to the current training image;
extracting a sample characteristic layer from the sample convolution layer;
obtaining a layer pixel point set corresponding to the sample characteristic layer;
superposing the layer pixel point set by a preset up-sampling method to obtain a sample super-column set;
when the traversal is finished, constructing a sample super-column set according to all the obtained sample super-column sets;
respectively preprocessing each sample super-column set in the sample super-column set to obtain a sample target image set;
obtaining a sample figure identification result corresponding to each sample target image contained in the sample target image set;
and constructing a preset supercolumn feature recognition model according to the training image set and the sample character recognition result.
3. The method of claim 2, wherein the step of separately pre-processing each superset of the set of supersets of samples to obtain a set of sample target images comprises:
traversing the sample super-column set to obtain a traversed current sample super-column set;
preprocessing the current sample super-column set by a preset down-sampling method to obtain a target super-column set;
Flattening the target super-column set to obtain a target area;
determining attention area positioning parameters according to the target area;
when the traversal is finished, constructing an attention area positioning parameter set according to all acquired attention area positioning parameters;
and respectively processing each sample target image contained in the sample target image set according to each attention area positioning parameter in the attention area positioning parameter set to obtain a sample target image set.
4. The method according to claim 3, wherein the step of processing each sample target image included in the sample target image set according to each attention area localization parameter in the attention area localization parameter set to obtain a sample target image set comprises:
traversing the attention area positioning parameter set to obtain a traversed current attention area positioning parameter;
determining the position of a target area according to the current attention area positioning parameter;
performing area cutting on the current training image according to the position of the target area to obtain a target area image;
amplifying the target area image by a preset bilinear interpolation method to obtain a sample target image;
And at the end of the traversal, constructing a sample target image set according to all the obtained sample target images.
5. The method of claim 2, wherein after the step of obtaining the sample person recognition result corresponding to each sample target image included in the sample target image set, the method further comprises:
inputting the sample feature layer into a preset residual error model to obtain a sample high-dimensional feature layer;
determining a sample class probability loss value according to the sample high-dimensional feature layer;
judging whether the sample class probability loss value is larger than a preset probability threshold value or not;
and when the sample class probability loss value is larger than the preset probability threshold value, executing the step of constructing a preset supercolumn feature recognition model according to the training image set and the sample person recognition result.
6. The method of claim 5, wherein the step of determining whether the sample class probability loss value is greater than a preset probability threshold further comprises:
and returning to the step of extracting the sample feature layer from the sample convolution layer when the sample class probability loss value is less than or equal to the preset probability threshold.
7. The method of claim 1, wherein the step of extracting the features of the image of the person to be recognized to obtain the person feature map layer comprises:
inputting the figure image to be identified into a preset convolutional neural network model to obtain an initial characteristic map layer;
pooling the initial feature map layer to obtain an attention image;
and obtaining a character feature layer according to the attention image and the initial feature layer.
8. An image recognition device based on human fine-grained features is characterized by comprising:
the acquisition module is used for acquiring a figure image to be identified;
the extraction module is used for extracting the characteristics of the figure image to be identified to obtain a figure characteristic map layer;
the recognition module is used for inputting the figure feature layer into a preset supercolumn feature recognition model to obtain a corresponding image recognition result;
the acquisition module is also used for acquiring the image identification accuracy according to the image identification result;
and the judging module is used for taking the image recognition result as an image recognition result based on the fine-grained character of the person when the image recognition accuracy is greater than or equal to a preset standard threshold.
9. An image recognition device based on human fine-grained features, which is characterized by comprising: a memory, a processor and an image recognition program based on human fine-grained features stored on the memory and operable on the processor, the image recognition program based on human fine-grained features implementing the steps of the image recognition method based on human fine-grained features according to any one of claims 1 to 7 when executed by the processor.
10. A storage medium, characterized in that the storage medium stores thereon an image recognition program based on fine-grained features of a person, the image recognition program based on fine-grained features of a person realizing the steps of the image recognition method based on fine-grained features of a person according to any one of claims 1 to 7 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010655258.0A CN111860250B (en) | 2020-07-14 | 2020-07-14 | Image recognition method and device based on fine-grained character features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010655258.0A CN111860250B (en) | 2020-07-14 | 2020-07-14 | Image recognition method and device based on fine-grained character features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111860250A true CN111860250A (en) | 2020-10-30 |
CN111860250B CN111860250B (en) | 2024-04-26 |
Family
ID=73152514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010655258.0A Active CN111860250B (en) | 2020-07-14 | 2020-07-14 | Image recognition method and device based on fine-grained character features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860250B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634231A (en) * | 2020-12-23 | 2021-04-09 | 香港中文大学深圳研究院 | Image classification method and device, terminal equipment and storage medium |
CN113080874A (en) * | 2021-04-17 | 2021-07-09 | 北京美医医学技术研究院有限公司 | Multi-angle cross validation intelligent skin measuring system |
CN115908280A (en) * | 2022-11-03 | 2023-04-04 | 广东科力新材料有限公司 | Data processing-based performance determination method and system for PVC calcium zinc stabilizer |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106101696A (en) * | 2016-06-16 | 2016-11-09 | 北京数智源科技股份有限公司 | Video quality diagnosis system and video quality analysis algorithm |
CN109859209A (en) * | 2019-01-08 | 2019-06-07 | 平安科技(深圳)有限公司 | Remote Sensing Image Segmentation, device and storage medium, server |
CN110678901A (en) * | 2017-05-22 | 2020-01-10 | 佳能株式会社 | Information processing apparatus, information processing method, and program |
CN111368788A (en) * | 2020-03-17 | 2020-07-03 | 北京迈格威科技有限公司 | Training method and device of image recognition model and electronic equipment |
-
2020
- 2020-07-14 CN CN202010655258.0A patent/CN111860250B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106101696A (en) * | 2016-06-16 | 2016-11-09 | 北京数智源科技股份有限公司 | Video quality diagnosis system and video quality analysis algorithm |
CN110678901A (en) * | 2017-05-22 | 2020-01-10 | 佳能株式会社 | Information processing apparatus, information processing method, and program |
CN109859209A (en) * | 2019-01-08 | 2019-06-07 | 平安科技(深圳)有限公司 | Remote Sensing Image Segmentation, device and storage medium, server |
CN111368788A (en) * | 2020-03-17 | 2020-07-03 | 北京迈格威科技有限公司 | Training method and device of image recognition model and electronic equipment |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634231A (en) * | 2020-12-23 | 2021-04-09 | 香港中文大学深圳研究院 | Image classification method and device, terminal equipment and storage medium |
CN113080874A (en) * | 2021-04-17 | 2021-07-09 | 北京美医医学技术研究院有限公司 | Multi-angle cross validation intelligent skin measuring system |
CN113080874B (en) * | 2021-04-17 | 2023-02-07 | 北京美医医学技术研究院有限公司 | Multi-angle cross validation intelligent skin measuring system |
CN115908280A (en) * | 2022-11-03 | 2023-04-04 | 广东科力新材料有限公司 | Data processing-based performance determination method and system for PVC calcium zinc stabilizer |
Also Published As
Publication number | Publication date |
---|---|
CN111860250B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111563417B (en) | Pyramid structure convolutional neural network-based facial expression recognition method | |
CN105426850B (en) | Associated information pushing device and method based on face recognition | |
CN111860250A (en) | Image identification method and device based on character fine-grained features | |
CN103914699B (en) | A kind of method of the image enhaucament of the automatic lip gloss based on color space | |
Baskan et al. | Projection based method for segmentation of human face and its evaluation | |
CN111597870B (en) | Human body attribute identification method based on attention mechanism and multi-task learning | |
CN107545536A (en) | The image processing method and image processing system of a kind of intelligent terminal | |
CN109960974A (en) | Face critical point detection method, apparatus, electronic equipment and storage medium | |
CN110414428A (en) | A method of generating face character information identification model | |
CN111178130A (en) | Face recognition method, system and readable storage medium based on deep learning | |
CN110210449A (en) | A kind of face identification system and method for virtual reality friend-making | |
CN115035581A (en) | Facial expression recognition method, terminal device and storage medium | |
CN114239754B (en) | Pedestrian attribute identification method and system based on attribute feature learning decoupling | |
CN114743241A (en) | Facial expression recognition method and device, electronic equipment and storage medium | |
CN108875496B (en) | Pedestrian representation generation and representation-based pedestrian recognition | |
CN109508660A (en) | A kind of AU detection method based on video | |
JPH1185988A (en) | Face image recognition system | |
Lee | Detection and recognition of facial emotion using bezier curves | |
CN109965493A (en) | A kind of split screen interactive display method and device | |
CN109978795A (en) | A kind of feature tracking split screen examination cosmetic method and system | |
CN113127663B (en) | Target image searching method, device, equipment and computer readable storage medium | |
CN114003746A (en) | Dressing recommendation method and device, electronic equipment and storage medium | |
CN112488965A (en) | Image processing method and device | |
KR20220090967A (en) | Fashion Virtual Reality System Using Artificial Intelligence | |
CN111627118A (en) | Scene portrait showing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |