CN110991568A - Target identification method, device, equipment and storage medium - Google Patents

Target identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN110991568A
CN110991568A CN202010133440.XA CN202010133440A CN110991568A CN 110991568 A CN110991568 A CN 110991568A CN 202010133440 A CN202010133440 A CN 202010133440A CN 110991568 A CN110991568 A CN 110991568A
Authority
CN
China
Prior art keywords
network model
feature
module
enhancement
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010133440.XA
Other languages
Chinese (zh)
Other versions
CN110991568B (en
Inventor
吴志伟
李德紘
张少文
冯琰一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xinke Jiadu Technology Co Ltd
PCI Technology Group Co Ltd
Original Assignee
Guangzhou Xinke Jiadu Technology Co Ltd
PCI Suntek Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xinke Jiadu Technology Co Ltd, PCI Suntek Technology Co Ltd filed Critical Guangzhou Xinke Jiadu Technology Co Ltd
Priority to CN202010133440.XA priority Critical patent/CN110991568B/en
Publication of CN110991568A publication Critical patent/CN110991568A/en
Application granted granted Critical
Publication of CN110991568B publication Critical patent/CN110991568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a target identification method, a device, equipment and a storage medium, wherein the method comprises the steps of embedding a channel feature reactivation module and a fine feature self-enhancement module into a neural network structure to generate a first network model; connecting a gradient enhancement cross entropy loss function with the first network model to generate a second network model; training the second network model based on a small batch random gradient descent algorithm; modifying the trained second network model to obtain a reasoning network model; and inputting the image into the inference network model to obtain a target recognition result. According to the scheme, more subtle features can be learned and recognized, and the accuracy of target recognition is improved.

Description

Target identification method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the field of computers, in particular to a target identification method, a target identification device, target identification equipment and a storage medium.
Background
Object recognition refers to the process by which a particular object (or type of object) is distinguished from other objects (or other types of objects). It includes the identification of both two very similar objects and the identification of one type of object with another type of object. With the continuous development of computer technology, the application range of target identification is more and more extensive, such as identifying the targets of automobile models, flowers, plants, birds and the like.
In the prior art, a corresponding target object can be obtained by performing target identification on an image, however, in the existing target identification mode, for the case of many fine features, for example, when the target identification is performed on the image containing a plurality of faces and a plurality of vehicle types, efficient and accurate identification cannot be performed, which also causes a series of problems, particularly, under social emergency conditions such as high occurrence of epidemic situation, public safety incident and the like, it is necessary to accurately identify highly-dangerous persons such as determination of suspected or diagnosed patients and early warning and tracking of the action track of an important vehicle from a plurality of images, and how to efficiently and accurately perform the target identification is crucial at this time.
Disclosure of Invention
The embodiment of the invention provides a target identification method, a target identification device, target identification equipment and a storage medium, which can learn and identify more subtle features and improve the accuracy of target identification.
In a first aspect, an embodiment of the present invention provides a target identification method, where the method includes:
embedding a channel feature reactivation module and a fine feature self-enhancement module into a neural network structure to generate a first network model;
connecting a gradient enhancement cross entropy loss function with the first network model to generate a second network model;
training the second network model based on a small batch random gradient descent algorithm;
modifying the trained second network model to obtain a reasoning network model;
and inputting the image into the inference network model to obtain a target recognition result.
Optionally, embedding the channel feature reactivation module and the fine feature self-enhancement module into the neural network structure to generate a first network model, including:
the weight of the feature graph output in the neural network structure is redistributed according to the channel through a channel feature reactivation module;
and enhancing the non-significant features of the feature map output by the channel feature reactivation module through the fine feature self-enhancement module, and inhibiting the significant features.
Optionally, the reassigning, by the channel feature reactivation module, the weight of the feature map output in the neural network structure according to the channel includes:
compressing a feature map output in a neural network structure at a spatial level to obtain compressed features;
reactivating the compressed features to obtain activated weights;
and multiplying the activated weight by the input feature map according to channels.
Optionally, the fine feature self-enhancement module includes an enhancement mask and a suppression mask, and the enhancing, by the fine feature self-enhancement module, the non-significant feature of the feature map output by the channel feature reactivation module, and suppressing the significant feature include:
enhancing the area corresponding to the non-significant feature of the feature map output by the channel feature reactivation module according to the enhancement mask;
and according to the suppression mask, suppressing the area corresponding to the salient feature of the feature map output by the channel feature reactivation module.
Optionally, the embedding the channel feature reactivation module and the fine feature self-enhancement module into the neural network structure to generate a first network model includes:
deleting a global pooling layer of the residual network, and modifying the last layer of full-link layer into a convolutional layer with a convolutional kernel size of 1x1 and a channel number of C to obtain a characteristic diagram;
inputting the feature map into a channel feature reactivation module;
inputting the feature graph output by the channel feature reactivation module into a fine feature self-enhancement module and then connecting the fine feature self-enhancement module with a global pooling layer to generate a first network model.
Optionally, the connecting the gradient enhancement cross-entropy loss function to the first network model to generate a second network model includes:
and adjusting the loss value of the sample by a loss adjusting factor introduced in the gradient enhancement cross entropy loss function, and meanwhile, operating the negative sample meeting the preset condition to generate a second network model.
Optionally, the modifying the trained second network model to obtain the inference network model includes:
deleting the fine feature self-enhancement module and the gradient enhancement cross entropy loss function in the trained second network model, and accessing the Softmax loss function after the global pooling layer to obtain the inference network model.
In a second aspect, an embodiment of the present invention further provides an object recognition apparatus, where the apparatus includes:
the first processing module is used for embedding the channel feature reactivation module and the fine feature self-enhancement module into the neural network structure to generate a first network model;
the second processing module is used for connecting the gradient enhancement cross entropy loss function with the first network model to generate a second network model;
the training module is used for training the second network model based on a small batch random gradient descent algorithm;
the third processing module is used for modifying the trained second network model to obtain a reasoning network model;
and the recognition module is used for inputting the image into the reasoning network model to obtain a target recognition result.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the object recognition method according to the embodiment of the present invention.
In a fourth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the object recognition method according to the present invention.
In the embodiment of the invention, a channel feature reactivation module and a fine feature self-enhancement module are embedded into a neural network structure to generate a first network model, a gradient enhancement cross entropy loss function is connected with the first network model to generate a second network model, the second network model is trained based on a small batch random gradient descent algorithm, the trained second network model is modified to obtain a reasoning network model, and an image is input into the reasoning network model to obtain a target recognition result. The problem of accuracy rate reduction caused by unbalanced sample categories in the training process is solved by introducing the channel feature reactivation module, more fine features can be learned by a network model by introducing the fine feature self-enhancement module and the gradient enhancement cross entropy loss function, the method is very effective for improving the identification accuracy rate of similar categories, and is particularly suitable for identification tasks of fine-grained targets, such as identification of fine features of vehicle types and the like, the targets can be quickly and accurately identified in the acquired complex images, so that the target action tracks are further determined, and the method plays a vital role in management and control of public safety events such as epidemic situation prevention and control society.
Drawings
Fig. 1 is a flowchart of a target identification method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for identifying objects according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method for identifying objects according to an embodiment of the present invention;
FIG. 4 is a flow chart of another method for identifying objects according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a second network model according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an inference network model provided in an embodiment of the present invention;
FIG. 7 is a flow chart of another method for identifying objects according to an embodiment of the present invention;
fig. 8 is a block diagram of a target identification apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad invention. It should be further noted that, for convenience of description, only some structures, not all structures, relating to the embodiments of the present invention are shown in the drawings.
Fig. 1 is a flowchart of a target identification method according to an embodiment of the present invention, where the present embodiment is applicable to target identification, and the method may be executed by a device such as a computer, and specifically includes the following steps:
step S101, embedding the channel feature reactivation module and the fine feature self-enhancement module into a neural network structure to generate a first network model.
In one embodiment, the pre-designed channel feature reactivation module and the fine feature self-enhancement module are embedded in a neural network structure, which may be an existing neural network structure. Exemplary, such as a Resnet50 neural network architecture. The channel characteristic reactivation module is used for reallocating weights to the characteristic diagram output in the neural network structure according to the channels, and the fine characteristic self-enhancement module is used for enhancing the non-significant characteristics of the characteristic diagram output in the channel characteristic reactivation module and inhibiting the significant characteristics. The fine feature self-enhancement module comprises an enhancement mask and a suppression mask, and is used for enhancing the area corresponding to the non-significant feature of the feature diagram output by the channel feature reactivation module according to the enhancement mask and suppressing the area corresponding to the significant feature of the feature diagram output by the channel feature reactivation module according to the suppression mask.
And S102, connecting the gradient enhancement cross entropy loss function with the first network model to generate a second network model.
In one embodiment, a second network model is generated by connecting a pre-designed gradient enhancement cross-entropy loss function to the first network model. The gradient enhancement cross entropy loss function is used for supervising the training of the network, so that the network model can focus on samples which are difficult to distinguish, and the recognition rate of samples of similar categories is improved.
And S103, training the second network model based on a small batch random gradient descent algorithm.
After the second network model is obtained, the second network model is trained based on a small-batch stochastic gradient descent algorithm, and the neural network model is trained through the small-batch stochastic gradient descent algorithm to obtain model parameters. In one embodiment, the Adam optimizer may be used to train the neural network model to obtain model parameters, or other optimization algorithms commonly used in deep learning, such as SGD algorithm, RMSProp algorithm, and the like, may be used.
And S104, modifying the trained second network model to obtain a reasoning network model, and inputting the image into the reasoning network model to obtain a target recognition result.
In one embodiment, after the second network model is trained, it is modified to obtain the inference network model. Specifically, the modification mode may be to delete the fine feature self-enhancement module and the gradient enhancement cross entropy loss function in the trained second network model, and access the Softmax loss function after the global pooling layer to obtain the inference network model.
According to the scheme, the channel feature reactivation module and the fine feature self-enhancement module are embedded into the neural network structure to generate a first network model, the gradient enhancement cross entropy loss function is connected with the first network model to generate a second network model, the second network model is trained on the basis of a small-batch random gradient descent algorithm, the trained second network model is modified to obtain a reasoning network model, and an image is input into the reasoning network model to obtain a target recognition result. The method solves the problem of accuracy rate reduction caused by unbalanced sample categories in the training process by introducing the channel feature reactivation module, can enable the network model to learn more fine features by introducing the fine feature self-enhancement module and the gradient enhancement cross entropy loss function, is very effective in improving the accuracy rate of similar categories, and is particularly suitable for identification tasks of fine-grained targets, such as vehicle type identification, flower and plant identification, bird identification and the like.
Fig. 2 is a flowchart of another target identification method according to an embodiment of the present invention, which shows a specific method for processing data by a channel feature reactivation module. As shown in fig. 2, the technical solution is as follows:
step S201, embedding the channel feature reactivation module into a neural network structure.
Step S202, compressing the feature diagram output in the neural network structure in a spatial layer through a channel feature reactivation module to obtain a compressed feature, reactivating the compressed feature to obtain an activated weight, and multiplying the input feature diagram by the activated weight according to a channel.
In one embodiment, the following common variables are defined: training image is I, class label is
Figure 704900DEST_PATH_IMAGE001
Where L is the set of all class labels, C is the number of channels, and the feature map of the selected neural network structure (backbone network) output is
Figure 616224DEST_PATH_IMAGE002
Expressed in the form of a set
Figure 600360DEST_PATH_IMAGE003
Where W, H is the width and height of the feature map and R is the real number field in the mathematical formula.
Specifically, the channel feature reactivation module is designed as follows:
the input of the channel feature reactivation module is a feature map
Figure 913530DEST_PATH_IMAGE004
To, for
Figure 692130DEST_PATH_IMAGE004
Compressing at spatial level to obtain compressed features
Figure 356330DEST_PATH_IMAGE005
The compression formula is:
Figure 878578DEST_PATH_IMAGE006
Figure 810368DEST_PATH_IMAGE007
then go right again
Figure 25449DEST_PATH_IMAGE008
Performing reactivation to obtain activated weight
Figure 52311DEST_PATH_IMAGE009
The reactivation formula is:
Figure 502884DEST_PATH_IMAGE010
Figure 666012DEST_PATH_IMAGE011
finally, multiplying the input characteristic diagram by the weight according to the channel to obtain
Figure 317573DEST_PATH_IMAGE012
Expressed in the form of a set
Figure 690786DEST_PATH_IMAGE013
The calculation formula is as follows:
Figure 554837DEST_PATH_IMAGE014
step S203, embedding the fine feature self-enhancement module into the neural network structure, enhancing the non-significant features of the feature map output by the channel feature reactivation module through the fine feature self-enhancement module, and inhibiting the significant features to generate a first network model.
And step S204, connecting the gradient enhancement cross entropy loss function with the first network model to generate a second network model.
And S205, training the second network model based on a small batch random gradient descent algorithm.
And S206, modifying the trained second network model to obtain a reasoning network model, and inputting the image into the reasoning network model to obtain a target recognition result.
According to the scheme, the designed channel feature reactivation module is embedded into the neural network structure, and the weight is redistributed to the feature graph output by the neural network structure according to the channels, so that the problem of accuracy reduction caused by unbalanced sample types in the training process is solved.
Fig. 3 is a flowchart of another target identification method according to an embodiment of the present invention, which shows a specific method for data processing by the fine feature self-enhancement module. As shown in fig. 3, the technical solution is as follows:
and S301, embedding the channel feature reactivation module into a neural network structure.
Step S302, compressing the feature diagram output in the neural network structure at a spatial level through a channel feature reactivation module to obtain a compressed feature, reactivating the compressed feature to obtain an activated weight, and multiplying the input feature diagram by the activated weight according to a channel.
Step S303, embedding the fine-feature self-enhancement module into the neural network structure, enhancing a region corresponding to the non-salient feature of the feature map output from the channel feature reactivation module according to the enhancement mask, and suppressing a region corresponding to the salient feature of the feature map output from the channel feature reactivation module according to the suppression mask, thereby generating a first network model.
In one embodiment, the fine feature self-enhancement module includes an enhancement mask and a suppression mask, and the specific design of the module is as follows:
the input of the fine feature self-enhancement module is an output feature diagram of the channel feature reactivation module
Figure 228263DEST_PATH_IMAGE015
Defining the output of the fine feature self-enhancement as
Figure 519567DEST_PATH_IMAGE016
. It is determined by a mask which regions need enhancement and which regions need suppression, the mask comprising an enhancement mask and a suppression mask. Defining an enhancement mask as
Figure 881541DEST_PATH_IMAGE017
Figure 814862DEST_PATH_IMAGE018
A value of 1 or 0, 1 indicating that enhancement is required, 0 indicating that enhancement is not required, and an enhancement factor of
Figure 952582DEST_PATH_IMAGE019
Indicating the degree of enhancement of the characteristic value, the suppression mask being
Figure 805000DEST_PATH_IMAGE020
Figure 28171DEST_PATH_IMAGE021
A value of 1 or 0, 1 indicating that inhibition is required, 0 indicating that inhibition is not required, and the inhibition factor is
Figure 499604DEST_PATH_IMAGE022
The degree of suppression of the characteristic value is indicated. And enhancing or suppressing the corresponding area in the input feature map according to the enhancement mask and the suppression mask, multiplying the corresponding position of the input feature map by an enhancement factor when the enhancement mask at a certain position is 1, multiplying the corresponding position of the input feature map by a suppression factor when the suppression mask at a certain position is 1, and keeping the rest positions unchanged. The calculation formula of the fine feature self-enhancement module is as follows:
Figure 616465DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 780730DEST_PATH_IMAGE024
and
Figure 615830DEST_PATH_IMAGE025
cannot be simultaneously 1.
The enhancement mask is used for calculating an area needing enhancement, the peak value of the input feature map represents a significant feature, besides the peak value, many non-significant features, namely subtle features exist, and the non-significant features need to be enhanced in order to improve the learning capability of the network model for the subtle features. Will input the feature map
Figure 359796DEST_PATH_IMAGE026
Is divided into
Figure 737687DEST_PATH_IMAGE027
Block, defining the m-th row and n-th column of the feature block as
Figure 164864DEST_PATH_IMAGE028
The characteristic diagram is represented in a set form by blocks
Figure 628207DEST_PATH_IMAGE029
Similarly, the enhancement mask is divided into
Figure 300496DEST_PATH_IMAGE027
Block, define m row n column mask block as
Figure 267315DEST_PATH_IMAGE030
The enhancement mask is represented in a set form by blocks
Figure 773383DEST_PATH_IMAGE031
. The non-peak area in the feature block has a certain probability of being a fine feature, so that the corresponding area in the mask block is marked as 1 at random according to the probability p, and the rest positions are 0, namely:
Figure 317497DEST_PATH_IMAGE032
where p represents a probability value, obeying a Bernoulli distribution,
Figure 934423DEST_PATH_IMAGE033
representing the maximum value of the feature block, the enhancement mask corresponding position is 1 if the probability value is greater than or equal to 0.5 and is not the peak position, otherwise 0.
The suppression mask is used for calculating the area needing to be suppressed, the peak value in the input feature diagram represents the significant feature, and the random suppression of the peak value area according to a certain probability can improve the attention of the network model to the non-significant area or the fine feature. The calculation formula of the suppression mask is:
Figure 880382DEST_PATH_IMAGE034
wherein the content of the first and second substances,
Figure 291772DEST_PATH_IMAGE035
representing probability values, obeying bernoulli distributions,
Figure 464127DEST_PATH_IMAGE036
representing the maximum value in the feature map, the suppression mask corresponds to a position of 1 if the probability value is greater than or equal to 0.5 and is the peak position, and 0 otherwise.
And step S304, connecting the gradient enhancement cross entropy loss function with the first network model to generate a second network model.
And S305, training the second network model based on a small batch random gradient descent algorithm.
And S306, modifying the trained second network model to obtain a reasoning network model, and inputting the image into the reasoning network model to obtain a target recognition result.
According to the scheme, the designed fine feature self-enhancement module can enhance the non-significant features in the feature diagram and inhibit the significant features so as to achieve the purpose of fine feature self-enhancement, and the module can be flexibly embedded into a backbone network model of a classical neural network structure so as to improve the recognition capability of the existing network model on similar samples and improve the recognition accuracy of similar class samples.
Fig. 4 is a flowchart of another target identification method according to an embodiment of the present invention, which shows a specific method for generating a second network model, and as shown in fig. 4, the method specifically includes the following steps:
step S401, deleting a global pooling layer of the residual network, modifying the last full-connection layer into a convolution layer with a convolution kernel size of 1x1 and a channel number of C to obtain a feature map, inputting the feature map into a channel feature reactivation module, inputting the feature map output by the channel feature reactivation module into a fine feature self-enhancement module, and then connecting the fine feature self-enhancement module with the global pooling layer to generate a first network model.
In an embodiment, a backbone network is obtained by deleting a global pooling layer and a full link layer of a residual network, a structural diagram of a first network model is shown in fig. 5, fig. 5 is a schematic structural diagram of a second network model provided in an embodiment of the present invention, as shown in the figure, original training data is input to the backbone network (the backbone network is a residual network in which the global pooling layer and the full link layer are deleted), a last full link layer of the original residual network is modified into a convolutional layer with a convolutional kernel size of 1x1 and a channel number of C, an obtained feature map is input to a channel feature reactivation module, and the channel feature reactivation module is connected to a fine feature self-enhancement module and then connected to the global pooling layer to obtain the first network model.
And S402, connecting the gradient enhancement cross entropy loss function with the first network model to generate a second network model.
As shown in fig. 5, after the fine feature self-enhancement module is accessed to the global pooling layer to obtain the feature map, the gradient enhancement cross entropy loss function is connected to obtain the final second network model, which is a complete training network model.
And S403, training the second network model based on a small batch random gradient descent algorithm.
And S404, modifying the trained second network model to obtain a reasoning network model, and inputting the image into the reasoning network model to obtain a target recognition result.
In an embodiment, as shown in fig. 6, fig. 6 is a schematic structural diagram of the inference network model provided in the embodiment of the present invention, specifically, the fine feature self-enhancement module and the gradient enhancement cross entropy loss function in the trained second network model are deleted, and the Softmax loss function is accessed after the global pooling layer to obtain the inference network model. Correspondingly, after the inference model is obtained, test data can be input into the model to obtain a corresponding target recognition result.
According to the scheme, the channel feature reactivation module and the fine feature self-enhancement module are embedded into the neural network structure to generate a first network model, and the gradient enhancement cross entropy loss function is connected with the first network model to generate a second network model; training the second network model based on a small batch random gradient descent algorithm; and modifying the trained second network model to obtain a reasoning network model, and inputting the image into the reasoning network model to obtain a target recognition result.
Fig. 7 is a flowchart of another target identification method according to an embodiment of the present invention, which provides a specific method for generating a second network model by connecting a gradient-enhanced cross entropy loss function to a first network model, and as shown in fig. 7, the method specifically includes the following steps:
and S701, deleting a global pooling layer of the residual network, modifying the last full-link layer into a convolution layer with a convolution kernel size of 1x1 and a channel number of C to obtain a feature map, inputting the feature map into a channel feature reactivation module, and inputting the feature map output by the channel feature reactivation module into a fine feature self-enhancement module to generate a first network model.
Step S702, adjusting the loss value of the sample through a loss adjusting factor introduced in the gradient enhanced cross entropy loss function, and meanwhile, operating the negative sample meeting the preset condition to generate a second network model.
In particular, the fine features are self-enhanced from the output of the module
Figure 979685DEST_PATH_IMAGE037
Then connected with a global pooling layer to obtain scores
Figure 186675DEST_PATH_IMAGE038
Expressed in the form of a set
Figure 893600DEST_PATH_IMAGE039
The conventional cross entropy loss function is
Figure 553251DEST_PATH_IMAGE040
Figure 246401DEST_PATH_IMAGE041
Wherein
Figure 166952DEST_PATH_IMAGE042
For training the real label of the image I, the conventional cross entropy loss function treats all classes equally, so that the problem of identifying similar class samples in a fine-grained target identification task cannot be well solved. According to the gradient enhancement cross entropy loss function provided by the scheme, when p (s, l) is calculated, only K (K) with the highest score is considered in negative samples<Class = C) and introduces loss adjustment factors
Figure 920145DEST_PATH_IMAGE043
The loss value of the samples difficult to be distinguished is adjusted, so that the network can focus on the identification of the samples difficult to be distinguished, and the identification rate of similar category samples in the target identification task is improved. Definition of
Figure 67092DEST_PATH_IMAGE042
For positive exemplar labels, the set of labels for all negative exemplar classes is
Figure 688566DEST_PATH_IMAGE044
All negative examplesThe score set of this class is
Figure 604570DEST_PATH_IMAGE045
Will be
Figure 886253DEST_PATH_IMAGE046
Ranked from high to low by score, the Kth score is
Figure 520497DEST_PATH_IMAGE047
The set of categories in which topK scores are high is
Figure 555449DEST_PATH_IMAGE048
Then the GBCE is calculated as:
Figure 716172DEST_PATH_IMAGE049
wherein K is,
Figure 811167DEST_PATH_IMAGE050
For a hyper-parameter, the smaller K,
Figure 667127DEST_PATH_IMAGE050
the larger the relative loss value of the hard-to-separate sample, the smaller the relative loss value of the easy-to-separate sample, and vice versa. The relative loss value of the samples difficult to separate is increased, and the relative loss value of the samples easy to separate is reduced, so that the capability of a network model for focusing the samples difficult to separate is improved.
And S703, training the second network model based on a small batch random gradient descent algorithm.
And step S704, modifying the trained second network model to obtain a reasoning network model, and inputting the image into the reasoning network model to obtain a target recognition result.
Therefore, by introducing the designed gradient enhanced cross entropy loss function, the problem of identifying similar category samples in a fine-grained target identification task can be well solved, so that the network can focus on identifying difficultly-classified samples, and the identification rate of the similar category samples in the target identification task is improved.
Fig. 8 is a block diagram of a target identification apparatus according to an embodiment of the present invention, where the apparatus is configured to execute the target identification method according to the above embodiment, and has corresponding functional modules and beneficial effects of the execution method. As shown in fig. 8, the apparatus specifically includes: a first processing module 101, a second processing module 102, a training module 103, a third processing module 104, and a recognition module 105, wherein,
the first processing module 101 is used for embedding the channel feature reactivation module and the fine feature self-enhancement module into a neural network structure to generate a first network model; a second processing module 102, configured to connect a gradient enhancement cross entropy loss function to the first network model to generate a second network model; the training module 103 is used for training the second network model based on a small batch random gradient descent algorithm; the third processing module 104 is configured to modify the trained second network model to obtain an inference network model; and the recognition module 105 is used for inputting the image into the inference network model to obtain a target recognition result.
According to the scheme, the channel feature reactivation module and the fine feature self-enhancement module are embedded into the neural network structure to generate a first network model, the gradient enhancement cross entropy loss function is connected with the first network model to generate a second network model, the second network model is trained on the basis of a small-batch random gradient descent algorithm, the trained second network model is modified to obtain a reasoning network model, and an image is input into the reasoning network model to obtain a target recognition result. The method solves the problem of accuracy rate reduction caused by unbalanced sample categories in the training process by introducing the channel feature reactivation module, can enable the network model to learn more fine features by introducing the fine feature self-enhancement module and the gradient enhancement cross entropy loss function, is very effective in improving the accuracy rate of similar categories, and is particularly suitable for identification tasks of fine-grained targets, such as vehicle type identification, flower and plant identification, bird identification and the like.
In a possible embodiment, the first processing module 101 is specifically configured to:
the weight of the feature graph output in the neural network structure is redistributed according to the channel through a channel feature reactivation module;
and enhancing the non-significant features of the feature map output by the channel feature reactivation module through the fine feature self-enhancement module, and inhibiting the significant features.
In a possible embodiment, the first processing module 101 is specifically configured to:
compressing a feature map output in a neural network structure at a spatial level to obtain compressed features;
reactivating the compressed features to obtain activated weights;
and multiplying the activated weight by the input feature map according to channels.
In a possible embodiment, the first processing module 101 is specifically configured to:
enhancing the area corresponding to the non-significant feature of the feature map output by the channel feature reactivation module according to the enhancement mask;
and according to the suppression mask, suppressing the area corresponding to the salient feature of the feature map output by the channel feature reactivation module.
In a possible embodiment, the first processing module 101 is specifically configured to:
deleting a global pooling layer of the residual network, and modifying the last layer of full-link layer into a convolutional layer with a convolutional kernel size of 1x1 and a channel number of C to obtain a characteristic diagram;
inputting the feature map into a channel feature reactivation module;
inputting the feature graph output by the channel feature reactivation module into a fine feature self-enhancement module and then connecting the fine feature self-enhancement module with a global pooling layer to generate a first network model.
In a possible embodiment, the second processing module 102 is specifically configured to:
and adjusting the loss value of the sample by a loss adjusting factor introduced in the gradient enhancement cross entropy loss function, and meanwhile, operating the negative sample meeting the preset condition to generate a second network model.
In a possible embodiment, the third processing module 104 is specifically configured to:
deleting the fine feature self-enhancement module and the gradient enhancement cross entropy loss function in the trained second network model, and accessing the Softmax loss function after the global pooling layer to obtain the inference network model.
Fig. 9 is a schematic structural diagram of an apparatus according to an embodiment of the present invention, as shown in fig. 9, the apparatus includes a processor 201, a memory 202, an input device 203, and an output device 204; the number of the processors 201 in the device may be one or more, and one processor 201 is taken as an example in fig. 9; the processor 201, the memory 202, the input device 203 and the output device 204 in the apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 9.
The memory 202, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the object recognition method in the embodiments of the present invention. The processor 201 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory 202, that is, implements the above-described object recognition method.
The memory 202 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 202 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 202 may further include memory located remotely from the processor 201, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 203 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the apparatus. The output device 204 may include a display device such as a display screen.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method of object recognition, the method comprising:
embedding a channel feature reactivation module and a fine feature self-enhancement module into a neural network structure to generate a first network model;
connecting a gradient enhancement cross entropy loss function with the first network model to generate a second network model;
training the second network model based on a small batch random gradient descent algorithm;
modifying the trained second network model to obtain a reasoning network model;
and inputting the image into the inference network model to obtain a target recognition result.
From the above description of the embodiments, it is obvious for those skilled in the art that the embodiments of the present invention can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better implementation in many cases. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device) perform the methods described in the embodiments of the present invention.
It should be noted that, in the embodiment of the object recognition apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
It should be noted that the foregoing is only a preferred embodiment of the present invention and the technical principles applied. Those skilled in the art will appreciate that the embodiments of the present invention are not limited to the specific embodiments described herein, and that various obvious changes, adaptations, and substitutions are possible, without departing from the scope of the embodiments of the present invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the concept of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims (10)

1. An object recognition method, comprising:
embedding a channel feature reactivation module and a fine feature self-enhancement module into a neural network structure to generate a first network model;
connecting a gradient enhancement cross entropy loss function with the first network model to generate a second network model;
training the second network model based on a small batch random gradient descent algorithm;
modifying the trained second network model to obtain a reasoning network model;
and inputting the image into the inference network model to obtain a target recognition result.
2. The method of claim 1, wherein embedding the channel feature reactivation module and the fine feature self-enhancement module in a neural network structure generates a first network model, comprising:
the weight of the feature graph output in the neural network structure is redistributed according to the channel through a channel feature reactivation module;
and enhancing the non-significant features of the feature map output by the channel feature reactivation module through the fine feature self-enhancement module, and inhibiting the significant features.
3. The method of claim 2, wherein the channel feature re-activation module re-assigns weights to feature maps output in the neural network structure on a channel-by-channel basis, comprising:
compressing a feature map output in a neural network structure at a spatial level to obtain compressed features;
reactivating the compressed features to obtain activated weights;
and multiplying the activated weight by the input feature map according to channels.
4. The method of claim 2, wherein the fine feature self-enhancement module comprises an enhancement mask and a suppression mask, and the enhancing the non-salient features of the feature map output from the channel feature reactivation module by the fine feature self-enhancement module and the suppressing the salient features comprises:
enhancing the area corresponding to the non-significant feature of the feature map output by the channel feature reactivation module according to the enhancement mask;
and according to the suppression mask, suppressing the area corresponding to the salient feature of the feature map output by the channel feature reactivation module.
5. The method of claim 1, wherein embedding the channel feature reactivation module and the fine feature self-enhancement module in a neural network structure generates a first network model, comprising:
deleting a global pooling layer of the residual network, and modifying the last layer of full-link layer into a convolutional layer with a convolutional kernel size of 1x1 and a channel number of C to obtain a characteristic diagram;
inputting the feature map into a channel feature reactivation module;
inputting the feature graph output by the channel feature reactivation module into a fine feature self-enhancement module and then connecting the fine feature self-enhancement module with a global pooling layer to generate a first network model.
6. The method of claim 5, wherein said connecting a gradient enhancement cross-entropy loss function to said first network model generates a second network model, comprising:
and adjusting the loss value of the sample by a loss adjusting factor introduced in the gradient enhancement cross entropy loss function, and meanwhile, operating the negative sample meeting the preset condition to generate a second network model.
7. The method according to any one of claims 1-6, wherein modifying the trained second network model to obtain the inference network model comprises:
deleting the fine feature self-enhancement module and the gradient enhancement cross entropy loss function in the trained second network model, and accessing the Softmax loss function after the global pooling layer to obtain the inference network model.
8. An object recognition apparatus, comprising:
the first processing module is used for embedding the channel feature reactivation module and the fine feature self-enhancement module into the neural network structure to generate a first network model;
the second processing module is used for connecting the gradient enhancement cross entropy loss function with the first network model to generate a second network model;
the training module is used for training the second network model based on a small batch random gradient descent algorithm;
the third processing module is used for modifying the trained second network model to obtain a reasoning network model;
and the recognition module is used for inputting the image into the reasoning network model to obtain a target recognition result.
9. An object recognition device, the device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the object recognition method of any one of claims 1-7.
10. A storage medium containing computer-executable instructions for performing the object recognition method of any one of claims 1-7 when executed by a computer processor.
CN202010133440.XA 2020-03-02 2020-03-02 Target identification method, device, equipment and storage medium Active CN110991568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010133440.XA CN110991568B (en) 2020-03-02 2020-03-02 Target identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010133440.XA CN110991568B (en) 2020-03-02 2020-03-02 Target identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110991568A true CN110991568A (en) 2020-04-10
CN110991568B CN110991568B (en) 2020-07-31

Family

ID=70081512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010133440.XA Active CN110991568B (en) 2020-03-02 2020-03-02 Target identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110991568B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112068004A (en) * 2020-09-16 2020-12-11 北京嘀嘀无限科技发展有限公司 Method and device for determining battery abnormity and battery charging remaining time
CN112465026A (en) * 2020-11-26 2021-03-09 深圳市对庄科技有限公司 Model training method and device for jadeite mosaic recognition
CN113239965A (en) * 2021-04-12 2021-08-10 北京林业大学 Bird identification method based on deep neural network and electronic equipment
CN113405527A (en) * 2021-06-11 2021-09-17 广州邦鑫水利科技有限公司 Unmanned aerial vehicle surveying and mapping method and system based on adaptive algorithm
CN113486977A (en) * 2021-07-26 2021-10-08 广州邦鑫水利科技有限公司 Unmanned aerial vehicle surveying and mapping method and system based on deep learning
WO2022002059A1 (en) * 2020-06-30 2022-01-06 北京灵汐科技有限公司 Initial neural network training method and apparatus, image recognition method and apparatus, device, and medium
CN115019151A (en) * 2022-08-05 2022-09-06 成都图影视讯科技有限公司 Non-salient feature region accelerated neural network architecture, method and apparatus
CN115570228A (en) * 2022-11-22 2023-01-06 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364023A (en) * 2018-02-11 2018-08-03 北京达佳互联信息技术有限公司 Image-recognizing method based on attention model and system
CN109785344A (en) * 2019-01-22 2019-05-21 成都大学 The remote sensing image segmentation method of binary channel residual error network based on feature recalibration
CN109871905A (en) * 2019-03-14 2019-06-11 同济大学 A kind of plant leaf identification method based on attention mechanism depth model
CN110472732A (en) * 2019-08-19 2019-11-19 杭州凝眸智能科技有限公司 Optimize feature extracting method and its neural network structure
CN110533024A (en) * 2019-07-10 2019-12-03 杭州电子科技大学 Biquadratic pond fine granularity image classification method based on multiple dimensioned ROI feature
CN110689056A (en) * 2019-09-10 2020-01-14 Oppo广东移动通信有限公司 Classification method and device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364023A (en) * 2018-02-11 2018-08-03 北京达佳互联信息技术有限公司 Image-recognizing method based on attention model and system
CN109785344A (en) * 2019-01-22 2019-05-21 成都大学 The remote sensing image segmentation method of binary channel residual error network based on feature recalibration
CN109871905A (en) * 2019-03-14 2019-06-11 同济大学 A kind of plant leaf identification method based on attention mechanism depth model
CN110533024A (en) * 2019-07-10 2019-12-03 杭州电子科技大学 Biquadratic pond fine granularity image classification method based on multiple dimensioned ROI feature
CN110472732A (en) * 2019-08-19 2019-11-19 杭州凝眸智能科技有限公司 Optimize feature extracting method and its neural network structure
CN110689056A (en) * 2019-09-10 2020-01-14 Oppo广东移动通信有限公司 Classification method and device, equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022002059A1 (en) * 2020-06-30 2022-01-06 北京灵汐科技有限公司 Initial neural network training method and apparatus, image recognition method and apparatus, device, and medium
CN112068004A (en) * 2020-09-16 2020-12-11 北京嘀嘀无限科技发展有限公司 Method and device for determining battery abnormity and battery charging remaining time
CN112465026A (en) * 2020-11-26 2021-03-09 深圳市对庄科技有限公司 Model training method and device for jadeite mosaic recognition
CN113239965A (en) * 2021-04-12 2021-08-10 北京林业大学 Bird identification method based on deep neural network and electronic equipment
CN113239965B (en) * 2021-04-12 2023-05-02 北京林业大学 Bird recognition method based on deep neural network and electronic equipment
CN113405527A (en) * 2021-06-11 2021-09-17 广州邦鑫水利科技有限公司 Unmanned aerial vehicle surveying and mapping method and system based on adaptive algorithm
CN113405527B (en) * 2021-06-11 2022-07-22 湖北知寸航测科技有限公司 Unmanned aerial vehicle surveying and mapping method and system based on adaptive algorithm
CN113486977A (en) * 2021-07-26 2021-10-08 广州邦鑫水利科技有限公司 Unmanned aerial vehicle surveying and mapping method and system based on deep learning
CN115019151A (en) * 2022-08-05 2022-09-06 成都图影视讯科技有限公司 Non-salient feature region accelerated neural network architecture, method and apparatus
CN115570228A (en) * 2022-11-22 2023-01-06 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply

Also Published As

Publication number Publication date
CN110991568B (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN110991568B (en) Target identification method, device, equipment and storage medium
EP3114540B1 (en) Neural network and method of neural network training
CN111310814A (en) Method and device for training business prediction model by utilizing unbalanced positive and negative samples
CN112052837A (en) Target detection method and device based on artificial intelligence
CN111352926B (en) Method, device, equipment and readable storage medium for data processing
Marjani et al. The Large-Scale Wildfire Spread Prediction Using a Multi-Kernel Convolutional Neural Network
CN116827685B (en) Dynamic defense strategy method of micro-service system based on deep reinforcement learning
CN116777646A (en) Artificial intelligence-based risk identification method, apparatus, device and storage medium
CN113807541B (en) Fairness repair method, system, equipment and storage medium for decision system
CN115758337A (en) Back door real-time monitoring method based on timing diagram convolutional network, electronic equipment and medium
CN115879369A (en) Coal mill fault early warning method based on optimized LightGBM algorithm
CN115049019A (en) Method and device for evaluating arsenic adsorption performance of metal organic framework and related equipment
CN113283388A (en) Training method, device and equipment of living human face detection model and storage medium
CN113205158A (en) Pruning quantification processing method, device, equipment and storage medium of network model
CN112183622A (en) Method, device, equipment and medium for detecting cheating in mobile application bots installation
CN111582446B (en) System for neural network pruning and neural network pruning processing method
CN117786705B (en) Statement-level vulnerability detection method and system based on heterogeneous graph transformation network
CN113283520B (en) Feature enhancement-based depth model privacy protection method and device for membership inference attack
CN116644438B (en) Data security management method and system based on mobile storage device
CN112989057B (en) Text label determination method and device, computer equipment and storage medium
CN117690451B (en) Neural network noise source classification method and device based on ensemble learning
CN117521063A (en) Malicious software detection method and device based on residual neural network and combined with transfer learning
CN117668897A (en) Privacy protection method and device for neural network classification model
CN117692187A (en) Vulnerability restoration priority ordering method and device based on dynamics
CN115293423A (en) Processing method, device, equipment and storage medium based on natural disaster data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 306, zone 2, building 1, Fanshan entrepreneurship center, Panyu energy saving technology park, No. 832 Yingbin Road, Donghuan street, Panyu District, Guangzhou City, Guangdong Province

Patentee after: Jiadu Technology Group Co.,Ltd.

Patentee after: GUANGZHOU XINKE JIADU TECHNOLOGY Co.,Ltd.

Address before: Room 306, zone 2, building 1, Fanshan entrepreneurship center, Panyu energy saving technology park, No. 832 Yingbin Road, Donghuan street, Panyu District, Guangzhou City, Guangdong Province

Patentee before: PCI-SUNTEKTECH Co.,Ltd.

Patentee before: GUANGZHOU XINKE JIADU TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder