CN113705310A - Feature learning method, target object identification method and corresponding device - Google Patents

Feature learning method, target object identification method and corresponding device Download PDF

Info

Publication number
CN113705310A
CN113705310A CN202110360265.2A CN202110360265A CN113705310A CN 113705310 A CN113705310 A CN 113705310A CN 202110360265 A CN202110360265 A CN 202110360265A CN 113705310 A CN113705310 A CN 113705310A
Authority
CN
China
Prior art keywords
image
loss value
target object
value
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110360265.2A
Other languages
Chinese (zh)
Inventor
曹琼
车翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110360265.2A priority Critical patent/CN113705310A/en
Publication of CN113705310A publication Critical patent/CN113705310A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application provides a feature learning method, a target object identification method and a corresponding device, which are applied to the technical field of image identification. The feature vector extraction model is obtained by training a neural network model based on a first loss value and a second loss value corresponding to the feature vector of a target object contained in an image sample, wherein the first loss value represents the difference between the image sample and class centers corresponding to classes to which the image sample belongs, and the second loss value represents the correlation between the class centers corresponding to different classes to which the image sample belongs; therefore, through the training of the first loss value and the second loss value on the neural network model, when the feature vector extraction model obtained through training performs feature vector extraction on the image of the target object, the difference between the class centers corresponding to the image and the class to which the image belongs can be reduced, the correlation between the class centers corresponding to different classes to which the image belongs is reduced, and the accuracy of the feature vector extraction model performing the feature vector of the target object on the image is improved.

Description

Feature learning method, target object identification method and corresponding device
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a feature learning method, a target object recognition method, and a corresponding apparatus.
Background
In recent years, with the emergence of a large amount of short videos on a social media platform, the requirements of the platform on user portrait description, content recommendation and auditing are increasing, and object recognition, which is one of important information in video understanding, attracts more and more attention, such as face recognition. However, compared with object identification on pictures, short videos shot and uploaded by users on a social platform generally have the problems of shaking, blurring, strong illumination change, complex scene change and the like, and the object identification on the short videos is also more challenging.
In the process of object identification, the image feature extraction of the object is one of more important steps, and the accuracy of object identification can be greatly influenced. However, the current technology for extracting the image features of the object generally has the problem of inaccurate feature extraction.
Disclosure of Invention
In view of the above, the present application provides a feature learning method, a target object identification method, and a corresponding apparatus, so as to solve the problem in the prior art that image feature extraction of an object is inaccurate.
In order to achieve the above purpose, the present application provides the following technical solutions:
a first aspect of the present application discloses a method of feature learning, comprising:
acquiring an image to be extracted; the image to be extracted comprises a target object, wherein the target object is an object with the characteristic vector extracted;
calling a trained feature vector extraction model to process the image to be extracted to obtain a feature vector of a target object in the image to be extracted; the characteristic vector extraction model is obtained by training a neural network model based on a first loss value and a second loss value corresponding to a characteristic vector of a target object contained in an image sample; the first loss value and the second loss value are obtained by processing a feature vector of a target object contained in an image sample based on a predetermined central loss function, and the class center of the predetermined central loss function is the weight of a full-link layer of the neural network model; the first loss value is used for representing the gap between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
Optionally, in the foregoing method, the method further includes:
calling the neural network model to process an image sample to obtain a feature vector of a target object in the image sample;
processing the feature vector of the target object in the image sample by using the preset central loss function to obtain the first loss value and the second loss value;
and calculating the first loss value and the second loss value to obtain a total loss value, training the neural network model by taking the total loss value as a training parameter, and taking the trained neural network model as the feature vector extraction model.
Optionally, in the method, the processing the feature vector of the target object in the image sample by using the predetermined central loss function to obtain a first loss value and a second loss value includes:
calculating a feature vector of a target object contained in the image sample by using a calculation formula of the minimum in-class distance of the predetermined central loss function to obtain the first loss value; and calculating a feature vector of a target object contained in the image sample by using a calculation formula of orthogonalizing and reducing the intra-class distance of the preset central loss function to obtain the second loss value.
Optionally, in the method, after the invoking of the neural network model to process the image sample and obtaining the feature vector of the target object in the image sample, the method further includes:
processing the feature vector of the target object in the image sample by using a first loss function to obtain a third loss value, and processing the feature vector of the target object in the image sample by using a second loss function to obtain a fourth loss value;
wherein: the calculating the first loss value and the second loss value to obtain a total loss value includes:
taking the sum of the third loss value, the correction value of the fourth loss value, the correction value of the first loss value and the correction value of the second loss value as the total loss value; the correction value of the first loss value is the product of the first loss value and a preset value, the correction value of the second loss value is the product of the second loss value and a preset value, and the correction value of the fourth loss value is the product of the fourth loss value and a preset value.
The second aspect of the present application discloses a method for identifying a target object, comprising:
processing an image to be recognized by using the feature learning method in any one of the first aspect to obtain a feature vector of a target object in the image to be recognized; wherein the image to be recognized comprises a target object;
acquiring image quality parameters of the image to be identified; the image quality parameter is used for representing the quality of the image to be identified;
and when the image quality parameter of the image to be recognized represents the image quality of the image to be recognized, searching the characteristic vector of the target object in a search library to obtain an object with the highest similarity with the characteristic vector of the target object in the search library.
Optionally, in the method, the obtaining of the image quality parameter of the image to be recognized includes:
calling an image quality evaluation model to process the characteristic vector of the target object to obtain an image quality parameter of the image to be identified; the image quality parameters are used for representing the quality of the image to be identified, the image quality evaluation model is obtained by training a neural network model by using an image sample, the neural network model comprises a full connection layer and a quality weight layer running in parallel with the full connection layer, the full connection layer obtains the output value of the image sample, and the quality weight layer obtains the quality evaluation parameters; and calculating the loss value in the process of training the neural network model to obtain the image quality evaluation model according to the output value of the image sample and the quality evaluation parameter to obtain a total loss value.
Optionally, in the method, the obtaining of the image quality parameter of the image to be recognized includes:
and acquiring the confidence coefficient of the image to be recognized, wherein the confidence coefficient of the image to be recognized is used for representing the probability that the image to be recognized contains the target object.
Optionally, in the method, the obtaining of the image quality parameter of the image to be recognized includes:
acquiring coordinates of a plurality of preset key points of a target object in the image to be recognized;
calculating to obtain an angle value between a first connecting line of the preset key points and a second connecting line of the preset key points by using the coordinates of the preset key points; wherein the first connection line of the preset key points and the second connection line of the preset key points respectively refer to connection lines between specific key points in the plurality of preset key points;
and determining the display angle of the target object in the image to be recognized by utilizing the angle value between the first connecting line of the preset key point and the second connecting line of the preset key point.
A third aspect of the present application discloses an apparatus for feature learning, comprising:
an acquisition unit configured to acquire an image to be extracted; the image to be extracted comprises a target object, wherein the target object is an object with the characteristic vector extracted;
the calling unit is used for calling the trained feature vector extraction model to process the image to be extracted to obtain the feature vector of the target object in the image to be extracted; the characteristic vector extraction model is obtained by training a neural network model based on a first loss value and a second loss value corresponding to a characteristic vector of a target object contained in an image sample; the first loss value and the second loss value are obtained by processing a feature vector of a target object contained in an image sample based on a predetermined central loss function, and the class center of the predetermined central loss function is the weight of a full-link layer of the neural network model; the first loss value is used for representing the gap between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
Optionally, the above apparatus further comprises: a training unit to:
calling the neural network model to process an image sample to obtain a feature vector of a target object in the image sample; processing the feature vector of the target object in the image sample by using the preset central loss function to obtain the first loss value and the second loss value; and calculating the first loss value and the second loss value to obtain a total loss value, training the neural network model by taking the total loss value as a training parameter, and taking the trained neural network model as the feature vector extraction model.
Optionally, in the above apparatus, the training unit processes the feature vector of the target object in the image sample by using a central loss function, and when obtaining a first loss value and a second loss value, is configured to:
calculating the characteristic vector of the target object by using a calculation formula of the minimum in-class distance of the central loss function to obtain the first loss value; and calculating the characteristic vector of the target object by using a calculation formula for reducing the intra-class distance by orthogonalizing the central loss function to obtain the second loss value.
Optionally, in the above apparatus, after the training unit invokes the neural network model to process an image sample, and obtains a feature vector of a target object in the image sample, the training unit is further configured to:
processing the feature vector of the target object in the image sample by using a first loss function to obtain a third loss value, and processing the feature vector of the target object in the image sample by using a second loss function to obtain a fourth loss value;
wherein, the training unit calculates the first loss value and the second loss value, and when obtaining a total loss value, is configured to: taking the sum of the third loss value, the correction value of the fourth loss value, the correction value of the first loss value and the correction value of the second loss value as the total loss value; the correction value of the first loss value is the product of the first loss value and a preset value, the correction value of the second loss value is the product of the second loss value and a preset value, and the correction value of the fourth loss value is the product of the fourth loss value and a preset value.
The fourth aspect of the present application discloses an apparatus for recognizing a target object, comprising:
a calling unit, configured to process an image to be recognized by using the feature learning method according to any one of the first aspects, and obtain a feature vector of a target object in the image to be recognized; wherein the image to be recognized comprises a target object;
the acquisition unit is used for acquiring the image quality parameters of the image to be identified; the image quality parameter is used for representing the quality of the image to be identified;
and the retrieval unit is used for retrieving the characteristic vector of the target object in a retrieval library when the image quality parameter of the image to be identified represents the image quality of the image to be identified, so as to obtain the object with the highest similarity with the characteristic vector of the target object in the retrieval library.
Optionally, in the foregoing apparatus, the obtaining unit includes:
the calling subunit is used for calling an image quality evaluation model to process the characteristic vector of the target object to obtain an image quality parameter of the image to be identified; the image quality parameters are used for representing the quality of the image to be identified, the image quality evaluation model is obtained by training a neural network model by using an image sample, the neural network model comprises a full connection layer and a quality weight layer running in parallel with the full connection layer, the quality weight layer obtains the output value of the image sample, and the processing layer obtains the quality evaluation parameters; and calculating the loss value in the process of training the neural network model to obtain the image quality evaluation model according to the output value of the image sample and the quality evaluation parameter to obtain a total loss value.
Optionally, in the foregoing apparatus, the obtaining unit includes:
the first obtaining subunit is configured to obtain a confidence level of the image to be recognized, where the confidence level of the image to be recognized is used to characterize a probability that the image to be recognized contains a target object.
Optionally, in the foregoing apparatus, the obtaining unit includes:
the second acquisition subunit is used for acquiring the coordinates of a plurality of preset key points of the target object in the image to be identified;
the calculation subunit is used for calculating an angle value between a first connecting line of the preset key points and a second connecting line of the preset key points by using the coordinates of the preset key points; wherein the first connection line of the preset key points and the second connection line of the preset key points respectively refer to connection lines between specific key points in the plurality of preset key points;
and the determining subunit is used for determining the display angle of the target object in the image to be identified by using the angle value between the first connecting line of the preset key point and the second connecting line of the preset key point.
A fifth aspect of the present application discloses an electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of the first and second aspects.
A sixth aspect of the present application discloses a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any one of the first and second aspects.
According to the technical scheme, in the feature learning method provided by the application, the feature vector extraction model is obtained by training the neural network model based on the first loss value and the second loss value corresponding to the feature vector of the target object contained in the image sample, the first loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to different classes to which the image sample belongs; therefore, through the training of the first loss value and the second loss value on the neural network model, when the feature vector extraction model obtained through training performs feature vector extraction on the image of the target object, the difference between the class centers corresponding to the image and the class to which the image belongs can be reduced, the correlation between the class centers corresponding to different classes to which the image belongs is reduced, and the accuracy of the feature vector extraction model performing the feature vector of the target object on the image is improved. In addition, the class center of the center loss function is the weight of the full-connection layer of the neural network model, and additional parameter learning is not needed, so that parameter redundancy is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1a is a flowchart of a method for training a neural network model according to an embodiment of the present disclosure;
FIG. 1b is a schematic diagram of a training process of a neural network model disclosed in another embodiment of the present application;
FIG. 2 is a flow chart of a method of training another neural network model disclosed in another embodiment of the present application;
FIG. 3 is a diagram illustrating an effect of evaluating an image by using an image quality evaluation model according to another embodiment of the present application;
FIG. 4 is a method flow diagram of a method of feature learning disclosed in another embodiment of the present application;
FIG. 5 is a flowchart of a method for identifying a target object according to another embodiment of the present disclosure;
fig. 6 is a schematic diagram of an execution module of a method for face recognition according to another embodiment of the present application;
FIG. 7 is a flowchart of a method for face recognition according to another embodiment of the present disclosure;
FIG. 8 is a database distribution diagram of a search library as disclosed in another embodiment of the present application;
FIG. 9 is a graph illustrating the comparison between face recognition and prior art applications in accordance with another embodiment of the present application;
FIG. 10 is a schematic diagram of a training apparatus for neural network models, according to another embodiment of the present disclosure;
FIG. 11 is a schematic diagram of an apparatus for feature learning according to another embodiment of the present disclosure;
FIG. 12 is a schematic view of a target object recognition apparatus according to another embodiment of the present disclosure;
fig. 13 is a schematic diagram of an electronic device according to another embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Moreover, in this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence, and is specifically explained by the following embodiment.
The embodiment of the application provides a feature learning method, a target object identification method and a corresponding device, and aims to solve the problem that in the prior art, the image feature extraction of an object is inaccurate.
The embodiment of the present application provides a training method for a neural network model, as shown in fig. 1a, specifically including:
s101, calling a neural network model to process the image sample to obtain a feature vector of a target object in the image sample.
It should be noted that, when performing object recognition, feature extraction on an object image is a very important step, and whether feature extraction can be accurately performed on the object image is a large factor that affects the accuracy of object recognition. Therefore, the training method of the neural network model provided by the embodiment of the application is used for training the neural network model to obtain the extraction model capable of accurately extracting the object feature vector.
It should be further noted that fig. 1b is a schematic diagram of a training process of a neural network model, and this embodiment describes a training method of the neural network model with reference to fig. 1a and fig. 1 b. The neural network model may be selected according to actual conditions, and a convolutional neural network model (ConvNet C) is taken as an example in this embodiment.
When the neural network model is trained, firstly, image sample data required by training are obtained, then the current network model is called to process the image samples, and the feature vector of a target object in each obtained image sample is obtained.
S102, processing the feature vector of the target object by using a first loss function to obtain a first loss value, processing the feature vector of the target object by using a second loss function to obtain a second loss value, and processing the feature vector of the target object by using a third loss function to obtain a third loss value and a fourth loss value.
Wherein, the third loss function is a central loss function, and the class center of the third loss function is the weight of the full connection layer of the neural network model; the third loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the fourth loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
It should be noted that, for the feature vector of the target object in the image sample extracted by the ConvNet C model, the feature vector of the target object is processed by using a cross entropy loss function to obtain a first loss value, and a specific formula of the cross entropy loss function is as follows:
Figure BDA0003005272050000091
wherein, wiThe dimension is 512 dimensions, i is from 1 to M, and M is the number of categories; v. ofiThe feature vector of the ith sample is 512 dimensions; and N is the batch size.
The cross-entropy loss function is a classified objective function, and a first loss value is calculated by using the cross-entropy loss function, wherein the loss value enables the prediction of whether the target objects in the multiple images are in the same category or not in the process of training the model.
And processing the feature vector of the target object by using the triple loss function to obtain a second loss value, wherein the formula of the triple loss function is as follows:
Figure BDA0003005272050000101
wherein the content of the first and second substances,
Figure BDA0003005272050000102
being triplets of samples, i.e. samples
Figure BDA0003005272050000103
And
Figure BDA0003005272050000104
belong to the same class, sample
Figure BDA0003005272050000105
And
Figure BDA0003005272050000106
belong to different categories; a is a threshold value, minimizing LtripletSo that distances belonging to the same category
Figure BDA0003005272050000107
Figure BDA0003005272050000108
And distances belonging to different categories
Figure BDA0003005272050000109
The difference is less than the threshold.
And calculating a second loss value by using the triple loss function, wherein the loss value can enable the distance between the target object in the two images belonging to the same category to be smaller than the distance between the target object in the two images belonging to different categories in the process of training the model.
And processing the feature vector of the target object by using the central loss function to obtain a third loss value and a fourth loss value. The class center of the central loss function is the weight of the full connection layer of the neural network model, that is, no matter which class of image samples are, the same class center is used when the central loss function is used for calculating the loss value, that is, the weight of the full connection layer of the neural network model is used, so that the problem of parameter redundancy caused by the fact that each class of image samples needs to be provided with one class center can be solved. Of course, the set class center may also be different for different classes of image samples, without taking into account the participation in redundancy.
It should be further noted that, in the conventional center loss function, model training is generally performed only by considering that the intra-class distance of the class to which the sample belongs is as small as possible, and not considering that the sample distances between different classes are as large as possible.
Optionally, in another embodiment of the present application, an implementation manner of step S102 may include:
and calculating the characteristic vector of the target object by using a calculation formula of the minimum in-class distance of the third loss function to obtain a third loss value.
And calculating the characteristic vector of the target object by using a calculation formula for reducing the intra-class distance by utilizing the orthogonalization of the third loss function to obtain a fourth loss value.
Specifically, the feature vector of the target object is substituted into a calculation formula of the minimum intra-class distance of the center loss function to obtain a third loss value, which is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the specific formula is as follows:
Figure BDA0003005272050000111
wherein L isintraIs the third loss value, viIs the feature vector of sample i with dimension 512, cyiIs the class center of the sample yi, which is also a 512-dimensional vector, and the sample v can be minimized by the above formulaiAnd class center cyiThe distance of (a) is such that the samples belonging to the same class are as closely spaced as possible, and can be grouped in a class center as possible.
Substituting the feature vector of the target object into a calculation formula for orthogonalizing and reducing the intra-class distance of the central loss function to obtain a fourth loss value, wherein the fourth loss value is used for representing the correlation between class centers corresponding to classes to which different image samples belong, and the specific formula is as follows:
Figure BDA0003005272050000112
Figure BDA0003005272050000113
wherein, CiIs a class center, i.e. wi,C={ciI is a number from 1 to M, LinterFor the fourth loss value, I is the identity matrix and/| | · | | F is the frobenius norm. In particular, the method comprises the following steps of,
Figure BDA0003005272050000114
is { ciThe gram matrix of { c } can be expressed by the above formulaiThe gram matrix of { c } is close to the identity matrix, thereby reducing the class center { c }iThe correlation, i.e. the sample spacing of samples of different classes, is more separated.
S103, calculating a first loss value, a second loss value, a third loss value and a fourth loss value to obtain a total loss value, and training a neural network model by using the total loss value as a training parameter to obtain a feature vector extraction model.
It should be noted that after the first loss value, the second loss value, the third loss value, and the fourth loss value are calculated, a total loss value is calculated by using the first loss value, the second loss value, the third loss value, and the fourth loss value, then the total loss value is used as a training parameter, the model is trained in the neural network model by iterating the total loss value back to the neural network model, until the subsequently calculated total loss value is reduced to a preset standard, the training of the neural network model is completed, and the trained neural network model is used as a feature vector extraction model for extracting a feature vector of a target object in an image sample.
It should be further noted that, the network structure of resnet50 is adopted in the embodiment of the present application, and similar effects can also be obtained by using different infrastructure network structures in combination with the central loss function proposed in the present application, for example, deeper and wider network structures such as resnet100, efficentnet, and the like are adopted.
Optionally, in another embodiment of the present application, an implementation manner of step S103 may include:
taking the sum of the first loss value, the correction value of the second loss value, the correction value of the third loss value and the fourth loss value as a total loss value;
the correction value of the second loss value is the product of the second loss value and a preset value, the correction value of the third loss value is the product of the third loss value and the preset value, and the correction value of the fourth loss value is the product of the fourth loss value and the preset value.
Specifically, the formula for calculating the total loss value is as follows:
Figure BDA0003005272050000121
wherein L istotalTo total loss value, α1、α2、α3The second loss value, the third loss value and the fourth loss value can be adjusted through the formula, and the total loss value is obtained through calculation.
In addition, when the total loss value of the neural network model trained as the training parameter is calculated, the method proposed in the above steps S102 and S103 may not be used, and:
processing the characteristic vector of a target object in the image sample by using a central loss function to obtain two loss values; and calculating two loss values to obtain a total loss value, and training the neural network model by taking the total loss value as a training parameter.
Similarly, the class center of the center loss function is the weight of the full connection layer of the neural network model; one loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the other loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
When the two loss values are calculated to obtain the total loss value, the sum of the loss value used for representing the gap between the image sample and the class center corresponding to the class to which the image sample belongs and the correction value of the loss value used for representing the correlation between the class centers corresponding to the classes to which different image samples belong is also used as the total loss value. The correction value of the loss value used for representing the correlation between the class centers corresponding to the classes to which the different image samples belong is the product of the loss value used for representing the correlation between the class centers corresponding to the classes to which the different image samples belong and a preset value.
Another embodiment of the present application further provides another training method for a neural network model, which is used for generating a face quality assessment model, specifically as shown in fig. 2, and includes:
s201, calling a neural network model to process the image sample to obtain a feature vector of a target object in the image sample.
It should be noted that, the content of the step S101 may be referred to in the execution process of the step S201, and is not described herein again.
S202, simultaneously inputting the feature vectors of the image samples into the full-connection layer and the quality weight layer to respectively obtain the output values of the image samples and the quality evaluation parameters of the image samples.
It should be noted that, the neural network model in this embodiment also uses a convolutional neural network model, in which a quality weight layer running in parallel with the fully-connected layer is additionally added in the convolutional neural network model, and a sigmoid function is added to the quality weight layer on the basis of the original fully-connected layer. The quality weight layer may map 512-dimensional feature vectors to 1-dimensional values and normalize them to obtain values between 0 and 1. After extracting the feature vector of the image sample, inputting the feature vector of the image sample into the full-connection layer and the quality weight layer at the same time, wherein the full-connection layer obtains an output value of the image sample, namely a loss value of the image sample is obtained by utilizing a loss function calculation. The quality weight layer outputs a quality evaluation parameter, which reflects the quality of the image sample.
S203, calculating an output value of the image sample and a quality evaluation parameter of the image sample to obtain a total loss value, and training a neural network model by taking the total loss value as a training parameter to obtain an image quality evaluation model.
It should be noted that after obtaining the output value of the image sample and the quality evaluation parameter of the image sample, a product of the output value of the image sample and the quality evaluation parameter of the image sample is calculated as a total loss value of the image quality evaluation model, and a specific calculation formula is as follows:
Figure BDA0003005272050000131
wherein, at (i) is a quality evaluation parameter obtained by the sample i through the calculation of the quality weight layer, loss (i) is an output value obtained by the sample i in the full connection layer through the calculation of the loss function, loss is a total loss value of the image quality evaluation model, and const is a constant value. And then, taking the total loss value as a training parameter, iterating the total loss value back to the neural network model, training the model until the subsequently calculated total loss value is reduced to a preset standard, indicating that the training of the neural network model is finished, and taking the trained neural network model as an image quality evaluation model for evaluating the image quality of the image sample. The effect of evaluating the image by using the image quality evaluation model is shown in fig. 3, and the quality of the image is sorted from top to bottom from left to right.
The image quality evaluation model obtained by training can be used for accurately evaluating the image quality without manually evaluating the image quality, and the problem that the efficiency and quality of manual evaluation are greatly reduced due to the fact that the standard of the image quality is judged manually is subjective, especially the image quality is between good and poor is solved, and therefore the object recognition effect is poor. Meanwhile, the image quality evaluation model has good data migration performance, and can be easily migrated to other application scenes only by adjusting the threshold value of the quality score in different scenes.
Another embodiment of the present application further provides a method for feature learning, as shown in fig. 4, specifically including:
s401, acquiring an image to be identified; the image to be recognized comprises a target object.
In the feature learning process, an image to be recognized is first acquired, where the image to be recognized includes a target object to be recognized.
S402, calling a feature vector extraction model to process the image to be recognized to obtain a feature vector of a target object in the image to be recognized; the feature vector extraction model is obtained by training by using the training method of the neural network model disclosed in any one of the above embodiments.
It should be noted that the image to be recognized is input into the trained feature vector extraction model to perform feature extraction on the target object in the image to be recognized, so as to obtain the feature vector of the target object in the image to be recognized. By using the method and the device for feature vector extraction, the feature vector extraction model is obtained through training, the feature extraction can be accurately carried out on the target object in the image sample, and the accuracy of object identification is improved.
It should be further noted that, compared with the prior art which applies three loss functions, i.e., a cross entropy loss function, a triplet loss function, and a quasi-center loss function, the accuracy of object comparison and identification on common data sets LFW, CFP-FP, and AgeDB-30 is the best by using the feature learning method of the present application to extract the feature vector of the target object in the image to be identified, which can be specifically shown in table 1:
Figure BDA0003005272050000141
TABLE 1
In the feature learning method provided by the embodiment of the application, the feature vector extraction model is obtained by training the neural network model based on a first loss value and a second loss value corresponding to the feature vector of the target object included in the image sample, the first loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong; therefore, through the training of the first loss value and the second loss value on the neural network model, when the feature vector extraction model obtained through training performs feature vector extraction on the image of the target object, the difference between the class centers corresponding to the image and the class to which the image belongs can be reduced, the correlation between the class centers corresponding to different classes to which the image belongs is reduced, and the accuracy of the feature vector extraction model performing the feature vector of the target object on the image is improved.
It should be noted that, a feature learning method is performed on an image to obtain a feature vector of a specific object in the image, and the feature vector of the specific object can be generally applied to a task of performing recognition, that is, identity recognition of the specific object in the image is realized through the feature vector of the object, and also attribute prediction of the object can be realized, for example, a feature vector of a face in the image is obtained through feature learning extraction, and attributes such as a color value/an age of the face are predicted through the feature vector of the face.
It should be noted that, in the method for feature learning disclosed in this embodiment, before performing step S402, a step of training to obtain a feature vector extraction model may also be included.
Specifically, the method for training the feature vector extraction model includes:
calling a neural network model to process the image sample to obtain a characteristic vector of a target object in the image sample;
processing the feature vector of the target object in the image sample by using a preset center loss function to obtain a loss value representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs and a loss value representing the correlation between the class centers corresponding to the classes to which different image samples belong;
and calculating the two loss values to obtain a total loss value, training a neural network model by taking the total loss value as a training parameter, and taking the trained neural network model as a feature vector extraction model.
Of course, the specific implementation content of the feature vector extraction model method obtained by training can be referred to the content of the embodiment of the training method of the neural network model, and is not described herein again.
Another embodiment of the present application further provides a method for identifying a target object, as shown in fig. 5, specifically including:
s501, acquiring an image to be identified; the image to be recognized comprises a target object.
S502, calling a feature vector extraction model to process the image to be recognized to obtain a feature vector of a target object in the image to be recognized; the feature vector extraction model is obtained by training by adopting the training method of the neural network model disclosed in the embodiment.
It should be noted that, the content of the embodiment in step S501 and step S502 may refer to the content of the embodiment in fig. 4, and is not described herein again.
S503, acquiring image quality parameters of the image to be identified; the image quality parameter is used for representing the quality of the image to be identified.
It should be noted that, because the image quality may affect the accuracy of object recognition, the higher the image quality is, the better the recognition is, and the lower the image quality is, the harder the recognition is, therefore, before the object recognition of the image to be recognized is performed, the image quality parameters of the image to be recognized need to be acquired first, so that the image with high quality can be screened out, and the accuracy of object recognition is improved. The image quality parameter is used for representing the quality of the image to be identified.
Optionally, in another embodiment of the present application, an implementation manner of step S503 may include:
and calling an image quality evaluation model to process the characteristic vector of the target object to obtain the image quality parameter of the image to be identified.
The image quality parameter is used for representing the quality of an image to be identified, the image quality evaluation model is obtained by training a neural network model by using an image sample, the neural network model comprises a full connection layer and a quality weight layer which runs in parallel with the full connection layer, the full connection layer obtains the output value of the image sample, and the quality weight layer obtains the quality evaluation parameter; and (4) calculating the loss value in the process of training the neural network model to obtain the image quality evaluation model according to the output value of the image sample and the quality evaluation parameter to obtain a total loss value.
It should be noted that, the feature vector of the target object in the image to be recognized is input into the image quality evaluation model obtained by the training, the feature vector of the target object in 512 dimensions is mapped to a value in 1 dimension and normalized to obtain a value between 0 and 1, so as to output a quality evaluation parameter, which can reflect the quality of the image sample, and the quality evaluation parameter can be used to directly judge whether the quality of the image to be recognized is high or low.
Optionally, in another embodiment of the application, an implementation manner of step S503 may include:
and acquiring the confidence coefficient of the image to be recognized, wherein the confidence coefficient of the image to be recognized is used for representing the probability that the image to be recognized contains the target object.
It should be noted that, when a target object is detected, the object detector may be used to detect an image to be recognized, and output the confidence of the image to be recognized, and if the confidence of the image to be recognized is high, it indicates that the probability that the image to be recognized contains the target object is high; if the confidence of the image to be recognized is low, the probability that the image to be recognized contains the target object is small.
Optionally, in another embodiment of the present application, an implementation manner of step S503 may include:
and acquiring the coordinates of a plurality of preset key points of the target object in the image to be recognized.
Calculating to obtain an angle value between a first connecting line of the preset key points and a second connecting line of the preset key points by using the coordinates of the preset key points; the first connecting line of the preset key points and the second connecting line of the preset key points respectively refer to connecting lines among specific key points in the plurality of preset key points.
And determining the display angle of the target object in the image to be recognized by utilizing the angle value between the first connecting line of the preset key point and the second connecting line of the preset key point.
It should be noted that coordinates of a plurality of preset key points of the target object in the image to be recognized are obtained, for example, if the target object in the image to be recognized is a human face, coordinates of key points such as a nose, a mouth, and eyes on the human face are obtained, and the key points are connected by line segments by using the coordinates of the preset key points. Then, using the connecting lines between the key points, the angle values between the line segments, such as the mouth-to-nose angle, the mouth-to-left eye angle, etc., are calculated. And determining the display angle of the target object in the image to be recognized according to the angle value between the connecting lines of the key points, and judging the quality of the image to be recognized according to the display angle. For example, the display angle of the human face may be a side face or a front face in the image, and if the image is a side face, the quality of the image is low, and if the image is a front face, the quality of the image is high.
S504, when the image quality parameter of the image to be recognized represents the image quality of the image to be recognized, the feature vector of the target object is searched in the search library, and the object with the highest similarity with the feature vector of the target object in the search library is obtained.
It should be noted that, if the image quality parameter of the image to be recognized reflects that the image quality of the current image to be recognized is high, the feature vector of the target object is retrieved in the retrieval library according to the extracted feature vector of the target object in the image to be recognized, the object corresponding to the feature vector with the highest similarity to the feature vector of the target object in the retrieval library is found, the found object is used as the recognition result of the current target object, and the recognition result is output.
In the method for identifying the target object provided by the embodiment of the application, an image to be identified is obtained first, wherein the image to be identified comprises the target object. And calling a feature vector extraction model obtained by training in the application to process the image to be recognized, obtaining a feature vector of a target object in the image to be recognized, and obtaining an image quality parameter of the image to be recognized, wherein the image quality parameter can represent the quality of the image to be recognized. And then when the image quality parameter of the image to be recognized represents the image quality of the image to be recognized, searching the characteristic vector of the target object in the search library to obtain the object with the highest similarity with the characteristic vector of the target object in the search library. Therefore, by the method, the characteristic vector of the target object can be accurately extracted by using the characteristic vector extraction model, and the image quality parameters of the image to be identified are obtained, so that the image with high quality is screened out and then the object is identified, and the problems of inaccurate object identification caused by factors such as inaccurate image characteristic extraction of the object, poor image quality evaluation effect and the like are solved.
For convenience of understanding, in another embodiment of the application, a method for face recognition is provided, and specifically, the face recognition is taken as an example for description. It should be noted that, referring to fig. 6, the face recognition method is executed by six modules, which are a face detection module 601, a key point detection module 602, an alignment module 603, a feature extraction module 604, a face clustering and quality filtering module 605, and a face retrieval module 606, and specific execution processes of the modules refer to the following embodiments of the face recognition method.
As shown in fig. 7, the method for face recognition provided in the embodiment of the present application specifically includes:
s701, detecting the face in the video to obtain a face image to be recognized.
It should be noted that, when performing face recognition on an image in a video, the face detection module 601 first performs face detection on a video frame of the video, and if a face is detected, determines an image area where the face is located, and separately intercepts the image area where the face is located, so as to generate a face image to be recognized.
S702, detecting target key points of the face to be recognized to obtain position information of the target key points.
It should be noted that the key point detecting module 602 receives the face image to be recognized sent by the face detecting module 601, and detects a target key point preset in the face image to be recognized, for example, a mouth, a nose, and eyes, to obtain specific coordinates of the mouth, the nose, and the eyes.
And S703, aligning the face to be recognized by using the position information of the target key point to obtain an aligned face image.
It should be noted that after receiving the face image sent by the key point detecting module 602 and the position information of the target key point, the aligning module 603 obtains the position relationship of each key point in the face image, such as the mouth, the nose, and the eyes, in the face image according to the coordinates of the target key point by using an affine change method. And then aligning the position relation of each key point such as the mouth, the nose, the eyes and the like in the face image with the position relation of each key point in a standard face to obtain the aligned face image.
And S704, extracting the features of the aligned face images to obtain feature vectors of the face images.
It should be noted that, after receiving the aligned face image sent by the alignment module 603, the feature extraction module 604 inputs the aligned face image into the feature vector extraction model for feature extraction, so as to obtain a feature vector corresponding to the face image, where the dimension of the feature vector is 512 dimensions.
S705, screening out feature vectors capable of representing high quality of the face image from the feature vectors of the face image.
It should be noted that, after receiving the feature vectors of the face image sent by the feature extraction module 604, the face clustering and quality filtering module 605 clusters the feature vectors of the face image, and may calculate the similarity between the feature vectors by methods such as calculating an inner product, and group the feature vectors with high similarity together, that is, to group the face feature vectors of the same person together, and then select the feature vector with high quality that can reflect the face image corresponding to the feature vector from the clustered feature vectors.
S706, searching the feature vectors of the screened face images in the search library to obtain the face with the highest similarity with the feature vectors of the screened face images in the search library.
It should be noted that the face retrieval module 606 retrieves the feature vectors of the face images in the retrieval library according to the feature vectors of the face images obtained by the face clustering and quality filtering module 605, finds out the face corresponding to the feature vector with the highest similarity to the feature vectors of the face images in the retrieval library, uses the found face as the recognition result of the face image, and outputs the recognition result.
It should be noted that the search library accumulates abundant celebrity databases, and supports more than 5w of celebrity identification, 9 sub-fields 82 such as entertainment, politics, sports, economy, science and technology, culture, military, english, and recent history, and specific data are respectively shown in fig. 8. The retrieval library adopts data structured storage, and a client can easily check the retrieval library through electronic equipment.
In addition, in practical application, the face recognition accuracy and recall achieve good results in different fields and the like. Meanwhile, compared with other existing applications, the face recognition service of the scheme is better in recognition accuracy and recall in different fields than other existing applications, and specific comparison conditions can be seen in fig. 9.
In another embodiment of the present application, a training apparatus 100 for a neural network model is further disclosed, as shown in fig. 10, including:
and the calling unit 1001 is configured to call the neural network model to process the image sample, so as to obtain a feature vector of the target object in the image sample.
A processing unit 1002, configured to process the feature vector of the target object by using a first loss function to obtain a first loss value, process the feature vector of the target object by using a second loss function to obtain a second loss value, and process the feature vector of the target object by using a third loss function to obtain a third loss value and a fourth loss value; the third loss function is a central loss function of which the class center is the weight of the full connection layer of the neural network model; the third loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the fourth loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
And the model training unit 1003 is configured to calculate a first loss value, a second loss value, a third loss value, and a fourth loss value to obtain a total loss value, train the neural network model by using the total loss value as a training parameter, and obtain a feature vector extraction model.
In this embodiment, for the specific execution processes of the calling unit 1001, the processing unit 1002 and the model training unit 1003, reference may be made to the contents of the method embodiment corresponding to fig. 1a, and details are not described here again.
In the training device for the neural network model provided by the embodiment of the application, the neural network model is called to process an image sample to obtain a feature vector of a target object in the image sample. The method comprises the steps of processing a feature vector of a target object by using a first loss function to obtain a first loss value, processing the feature vector of the target object by using a second loss function to obtain a second loss value, and processing the feature vector of the target object by using a third loss function to obtain a third loss value and a fourth loss value. And calculating a first loss value, a second loss value, a third loss value and a fourth loss value to obtain a total loss value, and training a neural network model by using the total loss value as a training parameter to obtain a feature vector extraction model. When the training device of the neural network model provided by the application is used for training the feature vector extraction model, only the weight of the full connection layer is needed to be used as a parameter, and additional parameter learning is not needed, so that parameter redundancy is avoided, in addition, through the training of the first loss value and the second loss value on the neural network model, when the feature vector extraction model obtained through training is used for extracting the feature vector of a target object from an image, the difference between class centers corresponding to the image and the image belonging class can be reduced, the correlation between the class centers corresponding to the different image belonging classes is reduced, and the accuracy of the feature vector of the target object from the feature vector extraction model to the image is improved.
Optionally, in another embodiment of the present application, an implementation manner of the processing unit 1002 includes:
the first calculating subunit is used for calculating the feature vector of the target object by using a calculation formula of the minimum in-class distance of the third loss function to obtain a third loss value; and calculating the characteristic vector of the target object by using a calculation formula for reducing the intra-class distance by orthogonalization of the third loss function to obtain a fourth loss value.
In this embodiment, for the specific execution process of the first calculating subunit, reference may be made to the contents of the corresponding method embodiments described above, and details are not described here again.
Optionally, in another embodiment of the present application, an implementation manner of the model training unit 1003 includes:
a second calculation subunit configured to calculate a sum of the first loss value, the correction value of the second loss value, the correction value of the third loss value, and the fourth loss value correction value as a total loss value; the correction value of the second loss value is the product of the second loss value and a preset value, the correction value of the third loss value is the product of the third loss value and the preset value, and the correction value of the fourth loss value is the product of the fourth loss value and the preset value.
In this embodiment, the specific execution process of the second calculating subunit may refer to the content of the corresponding method embodiment described above, and is not described herein again.
Optionally, the processing unit 1002 may be further configured to process the feature vector of the target object in the image sample by using a central loss function to obtain two loss values; and calculating two loss values to obtain a total loss value, and training the neural network model by taking the total loss value as a training parameter. Similarly, the class center of the center loss function is the weight of the full connection layer of the neural network model; one loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the other loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
In addition, the model training unit 1003 may also be configured to process the feature vector of the target object in the image sample by using the central loss function to obtain two loss values, and calculate a total loss value. Specifically, when the model training unit calculates two loss values to obtain a total loss value, the sum of the loss value used for characterizing the gap between the image sample and the class center corresponding to the class to which the image sample belongs and the correction value of the loss value used for characterizing the correlation between the class centers corresponding to the classes to which different image samples belong is also used as the total loss value. The correction value of the loss value used for representing the correlation between the class centers corresponding to the classes to which the different image samples belong is the product of the loss value used for representing the correlation between the class centers corresponding to the classes to which the different image samples belong and a preset value.
In another embodiment of the present application, there is also disclosed a feature learning apparatus 110, as shown in fig. 11, including:
an acquisition unit 1101 for acquiring an image to be extracted; the image to be extracted comprises a target object, and the target object is an object with the extracted feature vector.
The calling unit 1102 is configured to call a feature vector extraction model to process the image to be extracted, so as to obtain a feature vector of a target object in the image to be extracted; the characteristic vector extraction model is obtained by training a neural network model based on a first loss value and a second loss value corresponding to a characteristic vector of a target object contained in an image sample; the first loss value and the second loss value are obtained by processing a feature vector of a target object contained in an image sample based on a preset central loss function, and the class center of the preset central loss function is the weight of a full connection layer of the neural network model; the first loss value is used for representing the difference between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
Optionally, the apparatus further comprises a training unit, configured to:
calling a neural network model to process the image sample to obtain a characteristic vector of a target object in the image sample; processing the characteristic vector of the target object in the image sample by using a preset central loss function to obtain a first loss value and a second loss value; and calculating a first loss value and a second loss value to obtain a total loss value, training a neural network model by taking the total loss value as a training parameter, and taking the trained neural network model as a feature vector extraction model.
Optionally, in the above apparatus, the training unit processes the feature vector of the target object in the image sample by using the central loss function, and when obtaining the first loss value and the second loss value, is configured to:
calculating the characteristic vector of the target object by using a calculation formula of the minimum in-class distance of a central loss function to obtain the first loss value; and calculating the characteristic vector of the target object by using a calculation formula for reducing the intra-class distance by orthogonalizing the central loss function to obtain a second loss value.
Optionally, in the above apparatus, after the training unit invokes the neural network model to process the image sample, and obtains the feature vector of the target object in the image sample, the training unit is further configured to:
processing the feature vector of the target object in the image sample by using the first loss function to obtain a third loss value, and processing the feature vector of the target object in the image sample by using the second loss function to obtain a fourth loss value;
wherein, the training unit calculates a first loss value and a second loss value, and when obtaining a total loss value, is configured to: taking the sum of the third loss value, the correction value of the fourth loss value, the correction value of the first loss value and the second loss value as a total loss value; the correction value of the first loss value is the product of the first loss value and a preset value, the correction value of the second loss value is the product of the second loss value and a preset value, and the correction value of the fourth loss value is the product of the fourth loss value and a preset value.
In this embodiment, for specific execution processes of the obtaining unit 1101 and the invoking unit 1102, reference may be made to the contents of the method embodiment corresponding to fig. 4, which is not described herein again. Moreover, for a specific execution process of the training unit to train the neural network model, reference may be made to the content of the method embodiment in fig. 1a, which is not described herein again.
In another embodiment of the present application, there is also disclosed a target object recognition apparatus 120, as shown in fig. 12, including:
a calling unit 1201, configured to process the image to be recognized by using the feature learning method disclosed in the embodiment corresponding to fig. 4, to obtain a feature vector of a target object in the image to be recognized; the image to be recognized comprises a target object;
an obtaining unit 1202, configured to obtain an image quality parameter of an image to be identified; the image quality parameters are used for representing the quality of the image to be identified;
the retrieving unit 1203 is configured to, when the image quality parameter of the image to be recognized represents the image quality of the image to be recognized, retrieve the feature vector of the target object in the search library, and obtain an object with the highest similarity to the feature vector of the target object in the search library.
In this embodiment, the specific execution processes of the calling unit 1201, the obtaining unit 1202, and the retrieving unit 1203 may refer to the contents of the method embodiment corresponding to fig. 5, which are not described herein again.
Optionally, in another embodiment of the present application, an implementation manner of the obtaining unit 1202 includes:
the calling subunit is used for calling the image quality evaluation model to process the characteristic vector of the target object to obtain an image quality parameter of the image to be identified; the image quality parameters are used for representing the quality of an image to be identified, the image quality evaluation model is obtained by training a neural network model by using an image sample, the neural network model comprises a full connection layer and a quality weight layer which runs in parallel with the full connection layer, the quality weight layer obtains the output value of the image sample, and the processing layer obtains the quality evaluation parameters; and (4) calculating the loss value in the process of training the neural network model to obtain the image quality evaluation model according to the output value of the image sample and the quality evaluation parameter to obtain a total loss value.
In this embodiment, the specific execution process of the calling subunit may refer to the content of the above corresponding method embodiment, which is not described herein again.
Optionally, in another embodiment of the present application, an implementation manner of the obtaining unit 1202 includes:
the first obtaining subunit is configured to obtain a confidence level of the image to be recognized, where the confidence level of the image to be recognized is used to represent a probability that the image to be recognized contains the target object.
In this embodiment, for the specific execution process of the first obtaining subunit, reference may be made to the contents of the corresponding method embodiments described above, and details are not described here again.
Optionally, in another embodiment of the present application, an implementation manner of the obtaining unit 1202 includes:
and the second acquisition subunit is used for acquiring the coordinates of a plurality of preset key points of the target object in the image to be identified.
The calculation subunit is used for calculating an angle value between a first connecting line of the preset key points and a second connecting line of the preset key points by using the coordinates of the preset key points; the first connecting line of the preset key points and the second connecting line of the preset key points respectively refer to connecting lines among specific key points in the plurality of preset key points.
And the determining subunit is used for determining the display angle of the target object in the image to be identified by using the angle value between the first connecting line of the preset key point and the second connecting line of the preset key point.
In this embodiment, the specific execution processes of the second obtaining subunit, the calculating subunit and the determining subunit may refer to the contents of the above corresponding method embodiments, and are not described herein again.
Another embodiment of the present application further provides an electronic device 130, as shown in fig. 13, specifically including:
one or more processors 1301.
A storage 1302 having one or more programs stored thereon.
The one or more programs, when executed by the one or more processors 1301, cause the one or more processors 1301 to implement the methods as in any one of the embodiments described above.
Another embodiment of the present application further provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any one of the above embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of feature learning, comprising:
acquiring an image to be extracted; the image to be extracted comprises a target object, wherein the target object is an object with the characteristic vector extracted;
calling a trained feature vector extraction model to process the image to be extracted to obtain a feature vector of a target object in the image to be extracted; the characteristic vector extraction model is obtained by training a neural network model based on a first loss value and a second loss value corresponding to a characteristic vector of a target object contained in an image sample; the first loss value and the second loss value are obtained by processing a feature vector of a target object contained in an image sample based on a predetermined central loss function, and the class center of the predetermined central loss function is the weight of a full-link layer of the neural network model; the first loss value is used for representing the gap between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
2. The method of claim 1, further comprising:
calling the neural network model to process an image sample to obtain a feature vector of a target object in the image sample;
processing the feature vector of the target object in the image sample by using the preset central loss function to obtain the first loss value and the second loss value;
and calculating the first loss value and the second loss value to obtain a total loss value, training the neural network model by taking the total loss value as a training parameter, and taking the trained neural network model as the feature vector extraction model.
3. The method of claim 2, wherein the processing the feature vectors of the target objects in the image samples using the predetermined central loss function to obtain a first loss value and a second loss value comprises:
calculating a feature vector of a target object contained in the image sample by using a calculation formula of the minimum in-class distance of the predetermined central loss function to obtain the first loss value; and calculating a feature vector of a target object contained in the image sample by using a calculation formula of orthogonalizing and reducing the intra-class distance of the preset central loss function to obtain the second loss value.
4. The method of claim 2, wherein after the invoking the neural network model to process the image sample to obtain the feature vector of the target object in the image sample, the method further comprises:
processing the feature vector of the target object in the image sample by using a first loss function to obtain a third loss value, and processing the feature vector of the target object in the image sample by using a second loss function to obtain a fourth loss value;
wherein: the calculating the first loss value and the second loss value to obtain a total loss value includes:
taking the sum of the third loss value, the correction value of the fourth loss value, the correction value of the first loss value and the correction value of the second loss value as the total loss value; the correction value of the first loss value is the product of the first loss value and a preset value, the correction value of the second loss value is the product of the second loss value and a preset value, and the correction value of the fourth loss value is the product of the fourth loss value and a preset value.
5. A method for identifying a target object, comprising:
processing an image to be recognized by using the feature learning method of any one of claims 1 to 4 to obtain a feature vector of a target object in the image to be recognized; wherein the image to be recognized comprises a target object;
acquiring image quality parameters of the image to be identified; the image quality parameter is used for representing the quality of the image to be identified;
and when the image quality parameter of the image to be recognized represents the image quality of the image to be recognized, searching the characteristic vector of the target object in a search library to obtain an object with the highest similarity with the characteristic vector of the target object in the search library.
6. The method according to claim 5, wherein the obtaining of the image quality parameter of the image to be recognized comprises:
calling an image quality evaluation model to process the characteristic vector of the target object to obtain an image quality parameter of the image to be identified; the image quality parameters are used for representing the quality of the image to be identified, the image quality evaluation model is obtained by training a neural network model by using an image sample, the neural network model comprises a full connection layer and a quality weight layer running in parallel with the full connection layer, the full connection layer obtains the output value of the image sample, and the quality weight layer obtains the quality evaluation parameters; and calculating the loss value in the process of training the neural network model to obtain the image quality evaluation model according to the output value of the image sample and the quality evaluation parameter to obtain a total loss value.
7. The method according to claim 5, wherein the obtaining of the image quality parameter of the image to be recognized comprises:
and acquiring the confidence coefficient of the image to be recognized, wherein the confidence coefficient of the image to be recognized is used for representing the probability that the image to be recognized contains the target object.
8. The method according to claim 5, wherein the obtaining of the image quality parameter of the image to be recognized comprises:
acquiring coordinates of a plurality of preset key points of a target object in the image to be recognized;
calculating to obtain an angle value between a first connecting line of the preset key points and a second connecting line of the preset key points by using the coordinates of the preset key points; wherein the first connection line of the preset key points and the second connection line of the preset key points respectively refer to connection lines between specific key points in the plurality of preset key points;
and determining the display angle of the target object in the image to be recognized by utilizing the angle value between the first connecting line of the preset key point and the second connecting line of the preset key point.
9. An apparatus for feature learning, comprising:
an acquisition unit configured to acquire an image to be extracted; the image to be extracted comprises a target object, wherein the target object is an object with the characteristic vector extracted;
the calling unit is used for calling the trained feature vector extraction model to process the image to be extracted to obtain the feature vector of the target object in the image to be extracted; the characteristic vector extraction model is obtained by training a neural network model based on a first loss value and a second loss value corresponding to a characteristic vector of a target object contained in an image sample; the first loss value and the second loss value are obtained by processing a feature vector of a target object contained in an image sample based on a predetermined central loss function, and the class center of the predetermined central loss function is the weight of a full-link layer of the neural network model; the first loss value is used for representing the gap between the image sample and the class center corresponding to the class to which the image sample belongs, and the second loss value is used for representing the correlation between the class centers corresponding to the classes to which different image samples belong.
10. An apparatus for identifying a target object, comprising:
the calling unit is used for processing the image to be recognized by using the feature learning method of any one of claims 1 to 4 to obtain a feature vector of a target object in the image to be recognized; wherein the image to be recognized comprises a target object;
the acquisition unit is used for acquiring the image quality parameters of the image to be identified; the image quality parameter is used for representing the quality of the image to be identified;
and the retrieval unit is used for retrieving the characteristic vector of the target object in a retrieval library when the image quality parameter of the image to be identified represents the image quality of the image to be identified, so as to obtain the object with the highest similarity with the characteristic vector of the target object in the retrieval library.
CN202110360265.2A 2021-04-02 2021-04-02 Feature learning method, target object identification method and corresponding device Pending CN113705310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110360265.2A CN113705310A (en) 2021-04-02 2021-04-02 Feature learning method, target object identification method and corresponding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110360265.2A CN113705310A (en) 2021-04-02 2021-04-02 Feature learning method, target object identification method and corresponding device

Publications (1)

Publication Number Publication Date
CN113705310A true CN113705310A (en) 2021-11-26

Family

ID=78647940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110360265.2A Pending CN113705310A (en) 2021-04-02 2021-04-02 Feature learning method, target object identification method and corresponding device

Country Status (1)

Country Link
CN (1) CN113705310A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549948A (en) * 2022-02-16 2022-05-27 北京百度网讯科技有限公司 Deep learning model training method, image recognition method, device and equipment
CN115082740A (en) * 2022-07-18 2022-09-20 北京百度网讯科技有限公司 Target detection model training method, target detection method, device and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549948A (en) * 2022-02-16 2022-05-27 北京百度网讯科技有限公司 Deep learning model training method, image recognition method, device and equipment
CN114549948B (en) * 2022-02-16 2023-06-30 北京百度网讯科技有限公司 Training method, image recognition method, device and equipment for deep learning model
CN115082740A (en) * 2022-07-18 2022-09-20 北京百度网讯科技有限公司 Target detection model training method, target detection method, device and electronic equipment
CN115082740B (en) * 2022-07-18 2023-09-01 北京百度网讯科技有限公司 Target detection model training method, target detection device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108229674B (en) Training method and device of neural network for clustering, and clustering method and device
US7809185B2 (en) Extracting dominant colors from images using classification techniques
AU2007325117B2 (en) Identifying images using face recognition
US20150039583A1 (en) Method and system for searching images
CN113779308B (en) Short video detection and multi-classification method, device and storage medium
CN112131978A (en) Video classification method and device, electronic equipment and storage medium
CN112686223B (en) Table identification method and device and computer readable storage medium
KR102095892B1 (en) Method, apparatus and system for determining similarity of patent documents using artificial intelligence model
CN108960142B (en) Pedestrian re-identification method based on global feature loss function
CN109710804B (en) Teaching video image knowledge point dimension reduction analysis method
CN104156413A (en) Trademark density based personalized trademark matching recognition method
CN113705310A (en) Feature learning method, target object identification method and corresponding device
CN114332889A (en) Text box ordering method and text box ordering device for text image
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN115827995A (en) Social matching method based on big data analysis
CN113963303A (en) Image processing method, video recognition method, device, equipment and storage medium
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN107480628B (en) Face recognition method and device
CN115600013B (en) Data processing method and device for matching recommendation among multiple subjects
CN113824989B (en) Video processing method, device and computer readable storage medium
CN115439919A (en) Model updating method, device, equipment, storage medium and program product
CN115049962A (en) Video clothing detection method, device and equipment
CN113723160A (en) Key point detection method and device for target image, electronic equipment and storage medium
CN111275183A (en) Visual task processing method and device and electronic system
CN113269176B (en) Image processing model training method, image processing device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination