CN117671760A - Training method and device of picture recognition model, storage medium and electronic equipment - Google Patents

Training method and device of picture recognition model, storage medium and electronic equipment Download PDF

Info

Publication number
CN117671760A
CN117671760A CN202311635861.2A CN202311635861A CN117671760A CN 117671760 A CN117671760 A CN 117671760A CN 202311635861 A CN202311635861 A CN 202311635861A CN 117671760 A CN117671760 A CN 117671760A
Authority
CN
China
Prior art keywords
picture
vector
classification
classification vector
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311635861.2A
Other languages
Chinese (zh)
Inventor
李晓宇
李永翔
朱庆军
路红柱
刘阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202311635861.2A priority Critical patent/CN117671760A/en
Publication of CN117671760A publication Critical patent/CN117671760A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a training method and device of a picture identification model, a storage medium and electronic equipment. Wherein the method comprises the following steps: obtaining a picture training set, wherein the picture training set comprises: a plurality of sample pictures; feature extraction is carried out on the picture training set to obtain sample classification vectors of each sample picture, wherein the sample classification vectors comprise: a first classification vector belonging to a first picture category and a second classification vector belonging to a second picture category; determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector; and determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model. The invention solves the technical problem of low training efficiency of the picture identification model.

Description

Training method and device of picture recognition model, storage medium and electronic equipment
Technical Field
The invention relates to the field of picture recognition, in particular to a training method and device of a picture recognition model, a storage medium and electronic equipment.
Background
In the field of deep learning, a plurality of face recognition training algorithms have been proposed, and good recognition performance is achieved. At present, the face recognition algorithm extracts the characteristics of the picture after the operation of operators such as convolution operation, downsampling operation, linear mapping operation and the like, and then converts the calculation of characteristic vectors and classification parameter vectors into an angle space for carrying out classification training, namely converts the point multiplication operation of the two vectors from algebraic form to geometric angle for carrying out operation.
For example SphereFace, cosineFace, arcFace, the face recognition algorithm is advanced by ArcFace, which extracts the features of the picture through a convolutional neural network in the training process, and then converts the calculation of the extracted face feature vector and the classification parameter vector into an angle space for carrying out classification, that is to say, converts the point multiplication operation of the two vectors from algebraic form to geometric form for carrying out operation. The angular distance constraint is increased in angular space when the two vectors are geometrically run. An hyper-parameter m (m is a number larger than 0) is defined in the ArcFace algorithm and is used for angle distance constraint, m is added on the basis of the obtained two vector included angle results, the added results are then calculated by using a cosine function, and the added angle distance constraint enables the value range of the angle to exceed 180 degrees, so that the cosine function adopted in geometric operation is free from monotonicity in the definition domain. The obtained face features have no larger degree of distinction, so that the trained model is not easy to converge, and the model training efficiency is low.
Aiming at the problem of low training efficiency of the picture identification model, no effective solution is proposed at present.
Disclosure of Invention
The embodiment of the invention provides a training method and device for a picture identification model, a storage medium and electronic equipment, and aims to at least solve the technical problem of low training efficiency of the picture identification model.
According to an aspect of the embodiment of the present invention, there is provided a training method for a picture recognition model, including: obtaining a picture training set, wherein the picture training set comprises: a plurality of sample pictures, and a picture tag for each of the sample pictures, the picture tag being for representing a picture category of the sample picture, the picture category comprising at least: a first picture category and a second picture category; extracting features from the picture training set to obtain a sample classification vector of each sample picture, wherein the sample classification vector is determined based on a feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of a picture recognition model according to the sample classification vector of the sample picture and the picture category, and the sample classification vector comprises: a first classification vector belonging to the first picture category and a second classification vector belonging to the second picture category; determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector; and determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
Optionally, extracting features from the training set of pictures to obtain a sample classification vector of each sample picture includes: randomly extracting a sample picture from the picture training set to serve as a target picture; extracting features of the target picture to obtain a target classification vector; and adding a category label to the target classification vector to obtain the sample classification vector, wherein the category label is determined according to the picture category.
Optionally, extracting features of the target picture, and obtaining a target classification vector includes: extracting the characteristics of the target picture to obtain the picture characteristics of the target picture; performing linear mapping on the picture characteristics to obtain characteristic vectors of the target picture; and carrying out dot multiplication processing on the characteristic vector and the parameter vector to obtain the target classification vector.
Optionally, performing a point multiplication process on the feature vector and the parameter vector to obtain the target classification vector includes: performing point multiplication operation on the characteristic vector and the parameter vector to obtain an operation model of the target classification vector, wherein the operation model represents the product of the characteristic modular length of the characteristic vector, the parameter modular length of the parameter vector and an included angle cosine value of the characteristic vector and the parameter vector; normalizing the characteristic vector and the parameter vector to enable the characteristic modular length and the parameter modular length to be 1; and under the condition that the characteristic modular length and the parameter modular length are 1, determining the target classification vector as an included angle cosine value of the characteristic vector and the parameter vector.
Optionally, adding a class label to the target classification vector, and obtaining the sample classification vector includes: acquiring a picture tag of the target picture; detecting whether the picture tag indicates that the target picture is the first picture category; adding a first class label to the target classification vector when the target picture belongs to the first picture class, wherein the first class label is used for indicating that the target classification vector is the first classification vector; determining that the target picture belongs to the second picture category and adding a second category label to the target classification vector when the target picture does not belong to the first picture category, wherein the second category label is used for indicating that the target classification vector is the second classification vector, and the second picture category comprises: a plurality of picture subcategories.
Optionally, determining the third classification vector according to the vector included angle corresponding to the first classification vector includes: processing the first classification vector by using an inverse trigonometric function to obtain the vector included angle, wherein the value range of the vector included angle is [ 0-180 ° ]; determining the sum of the angles of the vector included angle and a preset included angle as a target included angle, wherein the value range of the preset included angle is [ 0-180 ° ]; detecting an angle interval corresponding to the target included angle, wherein the angle interval comprises: the system comprises a first angle interval and a second angle interval, wherein the first angle interval is (0-180 degrees), the second angle interval is (180-360 degrees), a preset classification vector corresponding to the angle interval is determined to be the third classification vector, the preset classification vector comprises a first preset vector corresponding to the first angle interval and a second preset vector corresponding to the second angle interval, the first preset vector is determined based on the cosine value of the target included angle, and the second preset vector is determined based on the difference between the opposite number of the cosine value of the target included angle and a preset constant.
Optionally, determining the model loss function of the picture identification model based on the second classification vector and the third classification vector includes: the model loss function is determined by cross entropy loss function analysis of the differences between the second classification vector and the third classification vector.
According to another aspect of the embodiment of the present invention, there is also provided a training device for a picture recognition model, including: the image acquisition module is used for acquiring an image training set, wherein the image training set comprises: a plurality of sample pictures, and a picture tag for each of the sample pictures, the picture tag being for representing a picture category of the sample picture, the picture category comprising at least: a first picture category and a second picture category; the feature extraction module is configured to perform feature extraction on the picture training set to obtain a sample classification vector of each sample picture, where the sample classification vector is determined based on a feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of the sample classification vector of the sample picture and the picture class on the picture identification model, and the sample classification vector includes: a first classification vector belonging to the first picture category and a second classification vector belonging to the second picture category; the first determining module is used for determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector; and the second determining module is used for determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
According to another aspect of the embodiment of the present invention, there is further provided a nonvolatile storage medium, where the nonvolatile storage medium is used to store a program, where a device where the nonvolatile storage medium is controlled to execute the training method of the picture recognition model when the program runs.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device, including: the system comprises a memory and a processor, wherein the processor is used for running a program stored in the processor, and the training method of the picture identification model is executed when the program runs.
In the embodiment of the invention, the sample pictures belonging to the first picture category and the second picture category in the picture training set are subjected to feature extraction to obtain the first classification vector belonging to the first picture category and the second classification vector belonging to the second picture category, and then the first classification vector is processed based on the vector included angle of the feature vector and the parameter vector to obtain the third classification vector belonging to the first picture category, so that the third classification vector has larger degree of distinction between the first classification vector and the second classification vector compared with the second classification vector, and further, the parameter vector of the picture recognition model can be adjusted according to the second classification vector and the third classification vector through a model loss function in the process of training the picture recognition model, so that the distinguishing capability of the picture recognition model to the first picture category and the second picture category is optimized, the training process is stable and fast converged, the technical effect of training efficiency of the picture recognition model is realized, and the technical problem of low training efficiency of the picture recognition model is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of a training method of a picture recognition model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a face recognition system training system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training device for a picture recognition model according to an embodiment of the present invention;
fig. 4 is a block diagram of a computer terminal according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in describing embodiments of the present application are applicable to the following explanation:
softmax: is a function widely used in machine learning (especially deep learning), whose basic form is to normalize the real vector of one K-dimension, transform it into the real vector of another K-dimension, so that each element ranges between (0, 1), and the sum of all elements is 1. This allows the softmax function to be interpreted as a probability distribution.
According to an embodiment of the present invention, there is provided a training method embodiment of a picture recognition model, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.
Fig. 1 is a flowchart of a training method of a picture recognition model according to an embodiment of the present invention, as shown in fig. 1, the method includes the steps of:
step S102, a picture training set is obtained, wherein the picture training set comprises: the system comprises a plurality of sample pictures and picture labels of each sample picture, wherein the picture labels are used for representing picture categories of the sample pictures, and the picture categories at least comprise: a first picture category and a second picture category;
step S104, extracting features from the picture training set to obtain a sample classification vector of each sample picture, wherein the sample classification vector is determined based on the feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of the picture recognition model according to the sample classification vector and the picture category of the sample picture, and the sample classification vector comprises: a first classification vector belonging to a first picture category and a second classification vector belonging to a second picture category;
Step S106, determining a third classification vector according to the vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between the feature vector and the parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector;
step S108, determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting parameter vectors of the picture identification model.
In the embodiment of the invention, the sample pictures belonging to the first picture category and the second picture category in the picture training set are subjected to feature extraction to obtain the first classification vector belonging to the first picture category and the second classification vector belonging to the second picture category, and then the first classification vector is processed based on the vector included angle of the feature vector and the parameter vector to obtain the third classification vector belonging to the first picture category, so that the third classification vector has larger degree of distinction between the first classification vector and the second classification vector compared with the second classification vector, and further, the parameter vector of the picture recognition model can be adjusted according to the second classification vector and the third classification vector through a model loss function in the process of training the picture recognition model, so that the distinguishing capability of the picture recognition model to the first picture category and the second picture category is optimized, the training process is stable and fast converged, the technical effect of training efficiency of the picture recognition model is realized, and the technical problem of low training efficiency of the picture recognition model is solved.
In the above step S102, the picture training set may include a plurality of sample pictures collected in advance, each of which is added in advance with a picture tag for representing a picture category.
Optionally, the picture categories of the sample picture include a first picture category and a second picture category, wherein the sample picture of the first picture category is used as a positive sample of the training picture identification model, and the sample picture of the second picture category is used as a negative sample of the training picture identification model.
Optionally, the picture category of the sample picture may further include: and a third picture category, wherein a sample picture of the third picture category also serves as a negative sample of the training picture recognition model.
Optionally, the picture tag marks the picture category of the sample picture according to the object in the sample picture, and in the case that the sample picture includes a plurality of objects, a picture tag can be added for each object, then a plurality of picture tags can be added for each sample picture, further the sample picture can be segmented according to the picture tag, each segmented sample sub-picture has an object, then each sample sub-picture can be assigned with a picture tag, and further each sample sub-picture can be used as a new sample picture to be placed in the picture training set.
Optionally, the object in the sample picture for determining the picture category may be a face picture.
Optionally, in the case that the sample picture is a face picture, the picture recognition model is a face recognition model, and the features extracted from the sample picture are face features.
Optionally, the image recognition model is used for extracting features of the sample image, performing model training according to the extracted features of the sample image and the image type of the sample image, determining model parameters in the image recognition model, and completing training of the image recognition model.
In the step S104, feature extraction is performed on the sample pictures belonging to the first picture category in the picture training set, so as to obtain a first classification vector; and extracting the characteristics of the sample pictures belonging to the second picture category in the picture training set to obtain a second category vector.
Optionally, the picture identification model at least includes: the image classification system comprises a feature model and a classification model, wherein the feature model is used for extracting classification vectors of images, and the classification model is used for determining image categories according to the classification vectors extracted by the feature model.
Optionally, feature vectors can be extracted from sample pictures through a feature model, in order to enable the extracted feature vectors to distinguish picture types, the extracted feature vectors can be processed by using parameter vectors to obtain sample classification vectors capable of enabling the picture types to have larger distinction, so that the distinction of the sample classification vectors can be increased by optimizing the parameter vectors, and the model training process is the optimization process of the parameter vectors, and model training can be completed by determining the optimal parameter vectors.
In the step S106, in order to make the sample classification vectors belonging to different picture categories have a larger degree of distinction, a specific picture category may be selected, and the sample classification vector belonging to the picture category may be further processed to obtain a third classification vector, so that the distinction between the third classification vector and the second classification vector is greater than the distinction between the first classification vector and the second classification vector, thereby making it easier to distinguish the first picture category from the second picture category.
In the step S108, the model loss function is used to evaluate the performance of the image recognition model, and the faster the model loss function converges, the better the performance of the image recognition model is, the more stable the training process of the image recognition model is, and the trained image recognition model can more accurately complete recognition of the image category.
As an alternative embodiment, determining the model loss function of the picture recognition model based on the second classification vector and the third classification vector comprises: the model loss function is determined by analyzing the difference between the second classification vector and the third classification vector by the cross entropy loss function.
According to the embodiment of the invention, the cross entropy loss function of the picture identification model is constructed according to the second classification vector and the third classification vector, and the distinguishing capability of the picture identification model on the second classification vector and the third classification vector can be evaluated through the cross entropy loss function.
Alternatively, the model loss function may be:
wherein M is the number of sample pictures in the picture training set, i represents the ith sample picture of the picture training set, 0<=i<The number of picture categories in the training set is C, pi is the first picture category, j is the second picture category, which may be plural, i.e. 0<=j<=c-1 and j+.p i S is a super parameter, f j As a result of the second classification vector,is the third classification vector.
As an optional embodiment, performing feature extraction on the picture training set, and obtaining a sample classification vector of each sample picture includes: randomly extracting a sample picture from the picture training set to serve as a target picture; extracting features of the target picture to obtain a target classification vector; and adding a category label to the target classification vector to obtain a sample classification vector, wherein the category label is determined according to the picture category.
According to the embodiment of the invention, in the process of training the picture identification model according to the picture training set, the sample picture can be randomly extracted from the picture training set to serve as the target picture, then the feature extraction is carried out on the target picture, the target classification vector capable of representing the feature of the target picture is obtained, the picture label of the target picture is matched with the target classification vector of the target picture, the sample classification vector of the target picture is obtained, further, the sample classification vector of the target picture is the first classification vector under the condition that the picture label indicates that the target picture belongs to the first picture category, the sample classification vector of the target picture is the second classification vector under the condition that the picture label indicates that the target picture belongs to the second picture category, and the feature extraction is carried out on all sample pictures in the picture training set in sequence, so that the feature extraction is carried out on the picture training set, and the sample classification vector required by the training picture identification model is obtained.
As an alternative embodiment, performing feature extraction on the target picture, and obtaining the target classification vector includes: extracting features of the target picture to obtain picture features of the target picture; performing linear mapping on the picture characteristics to obtain characteristic vectors of the target picture; and performing point multiplication on the feature vector and the parameter vector to obtain a target classification vector.
According to the embodiment of the invention, in the process of extracting the characteristics of the target picture, the characteristic vector which can be used for training the picture identification model is obtained by firstly carrying out the transformation of the picture characteristics from the picture characteristics of the target picture and then carrying out the dot multiplication processing on the characteristic vector by combining the parameter vector, the characteristic difference of sample pictures with different picture categories is further increased, the target classification vector with larger distinction is obtained, and the determination of the target classification vector is realized.
As an alternative embodiment, performing a point multiplication process on the feature vector and the parameter vector to obtain a target classification vector includes: performing point multiplication operation on the feature vector and the parameter vector to obtain an operation model of the target classification vector, wherein the operation model represents the product of the feature modular length of the feature vector, the parameter modular length of the parameter vector and the cosine value of the included angle between the feature vector and the parameter vector; carrying out normalization processing on the characteristic vector and the parameter vector to enable the characteristic module length and the parameter module length to be 1; and under the condition that the characteristic modular length and the parameter modular length are 1, determining the object classification vector as an included angle cosine value of the characteristic vector and the parameter vector.
According to the embodiment of the invention, the feature vector and the parameter vector are subjected to the dot multiplication processing, the feature represented by the vector can be converted into the representation by the cosine value of the included angle, and for sample pictures with different picture types, the feature vectors extracted from the sample pictures are different, so that the cosine value of the included angle obtained after the dot multiplication operation of the different feature vectors and the same generated vector is different, the difference is further increased, and the complexity of classifying in the model training process can be reduced and the model efficiency is improved based on the target classification vector represented by the cosine value of the included angle.
Optionally, a target classification vector f obtained by performing a point multiplication operation on the feature vector and the parameter vector j Can be expressed as:
wherein X is a feature vector, W is a parameter vector, B j The bias of the linear mapping layer in training is shown, set to 0.
Alternatively, from a geometric point of view, the point multiplication of the feature vector and the parameter vector is expressed as:
f j =||W j ||||X||cosθ j
the normalization operation is performed on the feature vector X and the parameter vector W, so that the feature module length of the feature vector and the parameter vector is 1, i.e., i W j I i=1, I X I i=1, thus f j Can be expressed as f j =cosθ j
As an alternative embodiment, adding a class label to the target classification vector, obtaining a sample classification vector includes: acquiring a picture tag of a target picture; detecting whether the picture tag indicates that the target picture is of a first picture type or not; adding a first class label to the target classification vector under the condition that the target picture belongs to the first picture class, wherein the first class label is used for indicating that the target classification vector is the first classification vector; under the condition that the target picture does not belong to the first picture category, determining that the target picture belongs to a second picture category, and adding a second category label for the target classification vector, wherein the second category label is used for indicating that the target classification vector is the second classification vector, and the second picture category comprises: a plurality of picture subcategories.
According to the embodiment of the invention, the first picture category is the picture category which needs to be identified by using the picture identification model, so that in the process of training the picture identification model, the difference between the first picture category and other categories needs to be highlighted, therefore, under the condition that more than three picture categories exist in the picture training set, the other picture categories except the first picture category are considered to belong to the second picture category, namely, each picture category except the first picture category belongs to one picture category in the second picture category, further, in the process of distinguishing the target classification vectors, only the sample picture (namely, the target picture) which extracts the target classification vector needs to be judged to belong to the first picture category, if the target picture belongs to the first category, the target classification vector extracted by the target picture is the first classification vector, otherwise, the target classification vector extracted from the target picture of the second category is determined to be the second classification vector, and the classification vector of the target classification vector is obtained.
As an alternative embodiment, determining the third classification vector according to the vector included angle corresponding to the first classification vector includes: processing the first classification vector by using an inverse trigonometric function to obtain a vector included angle, wherein the value range of the vector included angle is 0-180 degrees; determining the sum of angles of the vector included angle and a preset included angle as a target included angle, wherein the value range of the preset included angle is 0-180 degrees; detecting an angle interval corresponding to the target included angle, wherein the angle interval comprises: the angle range comprises a first angle range and a second angle range, wherein the first angle range is (0-180 degrees), the second angle range is (180-360 degrees), the preset classification vector corresponding to the angle range is determined to be a third classification vector, the preset classification vector comprises a first preset vector corresponding to the first angle range and a second preset vector corresponding to the second angle range, the first preset vector is determined based on the cosine value of the target included angle, and the second preset vector is determined based on the difference between the opposite number of the cosine value of the target included angle and a preset constant.
Optionally, the preset included angle may be a preset angular space constraint super parameter.
According to the embodiment of the invention, the first classification vector can be represented by an included angle cosine value, so that a vector included angle can be obtained by performing inverse trigonometric function processing on the first classification vector, the difference between the vector included angle and the second classification vector can be further increased by performing summation operation on the vector included angle and a preset included angle, and then the corresponding preset classification vector is selected as a third classification vector according to the angle between the vector included angle and the preset included angle and the corresponding angle interval, so that the determined third classification vector meets the monotonicity requirement, and the degree of distinction between the first classification vector and the second classification vector can be increased, thereby realizing the purpose of converting the first classification vector into the third classification vector.
Optionally, the first classification vector is f j =cosθ j (wherein j=p i ),p i For the first picture category, the vector included angle can be calculated by using an inverse trigonometric functionWherein (1)>The value range satisfies-> Then inputting an angle space constraint super parameter (namely a preset angle) gamma, wherein the gamma value range is more than or equal to 0 degree and less than or equal to 180 degrees, and the target included angle is more than or equal to 180 degrees>The value range of (2) satisfies->In order to make the output third classification vector lie in the domain, i.e. in the target angle +. >In the range of the value of (2), the monotonicity is satisfied, and the first preset vector isWherein the first angle interval is +.> The second preset vector isWherein the second angle interval is +.> Therefore, the third classification vector is: />
The invention also provides a preferred embodiment, which provides a training device of the face recognition system, the face feature vector is obtained according to the acquired picture features, and an adaptive processing function is designed, so that the dot multiplication of the feature vector, the parameter vector and the classification vector of the face passes through a geometric representation form, and the monotonicity is satisfied on the definition domain of an angle space, thereby effectively accelerating the convergence of a model in the training process, and enabling the comparison distinction of the feature vector and other algorithms of the face to be larger.
Fig. 2 is a schematic diagram of a training system of a face recognition system according to an embodiment of the present invention, as shown in fig. 2, the system solves a problem that an adopted target cosine function is not monotonic in a definition domain when a face feature is operated in an angle space, and the system includes: the device comprises a face picture input module, a feature extraction module, a feature self-adaptive classification module, a Softmax classification loss module and a label receiving module. The feature self-adaptive classification module acquires a face feature vector (such as a first classification vector) according to the input picture features, maps the face feature vector to an angle space self-adaptive classification vector (such as a third classification vector), and enables the finally obtained classification result to have monotonicity in an angle space definition domain. By utilizing the system to train the face recognition model, the face features with more discrimination can be obtained, and the training process is stable and rapid to converge.
Optionally, the face picture input module is configured to receive face image data input by the system, such as receiving a plurality of sample pictures in a picture training set.
Optionally, the label receiving module is configured to receive a picture label of the sample picture, where the picture label is a positive integer, and the number of the picture label is the number of the sample pictures in the picture training set.
It should be noted that the tag receiving module is only used in the model training phase.
Optionally, the feature extraction module includes: the convolution operation and the downsampling operation are used for extracting characteristic information (namely picture characteristics) of the sample picture.
Alternatively, the feature extraction module may be any CNN fabric network, such as ResNet, mobileNet, darkNet, etc.
Optionally, the feature adaptive classification module includes: the device comprises a linear mapping module, a characteristic self-adaptive processing module and a Softmax classification loss module.
Alternatively, the linear mapping module may be a fully connected layer in the deep learning framework, comprising: linear map FC1, linear map FC2.
Optionally, the linear mapping FC1 is configured to perform linear mapping on the image features extracted by the feature extraction module, so as to obtain a feature vector that is finally used to express a face.
Alternatively, the linear mapping FC2 is used to point multiply the feature vector of the face with the classified parameter vector, thereby mapping the face feature to the classification information.
Optionally, for the sample classification vector of the output of the linear mapping FC2, matching may be performed with the picture label received by the label receiving module, determining a first classification vector belonging to the first picture category and a second classification vector belonging to the second picture category, and then conveying the first classification vector to the feature adaptive processing module and conveying the second classification vector to the Softmax classification loss module.
Optionally, the feature adaptive processing module is configured to perform transformation processing on the first classification vector in an angle space, and then output a final classification information result to obtain a third classification vector.
Optionally, the Softmax classification loss module comprises: and the cross entropy loss function in the deep learning framework performs classification optimization on the input sample picture according to the second classification vector output by the linear mapping FC2 and the third classification vector output by the characteristic self-adaptive processing module.
Optionally, the cross entropy loss function is:
wherein M is the number of sample pictures in the picture training set, i represents the ith sample picture of the picture training set, 0 <=i<The number of picture categories in the training set is C, pi is the first picture category, j is the second picture category, which may be plural, i.e. 0<=j<=c-1 and j+.p i S is a super parameter, f j The sample classification vector output for the linear mapping FC2 module,and outputting a third classification vector for the characteristic self-adaptive processing module.
Optionally, the sample classification vector output by the linear mapping FC2 module is:x is a characteristic vector output by a linear mapping FC1 module, W is a parameter vector, and B j Representing the bias of the linear mapping layer in training, and setting the bias to 0; further, from the geometric point of view, the point multiplication of the feature vector and the parameter vector is represented, and the sample classification vector is: f (f) j =||W j ||||X||cosθ j The normalization operation is performed on the feature vector X and the parameter vector W, so that the feature module length of the feature vector and the parameter vector is 1, i.e., i W j I i=1, I X I i=1, thus f j Can be expressed as f j =cosθ j
Optionally, the sample classification vector received by the feature adaptive processing module is a first classification vector, and the first classification vector is f j =cosθ j (wherein j=p i ),p i For the first picture category, the vector included angle can be calculated by using an inverse trigonometric functionWherein (1)>The value range satisfies->Then inputting an angle space constraint super parameter (namely a preset angle) gamma, wherein the gamma value range is more than or equal to 0 degree and less than or equal to 180 degrees, and the target included angle is more than or equal to 180 degrees >The value range of (2) satisfies->In order to make the output third classification vector lie in the domain, i.e. in the target angle +.>In the range of values satisfying monotonicity, the first predetermined vector is +.>Wherein the first angle interval is +.>The second preset vector is->Wherein the second angle interval isTherefore, the third classification vector is: />And the third classification vector output by the characteristic self-adaptive processing module meets monotonicity on the definition domain.
As an alternative example, the face recognition system, during the model training phase, trains face pictures (i.e. sample pictures) with class C (i.e. picture class), and marks the input sample pictures with sequence numbers of 0-C-1, respectively. M sample pictures are input to the system each time during training, and the sample pictures are output to the characteristic self-adaptive classification module after being processed by the characteristic extraction module.
Alternatively, the feature extraction module may be a convolutional neural network structure such as ResNet, mobileNet, darkNet.
Optionally, the feature adaptive classification module processes the input image features through the linear mapping FC1 module to obtain an n-dimensional feature vector (n generally has a value of 512 or 128), where the feature vector is a final feature vector representing a face in the image. The M pictures are processed by the linear mapping FC1 module, the feature vectors of the faces are extracted and input to the linear mapping FC2 module, and the layer maps the feature vectors of the faces into target classification vectors.
Optionally, after a sample picture is processed by the linear mapping FC2 module, a target classification vector (f 1, f2, …, FC) in C dimensions is obtained, that is, output of C classes is obtained. Carrying out normalization operation on the characteristic vector W of the input face and the classified parameter vector X in a linear mapping FC2 module, so that the modulus of the two vectors is 1; and the point multiplication operation of the two vectors is performed in a geometric form, so that the output form of the linear mapping FC2 module meets f j =cosθ j . The output of the linear mapping FC2 module obtained after the processing is multi-classification output, the output of the class corresponding to the real label corresponding to the input picture at the moment (namely, the first classification vector) is selected, the output is input to the characteristic self-adaptive processing module, and the output of the other classes is input to the softmax classification loss module.
Optionally, the feature adaptive processing module is configured to, according toAnd obtaining the included angle between the characteristic vector of the face and the corresponding classification vector. According to the included angle and the angle constraint super-parameters set by the training system, the range size of the definition domain of the angle space is obtained>Finally according to->And giving out the result after the characteristic self-adaptive processing, namely obtaining a third classification vector, wherein the obtained result meets the monotonically decreasing characteristic on the definition domain. The results are input to the softmax classification loss module, where the final loss calculation is performed to optimize the classification model of the system.
According to the embodiment of the invention, an embodiment of a training device for a picture recognition model is further provided, and it should be noted that the training device for a picture recognition model can be used for executing the training method for a picture recognition model in the embodiment of the invention.
Fig. 3 is a schematic diagram of a training device for a picture recognition model according to an embodiment of the present invention, and as shown in fig. 3, the device may include: the picture obtaining module 32 is configured to obtain a picture training set, where the picture training set includes: the system comprises a plurality of sample pictures and picture labels of each sample picture, wherein the picture labels are used for representing picture categories of the sample pictures, and the picture categories at least comprise: a first picture category and a second picture category; the feature extraction module 34 is configured to perform feature extraction on the training set of pictures to obtain a sample classification vector of each sample picture, where the sample classification vector is determined based on a feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of the sample classification vector and the picture category of the sample picture on the picture recognition model, and the sample classification vector includes: a first classification vector belonging to a first picture category and a second classification vector belonging to a second picture category; the first determining module 36 is configured to determine a third classification vector according to a vector included angle corresponding to the first classification vector, where the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and a degree of distinction between the third classification vector and the second classification vector is greater than a degree of distinction between the first classification vector and the second classification vector; a second determining module 38 is configured to determine a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used to adjust a parameter vector of the picture identification model.
It should be noted that, the picture obtaining module 32 in this embodiment may be used to perform step S102 in the embodiment of the present application, the feature extracting module 34 in this embodiment may be used to perform step S104 in the embodiment of the present application, the first determining module 36 in this embodiment may be used to perform step S106 in the embodiment of the present application, and the second determining module 38 in this embodiment may be used to perform step S108 in the embodiment of the present application. The above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments.
In the embodiment of the invention, the sample pictures belonging to the first picture category and the second picture category in the picture training set are subjected to feature extraction to obtain the first classification vector belonging to the first picture category and the second classification vector belonging to the second picture category, and then the first classification vector is processed based on the vector included angle of the feature vector and the parameter vector to obtain the third classification vector belonging to the first picture category, so that the third classification vector has larger degree of distinction between the first classification vector and the second classification vector compared with the second classification vector, and further, the parameter vector of the picture recognition model can be adjusted according to the second classification vector and the third classification vector through a model loss function in the process of training the picture recognition model, so that the distinguishing capability of the picture recognition model to the first picture category and the second picture category is optimized, the training process is stable and fast converged, the technical effect of training efficiency of the picture recognition model is realized, and the technical problem of low training efficiency of the picture recognition model is solved.
As an alternative embodiment, the feature extraction module includes: the picture extraction unit is used for randomly extracting a sample picture from the picture training set to serve as a target picture; the feature extraction unit is used for extracting features of the target picture to obtain a target classification vector; and the characteristic determining unit is used for adding a category label to the target classification vector to obtain a sample classification vector, wherein the category label is determined according to the picture category.
As an alternative embodiment, the feature extraction unit comprises: the extraction subunit is used for extracting the characteristics of the target picture to obtain the picture characteristics of the target picture; the mapping subunit is used for carrying out linear mapping on the picture characteristics to obtain the characteristic vector of the target picture; and the processing subunit is used for carrying out dot multiplication processing on the feature vector and the parameter vector to obtain a target classification vector.
As an alternative embodiment, the processing subunit comprises: the point multiplication operation subunit is used for carrying out point multiplication operation on the feature vector and the parameter vector to obtain an operation model of the target classification vector, wherein the operation model represents the product of the feature modular length of the feature vector and the parameter modular length of the parameter vector as well as the cosine value of the included angle between the feature vector and the parameter vector; the normalization subunit is used for carrying out normalization processing on the characteristic vector and the parameter vector to ensure that the characteristic module length and the parameter module length are both 1; and the first determination subunit is used for determining the object classification vector as the cosine value of the included angle of the feature vector and the parameter vector under the condition that the feature module length and the parameter module length are 1.
As an alternative embodiment, the feature determining unit comprises: the acquisition subunit is used for acquiring the picture tag of the target picture; the detection subunit is used for detecting whether the picture label represents that the target picture is of a first picture type; a second determining subunit, configured to add a first class label to the target classification vector when the target picture belongs to the first picture class, where the first class label is used to indicate that the target classification vector is the first classification vector; a third determining subunit, configured to determine, when the target picture does not belong to the first picture category, that the target picture belongs to the second picture category, and add a second category label to the target classification vector, where the second category label is used to indicate that the target classification vector is the second classification vector, and the second picture category includes: a plurality of picture subcategories.
As an alternative embodiment, the first determining module includes: the vector processing unit is used for processing the first classification vector by using an inverse trigonometric function to obtain a vector included angle, wherein the value range of the vector included angle is 0-180 degrees; the included angle determining unit is used for determining the sum of the angles of the vector included angle and a preset included angle as a target included angle, wherein the value range of the preset included angle is 0-180 degrees; the included angle detection unit is used for detecting an angle interval corresponding to the target included angle, wherein the angle interval comprises: the device comprises a first angle interval and a second angle interval, wherein the first angle interval is (0-180 degrees), the second angle interval is (180-360 degrees), and a vector determining unit is used for determining that a preset classification vector corresponding to the angle interval is a third classification vector, the preset classification vector comprises a first preset vector corresponding to the first angle interval and a second preset vector corresponding to the second angle interval, the first preset vector is determined based on the cosine value of a target included angle, and the second preset vector is determined based on the difference between the opposite number of the cosine value of the target included angle and a preset constant.
As an alternative embodiment, the second determining module includes: and the loss function determining unit is used for determining a model loss function by analyzing the difference between the second classification vector and the third classification vector through the cross entropy loss function.
Embodiments of the present invention may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.
Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.
In this embodiment, the above-mentioned computer terminal may execute the program code of the following steps in the training method of the picture recognition model: obtaining a picture training set, wherein the picture training set comprises: the system comprises a plurality of sample pictures and picture labels of each sample picture, wherein the picture labels are used for representing picture categories of the sample pictures, and the picture categories at least comprise: a first picture category and a second picture category; extracting features from the picture training set to obtain a sample classification vector of each sample picture, wherein the sample classification vector is determined based on the feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of the picture recognition model according to the sample classification vector and the picture category of the sample picture, and the sample classification vector comprises: a first classification vector belonging to a first picture category and a second classification vector belonging to a second picture category; determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector; and determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
Alternatively, fig. 4 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 4, the computer terminal 40 may include: one or more (only one is shown) processors 42 and memory 44.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the training method and apparatus of the image recognition model in the embodiment of the present invention, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the training method of the image recognition model. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to the terminal 40 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: obtaining a picture training set, wherein the picture training set comprises: the system comprises a plurality of sample pictures and picture labels of each sample picture, wherein the picture labels are used for representing picture categories of the sample pictures, and the picture categories at least comprise: a first picture category and a second picture category; extracting features from the picture training set to obtain a sample classification vector of each sample picture, wherein the sample classification vector is determined based on the feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of the picture recognition model according to the sample classification vector and the picture category of the sample picture, and the sample classification vector comprises: a first classification vector belonging to a first picture category and a second classification vector belonging to a second picture category; determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector; and determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
Optionally, the above processor may further execute program code for: randomly extracting a sample picture from the picture training set to serve as a target picture; extracting features of the target picture to obtain a target classification vector; and adding a category label to the target classification vector to obtain a sample classification vector, wherein the category label is determined according to the picture category.
Optionally, the above processor may further execute program code for: extracting features of the target picture to obtain picture features of the target picture; performing linear mapping on the picture characteristics to obtain characteristic vectors of the target picture; and performing point multiplication on the feature vector and the parameter vector to obtain a target classification vector.
Optionally, the above processor may further execute program code for: performing point multiplication operation on the feature vector and the parameter vector to obtain an operation model of the target classification vector, wherein the operation model represents the product of the feature modular length of the feature vector, the parameter modular length of the parameter vector and the cosine value of the included angle between the feature vector and the parameter vector; carrying out normalization processing on the characteristic vector and the parameter vector to enable the characteristic module length and the parameter module length to be 1; and under the condition that the characteristic modular length and the parameter modular length are 1, determining the object classification vector as an included angle cosine value of the characteristic vector and the parameter vector.
Optionally, the above processor may further execute program code for: acquiring a picture tag of a target picture; detecting whether the picture tag indicates that the target picture is of a first picture type or not; adding a first class label to the target classification vector under the condition that the target picture belongs to the first picture class, wherein the first class label is used for indicating that the target classification vector is the first classification vector; under the condition that the target picture does not belong to the first picture category, determining that the target picture belongs to a second picture category, and adding a second category label for the target classification vector, wherein the second category label is used for indicating that the target classification vector is the second classification vector, and the second picture category comprises: a plurality of picture subcategories.
Optionally, the above processor may further execute program code for: processing the first classification vector by using an inverse trigonometric function to obtain a vector included angle, wherein the value range of the vector included angle is 0-180 degrees; determining the sum of angles of the vector included angle and a preset included angle as a target included angle, wherein the value range of the preset included angle is 0-180 degrees; detecting an angle interval corresponding to the target included angle, wherein the angle interval comprises: the angle range comprises a first angle range and a second angle range, wherein the first angle range is (0-180 degrees), the second angle range is (180-360 degrees), the preset classification vector corresponding to the angle range is determined to be a third classification vector, the preset classification vector comprises a first preset vector corresponding to the first angle range and a second preset vector corresponding to the second angle range, the first preset vector is determined based on the cosine value of the target included angle, and the second preset vector is determined based on the difference between the opposite number of the cosine value of the target included angle and a preset constant.
Optionally, the above processor may further execute program code for: the model loss function is determined by analyzing the difference between the second classification vector and the third classification vector by the cross entropy loss function.
By adopting the embodiment of the invention, a training scheme of a picture identification model is provided. In the embodiment of the invention, the sample pictures belonging to the first picture category and the second picture category in the picture training set are subjected to feature extraction to obtain the first classification vector belonging to the first picture category and the second classification vector belonging to the second picture category, and then the first classification vector is processed based on the vector included angle of the feature vector and the parameter vector to obtain the third classification vector belonging to the first picture category, so that the third classification vector has larger degree of distinction between the first classification vector and the second classification vector compared with the second classification vector, and further, the parameter vector of the picture recognition model can be adjusted according to the second classification vector and the third classification vector through a model loss function in the process of training the picture recognition model, so that the distinguishing capability of the picture recognition model to the first picture category and the second picture category is optimized, the training process is stable and fast converged, the technical effect of training efficiency of the picture recognition model is realized, and the technical problem of low training efficiency of the picture recognition model is solved.
It will be appreciated by those skilled in the art that the configuration shown in fig. 4 is only illustrative, and the computer terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm-phone computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 4 is not limited to the structure of the electronic device. For example, the computer terminal 40 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those skilled in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute on hardware associated with the terminal device, the program may be stored in a nonvolatile storage medium, and the nonvolatile storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
Embodiments of the present invention also provide a nonvolatile storage medium. Alternatively, in this embodiment, the above-described nonvolatile storage medium may be used to store program code executed by the training method of the picture recognition model provided in the above-described embodiment.
Alternatively, in this embodiment, the above-mentioned nonvolatile storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: obtaining a picture training set, wherein the picture training set comprises: a plurality of sample pictures, and a picture tag for each of the sample pictures, the picture tag being for representing a picture category of the sample picture, the picture category comprising at least: a first picture category and a second picture category; extracting features from the picture training set to obtain a sample classification vector of each sample picture, wherein the sample classification vector is determined based on a feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of a picture recognition model according to the sample classification vector of the sample picture and the picture category, and the sample classification vector comprises: a first classification vector belonging to the first picture category and a second classification vector belonging to the second picture category; determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector; and determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: randomly extracting a sample picture from the picture training set to serve as a target picture; extracting features of the target picture to obtain a target classification vector; and adding a category label to the target classification vector to obtain the sample classification vector, wherein the category label is determined according to the picture category.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: extracting the characteristics of the target picture to obtain the picture characteristics of the target picture; performing linear mapping on the picture characteristics to obtain characteristic vectors of the target picture; and carrying out dot multiplication processing on the characteristic vector and the parameter vector to obtain the target classification vector.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: performing point multiplication operation on the characteristic vector and the parameter vector to obtain an operation model of the target classification vector, wherein the operation model represents the product of the characteristic modular length of the characteristic vector, the parameter modular length of the parameter vector and an included angle cosine value of the characteristic vector and the parameter vector; normalizing the characteristic vector and the parameter vector to enable the characteristic modular length and the parameter modular length to be 1; and under the condition that the characteristic modular length and the parameter modular length are 1, determining the target classification vector as an included angle cosine value of the characteristic vector and the parameter vector.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: acquiring a picture tag of the target picture; detecting whether the picture tag indicates that the target picture is the first picture category; adding a first class label to the target classification vector when the target picture belongs to the first picture class, wherein the first class label is used for indicating that the target classification vector is the first classification vector; determining that the target picture belongs to the second picture category and adding a second category label to the target classification vector when the target picture does not belong to the first picture category, wherein the second category label is used for indicating that the target classification vector is the second classification vector, and the second picture category comprises: a plurality of picture subcategories.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: processing the first classification vector by using an inverse trigonometric function to obtain the vector included angle, wherein the value range of the vector included angle is [ 0-180 ° ]; determining the sum of the angles of the vector included angle and a preset included angle as a target included angle, wherein the value range of the preset included angle is [ 0-180 ° ]; detecting an angle interval corresponding to the target included angle, wherein the angle interval comprises: the system comprises a first angle interval and a second angle interval, wherein the first angle interval is (0-180 degrees), the second angle interval is (180-360 degrees), a preset classification vector corresponding to the angle interval is determined to be the third classification vector, the preset classification vector comprises a first preset vector corresponding to the first angle interval and a second preset vector corresponding to the second angle interval, the first preset vector is determined based on the cosine value of the target included angle, and the second preset vector is determined based on the difference between the opposite number of the cosine value of the target included angle and a preset constant.
Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: the model loss function is determined by cross entropy loss function analysis of the differences between the second classification vector and the third classification vector.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A training method for a picture recognition model, comprising:
obtaining a picture training set, wherein the picture training set comprises: a plurality of sample pictures, and a picture tag for each of the sample pictures, the picture tag being for representing a picture category of the sample picture, the picture category comprising at least: a first picture category and a second picture category;
extracting features from the picture training set to obtain a sample classification vector of each sample picture, wherein the sample classification vector is determined based on a feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of a picture recognition model according to the sample classification vector of the sample picture and the picture category, and the sample classification vector comprises: a first classification vector belonging to the first picture category and a second classification vector belonging to the second picture category;
Determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector;
and determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
2. The method of claim 1, wherein performing feature extraction on the training set of pictures to obtain a sample classification vector for each of the sample pictures comprises:
randomly extracting a sample picture from the picture training set to serve as a target picture;
extracting features of the target picture to obtain a target classification vector;
and adding a category label to the target classification vector to obtain the sample classification vector, wherein the category label is determined according to the picture category.
3. The method of claim 2, wherein performing feature extraction on the target picture to obtain a target classification vector comprises:
Extracting the characteristics of the target picture to obtain the picture characteristics of the target picture;
performing linear mapping on the picture characteristics to obtain characteristic vectors of the target picture;
and carrying out dot multiplication processing on the characteristic vector and the parameter vector to obtain the target classification vector.
4. A method according to claim 3, wherein performing a point multiplication process on the feature vector and the parameter vector to obtain the target classification vector comprises:
performing point multiplication operation on the characteristic vector and the parameter vector to obtain an operation model of the target classification vector, wherein the operation model represents the product of the characteristic modular length of the characteristic vector, the parameter modular length of the parameter vector and an included angle cosine value of the characteristic vector and the parameter vector;
normalizing the characteristic vector and the parameter vector to enable the characteristic modular length and the parameter modular length to be 1;
and under the condition that the characteristic modular length and the parameter modular length are 1, determining the target classification vector as an included angle cosine value of the characteristic vector and the parameter vector.
5. The method of claim 2, wherein adding a class label to the target classification vector to obtain the sample classification vector comprises:
Acquiring a picture tag of the target picture;
detecting whether the picture tag indicates that the target picture is the first picture category;
adding a first class label to the target classification vector when the target picture belongs to the first picture class, wherein the first class label is used for indicating that the target classification vector is the first classification vector;
determining that the target picture belongs to the second picture category and adding a second category label to the target classification vector when the target picture does not belong to the first picture category, wherein the second category label is used for indicating that the target classification vector is the second classification vector, and the second picture category comprises: a plurality of picture subcategories.
6. The method of claim 1, wherein determining a third classification vector based on the vector included angle corresponding to the first classification vector comprises:
processing the first classification vector by using an inverse trigonometric function to obtain the vector included angle, wherein the value range of the vector included angle is [ 0-180 ° ];
determining the sum of the angles of the vector included angle and a preset included angle as a target included angle, wherein the value range of the preset included angle is [ 0-180 ° ];
Detecting an angle interval corresponding to the target included angle, wherein the angle interval comprises: a first angle interval of (0 ° -180 ° ], and a second angle interval of (180 ° -360 ° ];
determining a preset classification vector corresponding to the angle interval as the third classification vector, wherein the preset classification vector comprises: the first preset vector is determined based on the cosine value of the target included angle, and the second preset vector is determined based on the difference between the opposite number of the cosine value of the target included angle and a preset constant.
7. The method of claim 1, wherein determining a model loss function for the picture identification model based on the second classification vector and the third classification vector comprises:
the model loss function is determined by cross entropy loss function analysis of the differences between the second classification vector and the third classification vector.
8. A training device for a picture recognition model, comprising:
the image acquisition module is used for acquiring an image training set, wherein the image training set comprises: a plurality of sample pictures, and a picture tag for each of the sample pictures, the picture tag being for representing a picture category of the sample picture, the picture category comprising at least: a first picture category and a second picture category;
The feature extraction module is configured to perform feature extraction on the picture training set to obtain a sample classification vector of each sample picture, where the sample classification vector is determined based on a feature vector and a parameter vector, the feature vector is extracted from the sample picture, the parameter vector is a training result of the sample classification vector of the sample picture and the picture class on the picture identification model, and the sample classification vector includes: a first classification vector belonging to the first picture category and a second classification vector belonging to the second picture category;
the first determining module is used for determining a third classification vector according to a vector included angle corresponding to the first classification vector, wherein the vector included angle is an included angle between a feature vector and a parameter vector of the first classification vector, and the distinguishing degree of the third classification vector and the second classification vector is larger than that of the first classification vector and the second classification vector;
and the second determining module is used for determining a model loss function of the picture identification model based on the second classification vector and the third classification vector, wherein the model loss function is used for adjusting a parameter vector of the picture identification model.
9. A non-volatile storage medium, wherein the non-volatile storage medium is configured to store a program, and wherein the program, when executed, controls a device in which the non-volatile storage medium is located to perform the training method of the picture recognition model according to any one of claims 1 to 7.
10. An electronic device, comprising: a memory and a processor for executing a program stored in the processor, wherein the program is executed to perform the training method of the picture recognition model of any one of claims 1 to 7.
CN202311635861.2A 2023-11-30 2023-11-30 Training method and device of picture recognition model, storage medium and electronic equipment Pending CN117671760A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311635861.2A CN117671760A (en) 2023-11-30 2023-11-30 Training method and device of picture recognition model, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311635861.2A CN117671760A (en) 2023-11-30 2023-11-30 Training method and device of picture recognition model, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117671760A true CN117671760A (en) 2024-03-08

Family

ID=90085830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311635861.2A Pending CN117671760A (en) 2023-11-30 2023-11-30 Training method and device of picture recognition model, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117671760A (en)

Similar Documents

Publication Publication Date Title
US9036905B2 (en) Training classifiers for deblurring images
CN111738244B (en) Image detection method, image detection device, computer equipment and storage medium
WO2020164278A1 (en) Image processing method and device, electronic equipment and readable storage medium
CN112633297A (en) Target object identification method and device, storage medium and electronic device
WO2024060684A1 (en) Model training method, image processing method, device, and storage medium
CN111507138A (en) Image recognition method and device, computer equipment and storage medium
CN115240280A (en) Construction method of human face living body detection classification model, detection classification method and device
CN114299363A (en) Training method of image processing model, image classification method and device
CN114612987A (en) Expression recognition method and device
CN115861210A (en) Transformer substation equipment abnormity detection method and system based on twin network
CN111259792A (en) Face living body detection method based on DWT-LBP-DCT characteristics
CN113158773B (en) Training method and training device for living body detection model
CN113762326A (en) Data identification method, device and equipment and readable storage medium
CN111626313B (en) Feature extraction model training method, image processing method and device
CN117671760A (en) Training method and device of picture recognition model, storage medium and electronic equipment
CN112184843B (en) Redundant data removing system and method for image data compression
CN116229528A (en) Living body palm vein detection method, device, equipment and storage medium
CN113065579B (en) Method and device for classifying target object
CN115115981A (en) Data processing method, device, equipment, storage medium and computer program product
Bongini et al. GADA: Generative adversarial data augmentation for image quality assessment
Qin et al. Hybrid NSS features for no‐reference image quality assessment
CN110738225B (en) Image recognition method and device
CN112733670A (en) Fingerprint feature extraction method and device, electronic equipment and storage medium
CN116778534B (en) Image processing method, device, equipment and medium
Singh et al. Performance Analysis of ELA-CNN model for Image Forgery Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination