CN115035314A - Network model training method and device and image feature determining method and device - Google Patents

Network model training method and device and image feature determining method and device Download PDF

Info

Publication number
CN115035314A
CN115035314A CN202210685058.9A CN202210685058A CN115035314A CN 115035314 A CN115035314 A CN 115035314A CN 202210685058 A CN202210685058 A CN 202210685058A CN 115035314 A CN115035314 A CN 115035314A
Authority
CN
China
Prior art keywords
feature
image
hash
pseudo
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210685058.9A
Other languages
Chinese (zh)
Inventor
徐富荣
程远
张伟
王萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202210685058.9A priority Critical patent/CN115035314A/en
Publication of CN115035314A publication Critical patent/CN115035314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the specification provides a method and a device for training a network model and determining image characteristics. The network model includes a feature extraction network and a classifier. When a network model is trained, acquiring a first image to be trained and a corresponding label, extracting a first feature of the first image by using a feature extraction network, and mapping feature elements contained in the first feature to obtain a pseudo-hash feature of the first image, wherein the value of the feature elements in the pseudo-hash feature is between two preset values; then, a prediction probability of the first image is determined by utilizing the pseudo-hash feature and the classifier, a prediction loss is determined based on the difference between the prediction probability and the label, and the network model is updated by utilizing the prediction loss.

Description

Network model training method and device and image feature determining method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of image processing technologies, and in particular, to a method and an apparatus for training a network model and determining image features.
Background
With the progress of technology and the development of society, the application of images is more and more extensive, for example, the application can be applied in the fields of visual processing, object comparison and search, and the like. In recent years, with the continuous improvement of image acquisition equipment, the pixels of images are gradually improved, and the definition of pictures is better and better. In processing an image, the amount of data of image features extracted from the image also shows a greatly increasing tendency. When processing a huge amount of images, the image features of high data volume also bring great storage pressure and data processing pressure.
Therefore, it is desirable to have an improved scheme for reducing the data size of image features without losing important features of the image as much as possible.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for training a network model and determining image features, so as to reduce the data volume of image features on the premise of not losing important features of images as much as possible. The specific technical scheme is as follows.
In a first aspect, an embodiment provides a method for training a network model, where the network model includes a feature extraction network and a classifier, and the method includes:
acquiring a first image to be trained and a corresponding label;
extracting a first feature of the first image by using the feature extraction network;
mapping the feature elements contained in the first feature to obtain a pseudo hash feature of the first image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
determining a prediction probability of the first image using the pseudo-hash feature and the classifier;
determining a prediction loss based on a difference between the prediction probability and the annotation tag;
and updating the network model by using the predicted loss.
In one embodiment, the step of mapping the feature elements included in the first feature includes:
reducing the dimension of the first feature to obtain a dimension reduction feature;
and mapping the feature elements contained in the dimension reduction feature to obtain the pseudo hash feature of the first image.
In one embodiment, the step of mapping the feature elements included in the first feature includes:
inputting feature elements contained in the first feature into a first preset function to obtain feature elements in the pseudo-hash feature; the range of the first predetermined function is the range between the two predetermined values.
In one embodiment, the step of determining the prediction probability of the first image using the pseudo-hash feature and the classifier comprises:
determining a prediction probability of the first image for a plurality of classes based on an operation between the pseudo-hash feature and a plurality of class parameters in the classifier.
In one embodiment, the step of determining the prediction probability of the first image for a plurality of classes comprises:
mapping the plurality of category parameters in the classifier to obtain a plurality of mapping category parameters; the value of the characteristic element in the mapping category parameter is between the two preset values;
determining a prediction probability of the first image for a plurality of classes based on an operation between the pseudo-hash feature and the plurality of mapping class parameters.
In one embodiment, the step of determining the prediction probability of the first image for a plurality of classes comprises:
mapping the pseudo hash characteristics to obtain mapping characteristics; the value range of the characteristic elements contained in the mapping characteristics is larger than the range between the two preset values;
determining a prediction probability of the first image for a plurality of classes based on an operation between the mapped features and a plurality of class parameters in the classifier.
In one embodiment, the step of mapping the pseudo hash feature includes:
inputting the feature elements contained in the pseudo-hash feature into a second preset function to obtain the feature elements in the mapping feature; the value range of the second preset function is larger than the range between the two preset values.
In one embodiment, the step of determining a predicted loss based on the difference between the prediction probability and the annotation tag comprises:
determining a first loss based on a difference between the predicted probability and the annotation tag;
determining a second loss based on the pseudo-hash feature and a difference between the two values;
determining a third loss based on a difference between the predicted probability corresponding to the first feature and the annotation tag;
determining a predicted loss based on the first loss, the second loss, and the third loss.
In one embodiment, the two preset values include 1 and-1.
In a second aspect, an embodiment provides a method for determining an image feature, including:
acquiring a second image of the feature to be determined;
obtaining a feature extraction network trained by the method of the first aspect;
extracting a second feature of the second image by using the feature extraction network;
mapping the feature elements contained in the second features to obtain the pseudo-hash features of the second image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
determining an image feature of the second image based on the pseudo-hash feature.
In one embodiment, the step of determining the image feature of the second image based on the pseudo-hash feature comprises:
utilizing a first threshold value to truncate the feature elements in the pseudo-hash feature to obtain a hash feature; wherein, the value of the characteristic element in the hash characteristic is any one of the two preset values;
determining an image feature of the second image based on the hash feature.
In one embodiment, the method further comprises:
matching the image features of the second image with the image features of a plurality of images in an image database;
determining image information matching the second image from the image database based on the matching result.
In a third aspect, an embodiment provides an apparatus for training a network model, where the network model includes a feature extraction network and a classifier, and the apparatus includes:
the first acquisition module is configured to acquire a first image to be trained and a corresponding annotation label;
a first extraction module configured to extract a first feature of the first image using the feature extraction network;
a first mapping module configured to map feature elements included in the first feature to obtain a pseudo hash feature of the first image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
a first classification module configured to determine a prediction probability of the first image using the pseudo-hash feature and the classifier;
a first loss module configured to determine a predicted loss based on a difference between the predicted probability and the annotation tag;
a first update module configured to update the network model with the predicted loss.
In a fourth aspect, an embodiment provides an apparatus for determining an image feature, including:
the second acquisition module is configured to acquire a second image of the feature to be determined;
a third obtaining module configured to obtain the feature extraction network trained by the method of the first aspect;
a second extraction module configured to extract a second feature of the second image using the feature extraction network;
the second mapping module is configured to map feature elements included in the second features to obtain pseudo hash features of the second image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
a first determination module configured to determine an image feature of the second image based on the pseudo-hash feature.
In a fifth aspect, embodiments provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of the first to second aspects.
In a sixth aspect, an embodiment provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method of any one of the first aspect to the second aspect.
In the method and the apparatus provided in the embodiment of the present specification, after the first feature of the first image is extracted by using the feature extraction network, the first feature is mapped to a pseudo hash feature, and a prediction loss is determined by using a prediction probability corresponding to the pseudo hash feature, so as to update the network model. The value of the feature element in the pseudo-hash feature is between two preset values, so that the data volume of the image feature is reduced to a great extent, and the feature extraction network can extract the image feature containing important features as accurately as possible by utilizing the pseudo-hash feature to train the network model. Therefore, the embodiments of the present specification can reduce the data amount of the image features without losing the important features of the image as much as possible.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the attached drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic flowchart of a method for training a network model according to an embodiment;
FIG. 2 is a schematic processing flow chart provided in this embodiment;
fig. 3 is a schematic flowchart of a method for determining image characteristics according to an embodiment;
FIG. 4 is a schematic block diagram of a training apparatus for a network model according to an embodiment;
fig. 5 is a schematic block diagram of an apparatus for determining an image feature according to an embodiment.
Detailed Description
The scheme provided by the specification is described in the following with reference to the attached drawings.
After a large number of images are captured using an image capture device, image features may be extracted, which may be utilized for a variety of applications. For example, the image feature is stored; alternatively, the image features are matched with image features of a plurality of images in an image database to find matched images and the like from the image database.
When extracting image features, a feature extraction network is generally adopted. For example, Convolutional Neural Networks (CNN), Deep Neural Networks (DNN), or Transformer Networks may be used to extract image features, that is, to convert an image containing pixel values into a feature map (feature map) of a certain dimension. In general, each feature element in the feature map is represented by a floating point number, which makes the data size of the feature map very large.
In order to reduce the data volume of image features, the embodiment of the specification provides a training method of a network model, wherein the network model comprises a feature extraction network and a classifier. The method comprises the following steps: step S110, acquiring a first image to be trained and a corresponding label; step S120, extracting a first feature of the first image by using a feature extraction network; step S130, mapping feature elements contained in the first features to obtain a pseudo-hash feature of the first image, wherein the value of the feature elements in the pseudo-hash feature is between two preset values; step S140, determining the prediction probability of the first image by using the pseudo-hash characteristics and the classifier; step S150, determining the prediction loss based on the difference between the prediction probability and the labeling label; step S160, the network model is updated with the predicted loss. According to the embodiment, the data volume of the image features is reduced by mapping the first features into the pseudo-hash features, and the precision of the pseudo-hash features is improved by training the feature extraction network and the classifier, so that the data volume of the image features is reduced on the premise of not losing important features of the image as much as possible. The present embodiment will be described in detail with reference to fig. 1.
Fig. 1 is a schematic flowchart of a method for training a network model according to an embodiment. The network model includes a feature extraction network and a classifier. The feature extraction network is used for extracting features of an input image, and the feature extraction network can be implemented by a CNN, a DCNN or a transform network. The classifier is used for determining the prediction probability of the input features for a plurality of classes, and a softmax classifier and the like can be adopted. The method is performed by a computing device, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing, processing capabilities.
In step S110, the computing device obtains a first image A1 to be trained and a corresponding annotation tag y 1. The computing device may obtain a first image a1 to be trained from a training set, the first image a1 being used to train a network model. The first image a1 is any image used to train the network model. The first image a1 may contain objects, which may be cars, people, dogs, etc. The classification of the first image a1 may be binary or multi-classification. The annotation label y1 may be represented by 0 or 1, and when the first image a1 belongs to a certain category, it is represented by 1, and when it does not belong to a certain category, it is represented by 0, for example, the annotation label y1 may be represented by (0, 0,10) in the four-category.
Next, in step S120, the computing device extracts the first feature F1 of the first image a1 using a feature extraction network. The computing device can directly input the first image A1 into a feature extraction network, and a first feature F1 is obtained through extraction of the feature extraction network; the first image a1 may be subjected to a preset process, and the processed first image a1 may be input to the feature extraction network. The preset processing includes scaling, gradation processing, and the like of the image.
The first feature F1 may be a feature map extracted from the first image a 1. The feature elements in the first feature F1 are typically represented using floating point numbers, each feature element being a numerical value of the floating point type, and the first feature F1 may be referred to as a floating point feature. The amount of data for floating-point characterization is typically large.
In step S130, the feature elements included in the first feature F1 are mapped to obtain a pseudo-hachi feature H1 of the first image a 1. The value of the feature element in the pseudo hash feature H1 is between two preset values. The two preset values may include 1 and-1, and the value of the characteristic element is between [ -1, 1 ]. The values of the characteristic elements are mapped between two preset values, so that the data volume of the characteristics can be obviously reduced. The two predetermined values may also include 0 and 1, or 0 and-1, etc.
When mapping the feature elements included in the first feature F1, the feature elements included in the first feature F1 may be input into a first preset function, so as to obtain the feature elements in the pseudo hash feature. The range of the first predetermined function is a range between two predetermined values. For example, when the two preset values take values of 1 and-1, the range of the first preset function may be [ -1, 1], and within the range, it is a monotonically increasing function, such as a tanh function or a sigmoid function. For example, the feature elements of the first feature F1 are mapped by using the formula x1 ═ tanh (x0), where x0 is the feature element of the first feature F1, and x1 is the mapped value, that is, the feature element in the pseudo hash feature H1. Converting the first signature F1 to the pseudo-hash signature H1 enables the conversion of floating point signatures to pseudo-hash signatures.
When mapping the feature elements included in the first feature F1, the dimension of the first feature F1 may also be reduced to obtain a dimension-reduced feature, and then the feature elements included in the dimension-reduced feature are mapped to obtain the pseudo hash feature H1 of the first image a. Dimensionality reduction of the first feature F1 can reduce the amount of data for the feature.
When mapping the feature elements included in the dimension reduction feature, the feature elements included in the dimension reduction feature may be input into a first preset function, so as to obtain a pseudo hash feature H1. For example, after the first feature F1 is flattened, a 512-dimensional floating point feature vector can be obtained, dimension reduction is performed on the 512-dimensional floating point feature vector to obtain a 256-dimensional floating point feature vector, and then a tanh function or a sigmoid function is input, so that a floating point feature with a relatively wide value range is mapped to a range from-1 to 1.
In step S140, the computing device determines the prediction probability p for the first image a1 using the pseudo-hash feature H1 and the classifier. Specifically, the pseudo-hash feature H1 may be input to a classifier, by which the prediction probability p of the first image a1 is determined.
In determining the prediction probability p of the first image a1, the prediction probabilities of the first image a1 for a plurality of classes may be determined in the classifier based on operations between the pseudo-hash feature H1 and a plurality of class parameters in the classifier. Specifically, the operation between the pseudo hash feature H1 and the plurality of class parameters in the classifier may be that the pseudo hash feature H1 is multiplied by the plurality of class parameters, or that the pseudo hash feature H1 is multiplied by the plurality of class parameters and then multiplied by a certain coefficient, or the like.
It should be noted that, in the training process, the feature extraction network and the classifier are jointly trained. After the network model training is completed, the feature extraction network and the classifier can be used separately, for example, the feature extraction network can be used to extract the pseudo hash feature of the image for query or storage. In order to further reduce the data amount of the image features and make the use of the image features more convenient, the feature elements in the pseudo hash features H1 of the image may be truncated by using the first threshold value to obtain the hash features. And the value of the characteristic element in the hash characteristic is any one of two preset values. The first threshold may be a preset value, or may be a numerical value determined by using a feature element in the pseudo hash feature.
Specifically, the feature element in the pseudo hash feature H1 may be compared with a first threshold, and when the feature element is greater than the first threshold, the feature element is updated to a larger preset value; and when the characteristic element is not larger than the first threshold value, updating the characteristic element to a smaller preset numerical value.
For example, when the value of the feature element in the pseudo-hash feature H1 is between-1 and 1, the pseudo-hash feature H1 is truncated by using a first threshold value, and a hash feature with the feature element being-1 or 1 can be obtained; the parameters of the sign function may also be set by using the first threshold, and then, the feature elements in the pseudo-hash feature H1 are input into the sign function, and the pseudo-hash feature H1 is truncated by using the sign function, so as to obtain the hash feature. Compared with a pseudo hash feature, the hash feature has the advantage that the data volume is greatly reduced.
In practical application, the hash feature may be used as an image feature, or the hash feature may be subjected to preset processing, and a result of the processing is used as the image feature. For example, the preset processing may be to replace the characteristic element whose value is-1 with 0, in this way, the values of the characteristic elements in the hash feature are both 0 and 1, and the hash feature is directly stored according to the binary system, so that the data amount and the storage space of the image feature can be reduced to a great extent.
The application manner of the pseudo hash feature in the actual application is described above. In the model training process, because the value of the pseudo-Hash feature is near-1 to 1, when the prediction probability of the pseudo-Hash feature is determined by using a classifier, some error samples may be generated. The samples can be separated by a classifier but cannot be separated by a symbolic function, namely, when the pseudo-hash features are converted into the hash features, the image features have some substantial changes, and the features of the image cannot be accurately characterized. This problem arises because the pseudo-hash signature is not accurate enough to cause errors in truncating it.
In order to improve the training accuracy, the prediction process can be improved in some ways, so that the network model can obtain more accurate pseudo-hash features, and the error change of image features when the pseudo-hash features are converted into the hash features is avoided.
In one embodiment, the classifier may be modified to map the classifier into a non-linear classifier, that is, to perform non-linear mapping on the class parameters in the classifier, when the prediction probabilities p of the first image a1 for multiple classes are determined in step S140, the multiple class parameters in the classifier may be mapped to obtain multiple mapped class parameters, and the prediction probabilities p of the first image a1 for multiple classes are determined based on the operation between the pseudo-hash feature H1 and the multiple mapped class parameters. The operation between the pseudo hash feature H1 and the plurality of mapping class parameters may be an operation such as multiplying the pseudo hash code H1 by the plurality of mapping class parameters, respectively.
And the value of the characteristic element in the mapping category parameter is between the two preset values. When mapping the plurality of category parameters, the plurality of category parameters may be input into the first preset function, respectively, to obtain a plurality of mapping category parameters. Taking the first preset function as the tanh function as an example, the following formula can be used to determine the prediction probability:
p j =H i *tanh(w j p ) (1)
wherein p is j Is the prediction probability of the jth class, H i For the pseudo-hash feature of the ith image, w j p For the jth class parameter, tanh (w) j p ) For the jth mapping category parameter, the superscript p represents that the category parameter is the classifier category parameter corresponding to the pseudo hash feature. The classification parameters are mapped between two preset values, so that the non-linear mapping of the classification parameters of the classifier is realized, the classification parameters and the pseudo-hash characteristics are in the same range, and the accuracy of the prediction probability is improved.
In another embodiment, the pseudo-hash code may be modified by non-linearly mapping the pseudo-hash codes H1 distributed between [ -1, 1] over a wider range of values and then classifying the codes using a classifier. When determining the prediction probabilities p of the first image a1 for multiple classes in step S140, the pseudo-hash feature H1 may be mapped to obtain a mapping feature, and the prediction probabilities p of the first image a1 for multiple classes may be determined based on an operation between the mapping feature and multiple class parameters in the classifier. The operation between the mapping feature and the plurality of class parameters in the classifier may be an operation of multiplying the mapping feature by the plurality of class parameters in the classifier, respectively. And the value range of the characteristic elements contained in the mapping characteristics is larger than the range between two preset values.
When the pseudo hash feature H1 is mapped, the feature elements included in the pseudo hash feature may be input into a second preset function, so as to obtain the feature elements in the mapped feature. The value range of the second predetermined function is greater than the range between two predetermined values, and the definition range of the second predetermined function may include the range between two predetermined values. For example, the second predetermined function may select the exponential function x 0 =a x1 Wherein the base a may be chosen to be a monotonically increasing function greater than 1. In one embodiment, the prediction probability may be determined using the following equation, based on e:
p j =e Hi *w j p (2)
wherein p is j Is the prediction probability of the jth class, H i Is a pseudo-hash feature of the ith image, w j p For the jth class parameter, the ith image is the ith sample. The e exponential function is monotonously increased between the definition domains-1 to 1, changes slowly around 0, and the change rate is increased when the change rate is greater than 0, so that the characteristic is favorable for improving the information retention when the pseudo-hash characteristic is converted into the hash characteristic.
In step S150, the computing device determines the predicted Loss based on the difference between the predicted probability p and the annotation tag y 1. In a two-or multi-classification problem, cross entropy may be used to determine the predicted Loss.
To improve the accuracy of the predicted Loss, the predicted Loss may be determined according to the following steps:
step 1, determining a first loss based on the difference between the prediction probability p and the label y 1;
step 2, determining a second loss based on the pseudo-hash characteristic H1 and the difference between the two values;
step 3, determining a third loss based on the difference between the prediction probability corresponding to the first feature F1 and the label y 1;
and 4, determining the predicted Loss based on the first Loss, the second Loss and the third Loss.
In step 1, the prediction probability p is approximately close to the label y1, which indicates that the prediction is more accurate. In step 2, the closer the feature element of the pseudo hash feature H1 is to two values, for example, the closer the absolute value of the feature element of the pseudo hash feature H1 is to 1, the more accurate the obtained hash feature will be when the pseudo hash feature is truncated. In step 3, the first feature F1 and the pseudo-hash feature have the same semantic information, and their class labels are also consistent, so that the first feature F1 may be input into another classifier to obtain the prediction probability corresponding to the first feature F1, and the label y1 is used to supervise the training of the first feature F1, that is, when the prediction probability corresponding to the first feature F1 is approximately close to the label y1, it indicates that the predicted first feature F1 is more accurate. In step 4, a sum of the first Loss, the second Loss, and the third Loss may be determined as the predicted Loss, a result of performing the preset processing on the sum may also be determined as the predicted Loss, and a weighted sum may also be performed on the first Loss, the second Loss, and the third Loss to obtain the predicted Loss.
The following describes a specific formula for determining the prediction loss by taking the cross entropy as an example. The first loss may be L H Indicating that the second loss is L Q Indicating that the third loss is L ss Represents, predicted loss Loss can be expressed by the following formula:
Loss=L H +αL Q +βL ss
where α and β are preset coefficients.
When the prediction probability p is calculated in the manner of formula (1), the first loss L H The following formula can be used for calculation:
Figure RE-GDA0003744916830000091
where M is the number of images in a batch, y i Is an annotation tag of the ith image, H i Is a pseudo-hash feature of the ith image, w j p For the jth class parameter, C is the total number of classes, w i p Is a classification parameter corresponding to the class to which the ith image belongs.
When the prediction probability p is calculated in the manner of formula (2), L H The following formula can be used for calculation:
Figure BDA0003696525100000092
wherein e is a natural base number.
Second loss L Q The following formula can be used for calculation:
Figure BDA0003696525100000093
wherein "| × | not counting 2 "is the L2 norm symbol, i.e. H is calculated i The squares of the differences of each feature element contained in (a) and 1 are added up and then root-opened.
Third loss L ss The following formula can be used for calculation:
Figure BDA0003696525100000094
wherein f is i Is the first feature of the ith image, w j f Is the jth class parameter in the classifier corresponding to the first feature, w i f The classification parameter corresponding to the classification to which the ith image belongs in the classifier. The classifier corresponding to the first feature and the classifier corresponding to the pseudo hash feature have different class parameters, and the total number of classes may be the same, i.e. C is the same.
Next, in step S160, the computing device updates the network model with the predicted Loss. In the updating, the parameters of the network model can be updated in the direction of reducing the Loss of prediction, including updating the parameters of the feature extraction network and updating the category parameters in the classifier.
The steps S110 to S160 are an iterative process of a model, which is described by taking a first image as an example. In practical applications, the first image acquired in step S110 may be a batch of images, the processes of steps S120 to S140 are performed for any one of the batch of images, and when the prediction loss is determined in step S150, the total prediction loss may be determined by using the prediction probabilities of the batch of images, and the network model may be updated in step S160 by using the total prediction loss. And when the training process of the network model reaches the convergence condition, stopping iterative training. The convergence condition may be that the number of times of updating the model is greater than a preset number threshold, or that the prediction loss is less than a preset value, or the like.
Fig. 2 is a schematic processing flow diagram provided in this embodiment. Wherein, the images are respectively input into a feature extraction network to obtain a high-dimensional floating point feature; performing hash mapping on the floating point characteristic to obtain a pseudo hash characteristic; based on the pseudo-hash feature, losses of a plurality of classes can be calculated, wherein the classification loss is a first loss L determined based on a difference between the prediction probability and the label H The quantization loss is a second loss L determined based on the difference between the pseudo-hash feature and the two values Q The semantic consistency loss is a third loss L determined based on a difference between the prediction probability corresponding to the first feature and the annotation tag ss . In determining scoresWhen the class is lost, two ways can be included, one is a way of improving the classifier, and one is a way of improving the pseudo-hash feature.
The above embodiment is a process of training a network model. When training of the network model is complete, a feature extraction network may be applied to extract image features. The following embodiments provide a method for determining image characteristics.
Fig. 3 is a flowchart illustrating an image feature determination method according to an embodiment. The method may be performed by a computing device, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing, processing capabilities. The method comprises the following steps.
Step S310, a second image of the feature to be determined is obtained. The second image may be any image from which features need to be extracted.
Step S320, obtaining the feature extraction network trained by the method of the embodiment of fig. 1.
Step S330, a second feature of the second image is extracted by using the feature extraction network. Specifically, the second image may be input to a feature extraction network, and a second feature of the second image is determined by the feature extraction network, where the second feature may be a floating-point feature.
Step S340, mapping the feature elements included in the second feature to obtain a pseudo hash feature of the second image. The value of the characteristic element in the pseudo-Hash characteristic is between two preset values. The specific implementation of this step can refer to the description content of step S130 in fig. 1, and only the first feature is replaced by the second feature.
In step S350, an image feature of the second image is determined based on the pseudo hash feature. In practical application, the pseudo hash feature may be directly determined as the image feature of the second image, or the feature element in the pseudo hash feature may be truncated by using the first threshold value to obtain the hash feature, and the image feature of the second image is determined based on the hash feature.
And the value of the characteristic element in the hash characteristic is any one of two preset values. The first threshold may be a preset value, or may be a numerical value determined by using a feature element in the pseudo hash feature of the second image.
Specifically, the characteristic element in the pseudo hash feature may be compared with a first threshold, and when the characteristic element is greater than the first threshold, the characteristic element is updated to a larger preset value; and when the characteristic element is not larger than the first threshold value, updating the characteristic element to a smaller preset numerical value.
For example, when the value of a feature element in the pseudo hash feature is between-1 and 1, the pseudo hash feature is truncated by using a first threshold value, and the hash feature with the feature element being-1 or 1 can be obtained; or setting parameters of the symbolic function by using the first threshold, then inputting the characteristic elements in the pseudo-hash feature into the symbolic function, and truncating the pseudo-hash feature by using the symbolic function to obtain the hash feature. Such a hash feature, in turn, provides a greater reduction in data size than a pseudo hash feature.
In practical application, the hash feature may be used as an image feature, or the hash feature may be subjected to preset processing, and a result of the processing is used as the image feature. For example, the preset processing may be to replace the feature element whose value is-1 with 0, in this way, the values of the feature elements in the hash feature are both 0 and 1, and the hash feature is directly stored according to a binary system, so that the data volume and the storage space of the image feature can be reduced to a great extent.
In one application scenario, the image features of the second image may be matched with the image features of the plurality of images in the image database, and based on the matching result, image information matched with the second image may be determined from the image database. The image database is used for storing image characteristics of a plurality of images and corresponding image information. The matching process may specifically be matching between objects in the image, and the obtained image information may be information related to the objects. After the image features of the second image are determined, they may also be applied to a variety of scenes.
In this specification, the first image, the first feature, the first predetermined function, the "first" in the first loss, and the corresponding "second" in the text are for convenience of distinction and description only and do not have any limiting meaning.
The foregoing describes certain embodiments of the present specification, and other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 4 is a schematic block diagram of a training apparatus for a network model according to an embodiment. The network model includes a feature extraction network and a classifier. This embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1. The apparatus is deployed in a computing device, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing and processing capabilities. The apparatus 400 comprises:
a first obtaining module 410 configured to obtain a first image to be trained and a corresponding label;
a first extraction module 420 configured to extract a first feature of the first image using the feature extraction network;
a first mapping module 430, configured to map feature elements included in the first feature to obtain a pseudo hash feature of the first image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
a first classification module 440 configured to determine a prediction probability of the first image using the pseudo-hash feature and the classifier;
a first loss module 450 configured to determine a predicted loss based on a difference between the predicted probability and the annotation tag;
a first update module 460 configured to update the network model with the predicted loss.
In one embodiment, the first mapping module 430 is specifically configured to:
reducing the dimension of the first feature to obtain a dimension reduction feature;
and mapping the feature elements contained in the dimension reduction feature to obtain the pseudo hash feature of the first image.
In one embodiment, the first mapping module 430 is specifically configured to:
inputting feature elements contained in the first feature into a first preset function to obtain feature elements in the pseudo-hash feature; the range of the first predetermined function is the range between the two predetermined values.
In one embodiment, the first classification module 440 is specifically configured to:
determining a prediction probability of the first image for a plurality of classes based on an operation between the pseudo-hash feature and a plurality of class parameters in the classifier.
In one embodiment, the first classification module 440 includes:
a first mapping sub-module (not shown in the figure) configured to map the plurality of category parameters in the classifier to obtain a plurality of mapping category parameters; the value of the characteristic element in the mapping category parameter is between the two preset values;
a first determination sub-module (not shown in the figures) configured to determine a prediction probability of the first image for a plurality of classes based on an operation between the pseudo-hash feature and the plurality of mapped class parameters.
In one embodiment, the first classification module 440 includes:
a second mapping sub-module (not shown in the figure) configured to map the pseudo hash feature to obtain a mapping feature; the value range of the characteristic elements contained in the mapping characteristics is larger than the range between the two preset values;
a second determination sub-module (not shown in the figures) configured to determine prediction probabilities of the first image for a plurality of classes based on operations between the mapped features and a plurality of class parameters in the classifier.
In one embodiment, the second mapping submodule is specifically configured to:
inputting feature elements contained in the pseudo-hash features into a second preset function to obtain feature elements in the mapping features; the value range of the second preset function is larger than the range between the two preset values.
In one embodiment, the first loss module 450 includes:
a first loss sub-module (not shown) configured to determine a first loss based on a difference between the prediction probability and the annotation tag;
a second loss sub-module (not shown) configured to determine a second loss based on the pseudo-hash feature and a difference between the two values;
a third loss sub-module (not shown) configured to determine a third loss based on a difference between the predicted probability corresponding to the first feature and the annotation tag;
a third determination submodule (not shown in the figures) configured to determine a predicted loss based on said first loss, said second loss and said third loss.
In one embodiment, the two preset values include 1 and-1.
Fig. 5 is a schematic block diagram of an apparatus for determining an image feature according to an embodiment. The apparatus 500 is deployed in a computing device. This embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 3. The apparatus is deployed in a computing device, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing and processing capabilities. The apparatus 500 comprises:
a second obtaining module 510 configured to obtain a second image of the feature to be determined;
a third obtaining module 520, configured to obtain the feature extraction network trained by the method of the embodiment shown in fig. 1;
a second extraction module 530 configured to extract a second feature of the second image using the feature extraction network;
a second mapping module 540, configured to map feature elements included in the second feature to obtain a pseudo hash feature of the second image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
a first determining module 550 configured to determine an image feature of the second image based on the pseudo-hash feature.
In one embodiment, the first determining module 550 comprises:
a first truncation submodule (not shown in the figure) configured to truncate the feature elements in the pseudo hash feature by using a first threshold to obtain a hash feature; the value of a feature element in the hash feature is any one of the two preset values;
a fourth determination sub-module (not shown in the figures) configured to determine image features of the second image based on the hash features.
In one embodiment, the apparatus 500 further comprises:
a first matching module (not shown) configured to match image features of the second image with image features of a plurality of images in an image database;
a second determining module (not shown in the figure) configured to determine image information matching the second image from the image database based on the matching result.
The above device embodiments correspond to the method embodiments, and for specific description, reference may be made to the description of the method embodiments, which is not described herein again. The device embodiments are obtained based on the corresponding method embodiments, and have the same technical effects as the corresponding method embodiments, and specific descriptions can be found in the corresponding method embodiments.
Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of fig. 1 to 3.
An embodiment of the present specification further provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of fig. 1 to 3.
The embodiments in the present specification are all described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the storage medium and the computing device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments further describe the objects, technical solutions and advantages of the embodiments of the present invention in detail. It should be understood that the above description is only exemplary of the embodiments of the present invention, and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like based on the technical solutions of the present invention should be included in the scope of the present invention.

Claims (16)

1. A method of training a network model, the network model comprising a feature extraction network and a classifier, the method comprising:
acquiring a first image to be trained and a corresponding label;
extracting a first feature of the first image by using the feature extraction network;
mapping the feature elements contained in the first feature to obtain a pseudo hash feature of the first image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
determining a prediction probability of the first image using the pseudo-hash feature and the classifier;
determining a prediction loss based on a difference between the prediction probability and the annotation tag;
and updating the network model by using the predicted loss.
2. The method of claim 1, the step of mapping the feature elements included in the first feature comprising:
reducing the dimension of the first feature to obtain a dimension reduction feature;
and mapping the feature elements contained in the dimension reduction feature to obtain the pseudo hash feature of the first image.
3. The method of claim 1, the step of mapping the feature elements included in the first feature comprising:
inputting feature elements contained in the first feature into a first preset function to obtain feature elements in the pseudo-hash feature; the range of the first predetermined function is the range between the two predetermined values.
4. The method of claim 1, the step of determining the prediction probability of the first image using the pseudo-hash feature and the classifier, comprising:
determining a prediction probability of the first image for a plurality of classes based on an operation between the pseudo-hash feature and a plurality of class parameters in the classifier.
5. The method of claim 4, the step of determining the prediction probability of the first image for a plurality of classes comprising:
mapping the plurality of category parameters in the classifier to obtain a plurality of mapping category parameters; the value of the characteristic element in the mapping category parameter is between the two preset values;
determining a prediction probability of the first image for a plurality of classes based on an operation between the pseudo-hash feature and the plurality of mapping class parameters.
6. The method of claim 4, the step of determining the prediction probability of the first image for a plurality of classes comprising:
mapping the pseudo hash characteristics to obtain mapping characteristics; the value range of the characteristic elements contained in the mapping characteristics is larger than the range between the two preset values;
determining a prediction probability of the first image for a plurality of classes based on an operation between the mapped feature and a plurality of class parameters in the classifier.
7. The method of claim 6, the step of mapping the pseudo-hash feature comprising:
inputting feature elements contained in the pseudo-hash features into a second preset function to obtain feature elements in the mapping features; the value range of the second preset function is larger than the range between the two preset values.
8. The method of claim 1, the step of determining a prediction loss based on a difference between the prediction probability and the annotation tag, comprising:
determining a first loss based on a difference between the prediction probability and the annotation tag;
determining a second loss based on the pseudo-hash feature and a difference between the two values;
determining a third loss based on a difference between the prediction probability corresponding to the first feature and the annotation tag;
determining a predicted loss based on the first loss, the second loss, and the third loss.
9. The method of claim 1, wherein the two preset values comprise 1 and-1.
10. A method of determining image features, comprising:
acquiring a second image of the feature to be determined;
obtaining a feature extraction network trained using the method of claim 1;
extracting a second feature of the second image by using the feature extraction network;
mapping the feature elements contained in the second features to obtain the pseudo-hash features of the second image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
determining an image feature of the second image based on the pseudo-hash feature.
11. The method of claim 10, the step of determining an image feature of the second image based on the pseudo-hash feature comprising:
utilizing a first threshold value to truncate the feature elements in the pseudo-hash feature to obtain a hash feature; the value of a feature element in the hash feature is any one of the two preset values;
determining an image feature of the second image based on the hash feature.
12. The method of claim 10, further comprising:
matching the image features of the second image with the image features of a plurality of images in an image database;
determining image information matching the second image from the image database based on the matching result.
13. An apparatus for training a network model, the network model including a feature extraction network and a classifier, the apparatus comprising:
the first acquisition module is configured to acquire a first image to be trained and a corresponding label;
a first extraction module configured to extract a first feature of the first image using the feature extraction network;
a first mapping module configured to map feature elements included in the first feature to obtain a pseudo hash feature of the first image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
a first classification module configured to determine a prediction probability of the first image using the pseudo-hash feature and the classifier;
a first loss module configured to determine a predicted loss based on a difference between the predicted probability and the annotation tag;
a first update module configured to update the network model with the predicted loss.
14. An apparatus for determining image features, comprising:
the second acquisition module is configured to acquire a second image of the feature to be determined;
a third obtaining module configured to obtain the feature extraction network trained by the method of claim 1;
a second extraction module configured to extract a second feature of the second image using the feature extraction network;
the second mapping module is configured to map feature elements included in the second features to obtain pseudo hash features of the second image; the value of a characteristic element in the pseudo-hash characteristic is between two preset values;
a first determination module configured to determine an image feature of the second image based on the pseudo-hash feature.
15. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-12.
16. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-12.
CN202210685058.9A 2022-06-15 2022-06-15 Network model training method and device and image feature determining method and device Pending CN115035314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210685058.9A CN115035314A (en) 2022-06-15 2022-06-15 Network model training method and device and image feature determining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210685058.9A CN115035314A (en) 2022-06-15 2022-06-15 Network model training method and device and image feature determining method and device

Publications (1)

Publication Number Publication Date
CN115035314A true CN115035314A (en) 2022-09-09

Family

ID=83124486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210685058.9A Pending CN115035314A (en) 2022-06-15 2022-06-15 Network model training method and device and image feature determining method and device

Country Status (1)

Country Link
CN (1) CN115035314A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711422A (en) * 2017-10-26 2019-05-03 北京邮电大学 Image real time transfer, the method for building up of model, device, computer equipment and storage medium
CN110363049A (en) * 2018-04-10 2019-10-22 阿里巴巴集团控股有限公司 The method and device that graphic element detection identification and classification determine
CN111930980A (en) * 2020-08-21 2020-11-13 深圳市升幂科技有限公司 Training method of image retrieval model, image retrieval method, device and medium
WO2021232752A1 (en) * 2020-05-22 2021-11-25 深圳前海微众银行股份有限公司 Hash encoding method, apparatus and device, and readable storage medium
CN114549907A (en) * 2022-02-28 2022-05-27 携程旅游信息技术(上海)有限公司 Model training method, image feature extraction method, system, device and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711422A (en) * 2017-10-26 2019-05-03 北京邮电大学 Image real time transfer, the method for building up of model, device, computer equipment and storage medium
CN110363049A (en) * 2018-04-10 2019-10-22 阿里巴巴集团控股有限公司 The method and device that graphic element detection identification and classification determine
WO2021232752A1 (en) * 2020-05-22 2021-11-25 深圳前海微众银行股份有限公司 Hash encoding method, apparatus and device, and readable storage medium
CN111930980A (en) * 2020-08-21 2020-11-13 深圳市升幂科技有限公司 Training method of image retrieval model, image retrieval method, device and medium
CN114549907A (en) * 2022-02-28 2022-05-27 携程旅游信息技术(上海)有限公司 Model training method, image feature extraction method, system, device and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜红花;王鹏飞;张昭;毛文华;赵博;齐鹏;: "基于卷积网络和哈希码的玉米田间杂草快速识别方法", 农业机械学报, no. 11, 10 September 2018 (2018-09-10) *

Similar Documents

Publication Publication Date Title
US11748619B2 (en) Image feature learning device, image feature learning method, image feature extraction device, image feature extraction method, and program
CN113657425B (en) Multi-label image classification method based on multi-scale and cross-modal attention mechanism
JP7131195B2 (en) Object recognition device, object recognition learning device, method, and program
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN110188827B (en) Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN111191583A (en) Space target identification system and method based on convolutional neural network
CN111079683A (en) Remote sensing image cloud and snow detection method based on convolutional neural network
CN115496928B (en) Multi-modal image feature matching method based on multi-feature matching
EP3191980A1 (en) Method and apparatus for image retrieval with feature learning
CN113313173B (en) Human body analysis method based on graph representation and improved transducer
CN113313179B (en) Noise image classification method based on l2p norm robust least square method
Wang et al. Recognizing handwritten mathematical expressions as LaTex sequences using a multiscale robust neural network
CN113642602B (en) Multi-label image classification method based on global and local label relation
EP4285281A1 (en) Annotation-efficient image anomaly detection
Hemanth et al. CNN-RNN BASED HANDWRITTEN TEXT RECOGNITION.
CN115187839B (en) Image-text semantic alignment model training method and device
Kishan et al. Handwritten character recognition using CNN
CN115035314A (en) Network model training method and device and image feature determining method and device
Rothacker et al. Robust output modeling in bag-of-features HMMs for handwriting recognition
CN116030295A (en) Article identification method, apparatus, electronic device and storage medium
CN113592045B (en) Model adaptive text recognition method and system from printed form to handwritten form
CN115565177A (en) Character recognition model training method, character recognition device, character recognition equipment and medium
CN113920291A (en) Error correction method and device based on picture recognition result, electronic equipment and medium
CN109344279B (en) Intelligent handwritten English word recognition method based on Hash retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination