CN117557822A - Image classification method, apparatus, electronic device, and computer-readable medium - Google Patents
Image classification method, apparatus, electronic device, and computer-readable medium Download PDFInfo
- Publication number
- CN117557822A CN117557822A CN202210923199.XA CN202210923199A CN117557822A CN 117557822 A CN117557822 A CN 117557822A CN 202210923199 A CN202210923199 A CN 202210923199A CN 117557822 A CN117557822 A CN 117557822A
- Authority
- CN
- China
- Prior art keywords
- image
- sample
- tag
- feature vector
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000013598 vector Substances 0.000 claims abstract description 198
- 230000004044 response Effects 0.000 claims abstract description 22
- 238000013145 classification model Methods 0.000 claims description 52
- 238000012549 training Methods 0.000 claims description 32
- 238000000605 extraction Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
Embodiments of the present disclosure disclose an image classification method, apparatus, electronic device, and computer-readable medium. One embodiment of the method comprises the following steps: extracting image features of a target image in target image information to obtain an image feature vector, wherein the target image information comprises a category label group and the target image; extracting tag features of each category tag in the category tag group to obtain a tag feature vector group; generating an image tag matching information group based on the image feature vector and the tag feature vector group; and generating an image classification result based on the image tag matching information meeting the preset matching condition in response to determining that the image tag matching information meeting the preset matching condition exists in the image tag matching information group. This embodiment is related to artificial intelligence, and can improve the accuracy of the image classification result.
Description
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to an image classification method, an apparatus, an electronic device, and a computer readable medium.
Background
Image classification is a technique for distinguishing between different classes of images. Currently, in image classification, the following methods are generally adopted: the method comprises the steps of extracting feature description information of an image, and inputting the feature description information into a classification model to determine the category of the image.
However, when classifying images in the above manner, there are often the following technical problems:
if the training set of the classification model does not include the class of the image to be classified, the image cannot be accurately classified, so that the accuracy of the image classification result is lower.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose image classification methods, apparatuses, electronic devices, and computer-readable media to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide an image classification method, the method comprising: extracting image features of a target image in target image information to obtain an image feature vector, wherein the target image information comprises a category label group and the target image; extracting tag features of each category tag in the category tag group to obtain a tag feature vector group; generating an image tag matching information group based on the image feature vector and the tag feature vector group; and generating an image classification result based on the image tag matching information meeting the preset matching condition in response to determining that the image tag matching information meeting the preset matching condition exists in the image tag matching information group.
Optionally, the generating the image tag matching information group based on the image feature vector and the tag feature vector group includes: and determining the similarity between the image feature vector and each tag feature vector in the tag feature vector set as image tag matching information to obtain an image tag matching information set.
Optionally, the extracting the tag feature of each category tag in the category tag set to obtain a tag feature vector set includes: performing expansion processing on each category label in the category label group to obtain an expanded category label group; and extracting tag features of each extended category tag in the extended category tag group based on a pre-trained image classification model to obtain a tag feature vector group, wherein the image classification model comprises an image feature extractor and a tag feature extractor.
Optionally, the image classification model is obtained through training of the following steps: inputting each sample information in the sample information group into an initial image classification model to generate a sample result pair to obtain a sample result pair set, wherein the sample result pair in the sample result pair set comprises a sample image feature vector and a sample label feature vector; for each sample information, determining a sample loss value according to the distance between the sample image feature vector corresponding to the sample information and the sample label feature vector, the distance between the sample image feature vector corresponding to the sample information and the sample label feature vectors corresponding to the rest of sample information, and the distance between the sample label feature vector corresponding to the sample information and the sample image feature vector corresponding to the rest of sample information; determining a target loss value corresponding to a sample loss value of each sample information in the sample information group; and adjusting related parameters in the initial image classification model in response to determining that the target loss value does not meet a preset training condition.
Optionally, the method further comprises: in response to determining that the generated sample loss value satisfies the training condition, the initial image classification model is determined as an image classification model.
Optionally, the sample information in the sample information set includes a sample image and a sample label, and the initial image classification model includes an initial image feature extractor and an initial label feature extractor; and inputting each sample information in the preset sample information group to the initial image classification model to generate a sample result pair, including: inputting the sample image in the sample information into the initial image feature extractor to obtain a sample image feature vector; inputting the sample label in the sample information into the initial label feature extractor to obtain a sample label feature vector; and determining the sample image feature vector and the sample label feature vector as a sample result pair.
Optionally, the preset training condition is that the target loss value is greater than or equal to a preset loss threshold.
In a second aspect, some embodiments of the present disclosure provide an image classification apparatus, the apparatus comprising: an image feature extraction unit configured to perform image feature extraction on a target image in target image information to obtain an image feature vector, wherein the target image information includes a category tag group and the target image; the label feature extraction unit is configured to extract label features of all the class labels in the class label group to obtain a label feature vector group; a first generation unit configured to generate an image tag matching information group based on the image feature vector and the tag feature vector group; and a second generation unit configured to generate an image classification result based on the image tag matching information satisfying the preset matching condition in response to determining that the image tag matching information satisfying the preset matching condition exists in the image tag matching information group.
Optionally, the first generating unit is further configured to determine a similarity between the image feature vector and each tag feature vector in the tag feature vector set as image tag matching information, to obtain an image tag matching information set.
Optionally, the tag feature extraction unit is further configured to perform expansion processing on each category tag in the category tag group to obtain an expanded category tag group; and extracting tag features of each extended category tag in the extended category tag group based on a pre-trained image classification model to obtain a tag feature vector group, wherein the image classification model comprises an image feature extractor and a tag feature extractor.
Optionally, the image classification model is obtained through training of the following steps: inputting each sample information in the sample information group into an initial image classification model to generate a sample result pair to obtain a sample result pair set, wherein the sample result pair in the sample result pair set comprises a sample image feature vector and a sample label feature vector; for each sample information, determining a sample loss value according to the distance between the sample image feature vector corresponding to the sample information and the sample label feature vector, the distance between the sample image feature vector corresponding to the sample information and the sample label feature vectors corresponding to the rest of sample information, and the distance between the sample label feature vector corresponding to the sample information and the sample image feature vector corresponding to the rest of sample information; determining a target loss value corresponding to a sample loss value of each sample information in the sample information group; and adjusting related parameters in the initial image classification model in response to determining that the target loss value does not meet a preset training condition.
Optionally, the training step further comprises determining the initial image classification model as an image classification model in response to determining that the generated sample loss value satisfies the training condition.
Optionally, the sample information in the sample information set includes a sample image and a sample label, and the initial image classification model includes an initial image feature extractor and an initial label feature extractor; and inputting each sample information in the preset sample information group to the initial image classification model in the training step to generate a sample result pair, wherein the training step comprises the following steps of: inputting the sample image in the sample information into the initial image feature extractor to obtain a sample image feature vector; inputting the sample label in the sample information into the initial label feature extractor to obtain a sample label feature vector; and determining the sample image feature vector and the sample label feature vector as a sample result pair.
Optionally, the preset training condition is that the target loss value is greater than or equal to a preset loss threshold.
In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
The above embodiments of the present disclosure have the following advantageous effects: by the image classification method of some embodiments of the present disclosure, the accuracy of the image classification result may be improved. Specifically, the reason for causing the accuracy of the image classification result to be lowered is that: if the training set of the classification model does not include the class of the image to be classified, the image cannot be accurately classified. Based on this, in the image classification method according to some embodiments of the present disclosure, first, image feature extraction is performed on a target image in target image information, so as to obtain an image feature vector. Wherein the target image information includes a category tag group and the target image. And then, extracting the tag characteristics of each category tag in the category tag group to obtain a tag characteristic vector group. And extracting image features of the target image and extracting tag features of the category tags respectively, so that the image classification can be conveniently carried out on the image feature vectors and the tag feature vectors in the follow-up process. Then, an image tag matching information group is generated based on the image feature vector and the tag feature vector group. And finally, in response to determining that the image tag matching information meeting the preset matching condition exists in the image tag matching information group, generating an image classification result based on the image tag matching information meeting the preset matching condition. The matching degree of the image feature vector and the label feature vector can be flexibly configured according to actual needs through the image label matching information and preset matching conditions. For example, when no matching category label is found, the matching condition may be adjusted to reduce the matching degree, so as to ensure that, for any target image, the category label corresponding to the target image may be determined. Thus, an image classification result is generated. Therefore, the situation that if the training set of the classification model does not comprise the class of the image to be classified, the image cannot be accurately classified is avoided. Further, the accuracy of the image classification result can be improved.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a schematic illustration of one application scenario of an image classification method of some embodiments of the present disclosure;
FIG. 2 is a flow chart of some embodiments of an image classification method according to the present disclosure;
FIG. 3 is a flow chart of other embodiments of an image classification method according to the present disclosure;
FIG. 4 is a schematic illustration of model training in some embodiments of an image classification method according to the present disclosure;
FIG. 5 is a schematic structural view of some embodiments of an image classification apparatus according to the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram of one application scenario of an image classification method of some embodiments of the present disclosure.
In the application scenario of fig. 1, first, the computing device 101 may perform image feature extraction on the target image 102 in the target image information, to obtain the image feature vector 103, where the target image information includes the category label group 104 and the target image 102. For example, target image 102 may be a toy vehicle image. The computing device 101 may then perform tag feature extraction on each of the category tags in the category tag set 104 described above, resulting in a tag feature vector set 105. For example, category labels may include, but are not limited to, at least one of: toy vehicles, remote control vehicles, cell phones, etc. Thereafter, the computing device 101 may generate an image tag matching information group 106 based on the image feature vector 103 and the tag feature vector group 105. Finally, the computing device 101 may generate the image classification result 107 based on the image tag matching information satisfying the preset matching condition in response to determining that the image tag matching information satisfying the preset matching condition exists in the image tag matching information group 106. For example, the image classification result 107 may be: "category label: toy car.
The computing device 101 may be hardware or software. When the computing device is hardware, the computing device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices listed above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.
It should be understood that the number of computing devices in fig. 1 is merely illustrative. There may be any number of computing devices, as desired for an implementation.
With continued reference to fig. 2, a flow 200 of some embodiments of an image classification method according to the present disclosure is shown. The image classification method flow 200 includes the steps of:
step 201, extracting image features of the target image in the target image information to obtain an image feature vector.
In some embodiments, the execution subject of the image classification method (such as the computing device 101 shown in fig. 1) may perform image feature extraction on the target image in the target image information, resulting in an image feature vector. The target image information may include a category tag group and the target image. The target image information may be image information that requires image classification. And extracting image features of the target image in the target image information through a feature extraction algorithm to obtain image feature vectors. Each category label in the category label group may be object information in the image. The target image may be an image to be classified.
As an example, the feature extraction algorithm described above may include, but is not limited to, at least one of: a Residual Network model, a VGG (Visual Geometry Group Network, convolutional neural Network) model, a google net (deep neural Network) model, and the like.
Step 202, extracting tag features of each category tag in the category tag group to obtain a tag feature vector group.
In some embodiments, the executing body may perform tag feature extraction on each category tag in the category tag set to obtain a tag feature vector set. The method comprises the steps of extracting tag features of each category tag in a category tag group through a tag feature extraction algorithm to obtain a tag feature vector group.
As an example, the tag feature extraction algorithm described above may include, but is not limited to, at least one of: transformer, one-Hot Encoding, etc.
Step 203, generating an image tag matching information group based on the image feature vector and the tag feature vector group.
In some embodiments, the execution body may generate the image tag matching information group based on the image feature vector and the tag feature vector group. First, a tag feature vector matched with the image feature vector is selected from the tag feature vector group through a vector matching algorithm, so as to obtain a matched tag feature vector group and a corresponding image tag similarity group. Then, each matching tag feature vector in the matching tag feature vector set and the corresponding image tag similarity can be determined as image tag matching information, and an image tag matching information set is obtained.
As an example, the vector matching algorithm described above may include, but is not limited to, at least one of: VSM (Vector Space Model ), BPDN-VF (efficient vector matching algorithm), etc.
In some optional implementations of some embodiments, the executing entity may generate the image tag matching information group based on the image feature vector and the tag feature vector group, and may include the steps of:
and determining the similarity between the image feature vector and each tag feature vector in the tag feature vector set as image tag matching information to obtain an image tag matching information set. Wherein, the image tag similarity between the image feature vector and each tag feature vector can be determined by a similarity algorithm. Then, each image tag similarity can be used as image tag matching information to obtain an image tag information group.
As an example, the similarity algorithm described above may include, but is not limited to, at least one of: cosine similarity, euclidean distance, manhattan distance, etc.
In step 204, in response to determining that the image tag matching information satisfying the preset matching condition exists in the image tag matching information group, an image classification result is generated based on the image tag matching information satisfying the preset matching condition.
In some embodiments, the executing body may generate the image classification result based on the image tag matching information satisfying the preset matching condition in response to determining that the image tag matching information satisfying the preset matching condition exists in the image tag matching information group. The preset matching condition may be that the similarity of the image tag included in the image tag matching information is greater than a preset similarity threshold. The image tag matching information with the largest similarity value can be selected from the image tag matching information meeting the preset matching condition and used as target image tag matching information. And determining the category label corresponding to the target image label matching information as an image classification result.
The above embodiments of the present disclosure have the following advantageous effects: by the image classification method of some embodiments of the present disclosure, the accuracy of the image classification result may be improved. Specifically, the reason for causing the accuracy of the image classification result to be lowered is that: if the training set of the classification model does not include the class of the image to be classified, the image cannot be accurately classified. Based on this, in the image classification method according to some embodiments of the present disclosure, first, image feature extraction is performed on a target image in target image information, so as to obtain an image feature vector. Wherein the target image information includes a category tag group and the target image. And then, extracting the tag characteristics of each category tag in the category tag group to obtain a tag characteristic vector group. And extracting image features of the target image and extracting tag features of the category tags respectively, so that the image classification can be conveniently carried out on the image feature vectors and the tag feature vectors in the follow-up process. Then, an image tag matching information group is generated based on the image feature vector and the tag feature vector group. And finally, in response to determining that the image tag matching information meeting the preset matching condition exists in the image tag matching information group, generating an image classification result based on the image tag matching information meeting the preset matching condition. The matching degree of the image feature vector and the label feature vector can be flexibly configured according to actual needs through the image label matching information and preset matching conditions. For example, when no matching category label is found, the matching condition may be adjusted to reduce the matching degree, so as to ensure that, for any target image, the category label corresponding to the target image may be determined. Thus, an image classification result is generated. Therefore, the situation that if the training set of the classification model does not comprise the class of the image to be classified, the image cannot be accurately classified is avoided. Further, the accuracy of the image classification result can be improved.
With further reference to fig. 3, a flow 300 of further embodiments of an image classification method is shown. The image classification method flow 300 comprises the steps of:
step 301, extracting image features of the target image in the target image information to obtain an image feature vector.
In some embodiments, the specific implementation manner and the technical effects of step 301 may refer to step 201 in those embodiments corresponding to fig. 2, which are not described herein.
Step 302, performing expansion processing on each category label in the category label group to obtain an expanded category label group.
In some embodiments, the execution body may extend each category label (typically, a keyword) in the category label group into a descriptive statement, to obtain an extended category label group. It can be seen that the information content of the expanded category label is more abundant than that of the original category label. The method comprises the steps of generating a network through a preset template or a pre-trained sentence, and performing expansion processing on each category label in the category label group to obtain an expanded category label group. The preset template may be: this is a picture. The expansion processing can be a sentence enriching category labels, and the length of the sentence is increased. For example, the category labels may be: toy car. The result of the rich category label statement may be: "this is a picture of a toy vehicle".
In practice, the sentence length of the category label can be increased through the expansion processing, so that the category label is expanded after expansion, and the information quantity of the category label is increased.
And 303, extracting tag features of each extended category tag in the extended category tag group based on a pre-trained image classification model to obtain a tag feature vector group.
In some embodiments, the executing body may perform label feature extraction on each of the extended category labels in the extended category label set based on a pre-trained image classification model, to obtain a label feature vector set. The image classification model may include an image feature extractor and a label feature extractor. The extended class labels may be input to the label feature extractor described above to obtain label feature vectors. The image feature extractor may be configured to perform image feature extraction on the target image to generate an image feature vector.
As an example, the image feature extractor may include, but is not limited to, at least one of: VGG (Visual Geometry Group Network, convolutional neural Network), resnet (Residual neural Network) model, and the like. The tag feature extractor may include, but is not limited to, at least one of: word2vec (Word vector model), LSA (Latent Semantic Analysis, latent semantic analysis model), etc.
In some optional implementations of some embodiments, the image classification model is trained by:
first, each sample information in the sample information group is input to an initial image classification model to generate a sample result pair, and a sample result pair set is obtained. The sample result pairs in the sample result pair set may include a sample image feature vector and a sample label feature vector. In practice, the sample image feature vector and the sample label feature vector in the sample result pair may be the same length.
And a second step of determining a sample loss value according to the distance between the sample image feature vector corresponding to the sample information and the sample label feature vector, the distance between the sample image feature vector corresponding to the sample information and the sample label feature vector corresponding to the rest of the sample information, and the distance between the sample label feature vector corresponding to the sample information and the sample image feature vector corresponding to the rest of the sample information for each sample information. The distance between the sample image feature vector and the sample label feature vector corresponding to the sample information may be: an index value of a product of the sample image feature vector and the sample tag feature vector. The distance between the sample image feature vector corresponding to the sample information and the sample label feature vector corresponding to the rest of sample information may be: and the sum of the index values of the sample image feature vector and the sample feature vector product corresponding to each piece of the rest sample information. The distance between the sample tag feature vector corresponding to the sample information and the sample image feature vector corresponding to the rest of sample information may be: and the sum of the index values of the sample label feature vector corresponding to the sample information and the sample label image feature vector product corresponding to each piece of the rest sample information. The index value may be obtained by an exponential function. The sample loss value for each sample may be determined by a preset loss function. For example, the loss function may include, but is not limited to, at least one of: cross entropy loss functions, maximum likelihood functions, etc.
As an example, if the loss function is a cross entropy loss function, the sample loss value may be generated by: first, the inverse of a logarithmic function of the distance between the sample image feature vector and the sample tag feature vector corresponding to the sample information and the ratio of the distance between the sample image feature vector and the sample tag feature vector corresponding to the remaining sample information may be determined as the first parameter. Then, the inverse of the logarithmic function of the distance between the sample image feature vector corresponding to the sample information and the sample tag feature vector, and the ratio of the distance between the sample tag feature vector and the sample image feature vector corresponding to the remaining sample information may be determined as the second parameter. Finally, a sum of the first parameter and the second parameter may be determined as the sample loss value.
In practice, by performing model training in this way, the distances between the sample image feature vector and the sample label feature vector, which are generated by the image classification model and correspond to each sample information, are close to each other, the distances between the sample image feature vector and the sample feature vector of the other samples, which correspond to the sample information, are far away from each other, and the distances between the sample label feature vector and the other sample label feature vectors, which correspond to the sample information, are far away from each other. Therefore, the linear degree of relatedness between the image feature vector generated by the image classification model and the corresponding label feature vector can be improved. Thus, the classification of the image can be determined more accurately when the image is classified.
As an example, as shown in fig. 4, a sample image 401 is input to an image feature extractor 402, resulting in a sample image feature vector 403. Each sample tag 404 is input to a tag feature extractor 405, resulting in a sample tag feature vector set 406.
And a third step of determining a target loss value corresponding to the sample loss value of each sample information in the sample information group. Wherein an average value of the sample loss values of the respective sample information may be determined as the target loss value.
And fourthly, adjusting related parameters in the initial image classification model in response to the fact that the target loss value does not meet the preset training condition. The training condition may be that a standard deviation between the target loss value and each of the historical loss values in the set of historical loss values is less than a preset standard deviation threshold. The historical loss value group can store the target loss value after the historical training. The standard deviation being less than a preset standard deviation threshold may characterize model training convergence.
In some optional implementations of some embodiments, the preset training condition may further be that the target loss value is greater than or equal to a preset loss threshold.
In some optional implementations of some embodiments, the sample information in the set of sample information may include a sample image and a sample label, and the initial image classification model may include an initial image feature extractor and an initial label feature extractor. Inputting each sample information in the sample information set to the initial image classification model to generate a sample result pair, resulting in a sample result pair set, may comprise the steps of:
And a first step of inputting the sample image in the sample information into the initial image feature extractor to obtain a sample image feature vector. For example, the sample image may be an image of a "toy vehicle".
And secondly, inputting the sample label in the sample information into the initial label feature extractor to obtain a sample label feature vector. For example, the sample tag may be: "this is an image of a toy vehicle".
And thirdly, determining the sample image feature vector and the sample label feature vector as a sample result pair.
Step 304, generating an image tag matching information group based on the image feature vector and the tag feature vector group.
In step 305, in response to determining that there is image tag matching information satisfying the preset matching condition in the image tag matching information group, an image classification result is generated based on the image tag matching information satisfying the preset matching condition.
In some embodiments, the specific implementation manner of steps 304-305 and the technical effects thereof may refer to steps 203-204 in those embodiments corresponding to fig. 2, which are not described herein.
As can be seen from fig. 3, compared with the description of some embodiments corresponding to fig. 2, the flow 300 of the image classification method in some embodiments corresponding to fig. 3 reflects the expansion step of category labels, and the general keywords are expanded into descriptive sentences, so that the information content of the labels is increased. Thereby, the probability of matching the target image to the category label can be increased. Further, the accuracy of the image classification result can be improved.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides embodiments of an image classification apparatus, which correspond to those method embodiments shown in fig. 2, and which are particularly applicable in various electronic devices.
As shown in fig. 5, the image classification apparatus 500 of some embodiments includes: an image feature extraction unit 501, a tag feature extraction unit 502, a first generation unit 503, and a second generation unit 504. Wherein, the image feature extraction unit 501 is configured to perform image feature extraction on a target image in target image information, so as to obtain an image feature vector, wherein the target image information includes a category tag group and the target image; a tag feature extraction unit 502 configured to extract tag features of each category tag in the category tag group to obtain a tag feature vector group; a first generating unit 503 configured to generate an image tag matching information group based on the image feature vector and the tag feature vector group; the second generating unit 504 is configured to generate, in response to determining that image tag matching information satisfying a preset matching condition exists in the image tag matching information group, an image classification result based on the image tag matching information satisfying the preset matching condition.
In an alternative implementation manner of some embodiments, the first generating unit 503 is further configured to determine a similarity between the image feature vector and each tag feature vector in the tag feature vector set as image tag matching information, to obtain an image tag matching information set.
In an optional implementation manner of some embodiments, the tag feature extraction unit 502 is further configured to perform an expansion process on each category tag in the category tag group to obtain an expanded category tag group; and extracting tag features of each extended category tag in the extended category tag group based on a pre-trained image classification model to obtain a tag feature vector group, wherein the image classification model comprises an image feature extractor and a tag feature extractor.
In an alternative implementation of some embodiments, the image classification model may be obtained through training: inputting each sample information in the sample information group into an initial image classification model to generate a sample result pair to obtain a sample result pair set, wherein the sample result pair in the sample result pair set comprises a sample image feature vector and a sample label feature vector; for each sample information, determining a sample loss value according to the distance between the sample image feature vector corresponding to the sample information and the sample label feature vector, the distance between the sample image feature vector corresponding to the sample information and the sample label feature vectors corresponding to the rest of sample information, and the distance between the sample label feature vector corresponding to the sample information and the sample image feature vector corresponding to the rest of sample information; determining a target loss value corresponding to a sample loss value of each sample information in the sample information group; and adjusting related parameters in the initial image classification model in response to determining that the target loss value does not meet a preset training condition.
In an alternative implementation of some embodiments, the training step may further include determining the initial image classification model as an image classification model in response to determining that the generated sample loss value satisfies the training condition.
In an optional implementation of some embodiments, the sample information in the sample information set includes a sample image and a sample label, and the initial image classification model includes an initial image feature extractor and an initial label feature extractor; and inputting each sample information in the preset sample information group to the initial image classification model in the training step to generate a sample result pair, which may include: inputting the sample image in the sample information into the initial image feature extractor to obtain a sample image feature vector; inputting the sample label in the sample information into the initial label feature extractor to obtain a sample label feature vector; and determining the sample image feature vector and the sample label feature vector as a sample result pair.
In an optional implementation manner of some embodiments, the preset training condition is that the target loss value is greater than or equal to a preset loss threshold.
It will be appreciated that the elements described in the apparatus 500 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 500 and the units contained therein, and are not described in detail herein.
Referring now to fig. 6, a schematic diagram of an electronic device 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 609, or from storage device 608, or from ROM 602. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.
It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting image features of a target image in target image information to obtain an image feature vector, wherein the target image information comprises a category label group and the target image; extracting tag features of each category tag in the category tag group to obtain a tag feature vector group; generating an image tag matching information group based on the image feature vector and the tag feature vector group; and generating an image classification result based on the image tag matching information meeting the preset matching condition in response to determining that the image tag matching information meeting the preset matching condition exists in the image tag matching information group.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an image feature extraction unit, a tag feature extraction unit, a first generation unit, and a second generation unit. The names of these units do not constitute limitations on the unit itself in some cases, and for example, the image feature extraction unit may also be described as "a unit that extracts an image feature vector".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
Claims (10)
1. An image classification method, comprising:
extracting image features of a target image in target image information to obtain an image feature vector, wherein the target image information comprises a category label group and the target image;
Extracting tag features of each category tag in the category tag group to obtain a tag feature vector group;
generating an image tag matching information group based on the image feature vector and the tag feature vector group;
and generating an image classification result based on the image tag matching information meeting the preset matching condition in response to determining that the image tag matching information meeting the preset matching condition exists in the image tag matching information group.
2. The method of claim 1, wherein the generating a set of image tag matching information based on the image feature vector and the set of tag feature vectors comprises:
and determining the similarity between the image feature vector and each tag feature vector in the tag feature vector group as image tag matching information to obtain an image tag matching information group.
3. The method of claim 1, wherein the extracting tag features for each category tag in the category tag set to obtain a tag feature vector set includes:
performing expansion processing on each category label in the category label group to obtain an expanded category label group;
and extracting tag features of each extended category tag in the extended category tag group based on a pre-trained image classification model to obtain a tag feature vector group, wherein the image classification model comprises an image feature extractor and a tag feature extractor.
4. A method according to claim 3, wherein the image classification model is trained by:
inputting each sample information in a sample information group into an initial image classification model to generate a sample result pair to obtain a sample result pair set, wherein the sample result pair in the sample result pair set comprises a sample image feature vector and a sample label feature vector;
for each sample information, determining a sample loss value according to the distance between a sample image feature vector corresponding to the sample information and a sample label feature vector, the distance between the sample image feature vector corresponding to the sample information and sample label feature vectors corresponding to the rest of sample information, and the distance between the sample label feature vector corresponding to the sample information and the sample image feature vector corresponding to the rest of sample information;
determining a target loss value corresponding to a sample loss value of each sample information in the sample information set;
and adjusting related parameters in the initial image classification model in response to determining that the target loss value does not meet a preset training condition.
5. The method of claim 4, wherein the method further comprises:
In response to determining that the generated sample loss value satisfies the training condition, the initial image classification model is determined as an image classification model.
6. The method of claim 4, wherein the sample information in the set of sample information comprises a sample image and a sample label, the initial image classification model comprising an initial image feature extractor and an initial label feature extractor; and
the inputting each sample information in the preset sample information set to the initial image classification model to generate a sample result pair includes:
inputting the sample image in the sample information to the initial image feature extractor to obtain a sample image feature vector;
inputting the sample label in the sample information to the initial label feature extractor to obtain a sample label feature vector;
and determining the sample image feature vector and the sample label feature vector as a sample result pair.
7. The method of claim 4, wherein the preset training condition is that the target loss value is greater than or equal to a preset loss threshold.
8. An image classification apparatus comprising:
an image feature extraction unit configured to perform image feature extraction on a target image in target image information to obtain an image feature vector, wherein the target image information includes a category tag group and the target image;
The tag feature extraction unit is configured to extract tag features of each category tag in the category tag group to obtain a tag feature vector group;
a first generation unit configured to generate an image tag matching information group based on the image feature vector and the tag feature vector group;
and a second generation unit configured to generate an image classification result based on the image tag matching information satisfying a preset matching condition in response to determining that the image tag matching information satisfying the preset matching condition exists in the image tag matching information group.
9. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210923199.XA CN117557822A (en) | 2022-08-02 | 2022-08-02 | Image classification method, apparatus, electronic device, and computer-readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210923199.XA CN117557822A (en) | 2022-08-02 | 2022-08-02 | Image classification method, apparatus, electronic device, and computer-readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117557822A true CN117557822A (en) | 2024-02-13 |
Family
ID=89811581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210923199.XA Pending CN117557822A (en) | 2022-08-02 | 2022-08-02 | Image classification method, apparatus, electronic device, and computer-readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117557822A (en) |
-
2022
- 2022-08-02 CN CN202210923199.XA patent/CN117557822A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444340B (en) | Text classification method, device, equipment and storage medium | |
CN111523640B (en) | Training method and device for neural network model | |
CN111428010A (en) | Man-machine intelligent question and answer method and device | |
CN112650841A (en) | Information processing method and device and electronic equipment | |
CN113408507B (en) | Named entity identification method and device based on resume file and electronic equipment | |
CN113449070A (en) | Multimodal data retrieval method, device, medium and electronic equipment | |
CN112182255A (en) | Method and apparatus for storing media files and for retrieving media files | |
CN115578570A (en) | Image processing method, device, readable medium and electronic equipment | |
CN116128055A (en) | Map construction method, map construction device, electronic equipment and computer readable medium | |
CN113591490B (en) | Information processing method and device and electronic equipment | |
CN111008213A (en) | Method and apparatus for generating language conversion model | |
CN114765025A (en) | Method for generating and recognizing speech recognition model, device, medium and equipment | |
CN111797263A (en) | Image label generation method, device, equipment and computer readable medium | |
CN112633004A (en) | Text punctuation deletion method and device, electronic equipment and storage medium | |
CN114970470B (en) | Method and device for processing file information, electronic equipment and computer readable medium | |
CN116843991A (en) | Model training method, information generating method, device, equipment and medium | |
CN112651231B (en) | Spoken language information processing method and device and electronic equipment | |
CN111460214B (en) | Classification model training method, audio classification method, device, medium and equipment | |
CN114625876A (en) | Method for generating author characteristic model, method and device for processing author information | |
CN113986958A (en) | Text information conversion method and device, readable medium and electronic equipment | |
CN117557822A (en) | Image classification method, apparatus, electronic device, and computer-readable medium | |
CN111581455A (en) | Text generation model generation method and device and electronic equipment | |
CN117172220B (en) | Text similarity information generation method, device, equipment and computer readable medium | |
CN113656573B (en) | Text information generation method, device and terminal equipment | |
CN115661238B (en) | Method and device for generating travelable region, electronic equipment and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |