CN111191691B - Fine granularity image classification method based on deep user click characteristics of part-of-speech decomposition - Google Patents
Fine granularity image classification method based on deep user click characteristics of part-of-speech decomposition Download PDFInfo
- Publication number
- CN111191691B CN111191691B CN201911296150.0A CN201911296150A CN111191691B CN 111191691 B CN111191691 B CN 111191691B CN 201911296150 A CN201911296150 A CN 201911296150A CN 111191691 B CN111191691 B CN 111191691B
- Authority
- CN
- China
- Prior art keywords
- click
- speech
- idf
- image
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 8
- 239000013598 vector Substances 0.000 claims abstract description 22
- 230000011218 segmentation Effects 0.000 claims abstract description 9
- 238000010276 construction Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 238000013145 classification model Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 230000002779 inactivation Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 238000003058 natural language processing Methods 0.000 abstract description 5
- 238000013459 approach Methods 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 230000015572 biosynthetic process Effects 0.000 abstract description 2
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a fine-granularity image classification method based on deep user click characteristics of part-of-speech decomposition. The invention firstly uses the user click data obtained from the Internet, uses the techniques of word segmentation, word stem formation, stop word removal and the like of natural language processing to obtain words, and simultaneously obtains the part of speech of the words, the part of speech selects proper keywords from the obtained words, then uses the obtained keywords and the corresponding word frequency to obtain the frequency characteristics of the word frequency inverse document, then integrates the characteristic vectors obtained in the mode to obtain a characteristic tensor, and finally uses the characteristic to specially construct and be suitable for classifying the network of the characteristic. The invention can effectively solve the problem of semantic gap which cannot be overcome by the traditional method on the premise of obtaining high accuracy. Another benefit of this approach is that it is more suitable for practical production practice activities due to the small size of the network architecture, ease of deployment. The method finally achieves excellent results on the Clickture-Dog data set.
Description
Technical Field
The invention belongs to the field of Fine-grained image classification (Fine-Grained Image Categorization, FGIC), and unlike the traditional method for classifying images by means of visual characteristics, the method is classified by using text data in a mode different from other modes of the images and constructing an End-to-End (End-to-End) deep neural network (Deep Neural Network). The invention can realize the high-precision classification requirement without using the traditional complex visual characteristics and only using the User Click Data acquired from the Internet.
Background
The fine-grained image classification is a classical computer vision task, and is different from the traditional classification task, and the purpose of fine-grained vision classification is to distinguish different subcategories under the same species, so that the fine-grained vision classification becomes a very challenging task due to the fact that the differences among different subcategories are fine, and pictures under the same subcategory are interfered by factors such as light, background shielding and the like. In real life, there is also a great need to identify sub-categories of different species. For example, in ecological protection, the effective identification of different species of organisms is an important prerequisite for ecological research. If low cost fine-grained image recognition can be achieved by means of computer vision techniques, it is of great importance both to the academia and industry.
From the feature construction process, the fine-grained image classification method goes through a development process from manual feature engineering, to multi-stage classification, and then to End-to-End (End to End) learning; from the research approach, it goes through a development process that uses only image data, to incorporate additional annotation information, to use only other types of data, such as text data. Because of the large intra-class difference and the fine inter-class difference of the fine-granularity classification task, the traditional artificial feature engineering cannot achieve the ideal effect. With the development of deep learning in recent years, great opportunity is brought to fine-granularity classification tasks, and the development of a large number of deep neural network models promotes the field to be rapidly developed.
One of the goals of classifying fine-grained images using user click data has been to address the Semantic Gap (Semantic Gap) problem. The existence of semantic gaps makes algorithms based on visual feature classification deficient in the prior art. One can deceptively forge an image that is meaningless from a human perspective, but that is likely to be a piece of meaningful data from a computer perspective. Furthermore, the user click data belongs to text data, which is easier to store than images. In actual production practice, classification models based on text data are easier to deploy than image-based ones. Meanwhile, the rapid development of natural language processing (Natural Language Processing, NLP) technology also well assists in the fine-grained image classification task based on text. The above two points are unique advantages of text data and can be used to effectively address challenges in conventional fine-grained image classification based on visual features.
In existing methods of constructing image features using click data, an image is often characterized as its vector of number of clicks in the query text space. Since the text in click data is composed of one or more words, feature construction using click data is generally classified into two types, namely, a construction method based on query text (i.e., original text space) and on query keywords (i.e., words divided in text), respectively. The key problem of the method based on the query text is that the query text is huge in quantity, so that the click characteristics of a user are too sparse and the dimension is too high, and the extraction of depth characteristics is not facilitated. The method based on the query keywords not only solves the problems, but also gives consideration to the inherent relation between words. The invention is also based on query keywords.
Disclosure of Invention
As mentioned above, the invention provides a fine-granularity image classification method (fine-grained image classification with Factorized Deep Click feature, FDC) based on the click feature construction idea of the query keyword and the deep user click feature based on the word segmentation. The invention firstly uses the user click data obtained from the Internet, uses the techniques of word segmentation, word stem formation, stop word removal and the like of natural language processing to obtain words, and simultaneously obtains the part of speech of the words, the part of speech selects proper keywords from the obtained words, then uses the obtained keywords and the corresponding word frequency to obtain the feature of word frequency inverse document frequency (TF-IDF), then integrates the feature vectors obtained in the mode to obtain a feature tensor, and finally uses the feature to specially construct and be suitable for classifying the network of the feature. The method mainly comprises the following steps:
step (1): part-of-speech construction dictionary set
Firstly, applying the techniques of natural language fields such as word segmentation, word stem extraction, part-of-speech tagging and the like to a user query text of a training image; selecting a plurality of parts of speech, and selecting words with higher clicking times for word sets of different parts of speech to construct a dictionary. Next, a number of keywords are extracted from each dictionary to construct a corpus (corpus) that is used to generate TF-IDF features.
Step (2): TF-IDF click tensor for constructing image
Firstly, according to a TF-IDF algorithm, according to any part of speech, the image is characterized as a TF-IDF click vector by utilizing corresponding word corpus and user click data. Secondly, constructing TF-IDF click vector structures under different parts of speech into TF-IDF click tensors by using an outer product operation.
Step (3): training deep click networks
Firstly, a relatively shallow click neural network classification model is constructed by utilizing a convolution layer and a full connection layer, each convolution layer is subjected to an activation function and a normalization layer (Batch Normalization, BN), and the network is trained by utilizing a random gradient descent method.
Step (4): fine granularity image classification based on deep click network
And (3) performing the operations of the steps (1) - (3) on the image, and extracting the depth click feature vector of the image, so as to realize fine-grained image classification based on the depth click feature.
Further, the word segmentation in the step (1) constructs a dictionary set, and the specific operation is as follows:
1-1. A click dataset comprising n images, m' query texts is utilized. For any piece of text click data (x i ,q j ,c i,j) wherein ,xi 、q i 、c i,j The method comprises the steps of respectively clicking times of images, query texts and corresponding images and texts, and utilizing word segmentation technology to click any piece of text data (x i ,q j ,c i,j ) Converting into the following word click set:
((x i ,w i,j,1 ,c i,j ),(x i ,w i,j,2 ,c i,j ),(x i ,w i,j,3 ,c i,j ) ,..) formula (1)
1-2 the operation of equation (1) is performed on all text click data resulting in a result of (x) i ,w i,j,k ,c i,j ) The word click set is formed, and parts of speech reduction, repeated word combination and parts of speech tagging are sequentially carried out on the words in the set, so that a click matrix C' formed by images, words and corresponding image-word click times is obtained. For words in C', the words are divided into M mutually disjoint sets according to part of speech. For the m-th part-of-speech set, selecting the front rho with the most clicks m Words forming part of speech dictionary of the mth kindThis is expressed as:
the words in (a) constitute the corpus (corpus) required to generate TF-IDF features at the mth part of speech.Is indicative of->Is the j-th word in (c).
Further, the specific construction steps of the TF-IDF click tensor of the image in the step (2) are as follows:
2-1 utilizing the C' sum in step 1-2Building an image x i TF-IDF click vector of the mth part of speech. Select word->Its corresponding click number is denoted +.> wherein />For image x i In the mth part-of-speech word w j Total number of clicks down. Constructing image x using C' and TF-IDF algorithms i TF-IDF click vector of part of speech m->The j-th element is defined as follows:
wherein ,to indicate a function. n is the total number of images. ρ' j Representing the frequency of the jth element in all images (number n) for calculating the inverse document frequency in the TF-IDF algorithm.
2-2, after the step 1-2 shows on all M medium parts of speech, TF-IDF click vectors under M different parts of speech are obtained. The vector set composed of these different part-of-speech TF-IDF click vectors is denoted as V i It is defined as follows:
2-3 TF-IDF click vector set V Using part-of-speech decomposition i The TF-IDF of the constructed image clicks on the tensor t, the elements of which are constructed as follows:
wherein ,as part-of-speech fusion function, it can be defined as any reasonable fusion operation (e.g., product, sum, average, maximum, etc.). One point to be particularly stated is that the TF-IDF click tensor t is an M-mode tensor.
Further, the specific structure of the network in the step (3) is as follows:
3-1. Network overall structure:
a structure of 4 convolutional layers plus 2 fully-connected layers is employed. The first half of the network is the convolutional layer and the second half is the fully-connected layer. Each convolution layer is followed by a Pooling layer, BN layer and ReLU layer. A Dropout layer with an inactivation rate of 0.8 is added between the two fully connected layers.
Of the four convolution layers described in step 3-1, each convolution layer has an M-mode 1-dimensional convolution, which is a convolution module composed of M consecutive one-dimensional convolution kernels, wherein the mth convolution kernel is performed on M-mode expansion of the point tensor, so that the network is better adapted to our data, and excellent recognition performance is obtained, which is one of the core innovation points of the present invention.
The invention has the beneficial effects that:
the invention is different from the traditional fine-grained image classification method, adopts click data with more abundant semantics to construct training data, and simultaneously provides a deep neural network specially aiming at the data, and on the basis, provides a fine-grained image classification method (fine-grained image classification with Factorized Deep Click feature, FDC) based on the participle deep user click characteristics. The method can effectively solve the problem of semantic gap which cannot be overcome by the traditional method on the premise of obtaining high accuracy. Another benefit of this approach is that it is more suitable for practical production practice activities due to the small size of the network architecture, ease of deployment. The method finally achieves excellent results on the Clickture-Dog data set.
Drawings
Fig. 1 is a schematic diagram of the overall framework and network architecture of the present invention.
FIG. 2 is a comparison of the accuracy achieved by the present invention on the Clickture-Dog dataset with other advanced methods.
Detailed description of the preferred embodiments
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in the frame diagram of fig. 1. A fine-grained image classification method based on the depth user click characteristics of the participle is specifically realized as follows:
the triplet (i, q, c) is first used as the input of the step (1), and after the processing of the step (1), the result shown in the first part of the frame graph, namely "Click Counts", can be obtained, namely the Click vector of one image.
Then taking the output of the step (1) as the input of the step (2), and after the processing of the step (2), we can obtain the TF-IDF characteristic of the part-of-speech of an image, namely the characteristic called "Factorized Features" in the second part.
Finally, the result of step (2) is input into step (3), and finally the click tensor needed by us is obtained, as shown in the third part. This is also the bright point of the present invention.
As shown in FIG. 2, the accuracy of the present invention in the click-Dog is compared with other advanced methods, the schematic diagram of the network structure is a part of the black solid line box in FIG. 1, and the specific structure of the convolution module is shown in the dashed line box. We need only input the click tensor from step (3) into the network to train the network. In the training stage, a random gradient method is utilized to train network parameters, and a counter propagation algorithm is utilized to update gradients; and in the test stage, after the data is input into the network, the classification result is obtained through a Softmax layer.
To fully illustrate the effectiveness of the present invention, we have compared with advanced methods in the current fine-grained image classification field, resulting in convincing results. Wherein LSTM (Long Short Term Memory) is a logo method of processing text data; VGG (Visual Geometry Group Net) is a classical model in the field of vision and has important reference significance; HBP (Hierarchical bilinear pooling Model) and NTS (Navigator-Teacher-Scrutinizer Network) are model methods recently proposed in the field and are excellent in performance. From the results, the contribution of the present invention in the field of fine-grained image classification is also enormous.
Claims (3)
1. The fine-granularity image classification method based on the deep user click characteristics of part-of-speech decomposition comprises the following steps:
step (1): part-of-speech construction dictionary set
Firstly, performing word segmentation, word stem extraction and part-of-speech tagging on a user query text of a training image, selecting a plurality of parts-of-speech, and selecting words with higher clicking times for word sets with different parts-of-speech to construct a dictionary;
secondly, extracting a set number of keywords from each dictionary to construct and generate a corpus of TF-IDF features;
step (2): TF-IDF click tensor for constructing image
Firstly, according to any part of speech and a TF-IDF algorithm, representing an image as a TF-IDF click vector by utilizing corresponding word corpus and user click data; secondly, constructing TF-IDF click vector structures under different parts of speech into TF-IDF click tensors by using outer product operation;
step (3): training deep click networks
Firstly, constructing a relatively shallow click neural network classification model by utilizing a convolution layer and a full connection layer, passing through an activation function and a normalization layer after each convolution layer, and training the network by utilizing a random gradient descent method;
step (4): fine granularity image classification based on deep click network
Performing the operations of the steps (1) - (3) on the image, extracting depth click feature vectors of the image, and therefore achieving fine-grained image classification based on the depth click features;
the word segmentation of the step (1) constructs a dictionary set, and the specific operation is as follows:
1-1. Using click data containing n images and m' query textsA collection; for any piece of text click data (x i ,q j ,c i,j) wherein ,xi 、q i 、c i,j The method comprises the steps of respectively clicking times of images, query texts and corresponding images and texts, and utilizing word segmentation technology to click any piece of text data (x i ,q j ,c i,j ) Converting into the following word click set:
((x i ,w i,j,1 ,c i,j ),(x i ,w i,j,2 ,c i,j ),(x i ,w i,j,3 ,c i,j ) ,..) formula (1)
1-2 the operation of equation (1) is performed on all text click data resulting in a result of (x) i ,w i,j,k ,c i,j ) Sequentially performing part-of-speech reduction, repeated word merging and part-of-speech tagging on the words in the set to obtain a click matrix C 'consisting of images, words and corresponding image-word click times';For words in C', dividing the words into M mutually disjoint sets according to parts of speech; for the m-th part-of-speech set, selecting the front rho with the most clicks m Words forming part of speech dictionary of the mth kindThis is expressed as:
the words in (a) constitute the corpus required for generating TF-IDF features under the mth part of speech; />Is indicative of->The first of (3)j words.
2. The fine-grained image classification method based on deep user click features of part-of-speech decomposition according to claim 1, wherein the specific construction steps of TF-IDF click tensor of the image in the step (2) are as follows:
2-1 utilizing the C' sum in step 1-2Building an image x i TF-IDF click vector of the mth part of speech; selecting wordsIts corresponding click number is denoted +.> wherein />For image x i In the mth part-of-speech word w j Total number of clicks down; constructing image x using C' and TF-IDF algorithms i TF-IDF click vector of part of speech m->The j-th element is defined as follows:
wherein ,is an indication function; n is the total number of images; ρ' j Representing the frequency of the jth element in all images for calculating the inverse document frequency in the TF-IDF algorithm;
2-2, after the step 1-2 shows on all M Chinese parts of speech, TF-IDF click vectors under M different parts of speech are obtained; the vector set formed by the TF-IDF click vectors with different parts of speech is recordedV as i It is defined as follows:
2-3 TF-IDF click vector set V Using part-of-speech decomposition i The TF-IDF of the constructed image clicks on the tensor t, the elements of which are constructed as follows:
wherein ,as part-of-speech fusion function, it can be defined as any reasonable fusion operation including product, sum, average, and maximum; the TF-IDF click tensor t is an M-mode tensor.
3. The fine-grained image classification method based on deep user click features of part-of-speech decomposition according to claim 2, wherein the specific structure of the network in the step (3) is as follows:
3-1. Network overall structure:
a structure of 4 convolution layers plus 2 full connection layers is adopted; the front half part of the network is a convolution layer, and the rear half part is a full connection layer; a Pooling layer, a BN layer and a ReLU layer are added behind each convolution layer; a Dropout layer with the inactivation rate of 0.8 is added between the two full-connection layers;
of the four convolution layers described in step 3-1, each convolution layer has an M-mode 1-dimensional convolution, which is a convolution module consisting of M consecutive one-dimensional convolution kernels, where the mth convolution kernel is performed on the M-mode expansion of the point tensor, thereby enabling the network to better adapt the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911296150.0A CN111191691B (en) | 2019-12-16 | 2019-12-16 | Fine granularity image classification method based on deep user click characteristics of part-of-speech decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911296150.0A CN111191691B (en) | 2019-12-16 | 2019-12-16 | Fine granularity image classification method based on deep user click characteristics of part-of-speech decomposition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111191691A CN111191691A (en) | 2020-05-22 |
CN111191691B true CN111191691B (en) | 2023-09-29 |
Family
ID=70707313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911296150.0A Active CN111191691B (en) | 2019-12-16 | 2019-12-16 | Fine granularity image classification method based on deep user click characteristics of part-of-speech decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111191691B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647691A (en) * | 2018-03-12 | 2018-10-12 | 杭州电子科技大学 | A kind of image classification method based on click feature prediction |
CN109947864A (en) * | 2018-06-27 | 2019-06-28 | 淮阴工学院 | One kind being based on the heuristic short text feature extraction and classifying method of TF-IDF and CNN |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10353950B2 (en) * | 2016-06-28 | 2019-07-16 | Google Llc | Visual recognition using user tap locations |
-
2019
- 2019-12-16 CN CN201911296150.0A patent/CN111191691B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647691A (en) * | 2018-03-12 | 2018-10-12 | 杭州电子科技大学 | A kind of image classification method based on click feature prediction |
CN109947864A (en) * | 2018-06-27 | 2019-06-28 | 淮阴工学院 | One kind being based on the heuristic short text feature extraction and classifying method of TF-IDF and CNN |
Non-Patent Citations (2)
Title |
---|
Lei Qi等.Exploiting spatial relation for fine-grained image classification.Pattern Recognition.2019,第47-55页. * |
俞俊 ; 谭敏 ; 张宏源 ; 张海超 ; .基于用户点击数据的细粒度图像识别方法概述.南京信息工程大学学报(自然科学版).2017,(第06期),第567-574页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111191691A (en) | 2020-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740148B (en) | Text emotion analysis method combining BiLSTM with Attention mechanism | |
CN109189925B (en) | Word vector model based on point mutual information and text classification method based on CNN | |
CN107679580B (en) | Heterogeneous migration image emotion polarity analysis method based on multi-mode depth potential correlation | |
Gupta et al. | Nonnegative shared subspace learning and its application to social media retrieval | |
Liu et al. | Image annotation via graph learning | |
Zheng et al. | Topic modeling of multimodal data: an autoregressive approach | |
Anupriya et al. | LDA based topic modeling of journal abstracts | |
Wang et al. | Exploiting high level feature for dynamic textures recognition | |
Zhou et al. | Joint image and text representation for aesthetics analysis | |
CN111368088A (en) | Text emotion classification method based on deep learning | |
CN108804595B (en) | Short text representation method based on word2vec | |
He et al. | A multi-attentive pyramidal model for visual sentiment analysis | |
CN107491782A (en) | Utilize the image classification method for a small amount of training data of semantic space information | |
CN112231482A (en) | Long and short text classification method based on scalable representation learning | |
CN114462392A (en) | Short text feature expansion method based on topic relevance and keyword association | |
CN113779283B (en) | Fine-grained cross-media retrieval method with deep supervision and feature fusion | |
CN105701516A (en) | Method for automatically marking image on the basis of attribute discrimination | |
Song et al. | Sparse multi-modal topical coding for image annotation | |
CN117556067B (en) | Data retrieval method, device, computer equipment and storage medium | |
Li et al. | Image decomposition with multilabel context: Algorithms and applications | |
Reda et al. | A hybrid arabic text summarization approach based on transformers | |
Jin et al. | Visual sentiment classification via low-rank regularization and label relaxation | |
CN111191691B (en) | Fine granularity image classification method based on deep user click characteristics of part-of-speech decomposition | |
Guo | Intelligent sports video classification based on deep neural network (DNN) algorithm and transfer learning | |
Wang et al. | A new transfer learning boosting approach based on distribution measure with an application on facial expression recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |