CN106919951B - Weak supervision bilinear deep learning method based on click and vision fusion - Google Patents

Weak supervision bilinear deep learning method based on click and vision fusion Download PDF

Info

Publication number
CN106919951B
CN106919951B CN201710059373.XA CN201710059373A CN106919951B CN 106919951 B CN106919951 B CN 106919951B CN 201710059373 A CN201710059373 A CN 201710059373A CN 106919951 B CN106919951 B CN 106919951B
Authority
CN
China
Prior art keywords
click
sample
features
learning
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710059373.XA
Other languages
Chinese (zh)
Other versions
CN106919951A (en
Inventor
俞俊
谭敏
郑光剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201710059373.XA priority Critical patent/CN106919951B/en
Publication of CN106919951A publication Critical patent/CN106919951A/en
Application granted granted Critical
Publication of CN106919951B publication Critical patent/CN106919951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The invention discloses a weak supervision bilinear deep learning method based on click and vision fusion. The invention comprises the following steps: 1. extracting click features formed by texts of each image from the click data set, and constructing new low-dimensional compact click features in a merged text space by merging texts with similar semantics; 2. constructing a depth model fused with the click and visual features; 3. BP learning network model parameters; 4. calculating model prediction loss of each training sample, constructing a similarity matrix of the sample set, learning sample reliability by using the sample loss and the similarity matrix, and weighting the samples by using the reliability; 5. repeating steps 3 and 4, iteratively optimizing the neural network model and the sample weights, thereby training the entire network model until convergence. The method integrates click data and visual features to construct a new bilinear convolutional neural network framework, and can be used for better identifying fine-grained images.

Description

Weak supervision bilinear deep learning method based on click and vision fusion
Technical Field
The invention relates to a fine-grained image classification method, in particular to a weak supervision bilinear deep learning method based on click and vision fusion.
Background
Fine-Grained classification (FGVC) is a sub-problem for object recognition as a research direction. The method is used for distinguishing different subclasses in the same type of objects, related objects are extremely similar in general appearance, certain related prior knowledge is needed to distinguish the objects, the fact that the objects are distinguished by inexperienced people is not easy to achieve, and the fact that a computer can automatically classify the objects is more challenging.
In a research task for fine-grained image recognition, Tsung-Yu Lin et al, massachusetts university, proposes a Bilinear convolutional neural network model (BCNN), and by applying the model to the task of fine-grained image recognition, a very good effect is found to be achieved. The model is based on the contents of popular deep learning in recent years, and consists of two different CNN network frameworks, two features with different expression properties are obtained by performing different convolutions on one image, and a feature vector with more representation capability is obtained by combining outer products, so that a better identification effect for fine-grained images is realized.
Although BCNN has proven to be a very effective model in fine-grained image recognition in view of the current state of the art, it still suffers from a deficiency in exploiting the semantic information of images. Therefore, it is very urgent to design an effective semantic feature. Many researchers would like to compensate for this by manually labeling attributes, but this approach is less promising due to the excessive labor cost. To address this problem, Microsoft released a new large-scale click dataset, Clicktube. This Microsoft published click data set comes from the record of a commercial search engine, which consists of three parts: text query, clicked pictures and corresponding click quantity. The three parts jointly express the correlation between the query text and the picture of the user, and the click rate quantifies the degree of correlation between the corresponding picture and the text. With the help of the click data, the image can use each query text as an attribute to obtain a feature related to semantic information, and the click quantity represents the value of each dimension (namely, attribute) corresponding to the feature.
The click data set is used as data collected from the internet, and has the advantages of large data volume, low labor cost and better capability of expressing semantic information. The visual features extracted by the BCNN are taken as main bodies, semantic features brought by clicking data are matched, the effect of promoting fine-grained image classification is feasible, and the method is worthy of study. In addition, the click data is taken as the hot direction of the current scientific research, and the reasonable use also enables the invention to have certain frontier and innovation.
Disclosure of Invention
The invention provides a weak supervision bilinear deep learning method based on click and vision fusion, which fuses click data and vision characteristics to construct a new bilinear convolutional neural network framework and can be used for better identifying fine-grained images.
A weak supervision bilinear deep learning method based on click and vision fusion comprises the following steps:
step (1), click data preprocessing:
extracting click features formed by texts of each image from the click data set, and constructing new low-dimensional compact click features in a merged text space by merging texts with similar semantics;
step (2), constructing a depth model fused with clicking and visual features:
and (3) weighting the samples based on reliability, constructing a weighted three-channel deep neural network model, wherein two channels extract image visual features, and the third channel processes the click features in the step (1). Fusing the visual and click characteristics through a characteristic connection layer;
step (3), BP learning network model parameters:
and (3) training the network model parameters of the neural network in the step (2) through a back propagation algorithm until the whole network model converges.
Step (4), learning sample reliability:
calculating the model prediction loss of each training sample according to the neural network model in the step (2), constructing a similarity matrix of the sample set, learning the reliability of the samples by using the sample loss and the similarity matrix, and weighting the samples by using the reliability;
step (5), model training:
repeating steps 3 and 4, iteratively optimizing the neural network model and the sample weights, thereby training the entire network model until convergence.
Extracting click features corresponding to the images from the click data set and clustering and combining the click features according to meanings in the step (1), wherein the specific steps are as follows:
1-1, extracting text corresponding to the image i from the click data set to form click characteristics
Figure GDA0002007514340000033
The specific formula is as follows:
Figure GDA0002007514340000031
wherein c isi,jIs the click rate for image i and text j.
1-2, in order to obtain short and compact feature vectors, reduce dimensions of click features so as to reduce calculated amount and solve problems of repeated text semantics and the like, clustering texts indirectly by using a K-means clustering method so as to obtain an index G of a text cluster, and adding click amounts of texts of the same class to obtain a new click feature uiSpecifically, as shown in formula 2:
Figure GDA0002007514340000032
wherein G isjRepresenting the jth text class.
Constructing a depth model fusing the click and the visual features, and connecting the visual features and the click features together, wherein the depth model is specifically as follows:
2-1, constructing a three-channel network frame structure W-C-BCNN, wherein the first two channels adopt a bilinear convolutional neural network to extract visual characteristics z of an imageiAnd (3) extracting the click feature u of the corresponding image obtained in the step (1) by the third channeli(ii) a Then, the extracted visual features and click features are spliced together through a connecting layer, and a feature o with visual and semantic expression capabilities is outputiSpecifically, as shown in formula 3:
oi=(zi,μui)=(zi,1,zi,2,…,μui,1,μui,2…) (equation 3)
Where μ represents a weight parameter.
2-2. given n training data (I)i,yi) Wherein y isi∈[1,2,...,N]Class labels representing each data, and obtaining network model parameters theta and sample reliability variables w by solving the weak supervision bilinear deep learning problem*Thus, the whole network model is trained until convergence, as shown in formula 4:
Figure GDA0002007514340000041
therein, rightWeight w*Representing the reliability of the training sample obtained after optimization, w representing the weight before optimization, particularly, when the weight is always 1, the network framework is called as C-BCNN, and since the weight is obtained by learning in continuous iterative optimization, the network framework is called as a weak supervised learning problem; p (w) is a weighted prior term, which is estimated based on the click amount of the click data by modeling, as shown in formula 5:
Figure GDA0002007514340000042
wherein
Figure GDA0002007514340000043
Is the normalized click vector; t (-) is an objective function of scale transformation, controlling wcA logarithmic transformation function of the scale range, which is used for processing the condition that the click number of the picture is unbalanced; s (G, w) is a smoothing term, and is based on the assumption of visual consistency of the image (i.e. weights are close when visual features are close), so as to perform regularization on the image, as shown in formula 6:
Figure GDA0002007514340000044
wherein g isi,jRepresenting the values in the sample similarity matrix G, the graph is computed and constructed using the similarity of the depth vision features z.
Training the network model parameters by using a back propagation algorithm until convergence in the step (3), which is specifically as follows:
3-1 training by using a back propagation algorithm to obtain a model parameter theta, let
Figure GDA0002007514340000046
As the gradient of the loss function to the input, a back propagation formula for the two deep nets a and B can be obtained according to the chain rule, as shown in formula 7:
Figure GDA0002007514340000045
wherein the content of the first and second substances,
Figure GDA0002007514340000051
and (4) learning the reliability variable w of the sample by using the sample loss and the similarity matrix*The method comprises the following steps:
4-1, extracting the softmax loss value of any training sample i in the network constructed based on the step (2) through inputting the data into the network for calculation
Figure GDA0002007514340000054
4-2, by fixing theta, converting the formula 4 into an optimization problem for solving the following quadratic programming, and learning to obtain a sample reliability parameter, which is specifically shown in a formula 8:
Figure GDA0002007514340000052
wherein I represents a unit vector, E represents a unit matrix, and LlapThe laplacian matrix representing G is specifically defined as shown in equation 9:
Figure GDA0002007514340000053
iteratively optimizing the model parameters and the sample weights until convergence in the step (5), wherein the specific process is as follows:
and 5-1, according to the weak supervised learning problem, iteratively optimizing the steps 3 and 4 in two steps in a variable control mode, so as to train the whole network model until convergence: 1) each weight w is fixediLearning by solving the problem of W-C-BCNN to obtain a network model parameter theta; 2) fixing each theta, converting the formula 4 into quadratic programming, and learning to obtain a sample reliability variable w*
The invention has the beneficial effects that:
the method integrates click data and visual features to construct a bilinear convolutional neural network framework, improves the defect that the conventional single visual feature is used for identifying the image, not only can obtain the feature with more representation capability by simultaneously capturing visual and semantic information of the image, but also can automatically weight training data based on the reliability of a data sample, and improves the effect of fine-grained image identification; in addition, the click data is taken as a current research hotspot, and the reasonable use also enables the invention to have more advanced and innovative scientific research.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is a schematic diagram of a network framework constructed in the method of the present invention.
FIG. 3 is a schematic diagram of network model training for the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, a weak supervised bilinear deep learning method based on click and visual fusion specifically includes the following steps:
extracting click features corresponding to the images from the click data set and clustering and combining the click features according to meanings in the step (1), wherein the specific steps are as follows:
1-1 to meet the experimental needs, we presented all Dog-related samples separately from the click data set Clickture available from microsoft, forming a new data set Clickture-Dog. The data set has 344 types of dog pictures, and we filter the types with the number of pictures less than 5, and finally obtain 283 groups of pictures. The data set was then segmented into training, validation, and testing in a 5: 3: 2 fashion. To improve the imbalance of the number of pictures in each class during training, we will choose more than 300 classes from which only 300 are randomly selected for training.
1-2, extracting text corresponding to the image i from the click data set click-Dog to form click characteristics
Figure GDA0002007514340000061
Specifically, as shown in equation 1, the length is 48 ten thousand.
1-3, in order to obtain short and compact feature vectors, reduce dimensions of click features to reduce calculated amount and solve problems of text semantic repetition and the like, clustering texts indirectly by using a K-means clustering method to obtain an index G of a text cluster, adding click amounts of texts of the same class to obtain new click features, specifically shown in formula 2, and finally obtaining the length of the click features of 4318 dimensions.
Constructing a depth model fusing the click and the visual features, and connecting the visual features and the click features together, wherein the depth model is specifically as follows:
2-1, constructing a three-channel network frame structure W-C-BCNN, as shown in figure 2, wherein the first two channels adopt a bilinear convolutional neural network to extract visual characteristics z of an imageiThe two channels respectively adopt VGG-M and VGG-16, the obtained visual feature length is 512 x 512 dimensions, and the third channel extracts the click feature u of the corresponding image obtained in the step (1)i(ii) a Then splicing the extracted visual features and click features through a connecting layer, wherein the connecting layer is specifically shown as a formula 3; wherein, setting mu in the formula as 1, adding a dropout layer after the network characteristic connection layer, and setting the parameter value as 0.1, namely, keeping the value of 0.1.
2-2. for a given number n of training data (I)i,yi) Wherein y isi∈[1,2,...,N]Class labels representing each data, network model parameters theta and sample reliability variables w obtained by solving the problem of weakly supervised learning*Specifically, the formula 4 is shown. When the weight w*When the value is always set to 1, the network effect of the C-BCNN is obtained through experiments; when the weight w*The initial value is set to 1, and when iterative optimization is continuously learned, the network effect of the W-C-BCNN is obtained through experiments.
2-3, for α and β in formula 4, we select a series of specific parameter values, wherein α e (0.01,0.1,1,10), β e (0.001,0.01,0.1,1,10), and the experiment shows that the group with the best effect is α -0.1 and β -1.
2-4, aiming at the similarity matrix G in the formula 6, the similarity matrix G is calculated and constructed according to the similarity of the depth visual feature z, and the depth visual feature is extracted by a VGG network.
Training the network model parameters by using a back propagation algorithm until convergence in the step (3), which is specifically as follows:
3-1. As shown in FIG. 3, model parameters θ are obtained by using back propagation algorithm training, so that
Figure GDA0002007514340000071
As the gradient of the loss function to the input, a back propagation formula for the two deep nets a and B can be obtained according to the chain rule, as shown in formula 6.
And (4) learning the reliability variable w of the sample by using the sample loss and the similarity matrix*The method comprises the following steps:
4-1, extracting the softmax loss value of any training sample i in the network constructed based on the step (2) through inputting the data into the network for calculation
Figure GDA0002007514340000072
And 4-2, converting the formula 4 into an optimization problem for solving quadratic programming by fixing theta, and learning to obtain a sample reliability parameter, wherein the sample reliability parameter is specifically shown in a formula 8, and can be obtained by calculating a formula 6 for G in a formula 9.
Iteratively optimizing the model parameters and the sample weights until convergence in the step (5), wherein the specific process is as follows:
and 5-1, according to the weak supervised learning problem, iteratively optimizing the steps 3 and 4 in two steps in a variable control mode, so as to train the whole network model until convergence: 1) each weight w is fixediLearning by solving the problem of W-C-BCNN to obtain a network model parameter theta; 2) fixing each theta, converting the formula 3 into quadratic programming, and learning to obtain a sample reliability variable w*
5-2, testing a network model: for the learned weight vector, a threshold value (2 in the experiment) is set for the learned weight to control the range, and the part of the weight exceeding the threshold value is evenly assigned to the corresponding term. We compared the effect achieved by this method with other methods and the results are shown in table 2. In addition, in order to improve the computational efficiency, the maxporoling method is adopted to shorten the dimension of the visual features to 4096 dimensions, and then the comparison of the recognition accuracy is uniformly carried out under the standard.
Table 1 compares the recognition accuracy of C-BCNN to BCNN, and the improved ratio.
Model (model) BCNN C-BCNN Ratio
Accuracy (%) 33.20 50.80 53%
Table 2 shows the comparison of the recognition accuracy between C-BCNN and W-C-BCNN, showing the effect under different treatments of weights, wherein W-C-BCNN (T) is a method for controlling the range of weight vector, and W-C-BCNN is a method for not controlling the range.
Method of producing a composite material C-BCNN W-C-BCNN W-C-BCNN(T)
Accuracy (%) 47.10 48.90 48.90

Claims (4)

1. A weak supervision bilinear deep learning method based on click and vision fusion is characterized by comprising the following steps:
step (1), click data preprocessing:
extracting click features formed by texts of each image from the click data set, and constructing new low-dimensional compact click features in a merged text space by merging texts with similar semantics;
step (2), constructing a depth model fused with clicking and visual features:
weighting the sample based on reliability, and constructing a weighted three-channel deep neural network model, wherein two channels extract image visual features, and the third channel processes the click features in the step (1); fusing the visual and click characteristics through a characteristic connection layer;
step (3), BP learning model parameters:
training the network model parameters of the neural network in the step (2) through a back propagation algorithm until the whole network model converges;
step (4), learning sample reliability:
calculating the model prediction loss of each training sample according to the neural network model in the step (2), constructing a similarity matrix of the sample set, learning the reliability of the samples by using the sample loss and the similarity matrix, and weighting the samples by using the reliability;
step (5), model training:
repeating steps (3) and (4), iteratively optimizing the neural network model and the sample weights, and thus training the whole network model until convergence;
extracting click features corresponding to the images from the click data set and clustering and combining the click features according to meanings in the step (1), wherein the specific steps are as follows:
1-1, extracting text corresponding to the image i from the click data set to form click characteristics
Figure FDA0002243954520000011
The specific formula is as follows:
Figure FDA0002243954520000012
wherein c isi,jThe click rate corresponding to the image i and the text j;
1-2, in order to obtain short and compact feature vectors, reducing dimensions of click features so as to reduce calculated amount and solve the problem of repeated text semantics, a K-means clustering method is used for indirectly clustering texts so as to obtain an index of text clustering
Figure FDA0002243954520000026
Adding the click quantities of the texts in the same class to obtain a new click characteristic uiSpecifically, as shown in formula 2:
Figure FDA0002243954520000021
wherein
Figure FDA0002243954520000022
Represents the jth text class;
constructing a depth model fusing the click and the visual features, and connecting the visual features and the click features together, wherein the depth model is specifically as follows:
2-1, constructing a three-channel network frame structure W-C-BCNN, wherein the first two channels adopt a bilinear convolutional neural network to extract visual characteristics z of an imageiAnd (3) extracting the click feature u of the corresponding image obtained in the step (1) by the third channeli(ii) a However, the device is not suitable for use in a kitchenThen the extracted visual features and click features are spliced together through a connecting layer, and a feature o with visual and semantic expression capabilities is outputiSpecifically, as shown in formula 3:
oi=(zi,μui)=(zi,1,zi,2,…,μui,1,μui,2…) (equation 3)
Wherein μ represents a weight parameter;
2-2. given n training data
Figure FDA0002243954520000023
Wherein y isi∈[1,2,...,N]Class labels representing each data, and obtaining network model parameters theta and sample reliability variables w by solving the weak supervision bilinear deep learning problem*Thus, the whole network model is trained until convergence, as shown in formula 4:
Figure FDA0002243954520000024
Figure FDA0002243954520000025
wherein the weight w*Representing the reliability of the training sample obtained after optimization, w representing the weight before optimization, particularly, when the weight is always 1, the network framework is called as C-BCNN, and the weight is obtained by learning in continuous iterative optimization, so that the problem of weak supervised learning is called; p (w) is a weighted prior term, which is estimated based on the click amount of the click data by modeling, as shown in formula 5:
Figure FDA0002243954520000031
wherein
Figure FDA0002243954520000032
Is the normalized click vector; t (-) is a scale transformationObjective function, control wcA logarithmic transformation function of the scale range, which is used for processing the condition that the click number of the picture is unbalanced; s (G, w) is a smoothing term, and is an assumption of visual consistency of the image, so as to perform regularization on the image, as shown in formula 6:
Figure FDA0002243954520000033
wherein g isi,jRepresenting the values in the sample similarity matrix G, the depth model is computed and constructed using the similarities of the visual features z.
2. The weakly supervised bilinear deep learning method based on click and vision fusion as claimed in claim 1, wherein the network model parameters are trained by using a back propagation algorithm until convergence in step (3), specifically as follows:
3-1, training by using a back propagation algorithm to obtain a model parameter θ, and taking dl/dx as a gradient of a loss function pair input, then obtaining a back propagation formula about two deep networks a and B according to a chain rule, as shown in formula 7:
Figure FDA0002243954520000034
wherein the content of the first and second substances,
Figure FDA0002243954520000035
3. the weakly supervised bilinear deep learning method based on click and vision fusion as claimed in claim 2, wherein the step (4) of learning the reliability variable w of the sample by using the sample loss and the similarity matrix*The method comprises the following steps:
4-1, extracting the softmax loss value l (y) of any training sample i by inputting the data into the network constructed based on the step (2) for calculationi,oi);
4-2, by fixing theta, converting the formula 4 into an optimization problem for solving the following quadratic programming, and learning to obtain a sample reliability parameter, which is specifically shown in a formula 8:
Figure FDA0002243954520000041
wherein I represents a unit vector, E represents a unit matrix, and LlapThe laplacian matrix representing G is specifically defined as shown in equation 9:
Figure FDA0002243954520000042
4. the weakly supervised bilinear deep learning method based on click and vision fusion as claimed in claim 3, wherein the model parameters and the sample weights are iteratively optimized until convergence in step (5), and the specific process is as follows:
and 5-1, according to the weak supervised learning problem, iteratively optimizing the steps 3 and 4 in two steps in a variable control mode, so as to train the whole network model until convergence: 1) each weight w is fixediLearning by solving the problem of W-C-BCNN to obtain a network model parameter theta; 2) fixing each theta, converting the formula 4 into quadratic programming, and learning to obtain a sample reliability variable w*
CN201710059373.XA 2017-01-24 2017-01-24 Weak supervision bilinear deep learning method based on click and vision fusion Active CN106919951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710059373.XA CN106919951B (en) 2017-01-24 2017-01-24 Weak supervision bilinear deep learning method based on click and vision fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710059373.XA CN106919951B (en) 2017-01-24 2017-01-24 Weak supervision bilinear deep learning method based on click and vision fusion

Publications (2)

Publication Number Publication Date
CN106919951A CN106919951A (en) 2017-07-04
CN106919951B true CN106919951B (en) 2020-04-21

Family

ID=59453478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710059373.XA Active CN106919951B (en) 2017-01-24 2017-01-24 Weak supervision bilinear deep learning method based on click and vision fusion

Country Status (1)

Country Link
CN (1) CN106919951B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506426A (en) * 2017-08-18 2017-12-22 四川长虹电器股份有限公司 A kind of implementation method of intelligent television automated intelligent response robot
CN107766794B (en) * 2017-09-22 2021-05-14 天津大学 Image semantic segmentation method with learnable feature fusion coefficient
CN108197561B (en) * 2017-12-29 2020-11-03 智慧眼科技股份有限公司 Face recognition model optimization control method, device, equipment and storage medium
CN108647691B (en) * 2018-03-12 2020-07-17 杭州电子科技大学 Image classification method based on click feature prediction
CN109002845B (en) * 2018-06-29 2021-04-20 西安交通大学 Fine-grained image classification method based on deep convolutional neural network
CN109447098B (en) * 2018-08-27 2022-03-18 西北大学 Image clustering algorithm based on deep semantic embedding
CN109086753B (en) * 2018-10-08 2022-05-10 新疆大学 Traffic sign identification method and device based on two-channel convolutional neural network
CN109582782A (en) * 2018-10-26 2019-04-05 杭州电子科技大学 A kind of Text Clustering Method based on Weakly supervised deep learning
CN109685115B (en) * 2018-11-30 2022-10-14 西北大学 Fine-grained conceptual model with bilinear feature fusion and learning method
CN109583507B (en) * 2018-12-07 2023-04-18 浙江工商大学 Pig body identification method based on deep convolutional neural network
CN109815973A (en) * 2018-12-07 2019-05-28 天津大学 A kind of deep learning method suitable for the identification of fish fine granularity
CN109933788B (en) * 2019-02-14 2023-05-23 北京百度网讯科技有限公司 Type determining method, device, equipment and medium
CN109886345B (en) * 2019-02-27 2020-11-13 清华大学 Self-supervision learning model training method and device based on relational reasoning
CN110245662B (en) * 2019-06-18 2021-08-10 腾讯科技(深圳)有限公司 Detection model training method and device, computer equipment and storage medium
CN113096023B (en) * 2020-01-08 2023-10-27 字节跳动有限公司 Training method, image processing method and device for neural network and storage medium
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002007854A (en) * 2000-06-21 2002-01-11 Nippon Telegr & Teleph Corp <Ntt> Method of displaying advertisement, and advertisement system
CN102880729A (en) * 2012-11-02 2013-01-16 深圳市宜搜科技发展有限公司 Figure image retrieval method and device based on human face detection and recognition
CN104317827A (en) * 2014-10-09 2015-01-28 深圳码隆科技有限公司 Picture navigation method of commodity
CN105653701A (en) * 2015-12-31 2016-06-08 百度在线网络技术(北京)有限公司 Model generating method and device as well as word weighting method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002007854A (en) * 2000-06-21 2002-01-11 Nippon Telegr & Teleph Corp <Ntt> Method of displaying advertisement, and advertisement system
CN102880729A (en) * 2012-11-02 2013-01-16 深圳市宜搜科技发展有限公司 Figure image retrieval method and device based on human face detection and recognition
CN104317827A (en) * 2014-10-09 2015-01-28 深圳码隆科技有限公司 Picture navigation method of commodity
CN105653701A (en) * 2015-12-31 2016-06-08 百度在线网络技术(北京)有限公司 Model generating method and device as well as word weighting method and device

Also Published As

Publication number Publication date
CN106919951A (en) 2017-07-04

Similar Documents

Publication Publication Date Title
CN106919951B (en) Weak supervision bilinear deep learning method based on click and vision fusion
Yang et al. Visual sentiment prediction based on automatic discovery of affective regions
Cetinic et al. A deep learning perspective on beauty, sentiment, and remembrance of art
Wang et al. A multi-scene deep learning model for image aesthetic evaluation
Chen et al. A deep learning framework for time series classification using Relative Position Matrix and Convolutional Neural Network
Yang et al. Deep relative attributes
Kao et al. Visual aesthetic quality assessment with a regression model
WO2017113232A1 (en) Product classification method and apparatus based on deep learning
Mittal et al. Image sentiment analysis using deep learning
CN110363253A (en) A kind of Surfaces of Hot Rolled Strip defect classification method based on convolutional neural networks
CN109492750B (en) Zero sample image classification method based on convolutional neural network and factor space
CN108846047A (en) A kind of picture retrieval method and system based on convolution feature
Tian et al. Diagnosis of typical apple diseases: a deep learning method based on multi-scale dense classification network
Zhang et al. Structured weak semantic space construction for visual categorization
Liang et al. Comparison detector for cervical cell/clumps detection in the limited data scenario
CN109815920A (en) Gesture identification method based on convolutional neural networks and confrontation convolutional neural networks
Gehlot et al. Ednfc-net: Convolutional neural network with nested feature concatenation for nuclei-instance segmentation
CN107491782A (en) Utilize the image classification method for a small amount of training data of semantic space information
CN110110724A (en) The text authentication code recognition methods of function drive capsule neural network is squeezed based on exponential type
Menaka et al. Chromenet: A CNN architecture with comparison of optimizers for classification of human chromosome images
Alamsyah et al. Object detection using convolutional neural network to identify popular fashion product
Huang et al. Fine-art painting classification via two-channel deep residual network
Xu et al. Weakly supervised facial expression recognition via transferred DAL-CNN and active incremental learning
Zhong et al. An emotion classification algorithm based on SPT-CapsNet
Chen et al. Bottom-up improved multistage temporal convolutional network for action segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant