CN106919951B - Weak supervision bilinear deep learning method based on click and vision fusion - Google Patents
Weak supervision bilinear deep learning method based on click and vision fusion Download PDFInfo
- Publication number
- CN106919951B CN106919951B CN201710059373.XA CN201710059373A CN106919951B CN 106919951 B CN106919951 B CN 106919951B CN 201710059373 A CN201710059373 A CN 201710059373A CN 106919951 B CN106919951 B CN 106919951B
- Authority
- CN
- China
- Prior art keywords
- click
- sample
- features
- learning
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Abstract
The invention discloses a weak supervision bilinear deep learning method based on click and vision fusion. The invention comprises the following steps: 1. extracting click features formed by texts of each image from the click data set, and constructing new low-dimensional compact click features in a merged text space by merging texts with similar semantics; 2. constructing a depth model fused with the click and visual features; 3. BP learning network model parameters; 4. calculating model prediction loss of each training sample, constructing a similarity matrix of the sample set, learning sample reliability by using the sample loss and the similarity matrix, and weighting the samples by using the reliability; 5. repeating steps 3 and 4, iteratively optimizing the neural network model and the sample weights, thereby training the entire network model until convergence. The method integrates click data and visual features to construct a new bilinear convolutional neural network framework, and can be used for better identifying fine-grained images.
Description
Technical Field
The invention relates to a fine-grained image classification method, in particular to a weak supervision bilinear deep learning method based on click and vision fusion.
Background
Fine-Grained classification (FGVC) is a sub-problem for object recognition as a research direction. The method is used for distinguishing different subclasses in the same type of objects, related objects are extremely similar in general appearance, certain related prior knowledge is needed to distinguish the objects, the fact that the objects are distinguished by inexperienced people is not easy to achieve, and the fact that a computer can automatically classify the objects is more challenging.
In a research task for fine-grained image recognition, Tsung-Yu Lin et al, massachusetts university, proposes a Bilinear convolutional neural network model (BCNN), and by applying the model to the task of fine-grained image recognition, a very good effect is found to be achieved. The model is based on the contents of popular deep learning in recent years, and consists of two different CNN network frameworks, two features with different expression properties are obtained by performing different convolutions on one image, and a feature vector with more representation capability is obtained by combining outer products, so that a better identification effect for fine-grained images is realized.
Although BCNN has proven to be a very effective model in fine-grained image recognition in view of the current state of the art, it still suffers from a deficiency in exploiting the semantic information of images. Therefore, it is very urgent to design an effective semantic feature. Many researchers would like to compensate for this by manually labeling attributes, but this approach is less promising due to the excessive labor cost. To address this problem, Microsoft released a new large-scale click dataset, Clicktube. This Microsoft published click data set comes from the record of a commercial search engine, which consists of three parts: text query, clicked pictures and corresponding click quantity. The three parts jointly express the correlation between the query text and the picture of the user, and the click rate quantifies the degree of correlation between the corresponding picture and the text. With the help of the click data, the image can use each query text as an attribute to obtain a feature related to semantic information, and the click quantity represents the value of each dimension (namely, attribute) corresponding to the feature.
The click data set is used as data collected from the internet, and has the advantages of large data volume, low labor cost and better capability of expressing semantic information. The visual features extracted by the BCNN are taken as main bodies, semantic features brought by clicking data are matched, the effect of promoting fine-grained image classification is feasible, and the method is worthy of study. In addition, the click data is taken as the hot direction of the current scientific research, and the reasonable use also enables the invention to have certain frontier and innovation.
Disclosure of Invention
The invention provides a weak supervision bilinear deep learning method based on click and vision fusion, which fuses click data and vision characteristics to construct a new bilinear convolutional neural network framework and can be used for better identifying fine-grained images.
A weak supervision bilinear deep learning method based on click and vision fusion comprises the following steps:
step (1), click data preprocessing:
extracting click features formed by texts of each image from the click data set, and constructing new low-dimensional compact click features in a merged text space by merging texts with similar semantics;
step (2), constructing a depth model fused with clicking and visual features:
and (3) weighting the samples based on reliability, constructing a weighted three-channel deep neural network model, wherein two channels extract image visual features, and the third channel processes the click features in the step (1). Fusing the visual and click characteristics through a characteristic connection layer;
step (3), BP learning network model parameters:
and (3) training the network model parameters of the neural network in the step (2) through a back propagation algorithm until the whole network model converges.
Step (4), learning sample reliability:
calculating the model prediction loss of each training sample according to the neural network model in the step (2), constructing a similarity matrix of the sample set, learning the reliability of the samples by using the sample loss and the similarity matrix, and weighting the samples by using the reliability;
step (5), model training:
repeating steps 3 and 4, iteratively optimizing the neural network model and the sample weights, thereby training the entire network model until convergence.
Extracting click features corresponding to the images from the click data set and clustering and combining the click features according to meanings in the step (1), wherein the specific steps are as follows:
1-1, extracting text corresponding to the image i from the click data set to form click characteristicsThe specific formula is as follows:
wherein c isi,jIs the click rate for image i and text j.
1-2, in order to obtain short and compact feature vectors, reduce dimensions of click features so as to reduce calculated amount and solve problems of repeated text semantics and the like, clustering texts indirectly by using a K-means clustering method so as to obtain an index G of a text cluster, and adding click amounts of texts of the same class to obtain a new click feature uiSpecifically, as shown in formula 2:
wherein G isjRepresenting the jth text class.
Constructing a depth model fusing the click and the visual features, and connecting the visual features and the click features together, wherein the depth model is specifically as follows:
2-1, constructing a three-channel network frame structure W-C-BCNN, wherein the first two channels adopt a bilinear convolutional neural network to extract visual characteristics z of an imageiAnd (3) extracting the click feature u of the corresponding image obtained in the step (1) by the third channeli(ii) a Then, the extracted visual features and click features are spliced together through a connecting layer, and a feature o with visual and semantic expression capabilities is outputiSpecifically, as shown in formula 3:
oi=(zi,μui)=(zi,1,zi,2,…,μui,1,μui,2…) (equation 3)
Where μ represents a weight parameter.
2-2. given n training data (I)i,yi) Wherein y isi∈[1,2,...,N]Class labels representing each data, and obtaining network model parameters theta and sample reliability variables w by solving the weak supervision bilinear deep learning problem*Thus, the whole network model is trained until convergence, as shown in formula 4:
therein, rightWeight w*Representing the reliability of the training sample obtained after optimization, w representing the weight before optimization, particularly, when the weight is always 1, the network framework is called as C-BCNN, and since the weight is obtained by learning in continuous iterative optimization, the network framework is called as a weak supervised learning problem; p (w) is a weighted prior term, which is estimated based on the click amount of the click data by modeling, as shown in formula 5:
whereinIs the normalized click vector; t (-) is an objective function of scale transformation, controlling wcA logarithmic transformation function of the scale range, which is used for processing the condition that the click number of the picture is unbalanced; s (G, w) is a smoothing term, and is based on the assumption of visual consistency of the image (i.e. weights are close when visual features are close), so as to perform regularization on the image, as shown in formula 6:
wherein g isi,jRepresenting the values in the sample similarity matrix G, the graph is computed and constructed using the similarity of the depth vision features z.
Training the network model parameters by using a back propagation algorithm until convergence in the step (3), which is specifically as follows:
3-1 training by using a back propagation algorithm to obtain a model parameter theta, letAs the gradient of the loss function to the input, a back propagation formula for the two deep nets a and B can be obtained according to the chain rule, as shown in formula 7:
and (4) learning the reliability variable w of the sample by using the sample loss and the similarity matrix*The method comprises the following steps:
4-1, extracting the softmax loss value of any training sample i in the network constructed based on the step (2) through inputting the data into the network for calculation
4-2, by fixing theta, converting the formula 4 into an optimization problem for solving the following quadratic programming, and learning to obtain a sample reliability parameter, which is specifically shown in a formula 8:
wherein I represents a unit vector, E represents a unit matrix, and LlapThe laplacian matrix representing G is specifically defined as shown in equation 9:
iteratively optimizing the model parameters and the sample weights until convergence in the step (5), wherein the specific process is as follows:
and 5-1, according to the weak supervised learning problem, iteratively optimizing the steps 3 and 4 in two steps in a variable control mode, so as to train the whole network model until convergence: 1) each weight w is fixediLearning by solving the problem of W-C-BCNN to obtain a network model parameter theta; 2) fixing each theta, converting the formula 4 into quadratic programming, and learning to obtain a sample reliability variable w*。
The invention has the beneficial effects that:
the method integrates click data and visual features to construct a bilinear convolutional neural network framework, improves the defect that the conventional single visual feature is used for identifying the image, not only can obtain the feature with more representation capability by simultaneously capturing visual and semantic information of the image, but also can automatically weight training data based on the reliability of a data sample, and improves the effect of fine-grained image identification; in addition, the click data is taken as a current research hotspot, and the reasonable use also enables the invention to have more advanced and innovative scientific research.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is a schematic diagram of a network framework constructed in the method of the present invention.
FIG. 3 is a schematic diagram of network model training for the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, a weak supervised bilinear deep learning method based on click and visual fusion specifically includes the following steps:
extracting click features corresponding to the images from the click data set and clustering and combining the click features according to meanings in the step (1), wherein the specific steps are as follows:
1-1 to meet the experimental needs, we presented all Dog-related samples separately from the click data set Clickture available from microsoft, forming a new data set Clickture-Dog. The data set has 344 types of dog pictures, and we filter the types with the number of pictures less than 5, and finally obtain 283 groups of pictures. The data set was then segmented into training, validation, and testing in a 5: 3: 2 fashion. To improve the imbalance of the number of pictures in each class during training, we will choose more than 300 classes from which only 300 are randomly selected for training.
1-2, extracting text corresponding to the image i from the click data set click-Dog to form click characteristicsSpecifically, as shown in equation 1, the length is 48 ten thousand.
1-3, in order to obtain short and compact feature vectors, reduce dimensions of click features to reduce calculated amount and solve problems of text semantic repetition and the like, clustering texts indirectly by using a K-means clustering method to obtain an index G of a text cluster, adding click amounts of texts of the same class to obtain new click features, specifically shown in formula 2, and finally obtaining the length of the click features of 4318 dimensions.
Constructing a depth model fusing the click and the visual features, and connecting the visual features and the click features together, wherein the depth model is specifically as follows:
2-1, constructing a three-channel network frame structure W-C-BCNN, as shown in figure 2, wherein the first two channels adopt a bilinear convolutional neural network to extract visual characteristics z of an imageiThe two channels respectively adopt VGG-M and VGG-16, the obtained visual feature length is 512 x 512 dimensions, and the third channel extracts the click feature u of the corresponding image obtained in the step (1)i(ii) a Then splicing the extracted visual features and click features through a connecting layer, wherein the connecting layer is specifically shown as a formula 3; wherein, setting mu in the formula as 1, adding a dropout layer after the network characteristic connection layer, and setting the parameter value as 0.1, namely, keeping the value of 0.1.
2-2. for a given number n of training data (I)i,yi) Wherein y isi∈[1,2,...,N]Class labels representing each data, network model parameters theta and sample reliability variables w obtained by solving the problem of weakly supervised learning*Specifically, the formula 4 is shown. When the weight w*When the value is always set to 1, the network effect of the C-BCNN is obtained through experiments; when the weight w*The initial value is set to 1, and when iterative optimization is continuously learned, the network effect of the W-C-BCNN is obtained through experiments.
2-3, for α and β in formula 4, we select a series of specific parameter values, wherein α e (0.01,0.1,1,10), β e (0.001,0.01,0.1,1,10), and the experiment shows that the group with the best effect is α -0.1 and β -1.
2-4, aiming at the similarity matrix G in the formula 6, the similarity matrix G is calculated and constructed according to the similarity of the depth visual feature z, and the depth visual feature is extracted by a VGG network.
Training the network model parameters by using a back propagation algorithm until convergence in the step (3), which is specifically as follows:
3-1. As shown in FIG. 3, model parameters θ are obtained by using back propagation algorithm training, so thatAs the gradient of the loss function to the input, a back propagation formula for the two deep nets a and B can be obtained according to the chain rule, as shown in formula 6.
And (4) learning the reliability variable w of the sample by using the sample loss and the similarity matrix*The method comprises the following steps:
4-1, extracting the softmax loss value of any training sample i in the network constructed based on the step (2) through inputting the data into the network for calculation
And 4-2, converting the formula 4 into an optimization problem for solving quadratic programming by fixing theta, and learning to obtain a sample reliability parameter, wherein the sample reliability parameter is specifically shown in a formula 8, and can be obtained by calculating a formula 6 for G in a formula 9.
Iteratively optimizing the model parameters and the sample weights until convergence in the step (5), wherein the specific process is as follows:
and 5-1, according to the weak supervised learning problem, iteratively optimizing the steps 3 and 4 in two steps in a variable control mode, so as to train the whole network model until convergence: 1) each weight w is fixediLearning by solving the problem of W-C-BCNN to obtain a network model parameter theta; 2) fixing each theta, converting the formula 3 into quadratic programming, and learning to obtain a sample reliability variable w*。
5-2, testing a network model: for the learned weight vector, a threshold value (2 in the experiment) is set for the learned weight to control the range, and the part of the weight exceeding the threshold value is evenly assigned to the corresponding term. We compared the effect achieved by this method with other methods and the results are shown in table 2. In addition, in order to improve the computational efficiency, the maxporoling method is adopted to shorten the dimension of the visual features to 4096 dimensions, and then the comparison of the recognition accuracy is uniformly carried out under the standard.
Table 1 compares the recognition accuracy of C-BCNN to BCNN, and the improved ratio.
Model (model) | BCNN | C-BCNN | Ratio |
Accuracy (%) | 33.20 | 50.80 | 53% |
Table 2 shows the comparison of the recognition accuracy between C-BCNN and W-C-BCNN, showing the effect under different treatments of weights, wherein W-C-BCNN (T) is a method for controlling the range of weight vector, and W-C-BCNN is a method for not controlling the range.
Method of producing a composite material | C-BCNN | W-C-BCNN | W-C-BCNN(T) |
Accuracy (%) | 47.10 | 48.90 | 48.90 |
Claims (4)
1. A weak supervision bilinear deep learning method based on click and vision fusion is characterized by comprising the following steps:
step (1), click data preprocessing:
extracting click features formed by texts of each image from the click data set, and constructing new low-dimensional compact click features in a merged text space by merging texts with similar semantics;
step (2), constructing a depth model fused with clicking and visual features:
weighting the sample based on reliability, and constructing a weighted three-channel deep neural network model, wherein two channels extract image visual features, and the third channel processes the click features in the step (1); fusing the visual and click characteristics through a characteristic connection layer;
step (3), BP learning model parameters:
training the network model parameters of the neural network in the step (2) through a back propagation algorithm until the whole network model converges;
step (4), learning sample reliability:
calculating the model prediction loss of each training sample according to the neural network model in the step (2), constructing a similarity matrix of the sample set, learning the reliability of the samples by using the sample loss and the similarity matrix, and weighting the samples by using the reliability;
step (5), model training:
repeating steps (3) and (4), iteratively optimizing the neural network model and the sample weights, and thus training the whole network model until convergence;
extracting click features corresponding to the images from the click data set and clustering and combining the click features according to meanings in the step (1), wherein the specific steps are as follows:
1-1, extracting text corresponding to the image i from the click data set to form click characteristicsThe specific formula is as follows:
wherein c isi,jThe click rate corresponding to the image i and the text j;
1-2, in order to obtain short and compact feature vectors, reducing dimensions of click features so as to reduce calculated amount and solve the problem of repeated text semantics, a K-means clustering method is used for indirectly clustering texts so as to obtain an index of text clusteringAdding the click quantities of the texts in the same class to obtain a new click characteristic uiSpecifically, as shown in formula 2:
constructing a depth model fusing the click and the visual features, and connecting the visual features and the click features together, wherein the depth model is specifically as follows:
2-1, constructing a three-channel network frame structure W-C-BCNN, wherein the first two channels adopt a bilinear convolutional neural network to extract visual characteristics z of an imageiAnd (3) extracting the click feature u of the corresponding image obtained in the step (1) by the third channeli(ii) a However, the device is not suitable for use in a kitchenThen the extracted visual features and click features are spliced together through a connecting layer, and a feature o with visual and semantic expression capabilities is outputiSpecifically, as shown in formula 3:
oi=(zi,μui)=(zi,1,zi,2,…,μui,1,μui,2…) (equation 3)
Wherein μ represents a weight parameter;
2-2. given n training dataWherein y isi∈[1,2,...,N]Class labels representing each data, and obtaining network model parameters theta and sample reliability variables w by solving the weak supervision bilinear deep learning problem*Thus, the whole network model is trained until convergence, as shown in formula 4:
wherein the weight w*Representing the reliability of the training sample obtained after optimization, w representing the weight before optimization, particularly, when the weight is always 1, the network framework is called as C-BCNN, and the weight is obtained by learning in continuous iterative optimization, so that the problem of weak supervised learning is called; p (w) is a weighted prior term, which is estimated based on the click amount of the click data by modeling, as shown in formula 5:
whereinIs the normalized click vector; t (-) is a scale transformationObjective function, control wcA logarithmic transformation function of the scale range, which is used for processing the condition that the click number of the picture is unbalanced; s (G, w) is a smoothing term, and is an assumption of visual consistency of the image, so as to perform regularization on the image, as shown in formula 6:
wherein g isi,jRepresenting the values in the sample similarity matrix G, the depth model is computed and constructed using the similarities of the visual features z.
2. The weakly supervised bilinear deep learning method based on click and vision fusion as claimed in claim 1, wherein the network model parameters are trained by using a back propagation algorithm until convergence in step (3), specifically as follows:
3-1, training by using a back propagation algorithm to obtain a model parameter θ, and taking dl/dx as a gradient of a loss function pair input, then obtaining a back propagation formula about two deep networks a and B according to a chain rule, as shown in formula 7:
3. the weakly supervised bilinear deep learning method based on click and vision fusion as claimed in claim 2, wherein the step (4) of learning the reliability variable w of the sample by using the sample loss and the similarity matrix*The method comprises the following steps:
4-1, extracting the softmax loss value l (y) of any training sample i by inputting the data into the network constructed based on the step (2) for calculationi,oi);
4-2, by fixing theta, converting the formula 4 into an optimization problem for solving the following quadratic programming, and learning to obtain a sample reliability parameter, which is specifically shown in a formula 8:
wherein I represents a unit vector, E represents a unit matrix, and LlapThe laplacian matrix representing G is specifically defined as shown in equation 9:
4. the weakly supervised bilinear deep learning method based on click and vision fusion as claimed in claim 3, wherein the model parameters and the sample weights are iteratively optimized until convergence in step (5), and the specific process is as follows:
and 5-1, according to the weak supervised learning problem, iteratively optimizing the steps 3 and 4 in two steps in a variable control mode, so as to train the whole network model until convergence: 1) each weight w is fixediLearning by solving the problem of W-C-BCNN to obtain a network model parameter theta; 2) fixing each theta, converting the formula 4 into quadratic programming, and learning to obtain a sample reliability variable w*。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710059373.XA CN106919951B (en) | 2017-01-24 | 2017-01-24 | Weak supervision bilinear deep learning method based on click and vision fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710059373.XA CN106919951B (en) | 2017-01-24 | 2017-01-24 | Weak supervision bilinear deep learning method based on click and vision fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106919951A CN106919951A (en) | 2017-07-04 |
CN106919951B true CN106919951B (en) | 2020-04-21 |
Family
ID=59453478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710059373.XA Active CN106919951B (en) | 2017-01-24 | 2017-01-24 | Weak supervision bilinear deep learning method based on click and vision fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106919951B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506426A (en) * | 2017-08-18 | 2017-12-22 | 四川长虹电器股份有限公司 | A kind of implementation method of intelligent television automated intelligent response robot |
CN107766794B (en) * | 2017-09-22 | 2021-05-14 | 天津大学 | Image semantic segmentation method with learnable feature fusion coefficient |
CN108197561B (en) * | 2017-12-29 | 2020-11-03 | 智慧眼科技股份有限公司 | Face recognition model optimization control method, device, equipment and storage medium |
CN108647691B (en) * | 2018-03-12 | 2020-07-17 | 杭州电子科技大学 | Image classification method based on click feature prediction |
CN109002845B (en) * | 2018-06-29 | 2021-04-20 | 西安交通大学 | Fine-grained image classification method based on deep convolutional neural network |
CN109447098B (en) * | 2018-08-27 | 2022-03-18 | 西北大学 | Image clustering algorithm based on deep semantic embedding |
CN109086753B (en) * | 2018-10-08 | 2022-05-10 | 新疆大学 | Traffic sign identification method and device based on two-channel convolutional neural network |
CN109582782A (en) * | 2018-10-26 | 2019-04-05 | 杭州电子科技大学 | A kind of Text Clustering Method based on Weakly supervised deep learning |
CN109685115B (en) * | 2018-11-30 | 2022-10-14 | 西北大学 | Fine-grained conceptual model with bilinear feature fusion and learning method |
CN109583507B (en) * | 2018-12-07 | 2023-04-18 | 浙江工商大学 | Pig body identification method based on deep convolutional neural network |
CN109815973A (en) * | 2018-12-07 | 2019-05-28 | 天津大学 | A kind of deep learning method suitable for the identification of fish fine granularity |
CN109933788B (en) * | 2019-02-14 | 2023-05-23 | 北京百度网讯科技有限公司 | Type determining method, device, equipment and medium |
CN109886345B (en) * | 2019-02-27 | 2020-11-13 | 清华大学 | Self-supervision learning model training method and device based on relational reasoning |
CN110245662B (en) * | 2019-06-18 | 2021-08-10 | 腾讯科技(深圳)有限公司 | Detection model training method and device, computer equipment and storage medium |
CN113096023B (en) * | 2020-01-08 | 2023-10-27 | 字节跳动有限公司 | Training method, image processing method and device for neural network and storage medium |
CN111598155A (en) * | 2020-05-13 | 2020-08-28 | 北京工业大学 | Fine-grained image weak supervision target positioning method based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002007854A (en) * | 2000-06-21 | 2002-01-11 | Nippon Telegr & Teleph Corp <Ntt> | Method of displaying advertisement, and advertisement system |
CN102880729A (en) * | 2012-11-02 | 2013-01-16 | 深圳市宜搜科技发展有限公司 | Figure image retrieval method and device based on human face detection and recognition |
CN104317827A (en) * | 2014-10-09 | 2015-01-28 | 深圳码隆科技有限公司 | Picture navigation method of commodity |
CN105653701A (en) * | 2015-12-31 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | Model generating method and device as well as word weighting method and device |
-
2017
- 2017-01-24 CN CN201710059373.XA patent/CN106919951B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002007854A (en) * | 2000-06-21 | 2002-01-11 | Nippon Telegr & Teleph Corp <Ntt> | Method of displaying advertisement, and advertisement system |
CN102880729A (en) * | 2012-11-02 | 2013-01-16 | 深圳市宜搜科技发展有限公司 | Figure image retrieval method and device based on human face detection and recognition |
CN104317827A (en) * | 2014-10-09 | 2015-01-28 | 深圳码隆科技有限公司 | Picture navigation method of commodity |
CN105653701A (en) * | 2015-12-31 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | Model generating method and device as well as word weighting method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106919951A (en) | 2017-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106919951B (en) | Weak supervision bilinear deep learning method based on click and vision fusion | |
Yang et al. | Visual sentiment prediction based on automatic discovery of affective regions | |
Cetinic et al. | A deep learning perspective on beauty, sentiment, and remembrance of art | |
Wang et al. | A multi-scene deep learning model for image aesthetic evaluation | |
Chen et al. | A deep learning framework for time series classification using Relative Position Matrix and Convolutional Neural Network | |
Yang et al. | Deep relative attributes | |
Kao et al. | Visual aesthetic quality assessment with a regression model | |
WO2017113232A1 (en) | Product classification method and apparatus based on deep learning | |
Mittal et al. | Image sentiment analysis using deep learning | |
CN110363253A (en) | A kind of Surfaces of Hot Rolled Strip defect classification method based on convolutional neural networks | |
CN109492750B (en) | Zero sample image classification method based on convolutional neural network and factor space | |
CN108846047A (en) | A kind of picture retrieval method and system based on convolution feature | |
Tian et al. | Diagnosis of typical apple diseases: a deep learning method based on multi-scale dense classification network | |
Zhang et al. | Structured weak semantic space construction for visual categorization | |
Liang et al. | Comparison detector for cervical cell/clumps detection in the limited data scenario | |
CN109815920A (en) | Gesture identification method based on convolutional neural networks and confrontation convolutional neural networks | |
Gehlot et al. | Ednfc-net: Convolutional neural network with nested feature concatenation for nuclei-instance segmentation | |
CN107491782A (en) | Utilize the image classification method for a small amount of training data of semantic space information | |
CN110110724A (en) | The text authentication code recognition methods of function drive capsule neural network is squeezed based on exponential type | |
Menaka et al. | Chromenet: A CNN architecture with comparison of optimizers for classification of human chromosome images | |
Alamsyah et al. | Object detection using convolutional neural network to identify popular fashion product | |
Huang et al. | Fine-art painting classification via two-channel deep residual network | |
Xu et al. | Weakly supervised facial expression recognition via transferred DAL-CNN and active incremental learning | |
Zhong et al. | An emotion classification algorithm based on SPT-CapsNet | |
Chen et al. | Bottom-up improved multistage temporal convolutional network for action segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |