CN107665356A - A kind of image labeling method - Google Patents
A kind of image labeling method Download PDFInfo
- Publication number
- CN107665356A CN107665356A CN201710969648.3A CN201710969648A CN107665356A CN 107665356 A CN107665356 A CN 107665356A CN 201710969648 A CN201710969648 A CN 201710969648A CN 107665356 A CN107665356 A CN 107665356A
- Authority
- CN
- China
- Prior art keywords
- image
- labeling method
- word
- image labeling
- mrow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The present invention relates to a kind of image labeling method, comprise the following steps:Step 1) defines the object function of image labeling model;Image is inputted CNN models by step 2), obtains primitive image features;Step 3) is weighted to primitive image features;Step 4) inputs information into LSTM models;Step 5) carries out backpropagation to error caused by prediction result.Image labeling method provided by the invention, first characteristics of the underlying image is extracted by convolutional neural networks, then the image specific location area characteristics of image related to image labeling word is extracted using focus mechanism to be input in shot and long term memory network model, the corresponding prediction mark word of generation, finally realize image labeling, excellent performance is marked, mark precision is high, can meet the needs of practical application well.
Description
Technical field
The invention belongs to technical field of image processing, and in particular to a kind of image labeling method.
Background technology
In recent years, researcher is directed to studying semantic understanding of the computer to image always.Automatic image annotation is to allow
Computer marks keyword to the entity in image automatically, and it is a kind of key technology in field of image search.With more matchmakers
Body information technology and Internet information technique are developed rapidly, and new images hundreds of millions of daily are presented on the internet.And text
Originally compare, image can more directly perceived, more accurately description information, therefore can make in the epoch of nowadays information explosion, image
User is more convenient, it is faster, more accurately obtain information needed.Image information is increasingly becoming the most heavy of epoch information propagation instantly
One of approach wanted.Therefore, how in the view data of such magnanimity help user quickly and accurately find out needed for image into
For the study hotspot in multimedia information technique field in recent years.Automatic image annotation technology is as the pass in field of image search
One of key technology, turn into the important topic of numerous researcher's researchs.
Automatic image annotation has higher Research Significance and commercial value as the important technology in field of image search.
Since 2000 are suggested, numerous researchers throw oneself into correlative study automatic image annotation technology, many automated graphics
Mask method is suggested, although these methods improve the accuracy and efficiency of image retrieval to a certain extent.But due to
The presence of image " semantic gap ", the current searching system accuracy rate based on automatic image annotation technology are still not enough managed
Think, automatic image annotation technology is still in developing stage, and it is prior art that mark performance is not good enough, it is not high enough to mark precision
Defect.Nowadays image information has become the important channel of transmission on Internet information.At present, global scale maximum image point
Platform Flicker user is enjoyed close to 1,000,000,000, includes over ten billion image.Can in so huge image library fast accurate
The image of user's request is retrieved, is the active demand in nowadays big data epoch, and current most of automatic image annotation technologies
Prevalent effects are poor in so huge image library, so the new automatic image annotation technical meaning weight under research big data
Greatly.
The content of the invention
For above-mentioned problems of the prior art, it is an object of the invention to provide one kind mark excellent performance, mark
Note the high image labeling method of precision.
In order to realize foregoing invention purpose, technical scheme provided by the invention is as follows:
A kind of image labeling method, comprises the following steps:
Step 1) defines the object function of image labeling model;
Image is inputted CNN models by step 2), obtains primitive image features;
Step 3) is weighted to primitive image features;
Step 4) inputs information into LSTM models;
Step 5) carries out backpropagation to error caused by prediction result.
Further, the object function in step 1) isWherein y={ y1...,
yN,θ represent in the model training in need parameter, I representative images;Y represents the mark group finally predicted
Close, i.e., final mark word, K represents the quantity of vocabulary in vocabulary, and N represents the number of mark word.
Further, the characteristic pattern of certain layer of convolutional layer before the primitive image features in step 2) are the full articulamentums of CNN,
The primitive image features are made up of L D dimensional feature, and each D dimensional features are mapped to the diverse location region of original image.
Further, step 3) is using focusing weight vectors αtPrimitive image features are weighted, focus on weight
Vectorial αtIt is a L dimensional vector, the weight size of image diverse location feature is represent per one-dimensional value size,
Focus on weight vectors αt=softmax (Weet), wherein,
etThe intermediateness information of t focus mechanism is represented, a represents original
Beginning characteristics of image, ht-1Represent the output of t-1 moment LSTM models.
Further, in step 4), LSTM input information xt=[Wyyt-1, Wzzt], wherein WyFor Chinese word coding parameter, Wz
For characteristics of image coding parameter, wherein yt-1It is the correct mark word of image, ztIt is to be weighted using weight parameter is focused at current time
Characteristics of image afterwards.
Further, the correct mark phrase Y=(y of image0, y1, y2…yt…yn) defeated in order since the t=1 moment
Enter among LSTM models, wherein y0It is a special word " start ", indicates the beginning of annotation process, ynIt is another
Special words " end ", indicate the end of annotation process;yt-1Through term vector coding parameter WyLSTM models are input to after coding
In;ztThrough characteristics of image coding parameter WzIt is input to after coding in LSTM models.
Further, correctly mark word uses one-hot coding form, is made up of a N-dimensional vector, N is represented in word lexicon
Number of words, in addition to corresponding mark lexeme is 1, remaining position is 0.
Further, all predictions are marked word using loss function and mark correct log likelihood probabilities by step 5)
Negative is taken after value summation, the loss function is defined as
Further, step 5) also includes constantly updating in model using stochastic gradient descent method and chain type Rule for derivation
Parameter.
Further, the calculating process formula of LSTM models is as follows:
it=σ (Wixxt+Wihht-1),
ot=σ (Woxxt+Wohht-1),
ft=σ (Wfxxt+Wfhht-1),
ct=ft⊙ct-1+it⊙h(Wcxxt+Wchht-1),
ht=ot⊙ct,
yt+1=Softmax (Wyht),
Wherein, σ (), h () are activation primitives, and ⊙ is the operation of matrix dot product;itIt is input threshold, during for controlling t
The input information at quarter;ftIt is to forget thresholding, for controlling the selective amnesia of the recall info to t-1 moment hidden layers;otIt is defeated
Go out thresholding, for controlling the output information of t;ctThe recall info of t hidden layer, it by last moment hidden layer
Information and the input information at current time together decide on, and it is LSTM core cell;htIt is the output of t hidden layer
Information;yt+1It is htThe prediction result obtained by softmax graders.
Image labeling method provided by the invention, it is existing between characteristics of the underlying image and high-level semantic for effectively alleviating
Semantic gap problem, proposes a kind of deep neural network image labeling method based on focus mechanism, and this method passes through volume first
Product neutral net (CNN) extraction characteristics of the underlying image, then extracts image specific location area and image mark using focus mechanism
The related characteristics of image of note word is input in shot and long term memory network (LSTM) model, generates corresponding prediction mark word, finally
Realize image labeling;This method effectively combines ability and LSTM extraction figures that CNN extracts characteristics of image by focus mechanism
As the ability of semantic feature, characteristics of the underlying image and image high-level semantics features can be utilized, preferably can extract and scheme
As semantic related characteristics of image, image labeling precision is effectively improved, marks excellent performance, mark precision is high, can be well
Meet the needs of practical application.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the structural representation of the deep neural network image labeling model based on focus mechanism;
Fig. 3 is traditional neural network model basic structure schematic diagram;
Fig. 4 is RNN neural network model conventional structure schematic diagrames;
Fig. 5 is LSTM NE internal structure schematic diagrams.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with the accompanying drawings and specific implementation
The present invention will be further described for example.It should be appreciated that specific embodiment described herein is only to explain the present invention, and do not have to
It is of the invention in limiting.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise
Lower obtained every other embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, a kind of image labeling method, comprises the following steps:
Step 1) establishes the deep neural network image labeling model based on focus mechanism, defines the target letter of the model
Number:
Wherein, y={ y1..., yN,θ represent in the model training in need parameter, I represents one
Image;Y represents the mark combination finally predicted, i.e., final mark word, and K represents the quantity of vocabulary in vocabulary, and N represents mark
The number of word.Shown in the structure reference picture 2 of image labeling model.
Image I is inputted CNN models by step 2), obtains primitive image features a,
It is that the feature of image diverse location is weighted in view of focus mechanism, therefore the primitive character extracted should include
Positional information, the characteristic pattern of each layer and the mapping relations in original image existence position before full articulamentum in CNN models.The present invention
The characteristic pattern of certain layer of convolutional layer before the selection full articulamentums of CNN is as primitive image features, and the primitive image features are by L D
Dimensional feature forms, and each D dimensional features are mapped to the diverse location region of original image.
Step 3) is weighted using weight vectors are focused on to primitive image features;
Focus mechanism is realized at different moments to the attention rate of diverse location provincial characteristics, this pass to diverse location
Note is by focusing on weight αtTo control.As shown in Fig. 2 since the t=1 moment, each moment, the model can produce one
Focus on weight vectors αt.Focus on weight vectors αtIt is a L dimensional vector, its vectorial all elements sum is 1, i.e.,It is each
The value size of dimension represents the weight size of image diverse location feature.Shown in its calculation formula such as formula (4.4), (4.5),
αt=softmax (Weet) (4.5);
Wherein etRepresent the intermediateness information of t focus mechanism, as t=0, e0Obtained from characteristics of image a.Work as t
During > 0, etBy the output h of t-1 moment LSTM modelst-1With the intermediateness information e of t-1 moment focus mechanismst-1Together decide on.
et-1The memory module in focus mechanism model is can be understood as, all moment are for picture position region before it remembers t
Concern information.This process can be needed by before by the picture position region to be paid close attention to of determination current time of intuitivism apprehension
Moment picture position area information of interest is (by et-1There is provided) and the moment remembers the semantic information in LSTM models before
(by ht-1There is provided).αtBy etLine focus weight decoding parametric WeObtained again by softmax graders after decoding.Focus mechanism mould
The focusing weight α that type obtains when training and startingtCharacteristics of image can not be focused on accurately to current time prediction mark word
Present position in the picture, i.e. application focus on weight αtCharacteristics of image after obtained weighting and to focus on current time exactly pre-
The weighted image of mark note word is characterized in gap being present.As training process is carried out, the parameter W in focus mechanism modela、Wh、
WeIt is continuously updated, this gap is also just steadily decreasing, and final focus mechanism model can realize accurate focusing.
The characteristics of image being input to after weighting in LSTM modelsFor the z of ttBy primitive image features
A and t focus on weight αtMultiplication obtains, for controlling attention rate of the t for image diverse location feature.To t
It is input to the image weighted feature z of LSTM modelstThe position focused on is exactly the prediction mark word of the output of t LSTM models
Location.
Step 4) inputs information into LSTM models, and the characteristics of image after the correct mark phrase of image and weighting is defeated
Enter in LSTM models;
LSTM input information xt=[Wyyt-1, Wzzt], wherein WyFor Chinese word coding parameter, WzFor characteristics of image coding parameter,
xtIt is made up of two parts, wherein yt-1It is that the correct mark word of image (uses one-hot coding form, is made up of a N-dimensional vector, N
Represent the number of words in word lexicon.In addition to corresponding mark lexeme is 1,0) remaining position is.The correct mark phrase Y=of image
(y0, y1, y2…yt...yn) be successively inputted to since the t=1 moment among LSTM models.Wherein y0It is a special list
Word " start ", indicates the beginning of annotation process.ynIt is another special words " end ", indicates the end of annotation process.
yt-1Through term vector coding parameter WyIt is input to after coding in LSTM models.xtAnother part be current time using focusing on weight
Characteristics of image z after parameter weightingt, ztThrough characteristics of image coding parameter WzIt is input to after coding in LSTM models.
The output information h of each moment LSTM model hidden layertBy exporting decoding parametric WpPrediction result is obtained after decoding
pt+1, pt+1=g (Wp·ht+bp), wherein g () represents softmax graders.pt+1Be using LSTM models obtain it is current when
Carve the prediction probability of next mark word of LSTM mode inputs mark word.But pass through pt+1Obtained prediction marks word and worked as
There is gap in next correct mark word of preceding moment LSTM mode inputs mark word, i.e. prediction result generates error.
Need to carry out backpropagation to this error, to ensure with the training of model, the prediction result at LSTM models each moment with
Correct prediction result gap is less and less, finally gives the higher image labeling model of precision.
Step 5) carries out backpropagation to error caused by prediction result, and all predictions are marked into word marks correct log
Negative is taken after the summation of likelihood probability value;
This model training process is the backpropagation of error, updates the process of model parameter, defines loss functionThe loss function is that all predictions are marked into the correct log likelihood probabilities value of word mark to ask
The result of negative is taken with after.
Updated for parameter, using stochastic gradient descent method (Stochastic Gradient Descent, abbreviation SGD)
And chain type Rule for derivation.By training the parameter constantly updated in model so that penalty values L (I, y) is as far as possible small.These parameters
Including LSTM model inner parameters, weight parameter (W is focused ona、Wh、We), Chinese word coding parameter Wy, characteristics of image coding parameter Wz, it is defeated
Go out decoding parametric WpDeng (present invention is direct using the CNN model extraction characteristics of image trained, therefore not to CNN model parameters
It is updated), above-mentioned parameter is parameter sharing at each moment of model training.
CNN is one kind of feedforward neural network, and it includes two kinds of unique hidden layer structure of convolutional layer and pond layer.CNN
With preferable ability in feature extraction, it is widely used in the fields such as image, video, voice at present.
CNN has unique network structure.Its uniqueness is mainly reflected in two aspects.Its next layer during one side
Do not connected entirely between neuron and last layer neuron, i.e., be local sensing between its neuron.On the other hand nerve
There is identical weight, i.e. the connection of neuron is that weight is shared in first connection procedure.This unique local sensing and power
Shared network structure approaches with biological neural network again.Such model can effectively reduce the parameter in network, effectively drop
The complexity of low network.CNN has two kinds of unique hidden layer structures, i.e. convolutional layer and pond layer.In CNN a certain layer convolutional layer by
A variety of convolution kernel compositions, a convolution kernel is the wave filter of a M*M size, and it is used for extracting each office in last layer receptive field
Certain local feature of portion position.Pond layer is used for carrying out dimensionality reduction to last layer convolution feature, and concrete operations are to roll up last layer
Product feature is divided into multiple N*N region.The characteristic value of average (or maximum) in each region is extracted as the feature after dimensionality reduction.
CNN generally would generally access a softmax grader after a series of convolutional layers, pond layer, full articulamentum, for locating
Manage more classification problems.
Recognition with Recurrent Neural Network (Recurrent Neural Network, hereinafter referred to as RNN) has unique memory function
Structure.Neural network model includes input layer, hidden layer, output layer three-decker.In traditional neural network model, from input
Layer is connectionless per interior nodes from level to level, the node between each layer is in the presence of connection, specific knot to hidden layer to output layer
Structure is as shown in Figure 3.This traditional neural network model and the function not comprising recall info, as some needs are by having produced
The problem of information is calculated is helpless.If for example, in short, to predict the word of next appearance, greatly
Needed in the case of more by above caused vocabulary, such as " I is a basket baller, and I likes to play basketball " such one
Word, " playing basketball " inside latter sentence need to be inferred to by " basket baller " in last sentence.RNN models can be with
Information caused by the moment it will be remembered and be applied in current time calculating process before, this has benefited from RNN compared to tradition
The change that occurs in structure of neural network model, the input of RNN hidden layer is not only defeated comprising current time input layer
Go out, also include the output information of last moment hidden layer, i.e., the node inside hidden layer has connection, and concrete structure information is such as
Shown in Fig. 4.
LSTM (Long Short-Term Memory, LSTM) is the improved model of RNN models, its internal cellular construction
As shown in Figure 5.Shown in calculating process formula (3.1)-(3.6) of LSTM models.Wherein, σ (), h () are activation primitives, ⊙
It is the operation of matrix dot product.itIt is input threshold, for controlling the input information of t.ftIt is to forget thresholding, for controlling to t-1
The selective amnesia of the recall info of moment hidden layer.otIt is output thresholding, for controlling the output information of t.ctWhen being t
The recall info of hidden layer is carved, it is together decided on by the hidden layer information of last moment and the input information at current time, and it is
LSTM core cell.htIt is the output information of t hidden layer.yt+1It is htObtained by softmax graders pre-
Survey result.
it=σ (Wixxt+Wihht-1) (3.1);
ot=σ (Woxxt+Wohht-1) (3.2);
ft=σ (Wfxxt+Wfhht-1) (3.3);
ct=ft⊙ct-1+it⊙h(Wcxxt+Wchht-1) (3.4);
ht=ot⊙ct(3.5);
yt+1=Softmax (Wyht) (3.6)。
Image labeling method provided by the invention, it is existing between characteristics of the underlying image and high-level semantic for effectively alleviating
Semantic gap problem, proposes a kind of deep neural network image labeling method based on focus mechanism, and this method passes through volume first
Product neutral net (CNN) extraction characteristics of the underlying image, then extracts image specific location area and image mark using focus mechanism
The related characteristics of image of note word is input in shot and long term memory network (LSTM) model, generates corresponding prediction mark word, finally
Realize image labeling;The new distance measure of this image labeling method, has merged the semantic information of image, reduces image bottom
Difference between feature and image high-level semantic, it is significant for image, semantic accurate understanding, image can be effectively improved
Mark precision;For this method by focus mechanism, the ability and LSTM for effectively combining CNN extraction characteristics of image extract image language
The ability of adopted feature, characteristics of the underlying image and image high-level semantics features can be utilized, can preferably be extracted and image language
Adopted related characteristics of image, effectively improves image labeling precision, marks excellent performance, mark precision is high, can meet well
The needs of practical application.
Embodiment described above only expresses embodiments of the present invention, and its description is more specific and detailed, but can not
Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art,
Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention
Enclose.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (10)
1. a kind of image labeling method, it is characterised in that comprise the following steps:
Step 1) defines the object function of image labeling model;
Image is inputted CNN models by step 2), obtains primitive image features;
Step 3) is weighted to primitive image features;
Step 4) inputs information into LSTM models;
Step 5) carries out backpropagation to error caused by prediction result.
2. image labeling method according to claim 1, it is characterised in that the object function in step 1) is
3. according to the image labeling method described in claim 1-2, it is characterised in that the primitive image features in step 2 are CNN
The characteristic pattern of certain layer of convolutional layer before full articulamentum, the primitive image features are made up of L D dimensional feature, and each D dimensional features reflect
It is mapped to the diverse location region of original image.
4. according to the image labeling method described in claim 1-3, it is characterised in that step 3 is using focusing weight vectors αt
Primitive image features are weighted, focus on weight vectors αtIt is a L dimensional vector, image is represent not per one-dimensional value size
With the weight size of position feature.
Focus on weight vectors αt=softmax (Weet), wherein,
etThe intermediateness information of t focus mechanism is represented, a represents original graph
As feature, ht-1Represent the output of t-1 moment LSTM models.
5. according to the image labeling method described in claim 1-4, it is characterised in that in step 4), LSTM input information xt=
[Wyyt-1, Wzzt], wherein WyFor Chinese word coding parameter, WzFor characteristics of image coding parameter, wherein yt-1It is the correct mark word of image,
ztIt is to use the characteristics of image after focusing on weight parameter weighting at current time.
6. according to the image labeling method described in claim 1-5, it is characterised in that the correct mark phrase Y=(y of image0,
y1, y2...yt...yn) be successively inputted to since the t=1 moment among LSTM models, wherein y0It is a special word
" start ", indicate the beginning of annotation process, ynIt is another special words " end ", indicates the end of annotation process;yt-1
Through term vector coding parameter WyIt is input to after coding in LSTM models;ztThrough characteristics of image coding parameter WzIt is input to after coding
In LSTM models.
7. according to the image labeling method described in claim 1-5, it is characterised in that correct mark word uses one-hot coding shape
Formula, it is made up of a N-dimensional vector, N represents the number of words in word lexicon, and in addition to corresponding mark lexeme is 1, remaining position is 0.
8. according to the image labeling method described in claim 1-7, it is characterised in that step 5) is using loss function by institute
There is prediction mark word to take negative after marking correct log likelihood probabilities value summation, the loss function is defined as
<mrow>
<mi>L</mi>
<mrow>
<mo>(</mo>
<mi>I</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mo>-</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mi>log</mi>
<mi> </mi>
<msub>
<mi>p</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>.</mo>
</mrow>
9. according to the image labeling method described in claim 1-8, it is characterised in that step 5) also includes applying under stochastic gradient
Drop method and chain type Rule for derivation constantly update the parameter in model.
10. according to the image labeling method described in claim 1-9, it is characterised in that the calculating process formula of LSTM models is such as
Under:
it=σ (Wixxt+Wihht-1),
ot=σ (Woxxt+Wohht-1),
ft=σ (Wfxxt+Wfhht-1),
ct=ft⊙ct-1+it⊙h(Wcxxt+Wchht-1),
ht=ot⊙ct,
yt+1=Softmax (Wyht)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710969648.3A CN107665356A (en) | 2017-10-18 | 2017-10-18 | A kind of image labeling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710969648.3A CN107665356A (en) | 2017-10-18 | 2017-10-18 | A kind of image labeling method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107665356A true CN107665356A (en) | 2018-02-06 |
Family
ID=61098761
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710969648.3A Pending CN107665356A (en) | 2017-10-18 | 2017-10-18 | A kind of image labeling method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107665356A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108665506A (en) * | 2018-05-10 | 2018-10-16 | 腾讯科技(深圳)有限公司 | Image processing method, device, computer storage media and server |
CN109032356A (en) * | 2018-07-27 | 2018-12-18 | 深圳绿米联创科技有限公司 | Sign language control method, apparatus and system |
CN109146858A (en) * | 2018-08-03 | 2019-01-04 | 诚亿电子(嘉兴)有限公司 | The secondary method of calibration of automatic optical inspection device problem |
CN109343920A (en) * | 2018-09-10 | 2019-02-15 | 深圳市腾讯网络信息技术有限公司 | A kind of image processing method and its device, equipment and storage medium |
WO2020186484A1 (en) * | 2019-03-20 | 2020-09-24 | 深圳大学 | Automatic image description generation method and system, electronic device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105469065A (en) * | 2015-12-07 | 2016-04-06 | 中国科学院自动化研究所 | Recurrent neural network-based discrete emotion recognition method |
CN105701516A (en) * | 2016-01-20 | 2016-06-22 | 福州大学 | Method for automatically marking image on the basis of attribute discrimination |
CN105938485A (en) * | 2016-04-14 | 2016-09-14 | 北京工业大学 | Image description method based on convolution cyclic hybrid model |
CN107066464A (en) * | 2016-01-13 | 2017-08-18 | 奥多比公司 | Semantic Natural Language Vector Space |
CN107076567A (en) * | 2015-05-21 | 2017-08-18 | 百度(美国)有限责任公司 | Multilingual image question and answer |
-
2017
- 2017-10-18 CN CN201710969648.3A patent/CN107665356A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107076567A (en) * | 2015-05-21 | 2017-08-18 | 百度(美国)有限责任公司 | Multilingual image question and answer |
CN105469065A (en) * | 2015-12-07 | 2016-04-06 | 中国科学院自动化研究所 | Recurrent neural network-based discrete emotion recognition method |
CN107066464A (en) * | 2016-01-13 | 2017-08-18 | 奥多比公司 | Semantic Natural Language Vector Space |
CN105701516A (en) * | 2016-01-20 | 2016-06-22 | 福州大学 | Method for automatically marking image on the basis of attribute discrimination |
CN105938485A (en) * | 2016-04-14 | 2016-09-14 | 北京工业大学 | Image description method based on convolution cyclic hybrid model |
Non-Patent Citations (4)
Title |
---|
KELVIN XU 等: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1502.03044》 * |
ORIOL VINYALS 等: "Show and Tell: A Neural Image Caption Generator", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
ZHAO GUO 等: "Attention-based LSTM with Semantic Consistency for Videos Captioning", 《PROCEEDINGS OF THE 24TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 * |
梁欢: "《基于深度学习的图像语义理解研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108665506A (en) * | 2018-05-10 | 2018-10-16 | 腾讯科技(深圳)有限公司 | Image processing method, device, computer storage media and server |
CN108665506B (en) * | 2018-05-10 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer storage medium and server |
CN109032356A (en) * | 2018-07-27 | 2018-12-18 | 深圳绿米联创科技有限公司 | Sign language control method, apparatus and system |
CN109146858A (en) * | 2018-08-03 | 2019-01-04 | 诚亿电子(嘉兴)有限公司 | The secondary method of calibration of automatic optical inspection device problem |
CN109146858B (en) * | 2018-08-03 | 2021-09-17 | 诚亿电子(嘉兴)有限公司 | Secondary checking method for problem points of automatic optical checking equipment |
CN109343920A (en) * | 2018-09-10 | 2019-02-15 | 深圳市腾讯网络信息技术有限公司 | A kind of image processing method and its device, equipment and storage medium |
CN109343920B (en) * | 2018-09-10 | 2021-09-07 | 深圳市腾讯网络信息技术有限公司 | Image processing method and device, equipment and storage medium thereof |
WO2020186484A1 (en) * | 2019-03-20 | 2020-09-24 | 深圳大学 | Automatic image description generation method and system, electronic device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110609891A (en) | Visual dialog generation method based on context awareness graph neural network | |
CN107665356A (en) | A kind of image labeling method | |
CN110390397B (en) | Text inclusion recognition method and device | |
CN110134946B (en) | Machine reading understanding method for complex data | |
CN108628935B (en) | Question-answering method based on end-to-end memory network | |
CN111291556B (en) | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item | |
CN106855853A (en) | Entity relation extraction system based on deep neural network | |
CN108875074A (en) | Based on answer selection method, device and the electronic equipment for intersecting attention neural network | |
CN110222163A (en) | A kind of intelligent answer method and system merging CNN and two-way LSTM | |
CN111966812B (en) | Automatic question answering method based on dynamic word vector and storage medium | |
CN110096567A (en) | Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning | |
CN109597876A (en) | A kind of more wheels dialogue answer preference pattern and its method based on intensified learning | |
CN112699216A (en) | End-to-end language model pre-training method, system, device and storage medium | |
CN109214006A (en) | The natural language inference method that the hierarchical semantic of image enhancement indicates | |
CN113743099B (en) | System, method, medium and terminal for extracting terms based on self-attention mechanism | |
CN114428850B (en) | Text retrieval matching method and system | |
CN112200664A (en) | Repayment prediction method based on ERNIE model and DCNN model | |
CN114969278A (en) | Knowledge enhancement graph neural network-based text question-answering model | |
CN112215017A (en) | Mongolian Chinese machine translation method based on pseudo parallel corpus construction | |
CN114510946B (en) | Deep neural network-based Chinese named entity recognition method and system | |
CN115964459B (en) | Multi-hop reasoning question-answering method and system based on food safety cognition spectrum | |
CN115688753A (en) | Knowledge injection method and interaction system of Chinese pre-training language model | |
CN112417890B (en) | Fine granularity entity classification method based on diversified semantic attention model | |
Li et al. | Multimodal fusion with co-attention mechanism | |
Xu et al. | CNN-based skip-gram method for improving classification accuracy of chinese text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180206 |